转帖:技术大拿用benfort法则分析部分county是否存在造假

y
yanhren
楼主 (北美华人网)
不懂技术,网上大拿多,来分析一下作者说的有道理否? https://github.com/cjph8914/2020_benfords
h
handsugar
别忘了贴费城😂
A
ACoolDude
这都拿来说事,可以当法庭证据吗?不能说个毛钱
e
ecko
Benford's law,可以标题改对吗:)
哈哈,科技求真。肯定不能拿来做法律证据,不过分析越多大家心里越有数
w
woshihai
选举系统就是一个诚信系统,只要有人不诚信,这个系统就是一个破烂,还不如中国无反对票直接了当。
y
yanhren
回复 4楼ecko的帖子
第一次听说这个法则,不好意思打错了!我看看在哪里修改
y
yanhren
谁能告诉我一下在哪里修改标题呢? 具体技术不懂,完全技术盲,只是看着好像有些道理,拿来请教大家。
张国荣
不是一直拿这个law说淘宝还是京东双十一数据造假吗?那案子查怎么样了? 李永乐老师讲过这个law, 很有意思。
y
yanhren
这应该是大数法则,如果不是大规模造假,估计也很难发现问题的。
h
handsugar
不是一直拿这个law说淘宝还是京东双十一数据造假吗?那案子查怎么样了? 李永乐老师讲过这个law, 很有意思。
张国荣 发表于 2020-11-06 14:55

甩个链接 咱也去学习下哈
揽月听风
Benford's law好些paper了
c
chmod999
看图说话 所以milwaukee chicago 和allegheny的拜登票是陈独秀?
y
yanhren
具体不懂,感觉就是说biden的票与这个法则不符。
古天乐.
正反都看一看
Unconvincing (to me) Use of Benford’s Law to Demonstrate Election Fraud in Iran
https://fivethirtyeight.com/features/unconvincing-to-me-use-of-benfords-law/
l
liheng
查到了就是手误打错了。。。没查到的呢? 相信全世界都盯着猪党,现在科技这么发达,做了那么多坏事不留痕不可能。
e
ecko
正反都看一看
Unconvincing (to me) Use of Benford’s Law to Demonstrate Election Fraud in Iran
https://fivethirtyeight.com/features/unconvincing-to-me-use-of-benfords-law/

古天乐. 发表于 2020-11-06 15:17

quote from the article you pasted:
Maybe there’s something I’m missing here, but that’s my quick take. This is not to say that I think the election was fair, or rigged, or whatever–I have absolutely zero knowledge on that matter–just that I don’t find this analysis convincing of anything. I will say, though, that Roukema deserves credit for presenting the analysis clearly. P.S. In response to comments: let me emphasize that I’m not saying that I think nothing funny was going on in the election. As I wrote, I’m commenting on the statistics, I don’t know the facts on the ground. To move my comments in a more constructive direction (I hope), let me pull out this useful comment from Roukema’s article: “One possible method to test whether this is just an odd fluke would be to check the validity of the vote counts for candidate K in the voting areas where the official number of votes for K starts with the digit 7.” Further investigation could be a good thing here.

以及,他反驳的不是lz贴的这个visualized data distribution。当然,same here: This is not to say that I think the election was fair, or rigged, or whatever–I have absolutely zero knowledge on that matter
张国荣
查到了就是手误打错了。。。没查到的呢? 相信全世界都盯着猪党,现在科技这么发达,做了那么多坏事不留痕不可能。
liheng 发表于 2020-11-06 15:19

真烦人,就是你这样的川粉把 trump 人品霍霍没的。多少人是因为身边这些低素质的川粉最后决定不选川的。真可笑,在朋友圈儿乱叫还以为是在给trump拉票。多少人没有给你点赞就是多少人烦你。
u
urthur
如果没理解错,biden 的图都有点不符合这个规律?以前也没听过这个law
张国荣
quote from the article you pasted:
Maybe there’s something I’m missing here, but that’s my quick take. This is not to say that I think the election was fair, or rigged, or whatever–I have absolutely zero knowledge on that matter–just that I don’t find this analysis convincing of anything. I will say, though, that Roukema deserves credit for presenting the analysis clearly. P.S. In response to comments: let me emphasize that I’m not saying that I think nothing funny was going on in the election. As I wrote, I’m commenting on the statistics, I don’t know the facts on the ground. To move my comments in a more constructive direction (I hope), let me pull out this useful comment from Roukema’s article: “One possible method to test whether this is just an odd fluke would be to check the validity of the vote counts for candidate K in the voting areas where the official number of votes for K starts with the digit 7.” Further investigation could be a good thing here.

以及,他反驳的不是lz贴的这个visualized data distribution。当然,same here: This is not to say that I think the election was fair, or rigged, or whatever–I have absolutely zero knowledge on that matter
ecko 发表于 2020-11-06 15:23

那篇文章是2009年的啊,质疑是否这个law试用于大选数据。这种质疑不是新的。
s
soup0708
这个专家没学过概率统计?每波报上来的数都是不同county的,也有是mail还是in person的区别 我那天半夜看直播,密尔沃基在4am一下报了所有的mail in,14万给Biden,3万给Trump,当然pattern跟其他小county差别巨大啊
c
computer101
哎,有时候真的很气好好的统计学知识都被人扭曲滥用。
就拿Milwaukee地区的数据做一个例子,Milwaukee地区Biden:Trump总支持率是在69%:29%,简化一下,就是70%:30%。我们简单假设一下整个城区478个ward都差不多符合这个比例(基本上没啥区别,有兴趣的人可以快速scan一下: https://county.milwaukee.gov/EN/County-Clerk/Off-Nav/Election-Results/Election-Results-Fall-2020)。
但是,要注意一点的是,在城市划分Ward的时候,本身就是大致按照人口分布划分的,而不是完全随机的!具体到Milwaukke这478个ward,其中投票人口在300~1400这个区间就占到382个(80%)!大家可以用简单的数学常识计算一下,70%乘以300~1400中的任何一个数,都不会得出首位是1的数字。这也就意味着,投Biden的选票在这80%的ward中都不太会是1字打头的,而1字打头的数字的期望比例在这个样本中应该是20%左右,刚好和这个原作者列出的图符合
相反,对于拿30%票的Trump来说,他如果要得到1字打头的选票数字,原始投票人口分布应该是在333~666这个范围之内。Ward人口分布里,300~700这个range的Ward总共有160个(33%),刚好可以达成30%这个数!
这个Github原作者精心cherry pick出了5个地区来选择性的证明他的观点,真是脏了Statistician这个职业
以下是Milwaukee每个Ward人口分布histogram供大家自己参考:

u
upendown
现在的过程很像温水煮青蛙,熟没有熟呢。。