https://x.com/chamath/status/1883579259769462819 “With R1, DeepSeek essentially cracked one of the holy grails of AI: getting models to reason step-by-step without relying on massive supervised datasets. Their DeepSeek-R1-Zero experiment showed something remarkable: using pure reinforcement learning with carefully crafted reward functions, they managed to get models to develop sophisticated reasoning capabilities completely autonomously. This wasn't just about solving problems— the model organically learned to generate long chains of thought, self-verify its work, and allocate more computation time to harder problems. The technical breakthrough here was their novel approach to reward modeling. Rather than using complex neural reward models that can lead to "reward hacking" (where the model finds bogus ways to boost their rewards that don't actually lead to better real-world model performance), they developed a clever rule-based system that combines accuracy rewards (verifying final answers) with format rewards (encouraging structured thinking). This simpler approach turned out to be more robust and scalable than the process-based reward models that others have tried.” 这个已经是真正的AGI的苗头了,AI自我发现,调整和解决问题的方式方法。
但是却连中学生会的查找文献,引用文献,都做不了。
信不信,大家去试试,看我是不是瞎扯胡说。
还有一个人人都会的技能,但是AI却不会。那就是炒股。农村大叔都会的技能,AI却不会。
能详细讲讲,这些AI的工作原理和步骤吗?
另外,怎么去copy别人的东西?
先把问题分成几百亿个盒子里?
然后每个盒子里,都对应一个模版?回答问题,就依照模版来回答?
牛
chatgpt不是不擅长,他搞生成式胡说八道搞习惯了,所谓的文献是他编造出来的。我第一次问了个专业问题,他说得不清不楚,我说你给我几个最新的reference把,倒是列了一大堆,有作者有title有期刊名有页数,就是他妈的一个也搜不到,去期刊网站按图索骥也是查无此文
请大牛们谈谈, 开发这个引用文献的软件, 难点在哪儿?
我不是专业人士,瞎猜的,觉得这是生成式AI的本质决定的,应该开发生成式+智能搜索结合的工具,先做个判断,需要生成的生成,不需要生成的,比如新闻,文献啥的,用智能搜索,再根据搜索结果缩小一下范围,归纳整理就行了
“With R1, DeepSeek essentially cracked one of the holy grails of AI: getting models to reason step-by-step without relying on massive supervised datasets. Their DeepSeek-R1-Zero experiment showed something remarkable: using pure reinforcement learning with carefully crafted reward functions, they managed to get models to develop sophisticated reasoning capabilities completely autonomously. This wasn't just about solving problems— the model organically learned to generate long chains of thought, self-verify its work, and allocate more computation time to harder problems.
The technical breakthrough here was their novel approach to reward modeling. Rather than using complex neural reward models that can lead to "reward hacking" (where the model finds bogus ways to boost their rewards that don't actually lead to better real-world model performance), they developed a clever rule-based system that combines accuracy rewards (verifying final answers) with format rewards (encouraging structured thinking). This simpler approach turned out to be more robust and scalable than the process-based reward models that others have tried.”
这个已经是真正的AGI的苗头了,AI自我发现,调整和解决问题的方式方法。
没难点。pp就是把搜索和生成结合。都是一样的东西。
deepseek 如果激活deepthink和search 选项,它是可以把参考的数据和资料出处链接给提供的