DeepSeek ... implemented cross-GPU communications ... using PTX. They did not use CUDA ... THAT is crazy significant.
H100s were prohibited by the chip ban, but not H800s. Everyone assumed that training leading edge models required more interchip memory bandwidth, but that is exactly what DeepSeek optimized both their model structure and infrastructure around.
Again, just to emphasize this point, all of the decisions DeepSeek made in the design of this model only make sense if you are constrained to the H800; if DeepSeek had access to H100s, they probably would have used a larger training cluster with much fewer optimizations specifically focused on overcoming the lack of bandwidth.
不用CUBA也能进行Al大模型训练。
以前,没有Nvidia的芯片,Al就无法前进。
现在可以用其他芯片来发展Al了。
据说现在DS完全用华为的升腾进行操作。
下一个大模型训练将全部用华为的升腾芯片。
现在就是100%对他国进行芯片封锁,
已经无法阻止其他人Al发展的步伐了。
Nvidia 的股价下跌是在情理之中的。
YMYD.
理论上什么芯片都可以,只是效率的区别
差点跟风投Nvidia, VGT.
那部分的支持都撤了.独掌难行
DS是蒸馏来的,是压缩,不是原创。智慧来自原创,效率来自压缩。就像长篇小说,改编成小故事,读起来更快一样。
发神经了,用汇编去提高效率,将来更不好移植?不过人力便宜,哈哈
你看书得好几年,他只要几个月就好
https://stratechery.com/2025/deepseek-faq/
不懂计术的就不要传小道消息了,我们搞这一行的人每天读deepseek paper.
their most recent DS R1 paper is at https://arxiv.org/pdf/2501.12948
DeepSeek ... implemented cross-GPU communications ... using PTX. They did not use CUDA ... THAT is crazy significant.
H100s were prohibited by the chip ban, but not H800s. Everyone assumed that training leading edge models required more interchip memory bandwidth, but that is exactly what DeepSeek optimized both their model structure and infrastructure around.
Again, just to emphasize this point, all of the decisions DeepSeek made in the design of this model only make sense if you are constrained to the H800; if DeepSeek had access to H100s, they probably would have used a larger training cluster with much fewer optimizations specifically focused on overcoming the lack of bandwidth.
which DeepSeek has also demonstrated — is great for Big Tech
DeepSeek brings good competition to US AI industry, which will lead to lower cost and better AI tool for everyone.