Deepseek本周一天发布一个开源项目

平明寻白羽
楼主 (北美华人网)
第一个刚发布。
优化Nividia H800 GPU 效率。
🚀 Day 1 of #OpenSourceWeek: FlashMLA

Honored to share FlashMLA - our efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences and now in production.

✅ BF16 support
✅ Paged KV cache (block size 64)
⚡ 3000 GB/s memory-bound & 580 TFLOPS… — DeepSeek (@deepseek_ai) February 24, 2025

系统提示:若遇到视频无法播放请点击下方链接
https://x.com/deepseek_ai/status/1893836827574030466
R
Riverss
Damn my NVDA is cooked