🚀 Day 1 of #OpenSourceWeek: FlashMLAHonored to share FlashMLA - our efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences and now in production.✅ BF16 support✅ Paged KV cache (block size 64)⚡ 3000 GB/s memory-bound & 580 TFLOPS… — DeepSeek (@deepseek_ai) February 24, 2025
优化Nividia H800 GPU 效率。
系统提示:若遇到视频无法播放请点击下方链接
https://x.com/deepseek_ai/status/1893836827574030466