Deepseek 自己推荐的配置和预算去跑 671b local node - 2025年2月1日北美华人网存档

5 个月

楼主 (北美华人网)

自己说大约10万到30万就可以了. 而且还推荐AMD也可以用.. NVDA吐血.
Running a model like **DeepSeek** locally with **671 billion parameters** is an extremely resource-intensive task. Models of this scale are typically designed for **data center-grade hardware** and are not feasible to run on consumer-grade systems. However, I can provide an overview of the **recommended hardware** and considerations for running such a massive model:
---
### 1. **GPU Requirements**   - **Type**: High-end GPUs like **NVIDIA A100**, **H100**, or **AMD MI250X** are recommended.   - **VRAM**: Each GPU should have **80 GB of VRAM** or more. For a 671B parameter model, you’ll likely need **multiple GPUs** (e.g., 8-16 GPUs) to fit the model and its activations in memory.   - **Interconnect**: Use **NVLink** or **InfiniBand** for high-speed communication between GPUs to handle the massive data transfer requirements.
---
### 2. **CPU Requirements**   - **Cores**: A high-core-count CPU (e.g., **AMD EPYC** or **Intel Xeon**) with at least **64 cores** is recommended.   - **Clock Speed**: Higher clock speeds will help with preprocessing and data pipeline tasks.   - **RAM**: At least **1-2 TB of system RAM** is necessary to handle the model’s memory requirements and data loading.
---
### 3. **Storage**   - **Type**: Use **NVMe SSDs** for fast data loading and checkpoint storage.   - **Capacity**: At least **10-20 TB of storage** is recommended to store the model weights, datasets, and intermediate checkpoints.
---
### 4. **Networking**   - **Bandwidth**: If running in a distributed setup, ensure high-speed networking (e.g., **100 Gbps or higher**) to handle communication between nodes.   - **Latency**: Low-latency networking is critical for distributed training or inference.
---
### 5. **Software and Frameworks**   - **Deep Learning Frameworks**: Use frameworks like **PyTorch**, **TensorFlow**, or **JAX** with optimized libraries (e.g., **DeepSpeed**, **Megatron-LM**) for distributed training and inference.   - **Mixed Precision**: Enable **mixed precision training** (FP16 or BF16) to reduce memory usage and improve performance.   - **Model Parallelism**: Use **tensor parallelism** and **pipeline parallelism** to split the model across multiple GPUs or nodes.
---
### 6. **Power and Cooling**   - **Power Supply**: Ensure sufficient power supply (e.g., **10-20 kW** per node) to handle the high energy consumption of GPUs and CPUs.   - **Cooling**: Use advanced cooling solutions (e.g., liquid cooling) to manage the heat generated by the hardware.
---
### 7. **Cost Estimate**   - **GPUs**: A single NVIDIA A100 GPU costs around **$10,000-$15,000**. For 8-16 GPUs, this alone could cost **$80,000-$240,000**.   - **CPU and RAM**: A high-end server with 64 cores and 1 TB of RAM could cost **$10,000-$20,000**.   - **Storage and Networking**: Additional costs for NVMe SSDs and networking equipment could add **$10,000-$30,000**.   - **Total**: Expect to spend **$100,000-$300,000+** for a single node setup, and significantly more for a multi-node cluster.
---
### 8. **Cloud Alternatives**   - If running locally is not feasible, consider using **cloud platforms** like **AWS**, **Google Cloud**, or **Azure** with access to high-end GPUs and distributed computing resources.   - Cloud providers offer pre-configured instances for large-scale AI workloads, which can be more cost-effective for occasional use.
---
### Summary Running a **671B parameter model** like DeepSeek locally requires **data center-grade hardware** with multiple high-end GPUs, a high-core-count CPU, massive RAM, and fast storage. The cost and complexity make it impractical for most users, so cloud-based solutions are often a better option. Let me know if you’d like more details on specific setups or cloud alternatives!

Silverwing

5 个月

自己说大约10万到30万就可以了. 而且还推荐AMD也可以用.. NVDA吐血.
Running a model like **DeepSeek** locally with **671 billion parameters** is an extremely resource-intensive task. Models of this scale are typically designed for **data center-grade hardware** and are not feasible to run on consumer-grade systems. However, I can provide an overview of the **recommended hardware** and considerations for running such a massive model:
---
### 1. **GPU Requirements**   - **Type**: High-end GPUs like **NVIDIA A100**, **H100**, or **AMD MI250X** are recommended.   - **VRAM**: Each GPU should have **80 GB of VRAM** or more. For a 671B parameter model, you’ll likely need **multiple GPUs** (e.g., 8-16 GPUs) to fit the model and its activations in memory.   - **Interconnect**: Use **NVLink** or **InfiniBand** for high-speed communication between GPUs to handle the massive data transfer requirements.
---
### 2. **CPU Requirements**   - **Cores**: A high-core-count CPU (e.g., **AMD EPYC** or **Intel Xeon**) with at least **64 cores** is recommended.   - **Clock Speed**: Higher clock speeds will help with preprocessing and data pipeline tasks.   - **RAM**: At least **1-2 TB of system RAM** is necessary to handle the model’s memory requirements and data loading.
---
### 3. **Storage**   - **Type**: Use **NVMe SSDs** for fast data loading and checkpoint storage.   - **Capacity**: At least **10-20 TB of storage** is recommended to store the model weights, datasets, and intermediate checkpoints.
---
### 4. **Networking**   - **Bandwidth**: If running in a distributed setup, ensure high-speed networking (e.g., **100 Gbps or higher**) to handle communication between nodes.   - **Latency**: Low-latency networking is critical for distributed training or inference.
---
### 5. **Software and Frameworks**   - **Deep Learning Frameworks**: Use frameworks like **PyTorch**, **TensorFlow**, or **JAX** with optimized libraries (e.g., **DeepSpeed**, **Megatron-LM**) for distributed training and inference.   - **Mixed Precision**: Enable **mixed precision training** (FP16 or BF16) to reduce memory usage and improve performance.   - **Model Parallelism**: Use **tensor parallelism** and **pipeline parallelism** to split the model across multiple GPUs or nodes.
---
### 6. **Power and Cooling**   - **Power Supply**: Ensure sufficient power supply (e.g., **10-20 kW** per node) to handle the high energy consumption of GPUs and CPUs.   - **Cooling**: Use advanced cooling solutions (e.g., liquid cooling) to manage the heat generated by the hardware.
---
### 7. **Cost Estimate**   - **GPUs**: A single NVIDIA A100 GPU costs around **$10,000-$15,000**. For 8-16 GPUs, this alone could cost **$80,000-$240,000**.   - **CPU and RAM**: A high-end server with 64 cores and 1 TB of RAM could cost **$10,000-$20,000**.   - **Storage and Networking**: Additional costs for NVMe SSDs and networking equipment could add **$10,000-$30,000**.   - **Total**: Expect to spend **$100,000-$300,000+** for a single node setup, and significantly more for a multi-node cluster.
---
### 8. **Cloud Alternatives**   - If running locally is not feasible, consider using **cloud platforms** like **AWS**, **Google Cloud**, or **Azure** with access to high-end GPUs and distributed computing resources.   - Cloud providers offer pre-configured instances for large-scale AI workloads, which can be more cost-effective for occasional use.
---
### Summary Running a **671B parameter model** like DeepSeek locally requires **data center-grade hardware** with multiple high-end GPUs, a high-core-count CPU, massive RAM, and fast storage. The cost and complexity make it impractical for most users, so cloud-based solutions are often a better option. Let me know if you’d like more details on specific setups or cloud alternatives!
dukan 发表于 2025-01-31 11:26

搞了个7b 乞丐版，我的显卡在痛苦的干嚎个人还是很难玩
话说这应该会刺激个人PC市场起死回生些，虽然不多

shanggj

5 个月

Silverwing 发表于 2025-01-31 11:29
搞了个7b 乞丐版，我的显卡在痛苦的干嚎个人还是很难玩
话说这应该会刺激个人PC市场起死回生些，虽然不多

7b 乞丐版显卡的什么级别才不会痛苦的干嚎

OLOAHA

5 个月

这个deepseek开源会不会增加N卡销量，现在谁都能下载部署，软件齐了，就差硬件了，会带动一些公司去买显卡自己跑模型

dukan

5 个月

回复 4楼 OLOAHA 的帖子
带动会有, 但NVDA的竞争也大了很多. 不用求NVDA卖卡. 今天少少的买了点 INTC. 搞不好对intel是个转机

OLOAHA

5 个月

回复 4楼 OLOAHA 的帖子
带动会有, 但NVDA的竞争也大了很多. 不用求NVDA卖卡. 今天少少的买了点 INTC. 搞不好对intel是个转机
dukan 发表于 2025-01-31 11:54

推荐GPU配置里只有N和A两家，没有Intel，不是更应该买AMD？

Harenough

5 个月

5年内有没有可能有$3000的家庭版AI, 加上 3万刀的机器人。
负责安保，吸尘，洗衣服，晾衣服，娃作业辅导，出门等娃校车，到娃学校接娃回家，出门浇草浇菜喂鸡等工作。

dukan

5 个月

回复 6楼 OLOAHA 的帖子
都可以啊. AMD 190B 对 INTC 86B 的market cap. 感觉intc比较便宜. 不过两个都买也是不错的感觉.

dukan

5 个月

回复 8楼 dukan 的帖子
我觉得不远了. 既然可以做到一步一步逻辑分析. 是不是只要对应每一步都下达一连串的机械指令就可以达成做各种家务的效果了?

Silverwing

5 个月

shanggj 发表于 2025-01-31 11:49
7b 乞丐版显卡的什么级别才不会痛苦的干嚎

我的机子老了点 1080ti的卡，老牛拉车太吃力我估计如果能上4080系列应该会好多了，跑个14b DeepSeek 应该问题不大
但是，如果我买了4080卡，我肯定先玩游戏爽去了，早不会理会DeepSeek

mariaapple

5 个月

回复 8楼 dukan 的帖子
intel的落寞真是让人心痛，我还是继续友情支持一下

OLOAHA

5 个月

Silverwing 发表于 2025-01-31 12:12
我的机子老了点 1080ti的卡，老牛拉车太吃力我估计如果能上4080系列应该会好多了，跑个14b DeepSeek 应该问题不大
但是，如果我买了4080卡，我肯定先玩游戏爽去了，早不会理会DeepSeek

我打着AI的名号买了个3090，结果大部分时间玩游戏了，惭愧

layjohns

5 个月

回复 10楼 Silverwing 的帖子
多大内存？看来我的3070ti+64g ram 也就跑个7b乞丐版了

shanggj

5 个月

layjohns 发表于 2025-01-31 12:24
回复 10楼 Silverwing 的帖子
多大内存？看来我的3070ti+64g ram 也就跑个7b乞丐版了

RTX A4000 + 512G RAM 能跑到什么 ?b 版？

平

平明寻白羽

5 个月

以后估计一家一个server装个AI，可以控制全家机器人。
在外可以连线使用。
这代人养老不愁啊，不用去nursing home, 在家就可以。
AI是全能专家，一堆sensor即时监控健康，实时诊断。家里配备急救设备，机器人来操作维持生命边等救护车。

Silverwing

5 个月

以后估计一家一个server装个AI，可以控制全家机器人。
在外可以连线使用。
这代人养老不愁啊，不用去nursing home, 在家就可以。
AI是全能专家，一堆sensor即时监控健康，实时诊断。家里配备急救设备，机器人来操作维持生命边等救护车。
平明寻白羽发表于 2025-01-31 12:46

这思路可以的如果只是控制家务，其实可以简化很多的
本地化部署后，我第一时间拔网线，然后进行闭卷考试……….. 还是可以回答问题的
所以理论上家庭服务器AI是可以无需网络自己运行管理的
这个直接颠覆了以前影视作品和游戏留下的AI作为虫巢女王的印象，那些作品里面，把主脑干掉，AI控制的所有东西都死机了。现实可能是每个有独立显卡CPU的“工蜂”都会继续独立存活

似

似曾相识

5 个月

试了前面几个一直到14B，好像都不大行，跟chatGPT 免费版一样经常一本正经的胡说八道，再往上家里现在的计算机可能不行了。

helloterran4

5 个月

Silverwing 发表于 2025-01-31 11:29
搞了个7b 乞丐版，我的显卡在痛苦的干嚎个人还是很难玩
话说这应该会刺激个人PC市场起死回生些，虽然不多

大模型对显存的压力远大于显卡算力。MBP 64G版本就能跑32B版本的R1无压力，因为内存和显存共用一块存储器。非mac就必须得上NV计算卡才能装下这么大的模型
这波利好Mac

helloterran4

5 个月

shanggj 发表于 2025-01-31 12:34
RTX A4000 + 512G RAM 能跑到什么 ?b 版？

系统内存大没用，得有显存。
Mac能跑就是因为显卡跟系统共用一片存储器。
只要显存能装下参数，那就是每秒3个token和每秒30个token的区别。

heartinny

5 个月

AI的目的是智能，7B的弄个智障回家，不糟心吗？

LamourEstIci

5 个月

Silverwing 发表于 2025-01-31 11:29
搞了个7b 乞丐版，我的显卡在痛苦的干嚎个人还是很难玩
话说这应该会刺激个人PC市场起死回生些，虽然不多

我一直想试试你的GPU和其他配置是啥

LamourEstIci

5 个月

Silverwing 发表于 2025-01-31 12:12
我的机子老了点 1080ti的卡，老牛拉车太吃力我估计如果能上4080系列应该会好多了，跑个14b DeepSeek 应该问题不大
但是，如果我买了4080卡，我肯定先玩游戏爽去了，早不会理会DeepSeek

整个NVIDIA RTX 3060

teabucket

5 个月

似曾相识发表于 2025-01-31 13:43
试了前面几个一直到14B，好像都不大行，跟chatGPT 免费版一样经常一本正经的胡说八道，再往上家里现在的计算机可能不行了。

我用了14b，感觉挺好用的。local 的模型。你不要故意折磨他

AlwaysHopeful

5 个月

teabucket 发表于 2025-01-31 21:01
我用了14b，感觉挺好用的。local 的模型。你不要故意折磨他

14B 需要什么配置？

znmyhj

5 个月

7b需要8G显存，14b要16G显存

northu21

5 个月

不如帮你把班也上了

johnlucy

5 个月

Silverwing 发表于 2025-01-31 13:13
这思路可以的如果只是控制家务，其实可以简化很多的
本地化部署后，我第一时间拔网线，然后进行闭卷考试……….. 还是可以回答问题的
所以理论上家庭服务器AI是可以无需网络自己运行管理的
这个直接颠覆了以前影视作品和游戏留下的AI作为虫巢女王的印象，那些作品里面，把主脑干掉，AI控制的所有东西都死机了。现实可能是每个有独立显卡CPU的“工蜂”都会继续独立存活

跟层主的思路不谋而合对养老机器人在我们这一代人能用上很乐观