Deepseek 自己推荐的配置和预算去跑 671b local node

d
dukan
楼主 (北美华人网)
自己说大约10万到30万就可以了. 而且还推荐AMD也可以用.. NVDA吐血.
Running a model like **DeepSeek** locally with **671 billion parameters** is an extremely resource-intensive task. Models of this scale are typically designed for **data center-grade hardware** and are not feasible to run on consumer-grade systems. However, I can provide an overview of the **recommended hardware** and considerations for running such a massive model:
---
### 1. **GPU Requirements**   - **Type**: High-end GPUs like **NVIDIA A100**, **H100**, or **AMD MI250X** are recommended.   - **VRAM**: Each GPU should have **80 GB of VRAM** or more. For a 671B parameter model, you’ll likely need **multiple GPUs** (e.g., 8-16 GPUs) to fit the model and its activations in memory.   - **Interconnect**: Use **NVLink** or **InfiniBand** for high-speed communication between GPUs to handle the massive data transfer requirements.
---
### 2. **CPU Requirements**   - **Cores**: A high-core-count CPU (e.g., **AMD EPYC** or **Intel Xeon**) with at least **64 cores** is recommended.   - **Clock Speed**: Higher clock speeds will help with preprocessing and data pipeline tasks.   - **RAM**: At least **1-2 TB of system RAM** is necessary to handle the model’s memory requirements and data loading.
---
### 3. **Storage**   - **Type**: Use **NVMe SSDs** for fast data loading and checkpoint storage.   - **Capacity**: At least **10-20 TB of storage** is recommended to store the model weights, datasets, and intermediate checkpoints.
---
### 4. **Networking**   - **Bandwidth**: If running in a distributed setup, ensure high-speed networking (e.g., **100 Gbps or higher**) to handle communication between nodes.   - **Latency**: Low-latency networking is critical for distributed training or inference.
---
### 5. **Software and Frameworks**   - **Deep Learning Frameworks**: Use frameworks like **PyTorch**, **TensorFlow**, or **JAX** with optimized libraries (e.g., **DeepSpeed**, **Megatron-LM**) for distributed training and inference.   - **Mixed Precision**: Enable **mixed precision training** (FP16 or BF16) to reduce memory usage and improve performance.   - **Model Parallelism**: Use **tensor parallelism** and **pipeline parallelism** to split the model across multiple GPUs or nodes.
---
### 6. **Power and Cooling**   - **Power Supply**: Ensure sufficient power supply (e.g., **10-20 kW** per node) to handle the high energy consumption of GPUs and CPUs.   - **Cooling**: Use advanced cooling solutions (e.g., liquid cooling) to manage the heat generated by the hardware.
---
### 7. **Cost Estimate**   - **GPUs**: A single NVIDIA A100 GPU costs around **$10,000-$15,000**. For 8-16 GPUs, this alone could cost **$80,000-$240,000**.   - **CPU and RAM**: A high-end server with 64 cores and 1 TB of RAM could cost **$10,000-$20,000**.   - **Storage and Networking**: Additional costs for NVMe SSDs and networking equipment could add **$10,000-$30,000**.   - **Total**: Expect to spend **$100,000-$300,000+** for a single node setup, and significantly more for a multi-node cluster.
---
### 8. **Cloud Alternatives**   - If running locally is not feasible, consider using **cloud platforms** like **AWS**, **Google Cloud**, or **Azure** with access to high-end GPUs and distributed computing resources.   - Cloud providers offer pre-configured instances for large-scale AI workloads, which can be more cost-effective for occasional use.
---
### Summary Running a **671B parameter model** like DeepSeek locally requires **data center-grade hardware** with multiple high-end GPUs, a high-core-count CPU, massive RAM, and fast storage. The cost and complexity make it impractical for most users, so cloud-based solutions are often a better option. Let me know if you’d like more details on specific setups or cloud alternatives! 
S
Silverwing
自己说大约10万到30万就可以了. 而且还推荐AMD也可以用.. NVDA吐血.
Running a model like **DeepSeek** locally with **671 billion parameters** is an extremely resource-intensive task. Models of this scale are typically designed for **data center-grade hardware** and are not feasible to run on consumer-grade systems. However, I can provide an overview of the **recommended hardware** and considerations for running such a massive model:
---
### 1. **GPU Requirements**   - **Type**: High-end GPUs like **NVIDIA A100**, **H100**, or **AMD MI250X** are recommended.   - **VRAM**: Each GPU should have **80 GB of VRAM** or more. For a 671B parameter model, you’ll likely need **multiple GPUs** (e.g., 8-16 GPUs) to fit the model and its activations in memory.   - **Interconnect**: Use **NVLink** or **InfiniBand** for high-speed communication between GPUs to handle the massive data transfer requirements.
---
### 2. **CPU Requirements**   - **Cores**: A high-core-count CPU (e.g., **AMD EPYC** or **Intel Xeon**) with at least **64 cores** is recommended.   - **Clock Speed**: Higher clock speeds will help with preprocessing and data pipeline tasks.   - **RAM**: At least **1-2 TB of system RAM** is necessary to handle the model’s memory requirements and data loading.
---
### 3. **Storage**   - **Type**: Use **NVMe SSDs** for fast data loading and checkpoint storage.   - **Capacity**: At least **10-20 TB of storage** is recommended to store the model weights, datasets, and intermediate checkpoints.
---
### 4. **Networking**   - **Bandwidth**: If running in a distributed setup, ensure high-speed networking (e.g., **100 Gbps or higher**) to handle communication between nodes.   - **Latency**: Low-latency networking is critical for distributed training or inference.
---
### 5. **Software and Frameworks**   - **Deep Learning Frameworks**: Use frameworks like **PyTorch**, **TensorFlow**, or **JAX** with optimized libraries (e.g., **DeepSpeed**, **Megatron-LM**) for distributed training and inference.   - **Mixed Precision**: Enable **mixed precision training** (FP16 or BF16) to reduce memory usage and improve performance.   - **Model Parallelism**: Use **tensor parallelism** and **pipeline parallelism** to split the model across multiple GPUs or nodes.
---
### 6. **Power and Cooling**   - **Power Supply**: Ensure sufficient power supply (e.g., **10-20 kW** per node) to handle the high energy consumption of GPUs and CPUs.   - **Cooling**: Use advanced cooling solutions (e.g., liquid cooling) to manage the heat generated by the hardware.
---
### 7. **Cost Estimate**   - **GPUs**: A single NVIDIA A100 GPU costs around **$10,000-$15,000**. For 8-16 GPUs, this alone could cost **$80,000-$240,000**.   - **CPU and RAM**: A high-end server with 64 cores and 1 TB of RAM could cost **$10,000-$20,000**.   - **Storage and Networking**: Additional costs for NVMe SSDs and networking equipment could add **$10,000-$30,000**.   - **Total**: Expect to spend **$100,000-$300,000+** for a single node setup, and significantly more for a multi-node cluster.
---
### 8. **Cloud Alternatives**   - If running locally is not feasible, consider using **cloud platforms** like **AWS**, **Google Cloud**, or **Azure** with access to high-end GPUs and distributed computing resources.   - Cloud providers offer pre-configured instances for large-scale AI workloads, which can be more cost-effective for occasional use.
---
### Summary Running a **671B parameter model** like DeepSeek locally requires **data center-grade hardware** with multiple high-end GPUs, a high-core-count CPU, massive RAM, and fast storage. The cost and complexity make it impractical for most users, so cloud-based solutions are often a better option. Let me know if you’d like more details on specific setups or cloud alternatives! 
dukan 发表于 2025-01-31 11:26

搞了个7b 乞丐版, 我的显卡在痛苦的干嚎 个人还是很难玩
话说这应该会刺激个人PC市场起死回生些, 虽然不多
s
shanggj
Silverwing 发表于 2025-01-31 11:29
搞了个7b 乞丐版, 我的显卡在痛苦的干嚎 个人还是很难玩
话说这应该会刺激个人PC市场起死回生些, 虽然不多

7b 乞丐版 显卡的什么级别 才不会痛苦的干嚎
O
OLOAHA
这个deepseek开源会不会增加N卡销量,现在谁都能下载部署,软件齐了,就差硬件了,会带动一些公司去买显卡自己跑模型
d
dukan
回复 4楼 OLOAHA 的帖子
带动会有, 但NVDA的竞争也大了很多. 不用求NVDA卖卡. 今天少少的买了点 INTC. 搞不好对intel是个转机
O
OLOAHA
回复 4楼 OLOAHA 的帖子
带动会有, 但NVDA的竞争也大了很多. 不用求NVDA卖卡. 今天少少的买了点 INTC. 搞不好对intel是个转机
dukan 发表于 2025-01-31 11:54

推荐GPU配置里只有N和A两家,没有Intel,不是更应该买AMD?
H
Harenough
5年内有没有可能有$3000的家庭版AI, 加上 3万刀的机器人。
负责安保,吸尘,洗衣服,晾衣服,娃作业辅导,出门等娃校车,到娃学校接娃回家,出门浇草浇菜喂鸡等工作。
d
dukan
回复 6楼 OLOAHA 的帖子
都可以啊. AMD 190B 对 INTC 86B 的market cap. 感觉intc比较便宜. 不过两个都买也是不错的感觉.
d
dukan
回复 8楼 dukan 的帖子
我觉得不远了. 既然可以做到一步一步逻辑分析. 是不是只要对应每一步都下达一连串的机械指令就可以达成做各种家务的效果了?
S
Silverwing
shanggj 发表于 2025-01-31 11:49
7b 乞丐版 显卡的什么级别 才不会痛苦的干嚎

我的机子老了点 1080ti的卡, 老牛拉车太吃力 我估计如果能上4080系列 应该会好多了, 跑个14b DeepSeek 应该问题不大
但是, 如果我买了4080卡, 我肯定先玩游戏爽去了,早不会理会DeepSeek
m
mariaapple
回复 8楼 dukan 的帖子
intel的落寞真是让人心痛,我还是继续友情支持一下
O
OLOAHA
Silverwing 发表于 2025-01-31 12:12
我的机子老了点 1080ti的卡, 老牛拉车太吃力 我估计如果能上4080系列 应该会好多了, 跑个14b DeepSeek 应该问题不大
但是, 如果我买了4080卡, 我肯定先玩游戏爽去了,早不会理会DeepSeek

我打着AI的名号买了个3090,结果大部分时间玩游戏了,惭愧
l
layjohns
回复 10楼 Silverwing 的帖子
多大内存?看来我的3070ti+64g ram 也就跑个7b乞丐版了
s
shanggj
layjohns 发表于 2025-01-31 12:24
回复 10楼 Silverwing 的帖子
多大内存?看来我的3070ti+64g ram 也就跑个7b乞丐版了

RTX A4000 + 512G RAM 能跑到什么 ?b 版?
平明寻白羽
以后估计一家一个server装个AI,可以控制全家机器人。
在外可以连线使用。
这代人养老不愁啊,不用去nursing home, 在家就可以。
AI是全能专家,一堆sensor即时监控健康,实时诊断。家里配备急救设备,机器人来操作维持生命边等救护车。
S
Silverwing
以后估计一家一个server装个AI,可以控制全家机器人。
在外可以连线使用。
这代人养老不愁啊,不用去nursing home, 在家就可以。
AI是全能专家,一堆sensor即时监控健康,实时诊断。家里配备急救设备,机器人来操作维持生命边等救护车。
平明寻白羽 发表于 2025-01-31 12:46

这思路可以的 如果只是控制家务, 其实可以简化很多的
本地化部署后, 我第一时间拔网线, 然后进行闭卷考试……….. 还是可以回答问题的
所以理论上家庭服务器AI是可以无需网络自己运行管理的
这个直接颠覆了以前影视作品和游戏留下的AI作为虫巢女王的印象, 那些作品里面,把主脑干掉,AI控制的所有东西都死机了。 现实可能是 每个有独立显卡CPU的“工蜂”都会继续独立存活