A dialogue with chatGPT about Deepseek. - 2025年1月30日文学城存档

大约 13 小时

楼主 (文学城)

QQ: Who are you?

ChatGPT: I'm ChatGPT, an Al designed to assist with answering questions, brainstorming ideas, and having conversations. What's on your mind?

QQ: How do you think about deepseek? Does it copy you in any sense?

$

ChatGPT: DeepSeek is another Al model developed independently, and while it may share similarities with me in terms of being a large language model, it isn't a direct copy. Many modern Al models are built using similar underlying techniques, such as transformer architectures and large-scale training on diverse datasets, but the implementation details, training data, and fine-tuning oaches can vary sianificantlv between models.

ChatGPT: Do you have any specific concerns or comparisons in mind?

QQ: Want to know some technique detail comparison of deepseek and you.

ChatGPT: Certainly! Let's delve into a technical comparison between DeepSeek and ChatGPT, focusing on their

architectures, training methodologies, and performance characteristics.

Architecture and Model Design

??ChatGPT: Developed by OpenAl, ChatGPT is based on the GPT (Generative Pre-trained Transformer)
architecture. It utilizes a dense transformer model where all parameters are active during inference, leading to substantial computational resource consumption. ??DeepSeek: DeepSeek employs a
"Mixture-of-Experts" (MoE)
architecture, which activates only a subset of its parameters during inference. This design enhances computational efficiency by engaging only the necessary "experts" for a given task, reducing resource usage and improving inference speed.

Training Methodology

??ChatGPT: The training process involves large-scale datasets comprising internet text, followed by supervised fine-tuning and reinforcement learning from human feedback to align the model's outputs with human preferences. ??DeepSeek: DeepSeek's training strategy includes pretraining on a diverse multilingual corpus, with a higher emphasis on mathematical and programming data. The model undergoes supervised fine-tuning using both human-generated and model-generated data, particularly in reasoning domains. Additionally, DeepSeek employs reinforcement learning techniques to further refine its performance.

Performance and Efficiency

??ChatGPT: While delivering high-quality responses, ChatGPT's dense architecture results in higher computational costs during inference, which can impact scalability and energy efficiency. ??DeepSeek: The MoE architecture of DeepSeek allows for significant computational savings, as only relevant experts are activated per task. This leads to reduced energy consumption and lower operational costs. Notably, DeepSeek has achieved performance on par with leading models like OpenAl's GPT series, but with a fraction of the training and inference costs.

Open-Source Availability

??ChatGPT: OpenAl's models, including ChatGPT, are proprietary, with access provided through APls and specific platforms. ??DeepSeek: In contrast, DeepSeek has embraced an open-source approach, releasing models like DeepSeek-R1 to the public. This openness fosters collaborative innovation and allows for broader adoption and adaptation within the Al community.

Conclusion

While both DeepSeek and ChatGPT are advanced language models excelling in natural language understanding and generation, they differ in architectural design and training methodologies.

DeepSeek's MoE architecture offers a more resource-efficient alternative to the dense transformer model employed by ChatGPT, potentially democratizing access to advanced Al capabilities through its open-source model.

DeepSeek's Impact on the Al Industry

w wired.com

How Chinese Al Startup DeepSeek Made a Model that Rivals OpenAl

Jan 25, 2025

/theverge.com

Why everyone is freaking out about DeepSeek

Jan 28, 2025

Tuesday briefing: How an unknown Chinese startup wiped $593bn from the value of an Al giant

Jan 27, 2025

最

最西边的岛上

大约 12 小时