ChatGPT: I'm ChatGPT, an Al designed to assist with answering questions, brainstorming ideas, and having conversations. What's on your mind?
QQ: How do you think about deepseek? Does it copy you in any sense?
$
ChatGPT: DeepSeek is another Al model developed independently, and while it may share similarities with me in terms of being a large language model, it isn't a direct copy. Many modern Al models are built using similar underlying techniques, such as transformer architectures and large-scale training on diverse datasets, but the implementation details, training data, and fine-tuning oaches can vary sianificantlv between models.
ChatGPT: Do you have any specific concerns or comparisons in mind?
QQ: Want to know some technique detail comparison of deepseek and you.
ChatGPT: Certainly! Let's delve into a technical comparison between DeepSeek and ChatGPT, focusing on their
architectures, training methodologies, and performance characteristics.
Architecture and Model Design
??ChatGPT: Developed by OpenAl, ChatGPT is based on the GPT (Generative Pre-trained Transformer) architecture. It utilizes a dense transformer model where all parameters are active during inference, leading to substantial computational resource consumption. ??DeepSeek: DeepSeek employs a "Mixture-of-Experts" (MoE) architecture, which activates only a subset of its parameters during inference. This design enhances computational efficiency by engaging only the necessary "experts" for a given task, reducing resource usage and improving inference speed.
Training Methodology
??ChatGPT: The training process involves large-scale datasets comprising internet text, followed by supervised fine-tuning and reinforcement learning from human feedback to align the model's outputs with human preferences. ??DeepSeek: DeepSeek's training strategy includes pretraining on a diverse multilingual corpus, with a higher emphasis on mathematical and programming data. The model undergoes supervised fine-tuning using both human-generated and model-generated data, particularly in reasoning domains. Additionally, DeepSeek employs reinforcement learning techniques to further refine its performance.
Performance and Efficiency
??ChatGPT: While delivering high-quality responses, ChatGPT's dense architecture results in higher computational costs during inference, which can impact scalability and energy efficiency. ??DeepSeek: The MoE architecture of DeepSeek allows for significant computational savings, as only relevant experts are activated per task. This leads to reduced energy consumption and lower operational costs. Notably, DeepSeek has achieved performance on par with leading models like OpenAl's GPT series, but with a fraction of the training and inference costs.
Open-Source Availability
??ChatGPT: OpenAl's models, including ChatGPT, are proprietary, with access provided through APls and specific platforms. ??DeepSeek: In contrast, DeepSeek has embraced an open-source approach, releasing models like DeepSeek-R1 to the public. This openness fosters collaborative innovation and allows for broader adoption and adaptation within the Al community.
Conclusion
While both DeepSeek and ChatGPT are advanced language models excelling in natural language understanding and generation, they differ in architectural design and training methodologies.
DeepSeek's MoE architecture offers a more resource-efficient alternative to the dense transformer model employed by ChatGPT, potentially democratizing access to advanced Al capabilities through its open-source model.
DeepSeek's Impact on the Al Industry
w wired.com
How Chinese Al Startup DeepSeek Made a Model that Rivals OpenAl
QQ: Who are you?
ChatGPT: I'm ChatGPT, an Al designed to assist with answering questions, brainstorming ideas, and having conversations. What's on your mind?
QQ: How do you think about deepseek? Does it copy you in any sense?
$
ChatGPT: DeepSeek is another Al model developed independently, and while it may share similarities with me in terms of being a large language model, it isn't a direct copy. Many modern Al models are built using similar underlying techniques, such as transformer architectures and large-scale training on diverse datasets, but the implementation details, training data, and fine-tuning oaches can vary sianificantlv between models.
ChatGPT: Do you have any specific concerns or comparisons in mind?
QQ: Want to know some technique detail comparison of deepseek and you.
ChatGPT: Certainly! Let's delve into a technical comparison between DeepSeek and ChatGPT, focusing on their
architectures, training methodologies, and performance characteristics.
Architecture and Model Design
??ChatGPT: Developed by OpenAl, ChatGPT is based on the GPT (Generative Pre-trained Transformer)architecture. It utilizes a dense transformer model where all parameters are active during inference, leading to substantial computational resource consumption. ??DeepSeek: DeepSeek employs a
"Mixture-of-Experts" (MoE)
architecture, which activates only a subset of its parameters during inference. This design enhances computational efficiency by engaging only the necessary "experts" for a given task, reducing resource usage and improving inference speed.
Training Methodology
??ChatGPT: The training process involves large-scale datasets comprising internet text, followed by supervised fine-tuning and reinforcement learning from human feedback to align the model's outputs with human preferences. ??DeepSeek: DeepSeek's training strategy includes pretraining on a diverse multilingual corpus, with a higher emphasis on mathematical and programming data. The model undergoes supervised fine-tuning using both human-generated and model-generated data, particularly in reasoning domains. Additionally, DeepSeek employs reinforcement learning techniques to further refine its performance.Performance and Efficiency
??ChatGPT: While delivering high-quality responses, ChatGPT's dense architecture results in higher computational costs during inference, which can impact scalability and energy efficiency. ??DeepSeek: The MoE architecture of DeepSeek allows for significant computational savings, as only relevant experts are activated per task. This leads to reduced energy consumption and lower operational costs. Notably, DeepSeek has achieved performance on par with leading models like OpenAl's GPT series, but with a fraction of the training and inference costs.Open-Source Availability
??ChatGPT: OpenAl's models, including ChatGPT, are proprietary, with access provided through APls and specific platforms. ??DeepSeek: In contrast, DeepSeek has embraced an open-source approach, releasing models like DeepSeek-R1 to the public. This openness fosters collaborative innovation and allows for broader adoption and adaptation within the Al community.Conclusion
While both DeepSeek and ChatGPT are advanced language models excelling in natural language understanding and generation, they differ in architectural design and training methodologies.
DeepSeek's MoE architecture offers a more resource-efficient alternative to the dense transformer model employed by ChatGPT, potentially democratizing access to advanced Al capabilities through its open-source model.
DeepSeek's Impact on the Al Industry
w wired.com
How Chinese Al Startup DeepSeek Made a Model that Rivals OpenAl
Jan 25, 2025
/theverge.com
Why everyone is freaking out about DeepSeek
Jan 28, 2025
© theguardian.com
Tuesday briefing: How an unknown Chinese startup wiped $593bn from the value of an Al giant
Jan 27, 2025
https://azure.microsoft.com/en-us/blog/deepseek-r1-is-now-available-on-azure-ai-foundry-and-github/
I heard it was targeted by malicious cyberattacks.
BTW, interesting findings! Thank you for sharing:-)