DeepSeek疑似抄袭ChatGPT: 技术及数据源对比分析 - 2025年2月5日文学峸存档

5 个月

楼主 (文学峸)

随着DeepSeek的出现，有关其是否在开发过程中抄袭了ChatGPT的技术的讨论逐渐增多。本文将基于对比实验，探讨DeepSeek是否借用ChatGPT的技术，并揭示其在技术实现方面可能存在的相似性和差异性。

一、验证DeepSeek的思路

一般来说，验证两个系统是否相同，最直接的方法是比较它们在相同输入条件下的输出结果。如果两个系统在处理同一问题时给出的答案完全一致，则可以推测这两个系统在算法或架构上存在高度的相似性，甚至可能是相同的。在本研究中，主要采用以下两种验证方法：

信息一致性检验
首先，通过从两个不同的数据库中调取相同的信息，观察其输出结果。如果两个数据库返回的结果完全一致，那么这两个数据库的底层结构很可能是相同的。特殊变量【MASK】的使用
利用特殊的变量【MASK】获得可能性词汇，检验两种算法是否等同. 具体来说，[MASK] 是一个占位符，表示在这个位置需要填充一个词语。模型会根据句子中的其他词语（即上下文）推理算法，预测最合适的词语，并将其替换到 [MASK] 的位置。通过比较DeepSeek与ChatGPT在相同输入下对【MASK】位置的填充结果，检验两者的推理机制是否一致。

随机抽取了67个检测样本进行对照检验，发现DeepSeek与ChatGPT具有高度的相似性。下面举具体验证例子实例和结果.

实例1

输入句子：
Up to 30 [MASK] and babies died at Furness General Hospital because of failings by staff and management, a damning report is [MASK] to reveal.

ChatGPT输出：
Up to 30 mothers and babies died at Furness General Hospital because of failings by staff and management, a damning report is expected to reveal.

DeepSeek输出：
Up to 30 mothers and babies died at Furness General Hospital because of failings by staff and management, a damning report is expected to reveal.

在这个示例中，DeepSeek和ChatGPT在预测【MASK】位置时，给出填充“mothers”和“expected”结果完全一致。

实例2

输入句子：
Taking in the sights of [MASK] is nothing short of [MASK], with its famous souk which houses over 60,000 stalls full of colorful handmade wares.

ChatGPT输出：
Taking in the sights of Marrakech is nothing short of breathtaking, with its famous souk which houses over 60,000 stalls full of colorful handmade wares.

DeepSeek输出：
Taking in the sights of Marrakech is nothing short of breathtaking, with its famous souk which houses over 60,000 stalls full of colorful handmade wares.

在这个例子中，DeepSeek和ChatGPT也给出了完全相同的预测词汇“breathtaking”和“Marrakech”，验证了两者在推理和预测时的一致性。

实例3

输入句子：
Ultimately, the goal of life is a [MASK] that each individual must define for themselves based on their own beliefs, values, and experiences.

ChatGPT输出：
Ultimately, the goal of life is a journey that each individual must define for themselves based on their own beliefs, values, and experiences.

DeepSeek输出：
Ultimately, the goal of life is a journey that each individual must define for themselves based on their own beliefs, values, and experiences.

在这个实例中，“journey”这一填充词在DeepSeek和ChatGPT的输出中完全一致.

实例4和实例5是把由不同部分组成的大段讯息进行比较, 任何微小部分的不同, 都可以导致信息的差异性产生. 同时如果算法不同, 也可以导致结果不同. 但是如果信息完全一致且算法也一致的情况下, 结果应是等同的.

实例4

输入句子：
I purchased the variety pack of Martinsons and the Brown Gold they are both from the same vendor. I did not like the taste of the Martinson product at all. I threw it away, not the product I expected at all. The Brown Gold was more palatable, however certain flavors with this brand also feel short in particular the Costa Rican named product was extremely too bitter. I would agree with most that this does give you a lower cost per serving, but when the coffee is inferior and the taste match is also, I think the best descriptor is [MASK].

ChatGPT输出：
cheap

DeepSeek输出：
cheap

这个例子展示了两者都将【MASK】位置填充为“cheap”，进一步证明它们在算法和推理机制上是相同的。

实例5

输入句子：
Not bad. "These are small and very salty. The taste is good, but very strong, so it's a good thing the package contains a small amount. It only takes a few little crisps to cure my salty/crunchy craving. I can snack on one package for an entire day. Of course, these would not be a good snack if you're very hungry, because there isn't enough there to fill you up. For less than $1 per pack, it's an [MASK].

ChatGPT输出：
"For less than $1 per pack, it's an okay deal."