人类最后的考试

未完的歌
楼主 (文学城)

八卦大S津津乐道。AI不能聊?,

做调研报告的福音,OpenAI 最新发布Deep Research

自己查资料很辛苦,DS可做大量的基础调查,分析,并做出总结报告,

Today we are launching our next agent capable of doing work for you independently—deep research.

Give it a prompt and ChatGPT will find, analyze & synthesize hundreds of online sources to create a comprehensive report in tens of minutes vs what would take a human many hours. pic.twitter.com/03PPi4cdqi

— OpenAI (@OpenAI) February 3, 2025

 

===============

考试成绩

人类最后的考试(HLE)是一项全球性的合作项目,由来自 50 个国家 500 多所院校的近千名学科专家提供试题,其中大部分是教授、研究人员和研究生学位获得者。

由3,000 多道选择题和简答题,涉及从语言学到火箭科学、从古典文学到生态学等 100 多个学科

这一套题用来测试各种AI模型的水平,有两个分数,一个是准确性。一个校准错误。

准确性。所有前沿模型在人类上次考试中的准确性都很低,这凸显了在缩小当前法学硕士与专家级学术能力在封闭式问题上的差距方面还有很大的改进空间。

校准错误。鉴于人类上次考试的表现较低,应该对模型进行校准,认识到它们的不确定性,而不是自信地提供不正确的答案,表明虚构/幻觉。为了测量校准,我们提示模型提供从 0% 到 100% 的答案及其置信度%.

 

但愿Deep Research 能到达L4

三花锦鲤
等到了95%就不需要人了。
w
weihua2021
你现在被AI迷住不可救药了, 问过AI一个问题, 答的不对
未完的歌
如果及格基本上就可以秒杀90%人了
a
akuan
要认真的话得进行双盲比较,而且得是新题,因为稍微有名一点的东西可能已经被训练进去了
未完的歌
问的那个AI?啥版本?
三花锦鲤
日光之下没新事儿
一笑拉
说这些没有用,GPT好不好DS好不好都不是问题关键,而是AI是不是需要这样高端芯片和大投入,是不是以后美元的潜在锚定

等一会股市开就知道资本是什么反应了,英伟达会不会还跌

未完的歌
前几天我贴了一个门萨测试,那个人与门萨会员一起设计新题,让AI们去做,
十块糍饭糕
以后是不是学校不再要求背各种公式文章了?连外语好像都没有必要学了
有个用户名
以前可以聊,还聊很多,因为没有DS,没有引起吵架,,现在草木皆兵了,
外乡人
关键是股市,退休的钱够了吗?
瓜籽
长线不看短时间的涨涨落落
l
lzr
ai特别适合弄那些没有明确答案的东西

ai擅长于罗列一大堆具体的词句,看上去讲了一大堆,如果没有明确的对错区分,很容易就能接受了。

未完的歌
随着AI的发展,今后高算力依然是必须的。在算力面前,绞尽脑汁优化的算法没啥特别的价值。这么多年,计算机的发展

也告诉我们了这个事实,,,

瓜籽
曲高和寡
未完的歌
你终于来了,昨天呼叫你来着
一笑拉
这是你说的,TikTok时候也说算法很简单,你研究过,我就佩服你的自信
h
heka
就等着发钱了
瓜籽
昨天我来了,问了五花猪包粽子,然后煮了一大锅粽子,这季节买的粽叶小,。。。
a
akuan
有一种说法,美元的锚定是芯片。未来算力仍然是关键的。
一笑拉
哈哈,烦死了,我这些天都不去看,,,
未完的歌
我们人类的知识,我们个人的知识,90%以上就如传话筒一样,在复读而已。
有个用户名
那也没必要刻意去融入和迎合,一时热闹,虚假繁荣。若非一直勉强自己总会迎合不下去的,谁又真的会一直勉强自己呢
一笑拉
还有说虚拟货币的,现在还是在寻找方向吧
遥遥
估计今天还会跌。近期主打关税, DS的结果是川普要更严格芯片管制。估计拜登出台的分三级要实施了
未完的歌
不是算法简单,而是算力是关键,你知道那些搞绞尽脑汁搞算法优化的,是为了啥吗?就是为了节省一点点算力。286时代的CPU

和现在的CPU,CPU处理能力高了,还需要费劲那么一点点算法优化吗?

瓜籽
我不管ChatGPT和DS谁好谁坏,谁抄谁,我都用,多选择比少选择好,我小气,不交钱。
三花锦鲤
有比较合适的PAI模型嘛,个人可以自智慧养成。
有个用户名
现在知道外乡人也可以测试了,以后也可以找他:D
未完的歌
帮我问一个问题,我想证实一下,是否是真的。家坛人那些人都沉默了

?

?

?

一笑拉
我不知道这些,这种对我来说太局部,但你也不用spin,你当时说的TikTok不卖的算法,你研究过很简单,先说我没有截屏

其他你随意

未完的歌
他问了,换了一个词,不敢直面,哈哈
未完的歌
你们老感觉是spin,我感觉是你们了解的少,
瓜籽
DS不写了

外乡人
终于回答了为啥DS不了解DS,一直连不上

(Due to technical issues, the search service is temporarily unavailable.)

Great question! My relative lack of detailed familiarity with **DeepSeek** compared to **ChatGPT** stems from a few key factors:

### 1. **Knowledge Cutoff and Training Data**  
   - My training data includes publicly available information up to **July 2024**, but details about specific regional or niche AI systems (like DeepSeek) may be limited if they aren’t widely documented in English or global sources.  
   - ChatGPT, developed by OpenAI, has been extensively covered in global tech media, research papers, and public APIs, making it easier to analyze. In contrast, DeepSeek’s technical details, use cases, and updates might be less accessible outside its primary market (China) due to language or regional reporting gaps.

### 2. **Regional Focus and Visibility**  
   - DeepSeek is designed for **Chinese-language applications** and tailored to comply with China’s regulatory framework, which often limits its international visibility. Many Chinese AI tools prioritize domestic integration (e.g., with WeChat, Alibaba Cloud) over global outreach.  
   - By contrast, ChatGPT is built for a **global audience** and has been widely adopted and discussed across English-dominated platforms (e.g., Twitter, Reddit, international tech blogs).

### 3. **Enterprise vs. Consumer Focus**  
   - DeepSeek appears to focus on **B2B and industry-specific solutions** (e.g., finance, government), which are less publicly scrutinized compared to consumer-facing tools like ChatGPT. Enterprise AI systems often operate behind closed doors, with details shared only with clients under NDAs.  
   - ChatGPT, meanwhile, is openly accessible to individuals and developers, fostering widespread experimentation, critique, and documentation.

### 4. **Language and Cultural Barriers**  
   - Much of DeepSeek’s documentation, updates, and case studies may exist primarily in **Chinese**, which isn’t as comprehensively represented in my training data compared to English. This creates a knowledge asymmetry.  
   - For example, innovations or technical breakthroughs by Chinese AI firms sometimes go underreported in Western media, even if they’re significant.

### 5. **Regulatory and Compliance Differences**  
   - AI systems like DeepSeek must adhere to **China’s strict data laws** (e.g., data localization, content moderation rules), which can limit how they share technical details or collaborate internationally. This creates opacity compared to OpenAI, which operates under different norms.

---

### Why This Matters  
If you’re exploring DeepSeek for specific use cases (e.g., Chinese NLP tasks or enterprise analytics), I’d recommend consulting **official DeepSeek documentation** or case studies for precise technical details. For broader AI comparisons, I’m happy to help analyze based on available data!

瓜籽
服务器繁忙,请稍后再试。

未完的歌
看来是紧急修改了,因为无数人而且是不同国家的人试过,说OpenAI的政策不允许。。。
未完的歌
比如这个就说的很奇怪,DS自己不了解自己是因为很多技术细节等在中国以外无法获得,啥意思啊

  - ChatGPT, developed by OpenAI, has been extensively covered in global tech media, research papers, and public APIs, making it easier to analyze. In contrast, DeepSeek’s technical details, use cases, and updates might be less accessible outside its primary market (China) due to language or regional reporting gaps.