Chi Heem @chwong0 X Profile

Chi Heem

@chwong0

Followers

4

Following

18

Media

0

Statuses

15

Joined May 2024

Don't wanna be here? Send us removal request.

Percy Liang

@percyliang

6 months

What would truly open-source AI look like? Not just open weights, open code/data, but *open development*, where the entire research and development process is public *and* anyone can contribute. We built Marin, an open lab, to fulfill this vision:

51

216

1K

Percy Liang

@percyliang

6 months

Announcing VHELM v2.1.2 for VLMs: We added the latest Gemini models, Qwen2.5-VL Instruct models, GPT 4.5 preview, o3, o4-mini, and Llama 4 Scout/Maverick. Prompts and predictions can be found on our website: https://t.co/44qPb5agWP

3

27

70

Chi Heem

@chwong0

9 months

If we max out the need for compute, it also means we have maxed out data. The internet archive may be finite but the real world is not.

0

Chi Heem

@chwong0

9 months

Is the need for compute dead? Unless Deepseek provides more information about the training infrastructure, data used, and parameters, I wouldn't be betting against it. Remember, ML = compute + data + luck! You may need to revise statistical learning theory if you think otherwise

1

0

Chi Heem

@chwong0

9 months

My opinions probably do not matter, but I think people are greatly overreacting. Does Deepseek have the best LLM/VLM? Maybe it does well in some areas, but definitely not all.

1

0

Chi Heem

@chwong0

10 months

alphaXiv makes it easier to find relevant papers!

alphaXiv

@askalphaxiv

10 months

Goodreads for arXiv papers💡 What if instead of arbitrary algorithms and tweets, arXiv papers were curated by your research community? Introducing communities on alphaXiv: bridging papers, discussions, and people in one space.

0

3

Tony Lee

@tonyh_lee

10 months

🚀 VHELM v2.1.1 (leaderboard for VLMs - https://t.co/vWjccpJptE) is out! We added 5 new models: o1 (2024-12-17), GPT-4o (2024-11-20), Gemini 2.0 Flash Experimental, and Qwen2-VL 7B/72B. 🥇 Leaderboard/prompts with images/raw predictions: https://t.co/6X0i2pbyPK See 🧵 below.

1

11

19

Chi Heem

@chwong0

11 months

I will be at #NeurIPS2024 presenting papers with my collaborators! Hit me up if you are there! 📜Image2Struct: Benchmarking Structure Extraction for Vision-Language Models 📅Fri, 13 Dec 11 a.m. - 2 p.m. 📍East Exhibit Hall A-C #3608 2/2

0

1

Chi Heem

@chwong0

11 months

I will be at #NeurIPS2024 presenting papers with my collaborators! Hit me up if you are there! 📜VHELM: A Holistic Evaluation of Vision Language Models 📅Thu, 12 Dec, 11 a.m. - 2 p.m. 📍East Exhibit Hall A-C #3603 1/2

1

0

1

Percy Liang

@percyliang

1 year

Image2Struct is not just a new, challenging VLM benchmark, but a sustainable process for creating fresh evals from the never ending stream of webpages, papers, and music scores! Hosted on HELM with full transparency.

Josselin Somerville

@JossSomerville

1 year

📢 NEW EVAL - We introduce Image2Struct: Benchmarking Structure Extraction for Vision-Language Models (VLMs). Automatic (no human eval), open-ended, with fresh data, and real use cases! 📝 Paper: https://t.co/AWpw4ZHb1b 🥇 Website: https://t.co/3ZhK5naxzr See 🧵 below. (1/10)

2

12

50

Josselin Somerville

@JossSomerville

1 year

📢 NEW EVAL - We introduce Image2Struct: Benchmarking Structure Extraction for Vision-Language Models (VLMs). Automatic (no human eval), open-ended, with fresh data, and real use cases! 📝 Paper: https://t.co/AWpw4ZHb1b 🥇 Website: https://t.co/3ZhK5naxzr See 🧵 below. (1/10)

2

5

27

Tony Lee

@tonyh_lee

1 year

📢 Announcing Holistic Evaluation of Vision-Language Models (VHELM), the HELM extension for VLMs, where we holistically evaluated 22 VLMs across 9 different aspects: 📝 Paper: https://t.co/vWjccpJptE 🥇 Leaderboard/prompts/raw predictions: https://t.co/NmRuy8XBbH See 🧵 below

arxiv.org

Current benchmarks for assessing vision-language models (VLMs) often focus on their perception or problem-solving capabilities and neglect other critical aspects such as fairness, multilinguality,...

4

25

90

Andrew Ng

@AndrewYNg

1 year

A decision on SB-1047 is due soon. Governor @GavinNewsom has said he's concerned about its "chilling effect, particularly in the open source community". He's right, and I hope he will veto this. If you agree, please like/retweet this to show your support for VETOing SB-1047!

72

479

2K

Percy Liang

@percyliang

1 year

GPT-4o tops the VHELM leaderboard.

Tony Lee

@tonyh_lee

1 year

Added GPT-4o to VHELM v1 ✅ For MMMU, we noticed a 6% gap from what was reported because OpenAI evaluated with zero-shot CoT. Stay tuned for the VHELM v2 update! 💯 Leaderboard + raw predictions:

7

33

Tony Lee

@tonyh_lee

1 year

📢 HELM now supports VLM evaluation to evaluate VLMs in a standardized and transparent way. We started with 6 VLMs on 3 scenarios: MMMU, VQAv2 and VizWiz. Stay tuned for more - this is v1! ✍️ Blog post: https://t.co/kkYae5dvFs 💯 Raw predictions/results: https://t.co/eHRJtAXo3r

2

23

89