Chi Heem Profile
Chi Heem

@chwong0

Followers
4
Following
18
Media
0
Statuses
15

Joined May 2024
Don't wanna be here? Send us removal request.
@percyliang
Percy Liang
6 months
What would truly open-source AI look like? Not just open weights, open code/data, but *open development*, where the entire research and development process is public *and* anyone can contribute. We built Marin, an open lab, to fulfill this vision:
51
216
1K
@percyliang
Percy Liang
6 months
Announcing VHELM v2.1.2 for VLMs: We added the latest Gemini models, Qwen2.5-VL Instruct models, GPT 4.5 preview, o3, o4-mini, and Llama 4 Scout/Maverick. Prompts and predictions can be found on our website: https://t.co/44qPb5agWP
3
27
70
@chwong0
Chi Heem
9 months
If we max out the need for compute, it also means we have maxed out data. The internet archive may be finite but the real world is not.
0
0
0
@chwong0
Chi Heem
9 months
Is the need for compute dead? Unless Deepseek provides more information about the training infrastructure, data used, and parameters, I wouldn't be betting against it. Remember, ML = compute + data + luck! You may need to revise statistical learning theory if you think otherwise
1
0
0
@chwong0
Chi Heem
9 months
My opinions probably do not matter, but I think people are greatly overreacting. Does Deepseek have the best LLM/VLM? Maybe it does well in some areas, but definitely not all.
1
0
0
@chwong0
Chi Heem
10 months
alphaXiv makes it easier to find relevant papers!
@askalphaxiv
alphaXiv
10 months
Goodreads for arXiv papersπŸ’‘ What if instead of arbitrary algorithms and tweets, arXiv papers were curated by your research community? Introducing communities on alphaXiv: bridging papers, discussions, and people in one space.
0
0
3
@tonyh_lee
Tony Lee
10 months
πŸš€ VHELM v2.1.1 (leaderboard for VLMs - https://t.co/vWjccpJptE) is out! We added 5 new models: o1 (2024-12-17), GPT-4o (2024-11-20), Gemini 2.0 Flash Experimental, and Qwen2-VL 7B/72B. πŸ₯‡ Leaderboard/prompts with images/raw predictions: https://t.co/6X0i2pbyPK See 🧡 below.
1
11
19
@chwong0
Chi Heem
11 months
I will be at #NeurIPS2024 presenting papers with my collaborators! Hit me up if you are there! πŸ“œImage2Struct: Benchmarking Structure Extraction for Vision-Language Models πŸ“…Fri, 13 Dec 11 a.m. - 2 p.m. πŸ“East Exhibit Hall A-C #3608 2/2
0
0
1
@chwong0
Chi Heem
11 months
I will be at #NeurIPS2024 presenting papers with my collaborators! Hit me up if you are there! πŸ“œVHELM: A Holistic Evaluation of Vision Language Models πŸ“…Thu, 12 Dec, 11 a.m. - 2 p.m. πŸ“East Exhibit Hall A-C #3603 1/2
1
0
1
@percyliang
Percy Liang
1 year
Image2Struct is not just a new, challenging VLM benchmark, but a sustainable process for creating fresh evals from the never ending stream of webpages, papers, and music scores! Hosted on HELM with full transparency.
@JossSomerville
Josselin Somerville
1 year
πŸ“’ NEW EVAL - We introduce Image2Struct: Benchmarking Structure Extraction for Vision-Language Models (VLMs). Automatic (no human eval), open-ended, with fresh data, and real use cases! πŸ“ Paper: https://t.co/AWpw4ZHb1b πŸ₯‡ Website: https://t.co/3ZhK5naxzr See 🧡 below. (1/10)
2
12
50
@JossSomerville
Josselin Somerville
1 year
πŸ“’ NEW EVAL - We introduce Image2Struct: Benchmarking Structure Extraction for Vision-Language Models (VLMs). Automatic (no human eval), open-ended, with fresh data, and real use cases! πŸ“ Paper: https://t.co/AWpw4ZHb1b πŸ₯‡ Website: https://t.co/3ZhK5naxzr See 🧡 below. (1/10)
2
5
27
@tonyh_lee
Tony Lee
1 year
πŸ“’ Announcing Holistic Evaluation of Vision-Language Models (VHELM), the HELM extension for VLMs, where we holistically evaluated 22 VLMs across 9 different aspects: πŸ“ Paper: https://t.co/vWjccpJptE πŸ₯‡ Leaderboard/prompts/raw predictions:Β  https://t.co/NmRuy8XBbH See 🧡 below
Tweet card summary image
arxiv.org
Current benchmarks for assessing vision-language models (VLMs) often focus on their perception or problem-solving capabilities and neglect other critical aspects such as fairness, multilinguality,...
4
25
90
@AndrewYNg
Andrew Ng
1 year
A decision on SB-1047 is due soon. Governor @GavinNewsom has said he's concerned about its "chilling effect, particularly in the open source community". He's right, and I hope he will veto this. If you agree, please like/retweet this to show your support for VETOing SB-1047!
72
479
2K
@percyliang
Percy Liang
1 year
GPT-4o tops the VHELM leaderboard.
@tonyh_lee
Tony Lee
1 year
Added GPT-4o to VHELM v1 βœ… For MMMU, we noticed a 6% gap from what was reported because OpenAI evaluated with zero-shot CoT. Stay tuned for the VHELM v2 update! πŸ’― Leaderboard + raw predictions:
7
7
33
@tonyh_lee
Tony Lee
1 year
πŸ“’ HELM now supports VLM evaluation to evaluate VLMs in a standardized and transparent way. We started with 6 VLMs on 3 scenarios: MMMU, VQAv2 and VizWiz. Stay tuned for more - this is v1! ✍️ Blog post: https://t.co/kkYae5dvFs πŸ’― Raw predictions/results: https://t.co/eHRJtAXo3r
2
23
89