Chi Heem
@chwong0
Followers
4
Following
18
Media
0
Statuses
15
Joined May 2024
What would truly open-source AI look like? Not just open weights, open code/data, but *open development*, where the entire research and development process is public *and* anyone can contribute. We built Marin, an open lab, to fulfill this vision:
51
216
1K
Announcing VHELM v2.1.2 for VLMs: We added the latest Gemini models, Qwen2.5-VL Instruct models, GPT 4.5 preview, o3, o4-mini, and Llama 4 Scout/Maverick. Prompts and predictions can be found on our website: https://t.co/44qPb5agWP
3
27
70
If we max out the need for compute, it also means we have maxed out data. The internet archive may be finite but the real world is not.
0
0
0
Is the need for compute dead? Unless Deepseek provides more information about the training infrastructure, data used, and parameters, I wouldn't be betting against it. Remember, ML = compute + data + luck! You may need to revise statistical learning theory if you think otherwise
1
0
0
My opinions probably do not matter, but I think people are greatly overreacting. Does Deepseek have the best LLM/VLM? Maybe it does well in some areas, but definitely not all.
1
0
0
π VHELM v2.1.1 (leaderboard for VLMs - https://t.co/vWjccpJptE) is out! We added 5 new models: o1 (2024-12-17), GPT-4o (2024-11-20), Gemini 2.0 Flash Experimental, and Qwen2-VL 7B/72B. π₯ Leaderboard/prompts with images/raw predictions: https://t.co/6X0i2pbyPK See π§΅ below.
1
11
19
I will be at #NeurIPS2024 presenting papers with my collaborators! Hit me up if you are there! πImage2Struct: Benchmarking Structure Extraction for Vision-Language Models π
Fri, 13 Dec 11 a.m. - 2 p.m. πEast Exhibit Hall A-C #3608 2/2
0
0
1
I will be at #NeurIPS2024 presenting papers with my collaborators! Hit me up if you are there! πVHELM: A Holistic Evaluation of Vision Language Models π
Thu, 12 Dec, 11 a.m. - 2 p.m. πEast Exhibit Hall A-C #3603 1/2
1
0
1
Image2Struct is not just a new, challenging VLM benchmark, but a sustainable process for creating fresh evals from the never ending stream of webpages, papers, and music scores! Hosted on HELM with full transparency.
π’ NEW EVAL - We introduce Image2Struct: Benchmarking Structure Extraction for Vision-Language Models (VLMs). Automatic (no human eval), open-ended, with fresh data, and real use cases! π Paper: https://t.co/AWpw4ZHb1b π₯ Website: https://t.co/3ZhK5naxzr See π§΅ below. (1/10)
2
12
50
π’ NEW EVAL - We introduce Image2Struct: Benchmarking Structure Extraction for Vision-Language Models (VLMs). Automatic (no human eval), open-ended, with fresh data, and real use cases! π Paper: https://t.co/AWpw4ZHb1b π₯ Website: https://t.co/3ZhK5naxzr See π§΅ below. (1/10)
2
5
27
π’ Announcing Holistic Evaluation of Vision-Language Models (VHELM), the HELM extension for VLMs, where we holistically evaluated 22 VLMs across 9 different aspects: π Paper: https://t.co/vWjccpJptE π₯ Leaderboard/prompts/raw predictions:Β https://t.co/NmRuy8XBbH See π§΅ below
arxiv.org
Current benchmarks for assessing vision-language models (VLMs) often focus on their perception or problem-solving capabilities and neglect other critical aspects such as fairness, multilinguality,...
4
25
90
A decision on SB-1047 is due soon. Governor @GavinNewsom has said he's concerned about its "chilling effect, particularly in the open source community". He's right, and I hope he will veto this. If you agree, please like/retweet this to show your support for VETOing SB-1047!
72
479
2K
π’ HELM now supports VLM evaluation to evaluate VLMs in a standardized and transparent way. We started with 6 VLMs on 3 scenarios: MMMU, VQAv2 and VizWiz. Stay tuned for more - this is v1! βοΈ Blog post: https://t.co/kkYae5dvFs π― Raw predictions/results: https://t.co/eHRJtAXo3r
2
23
89