michiyasunaga Profile Banner
Michi Yasunaga Profile
Michi Yasunaga

@michiyasunaga

Followers
4K
Following
751
Media
48
Statuses
310

Stanford, CA
Joined October 2019
Don't wanna be here? Send us removal request.
@michiyasunaga
Michi Yasunaga
18 days
gpt-oss (open models) are out - they can reason, can code, and can use tools like browsing and python to solve agentic tasks. Hope they are useful for the community!.
@OpenAI
OpenAI
18 days
Our open models are here. Both of them.
0
1
123
@michiyasunaga
Michi Yasunaga
18 days
RT @ren_hongyu: Check out the latest open models. Absolutely no competitor of the same scale. Towards intelligence too cheap to meter. http….
0
16
0
@michiyasunaga
Michi Yasunaga
5 months
RT @zhaofeng_wu: Robust reward models are critical for alignment/inference-time algos, auto eval, etc. (e.g. to prevent reward hacking whic….
0
36
0
@michiyasunaga
Michi Yasunaga
6 months
RT @gh_marjan: As Vision-Language Models (VLMs) grow more powerful, we need better reward models to align them with human intent. But how….
0
2
0
@michiyasunaga
Michi Yasunaga
6 months
🔗 Check out the benchmark here: This is a joint work with @gh_marjan and @LukeZettlemoyer at @AIatMeta. Huge thanks to all who gave us feedback and support. [4/4].
Tweet card summary image
github.com
Multimodal RewardBench. Contribute to facebookresearch/multimodal_rewardbench development by creating an account on GitHub.
0
0
3
@michiyasunaga
Michi Yasunaga
6 months
Our findings reveal that even top models like Gemini 1.5 Pro, Claude 3.5 Sonnet, and GPT-4o achieve just 72% accuracy (with 50% being random guess), struggling with knowledge, reasoning, and safety. This highlights a tough but important testbed for advancing VLMs. [3/n]
Tweet media one
1
1
4
@michiyasunaga
Michi Yasunaga
6 months
Reward models are essential for training vision-language models (VLMs), yet we lack benchmarks for them. Multimodal RewardBench is a new expert-annotated benchmark covering holistic aspects like reasoning & safety with 5,000+ (prompt, chosen & rejected responses) examples. [2/n]
Tweet media one
1
1
4
@michiyasunaga
Michi Yasunaga
6 months
📢 Introducing Multimodal RewardBench:. A holistic, human-annotated benchmark for evaluating VLM reward models or judges across diverse dimensions: correctness, preference, knowledge, reasoning, safety and more. Paper: Data: [1/n]
Tweet media one
7
37
183
@michiyasunaga
Michi Yasunaga
8 months
RT @JunhongShen1: Introducing Content-Adaptive Tokenizer (CAT) 🐈! An image tokenizer that adapts token count based on image complexity, off….
0
47
0
@michiyasunaga
Michi Yasunaga
8 months
RT @WeijiaShi2: Introducing 𝐋𝐥𝐚𝐦𝐚𝐅𝐮𝐬𝐢𝐨𝐧: empowering Llama 🦙 with diffusion 🎨 to understand and generate text and images in arbitrary sequen….
0
179
0
@michiyasunaga
Michi Yasunaga
8 months
RT @liliyu_lili: We scaled up Megabyte and ended up with a BLT! . A pure byte-level model, has a steeper scaling law than the BPE-based mod….
0
10
0
@michiyasunaga
Michi Yasunaga
8 months
RT @__JohnNguyen__: 🥪New Paper! 🥪Introducing Byte Latent Transformer (BLT) - A tokenizer free model scales better than BPE based models wit….
0
64
0
@michiyasunaga
Michi Yasunaga
9 months
RT @gh_marjan: Everyone’s talking about synthetic data generation — but what’s the recipe for scaling it without model collapse? 🤔. Meet AL….
0
10
0
@michiyasunaga
Michi Yasunaga
9 months
This is a joint work with @gh_marjan, Leonid Shamis, @violet_zct, @andrew_e_cohen, @jaseweston, @LukeZettlemoyer at @AIatMeta. Huge thanks to the collaborators and all who gave us feedback and support. [6/6].
0
0
6
@michiyasunaga
Michi Yasunaga
9 months
As a result, ALMA improves for 10+ rounds of training (exceeding the ceiling in existing self-alignment methods). It achieves performance close to Llama3 Instruct across alignment benchmarks (MT-Bench, Arena-Hard, and AlpacaEval2.0, etc), while using <1% of human labels. [5/n]
Tweet media one
1
0
3
@michiyasunaga
Michi Yasunaga
9 months
(3) Improved LLM-as-a-Judge with score aggregation and self-distillation.(4) Iterative data synthesis & RLHF that prevents saturation. Notably, ALMA is fully self-bootstrapping, relying solely on the base model and small seed data; no distillation from external models. [4/n].
1
0
4
@michiyasunaga
Michi Yasunaga
9 months
To achieve this, ALMA introduces technical improvements for 4 core components in alignment:. (1) Diverse prompt synthesis via few-shot prompting & clustering.(2) Diverse response synthesis by sampling from multiple model checkpoints. [3/n].
1
0
4
@michiyasunaga
Michi Yasunaga
9 months
We present ALMA, a new self-alignment recipe that takes only a base LLM (Llama3 Base), minimal SFT data (5k) and preference data (4k) as input, and achieves competitive alignment performance (close to Llama3 Instruct) via iterative data synthesis and training. [2/n].
1
0
5
@michiyasunaga
Michi Yasunaga
9 months
📣 Introducing ALMA: Alignment with Minimal Annotation. Idea: Conventional LLM alignment (post-training) methods use millions of human labeled data for SFT and RLHF, which is costly. Can a base LLM just self-align from much fewer (<1%) seed data?. [1/n]
Tweet media one
3
40
159
@michiyasunaga
Michi Yasunaga
9 months
RT @AkariAsai: 🚨 I’m on the job market this year! 🚨.I’m completing my @uwcse Ph.D. (2025), where I identify and tackle key LLM limitations….
0
121
0