Michi Yasunaga @michiyasunaga X Profile

Michi Yasunaga

@michiyasunaga

Followers

4K

Following

751

Media

48

Statuses

310

Stanford, CA

Joined October 2019

Don't wanna be here? Send us removal request.

Michi Yasunaga

@michiyasunaga

18 days

gpt-oss (open models) are out - they can reason, can code, and can use tools like browsing and python to solve agentic tasks. Hope they are useful for the community!.

OpenAI

@OpenAI

18 days

Our open models are here. Both of them.

0

1

123

Michi Yasunaga

@michiyasunaga

18 days

RT @ren_hongyu: Check out the latest open models. Absolutely no competitor of the same scale. Towards intelligence too cheap to meter. http….

0

16

0

Michi Yasunaga

@michiyasunaga

5 months

RT @zhaofeng_wu: Robust reward models are critical for alignment/inference-time algos, auto eval, etc. (e.g. to prevent reward hacking whic….

0

36

0

Michi Yasunaga

@michiyasunaga

6 months

RT @gh_marjan: As Vision-Language Models (VLMs) grow more powerful, we need better reward models to align them with human intent. But how….

0

2

0

Michi Yasunaga

@michiyasunaga

6 months

🔗 Check out the benchmark here: This is a joint work with @gh_marjan and @LukeZettlemoyer at @AIatMeta. Huge thanks to all who gave us feedback and support. [4/4].

github.com

Multimodal RewardBench. Contribute to facebookresearch/multimodal_rewardbench development by creating an account on GitHub.

0

3

Michi Yasunaga

@michiyasunaga

6 months

Our findings reveal that even top models like Gemini 1.5 Pro, Claude 3.5 Sonnet, and GPT-4o achieve just 72% accuracy (with 50% being random guess), struggling with knowledge, reasoning, and safety. This highlights a tough but important testbed for advancing VLMs. [3/n]

1

4

Michi Yasunaga

@michiyasunaga

6 months

Reward models are essential for training vision-language models (VLMs), yet we lack benchmarks for them. Multimodal RewardBench is a new expert-annotated benchmark covering holistic aspects like reasoning & safety with 5,000+ (prompt, chosen & rejected responses) examples. [2/n]

1

4

Michi Yasunaga

@michiyasunaga

6 months

📢 Introducing Multimodal RewardBench:. A holistic, human-annotated benchmark for evaluating VLM reward models or judges across diverse dimensions: correctness, preference, knowledge, reasoning, safety and more. Paper: Data: [1/n]

7

37

183

Michi Yasunaga

@michiyasunaga

8 months

RT @JunhongShen1: Introducing Content-Adaptive Tokenizer (CAT) 🐈! An image tokenizer that adapts token count based on image complexity, off….

0

47

0

Michi Yasunaga

@michiyasunaga

8 months

RT @WeijiaShi2: Introducing 𝐋𝐥𝐚𝐦𝐚𝐅𝐮𝐬𝐢𝐨𝐧: empowering Llama 🦙 with diffusion 🎨 to understand and generate text and images in arbitrary sequen….

0

179

0

Michi Yasunaga

@michiyasunaga

8 months

RT @liliyu_lili: We scaled up Megabyte and ended up with a BLT! . A pure byte-level model, has a steeper scaling law than the BPE-based mod….

0

10

0

Michi Yasunaga

@michiyasunaga

8 months

RT @__JohnNguyen__: 🥪New Paper! 🥪Introducing Byte Latent Transformer (BLT) - A tokenizer free model scales better than BPE based models wit….

0

64

0

Michi Yasunaga

@michiyasunaga

9 months

RT @gh_marjan: Everyone’s talking about synthetic data generation — but what’s the recipe for scaling it without model collapse? 🤔. Meet AL….

0

10

0

Michi Yasunaga

@michiyasunaga

9 months

This is a joint work with @gh_marjan, Leonid Shamis, @violet_zct, @andrew_e_cohen, @jaseweston, @LukeZettlemoyer at @AIatMeta. Huge thanks to the collaborators and all who gave us feedback and support. [6/6].

0

6

Michi Yasunaga

@michiyasunaga

9 months

As a result, ALMA improves for 10+ rounds of training (exceeding the ceiling in existing self-alignment methods). It achieves performance close to Llama3 Instruct across alignment benchmarks (MT-Bench, Arena-Hard, and AlpacaEval2.0, etc), while using <1% of human labels. [5/n]

1

0

3

Michi Yasunaga

@michiyasunaga

9 months

(3) Improved LLM-as-a-Judge with score aggregation and self-distillation.(4) Iterative data synthesis & RLHF that prevents saturation. Notably, ALMA is fully self-bootstrapping, relying solely on the base model and small seed data; no distillation from external models. [4/n].

1

0

4

Michi Yasunaga

@michiyasunaga

9 months

To achieve this, ALMA introduces technical improvements for 4 core components in alignment:. (1) Diverse prompt synthesis via few-shot prompting & clustering.(2) Diverse response synthesis by sampling from multiple model checkpoints. [3/n].

1

0

4

Michi Yasunaga

@michiyasunaga

9 months

We present ALMA, a new self-alignment recipe that takes only a base LLM (Llama3 Base), minimal SFT data (5k) and preference data (4k) as input, and achieves competitive alignment performance (close to Llama3 Instruct) via iterative data synthesis and training. [2/n].

1

0

5

Michi Yasunaga

@michiyasunaga

9 months

📣 Introducing ALMA: Alignment with Minimal Annotation. Idea: Conventional LLM alignment (post-training) methods use millions of human labeled data for SFT and RLHF, which is costly. Can a base LLM just self-align from much fewer (<1%) seed data?. [1/n]

3

40

159

Michi Yasunaga

@michiyasunaga

9 months

RT @AkariAsai: 🚨 I’m on the job market this year! 🚨.I’m completing my @uwcse Ph.D. (2025), where I identify and tackle key LLM limitations….

0

121

0