Seungone Kim @seungonekim X Profile

Seungone Kim

@seungonekim

Followers

2K

Following

5K

Media

50

Statuses

822

Ph.D. student @LTIatCMU and intern at @AIatMeta (FAIR) working on (V)LM Evaluation & Systems that SeIf-Improve | Prev: @kaist_ai @yonsei_u

https://t.co/8zfgAGKCcS

Pittsburgh, PA

Joined November 2021

Don't wanna be here? Send us removal request.

Seungone Kim

@seungonekim

7 months

#NLProc New paper on "evaluation-time scaling", a new dimension to leverage test-time compute! We replicate the test-time scaling behaviors observed in generators (e.g., o1, r1, s1) with evaluators by enforcing to generate additional reasoning tokens. https://t.co/Qaxhdap52S

3

39

171

Seungone Kim

@seungonekim

11 days

We are gathering problems to build a challenging math benchmark (collaboration between @AiEleuther and @withmsit). The compensation per problem is up to ~$3,623 and the due date is Nov 10th! https://t.co/TdUG5xvTr2

EleutherAI

@AiEleuther

12 days

We are announcing an opportunity for paid question writers to contribute to a new PhD-level math benchmark. Accepted contributors will be paid per question and will be invited to be authors on the resulting dataset paper. Check out the link below for more information!

0

2

22

EleutherAI

@AiEleuther

12 days

We are announcing an opportunity for paid question writers to contribute to a new PhD-level math benchmark. Accepted contributors will be paid per question and will be invited to be authors on the resulting dataset paper. Check out the link below for more information!

1

5

23

Andre Martins

@andre_t_martins

21 days

2) M-Prometheus: A Suite of Open Multilingual LLM Judges w/ @zmprcp @dongkeun_yoon @psanfernandes @ianwu97 @seungonekim @RicardoRei7 @gneubig - (Poster session 1, Tue Oct 7, 11:00 AM – 1:00 PM)

1

2

7

Akari Asai

@AkariAsai

1 month

Grad school season reminder: many CS departments run student-led pre-application mentorship programs for prospective PhD applicants (due Oct. You can get feedback from current PhD students! Eg - UW’s CSE PAMS: https://t.co/RYw4mbD47h - MIT EECS GAAP: https://t.co/piD6hkmHzq 🧵

cs.washington.edu

Pre-Application Mentorship Service (PAMS)

10

42

265

Sean Welleck

@wellecks

2 months

Excited to teach Advanced NLP at CMU again this semester! Slides are on the course page as the course proceeds: https://t.co/xsqARaZEK9 Lectures will be uploaded to Youtube: https://t.co/4kfXvS2MCb

5

94

591

Jiseung Hong

@jiseungh99

2 months

Introducing ⚔️PR Arena⚔️ - free AI coding agents to fix real GitHub issues. Claude Sonnet 4 vs Gemini 2.5 Pro… Who writes better pull requests? 👉 Install here: https://t.co/bk19LcnBVf Powered by @allhands_ai

4

12

79

Tianjian Li

@tli104

2 months

Language models often produce repetitive responses, and this issue is further amplified by post-training. In this work, we introduce DARLING, a method that explicitly optimizes for both response diversity and quality within online reinforcement learning!

Jason Weston

@jaseweston

2 months

🌀Diversity Aware RL (DARLING)🌀 📝: https://t.co/MH0tui34Cb - Jointly optimizes for quality & diversity using a learned partition function - Outperforms standard RL in quality AND diversity metrics, e.g. higher pass@1/p@k - Works for both non-verifiable & verifiable tasks 🧵1/5

2

24

90

Swarnadeep Saha

@swarnaNLP

2 months

Got a new efficient/optimally-thinking LLM? Does you model answer simple queries quickly and spends compute on the harder ones? Test it on our new benchmark, OptimalThinkingBench! 👇 Work led by the amazing @PranjalAggarw16 during this internship!

Jason Weston

@jaseweston

2 months

🤖Introducing OptimalThinkingBench 🤖 📝: https://t.co/aufQVJp8aC - Thinking LLMs use a lot of tokens & overthink; non-thinking LLMs underthink & underperform. - We introduce a benchmark which scores models in the quest to find the best mix. - OptimalThinkingBench reports the F1

0

10

79

Jason Weston

@jaseweston

2 months

🤖Introducing OptimalThinkingBench 🤖 📝: https://t.co/aufQVJp8aC - Thinking LLMs use a lot of tokens & overthink; non-thinking LLMs underthink & underperform. - We introduce a benchmark which scores models in the quest to find the best mix. - OptimalThinkingBench reports the F1

1

72

422

Jean de Nyandwi

@Jeande_d

2 months

Current multimodal LLMs excel in English and Western contexts but struggle with cultural knowledge from underrepresented regions and languages. How can we build truly globally inclusive vision-language models? We are introducing CulturalGround, a large-scale dataset with 22M

7

25

154

Sean Welleck

@wellecks

3 months

Excited about CMU's new Institute for Computer-Aided Reasoning in Mathematics (ICARM), a new NSF Mathematical Sciences Research Institute. I'm honored to serve as an Assistant Director focusing on machine learning and mathematics.

Carnegie Mellon University

@CarnegieMellon

3 months

A new federally funded national institute at CMU will help mathematicians use AI to make mathematical reasoning faster and more reliable in solving pressing challenges across science, security and the economy. Read more, and scroll for further details:

8

23

173

Graham Neubig

@gneubig

3 months

Here is another one, forgot to post before the session 😅

0

2

21

Seungone Kim

@seungonekim

3 months

Also check out our LLM-as-an-Interviewer work that will be presented by @euns0o_kim ! I think there are many future work to be done in dynamic evals🙂 https://t.co/5EclVtTOW4

Eunsu Kim

@euns0o_kim

3 months

I’ll be presenting our LLM-as-an-Interviewer work at #ACL2025! 📅 When: July 30 (wed) 11:00-12:30 📍 Where: Hall 4/5 https://t.co/dreWbCy0Pb Feel free to stop by ! Looking forward to discussing (m)LLM evaluation and more!

0

3

Graham Neubig

@gneubig

3 months

I'll try to do my best 😳 There's no substituting for @seungonekim

Seungone Kim

@seungonekim

3 months

Unfortunately, I won't be at @aclmeeting this year, but my advisor @gneubig will thankfully be presenting this work! (It's so cool to have an advisor who presents your paper☺️) 📆 July 29th (Tuesday), 10:30AM-12:00PM 📍Hall 4/5, Session 7: IP-Posters (Poster Session 2)

0

1

33

Seungone Kim

@seungonekim

3 months

Unfortunately, I won't be at @aclmeeting this year, but my advisor @gneubig will thankfully be presenting this work! (It's so cool to have an advisor who presents your paper☺️) 📆 July 29th (Tuesday), 10:30AM-12:00PM 📍Hall 4/5, Session 7: IP-Posters (Poster Session 2)

Seungone Kim

@seungonekim

11 months

#NLProc Just because GPT-4o is 17 times more expensive than GPT-4o-mini, does that mean it generates synthetic data 17 times better? Introducing the AgoraBench, a benchmark for evaluating data generation capabilities of LMs.

1

7

63

Vijay V.

@vijaytarian

3 months

We’ve prepared a tutorial for ACL this year to give you some answers. Come join @xiangyue96, @alisawuffles, @yizhongwyz, @gneubig, and me for “Synthetic Data in the Era of LLMs.” 📍 Sunday 2–3:30pm, Hall B #ACL2025

6

8

56

Eunsu Kim

@euns0o_kim

3 months

I’ll be presenting our LLM-as-an-Interviewer work at #ACL2025! 📅 When: July 30 (wed) 11:00-12:30 📍 Where: Hall 4/5 https://t.co/dreWbCy0Pb Feel free to stop by ! Looking forward to discussing (m)LLM evaluation and more!

arxiv.org

We introduce LLM-as-an-Interviewer, a novel paradigm for evaluating large language models (LLMs). This approach leverages multi-turn interactions where the LLM interviewer actively provides...

Eunsu Kim

@euns0o_kim

10 months

[1/7] 🚨 New LLM Evaluation Paper Alert! How can we better understand LLMs' abilities? Why not interview them across multiple turns? 🎤 We introduce the LLM-as-an-Interviewer Framework, along with its summarized interview report! 👉 https://t.co/dreWbCyyEJ

0

4

29

Pranjal Aggarwal ✈️ COLM 🍁

@PranjalAggarw16

3 months

Can LLMs self-improve on code generation? Check out our work AlphaVerus where model generates provably correct code and self-improves without any weight updates! At #ICML2025 today: 📆: 11:00 AM - 1:30 PM 📷: Poster #East-2912 https://t.co/53AIFOaEBY w/ Bryan, @wellecks

0

10

57

Akari Asai

@AkariAsai

3 months

Some updates 🚨 I finished my Ph.D at @uwcse in June 2025! After a year at AI2 as a Research Scientist, I am joining CMU @LTIatCMU & @mldcmu (courtesy) as an Assistant Professor in Fall 2026. The journey, acknowledgments & recruiting in 🧵

121

64

1K