seungonekim Profile Banner
Seungone Kim Profile
Seungone Kim

@seungonekim

Followers
2K
Following
5K
Media
50
Statuses
822

Ph.D. student @LTIatCMU and intern at @AIatMeta (FAIR) working on (V)LM Evaluation & Systems that SeIf-Improve | Prev: @kaist_ai @yonsei_u

Pittsburgh, PA
Joined November 2021
Don't wanna be here? Send us removal request.
@seungonekim
Seungone Kim
7 months
#NLProc New paper on "evaluation-time scaling", a new dimension to leverage test-time compute! We replicate the test-time scaling behaviors observed in generators (e.g., o1, r1, s1) with evaluators by enforcing to generate additional reasoning tokens. https://t.co/Qaxhdap52S
3
39
171
@seungonekim
Seungone Kim
11 days
We are gathering problems to build a challenging math benchmark (collaboration between @AiEleuther and @withmsit). The compensation per problem is up to ~$3,623 and the due date is Nov 10th! https://t.co/TdUG5xvTr2
@AiEleuther
EleutherAI
12 days
We are announcing an opportunity for paid question writers to contribute to a new PhD-level math benchmark. Accepted contributors will be paid per question and will be invited to be authors on the resulting dataset paper. Check out the link below for more information!
0
2
22
@AiEleuther
EleutherAI
12 days
We are announcing an opportunity for paid question writers to contribute to a new PhD-level math benchmark. Accepted contributors will be paid per question and will be invited to be authors on the resulting dataset paper. Check out the link below for more information!
1
5
23
@andre_t_martins
Andre Martins
21 days
2) M-Prometheus: A Suite of Open Multilingual LLM Judges w/ @zmprcp @dongkeun_yoon @psanfernandes @ianwu97 @seungonekim @RicardoRei7 @gneubig - (Poster session 1, Tue Oct 7, 11:00 AM – 1:00 PM)
1
2
7
@AkariAsai
Akari Asai
1 month
Grad school season reminder: many CS departments run student-led pre-application mentorship programs for prospective PhD applicants (due Oct. You can get feedback from current PhD students! Eg - UW’s CSE PAMS: https://t.co/RYw4mbD47h - MIT EECS GAAP: https://t.co/piD6hkmHzq 🧵
Tweet card summary image
cs.washington.edu
Pre-Application Mentorship Service (PAMS)
10
42
265
@wellecks
Sean Welleck
2 months
Excited to teach Advanced NLP at CMU again this semester! Slides are on the course page as the course proceeds: https://t.co/xsqARaZEK9 Lectures will be uploaded to Youtube: https://t.co/4kfXvS2MCb
5
94
591
@jiseungh99
Jiseung Hong
2 months
Introducing ⚔️PR Arena⚔️ - free AI coding agents to fix real GitHub issues. Claude Sonnet 4 vs Gemini 2.5 Pro… Who writes better pull requests? 👉 Install here: https://t.co/bk19LcnBVf Powered by @allhands_ai
4
12
79
@tli104
Tianjian Li
2 months
Language models often produce repetitive responses, and this issue is further amplified by post-training. In this work, we introduce DARLING, a method that explicitly optimizes for both response diversity and quality within online reinforcement learning!
@jaseweston
Jason Weston
2 months
🌀Diversity Aware RL (DARLING)🌀 📝: https://t.co/MH0tui34Cb - Jointly optimizes for quality & diversity using a learned partition function - Outperforms standard RL in quality AND diversity metrics, e.g. higher pass@1/p@k - Works for both non-verifiable & verifiable tasks 🧵1/5
2
24
90
@swarnaNLP
Swarnadeep Saha
2 months
Got a new efficient/optimally-thinking LLM? Does you model answer simple queries quickly and spends compute on the harder ones? Test it on our new benchmark, OptimalThinkingBench! 👇 Work led by the amazing @PranjalAggarw16 during this internship!
@jaseweston
Jason Weston
2 months
🤖Introducing OptimalThinkingBench 🤖 📝: https://t.co/aufQVJp8aC - Thinking LLMs use a lot of tokens & overthink; non-thinking LLMs underthink & underperform. - We introduce a benchmark which scores models in the quest to find the best mix. - OptimalThinkingBench reports the F1
0
10
79
@jaseweston
Jason Weston
2 months
🤖Introducing OptimalThinkingBench 🤖 📝: https://t.co/aufQVJp8aC - Thinking LLMs use a lot of tokens & overthink; non-thinking LLMs underthink & underperform. - We introduce a benchmark which scores models in the quest to find the best mix. - OptimalThinkingBench reports the F1
1
72
422
@Jeande_d
Jean de Nyandwi
2 months
Current multimodal LLMs excel in English and Western contexts but struggle with cultural knowledge from underrepresented regions and languages. How can we build truly globally inclusive vision-language models? We are introducing CulturalGround, a large-scale dataset with 22M
7
25
154
@wellecks
Sean Welleck
3 months
Excited about CMU's new Institute for Computer-Aided Reasoning in Mathematics (ICARM), a new NSF Mathematical Sciences Research Institute. I'm honored to serve as an Assistant Director focusing on machine learning and mathematics.
@CarnegieMellon
Carnegie Mellon University
3 months
A new federally funded national institute at CMU will help mathematicians use AI to make mathematical reasoning faster and more reliable in solving pressing challenges across science, security and the economy. Read more, and scroll for further details:
8
23
173
@gneubig
Graham Neubig
3 months
Here is another one, forgot to post before the session 😅
0
2
21
@seungonekim
Seungone Kim
3 months
Also check out our LLM-as-an-Interviewer work that will be presented by @euns0o_kim ! I think there are many future work to be done in dynamic evals🙂 https://t.co/5EclVtTOW4
@euns0o_kim
Eunsu Kim
3 months
I’ll be presenting our LLM-as-an-Interviewer work at #ACL2025! 📅 When: July 30 (wed) 11:00-12:30 📍 Where: Hall 4/5 https://t.co/dreWbCy0Pb Feel free to stop by ! Looking forward to discussing (m)LLM evaluation and more!
0
0
3
@gneubig
Graham Neubig
3 months
I'll try to do my best 😳 There's no substituting for @seungonekim
@seungonekim
Seungone Kim
3 months
Unfortunately, I won't be at @aclmeeting this year, but my advisor @gneubig will thankfully be presenting this work! (It's so cool to have an advisor who presents your paper☺️) 📆 July 29th (Tuesday), 10:30AM-12:00PM 📍Hall 4/5, Session 7: IP-Posters (Poster Session 2)
0
1
33
@seungonekim
Seungone Kim
3 months
Unfortunately, I won't be at @aclmeeting this year, but my advisor @gneubig will thankfully be presenting this work! (It's so cool to have an advisor who presents your paper☺️) 📆 July 29th (Tuesday), 10:30AM-12:00PM 📍Hall 4/5, Session 7: IP-Posters (Poster Session 2)
@seungonekim
Seungone Kim
11 months
#NLProc Just because GPT-4o is 17 times more expensive than GPT-4o-mini, does that mean it generates synthetic data 17 times better? Introducing the AgoraBench, a benchmark for evaluating data generation capabilities of LMs.
1
7
63
@vijaytarian
Vijay V.
3 months
We’ve prepared a tutorial for ACL this year to give you some answers. Come join @xiangyue96, @alisawuffles, @yizhongwyz, @gneubig, and me for “Synthetic Data in the Era of LLMs.” 📍 Sunday 2–3:30pm, Hall B #ACL2025
6
8
56
@euns0o_kim
Eunsu Kim
3 months
I’ll be presenting our LLM-as-an-Interviewer work at #ACL2025! 📅 When: July 30 (wed) 11:00-12:30 📍 Where: Hall 4/5 https://t.co/dreWbCy0Pb Feel free to stop by ! Looking forward to discussing (m)LLM evaluation and more!
Tweet card summary image
arxiv.org
We introduce LLM-as-an-Interviewer, a novel paradigm for evaluating large language models (LLMs). This approach leverages multi-turn interactions where the LLM interviewer actively provides...
@euns0o_kim
Eunsu Kim
10 months
[1/7] 🚨 New LLM Evaluation Paper Alert! How can we better understand LLMs' abilities? Why not interview them across multiple turns? 🎤 We introduce the LLM-as-an-Interviewer Framework, along with its summarized interview report! 👉 https://t.co/dreWbCyyEJ
0
4
29
@PranjalAggarw16
Pranjal Aggarwal ✈️ COLM 🍁
3 months
Can LLMs self-improve on code generation? Check out our work AlphaVerus where model generates provably correct code and self-improves without any weight updates! At #ICML2025 today: 📆: 11:00 AM - 1:30 PM 📷: Poster #East-2912 https://t.co/53AIFOaEBY w/ Bryan, @wellecks
0
10
57
@AkariAsai
Akari Asai
3 months
Some updates 🚨 I finished my Ph.D at @uwcse in June 2025! After a year at AI2 as a Research Scientist, I am joining CMU @LTIatCMU & @mldcmu (courtesy) as an Assistant Professor in Fall 2026. The journey, acknowledgments & recruiting in 🧵
121
64
1K