
Seungone Kim
@seungonekim
Followers
2K
Following
5K
Media
50
Statuses
806
Ph.D. student @LTIatCMU and intern at @AIatMeta (FAIR) working on (V)LM Evaluation & Systems that SeIf-Improve | Prev: @kaist_ai @yonsei_u
Pittsburgh, PA
Joined November 2021
#NLProc .New paper on "evaluation-time scaling", a new dimension to leverage test-time compute!. We replicate the test-time scaling behaviors observed in generators (e.g., o1, r1, s1) with evaluators by enforcing to generate additional reasoning tokens.
3
37
170
RT @wellecks: Excited about CMU's new Institute for Computer-Aided Reasoning in Mathematics (ICARM), a new NSF Mathematical Sciences Resear….
0
21
0
Also check out our LLM-as-an-Interviewer work that will be presented by @euns0o_kim ! I think there are many future work to be done in dynamic evals🙂.
I’ll be presenting our LLM-as-an-Interviewer work at #ACL2025!. 📅 When: July 30 (wed) 11:00-12:30.📍 Where: Hall 4/5. Feel free to stop by !.Looking forward to discussing (m)LLM evaluation and more!.
0
0
3
Unfortunately, I won't be at @aclmeeting this year, but my advisor @gneubig will thankfully be presenting this work! (It's so cool to have an advisor who presents your paper☺️). 📆 July 29th (Tuesday), 10:30AM-12:00PM.📍Hall 4/5, Session 7: IP-Posters (Poster Session 2).
#NLProc .Just because GPT-4o is 17 times more expensive than GPT-4o-mini, does that mean it generates synthetic data 17 times better? . Introducing the AgoraBench, a benchmark for evaluating data generation capabilities of LMs.
1
7
63
RT @vijaytarian: We’ve prepared a tutorial for ACL this year to give you some answers. Come join @xiangyue96, @alisawuffles, @yizhongwyz, @….
0
8
0
RT @euns0o_kim: I’ll be presenting our LLM-as-an-Interviewer work at #ACL2025!. 📅 When: July 30 (wed) 11:00-12:30.📍 Where: Hall 4/5.https:/….
arxiv.org
We introduce LLM-as-an-Interviewer, a novel paradigm for evaluating large language models (LLMs). This approach leverages multi-turn interactions where the LLM interviewer actively provides...
0
4
0
RT @PranjalAggarw16: Can LLMs self-improve on code generation? Check out our work AlphaVerus where model generates provably correct code an….
0
10
0
RT @AkariAsai: Some updates 🚨.I finished my Ph.D at @uwcse in June 2025!.After a year at AI2 as a Research Scientist, I am joining CMU @LTI….
0
61
0
RT @sukjun_hwang: Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical netw….
0
733
0
RT @xiangyue96: People are racing to push math reasoning performance in #LLMs—but have we really asked why? The common assumption is that i….
0
126
0
RT @Benjamin_eecs: We've always been excited about self-play unlocking continuously improving agents. Our insight: RL selects generalizable….
0
51
0
RT @apurvasgandhi: New preprint on web agents🚨.Go-Browse: Training Web Agents with Structured Exploration. Problem: LLMs lack prior underst….
0
6
0
RT @jiseungh99: 👆 OpenAI recently rolled back its GPT- 4o update due to Sycophancy—being overly flattering and agreeable. 🧐 However, can we….
0
4
0
RT @wellecks: New paper by Andre He:. Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening. Tired o….
0
53
0
RT @JiayiiGeng: Using LLMs to build AI scientists is all the rage now (e.g., Google’s AI co-scientist [1] and Sakana’s Fully Automated Scie….
0
81
0
RT @ShayneRedford: 🚨 @frimelle and I are looking for a junior collaborator to research the Open Model Ecosystem! 🤖. Ideally, someone w/ AI/….
docs.google.com
This is an interest form to contribute/collaborate on a research project, investigating the open model ecosystem. What? A research project, doing analysis on the downstream use of open models. We...
0
25
0
RT @yizhongwyz: Thrilled to announce that I will be joining @UTAustin @UTCompSci as an assistant professor in fall 2026! . I will continue….
0
55
0
RT @ronalhwang: 🚨 New Paper co-led with @bkjeon1211 🚨. Q. Can we adapt Language Models, trained to predict next token, to reason in sentenc….
0
44
0
Within the RAG pipeline, the retriever often acts as the bottleneck!. Instead of training a better embedding model, we explore using a reasoning model both as the retriever&generator. To do this, we add MCTS to the generative retrieval pipeline. Check out @chaechaek1214's post!.
❓What if your RAG didn’t need a separate retrieval model at all?. We present 🧊FREESON, a new framework for retriever-FREE retrieval-augmented reasoning. With FREESON, a single LRM acts as both generator and retriever, shifting the focus from seq2seq matching to locating
0
2
31