Sungmin Cha
@_sungmin_cha
Followers
463
Following
2K
Media
24
Statuses
432
Faculty Fellow @nyuniversity | PhD @SeoulNatlUni
Manhattan, NY
Joined July 2019
How can we be sure a generative model (LLMs, Diffusion) has truly unlearned something? What if existing evaluation metrics are misleading us? In our new paper, we introduce FADE, a new metric that assesses genuine unlearning by measuring distributional alignment, moving beyond
1
10
38
I'm excited to share recent research I've been working on with my amazing co-authors since joining @thomsonreuters Foundational Research, tackling two critical challenges in LLM evals: ๐ฟ๐ฒ๐น๐ถ๐ฎ๐ฏ๐น๐ฒ ๐ฎ๐๐๐ผ-๐ฒ๐๐ฎ๐นs & ๐ฐ๐ผ๐บ๐ฝ๐๐๐ฒ-๐ฒ๐ณ๐ณ๐ถ๐ฐ๐ถ๐ฒ๐ป๐ ๐ฒ๐๐ฎ๐นs (Thread ๐)
1
1
4
Happy to introduce my internship work at @Google and @GoogleDeepMind, collab w/ @googlecloud. We introduce TIR-Judge, an end-to-end agentic RL framework that trains LLM judges with tool-integrated reasoning ๐ง ๐ ๏ธ ๐ https://t.co/rtfqlvuzJ0
#Agents #LLMs #Judges #RL #reasoning
6
40
385
Launching our Research Lab : Advancing experience powered, decentralized superintelligence - built for continual learning, generalization & model-based planning. Press Release : https://t.co/iPYXb1nzYr Weโre solving the hardest challenges in real-world industries, robotics,
businesswire.com
ExperienceFlow.AI, a pioneer in delivering autonomous enterprise operations and decision-making platforms, announces launch of their Superintelligence Resear...
2
10
98
New changes for ICML 2026: - attendance not required for acceptance - original submission published along side camera ready version - new reciprocal reviewing requirements
- New guidelines on generative AI considerations Check out the full CfPs! Papers: https://t.co/4ppHEb6w1c Position Papers: https://t.co/HS6AXFehDW
5
6
119
Letโs go to Seoul!
๐ข We are excited to release the call for papers for #ICML2026, held in Seoul, South Korea next year! ๐
Key Dates Abstract deadline: Jan 23, 2026 AOE Full paper deadline: Jan 28, 2026 AOE Main Track โ https://t.co/CYBD7dxFJv Position Papers โ
0
0
27
First principle of Context Engineering: Human-Machine Intelligence Gap โ Humans naturally "fill in the blanks," machines don't. Context Engineering is fundamentally about entropy reduction, translating high-entropy human intent into machine-understandable signals. Every
๐จ RIP โPrompt Engineering.โ The GAIR team just dropped Context Engineering 2.0 โ and it completely reframes how we think about humanโAI interaction. Forget prompts. Forget โfew-shot.โ Context is the real interface. Hereโs the core idea: โA person is the sum of their
1
5
18
๐ฆ๐๐ผ๐ฝ ๐๐ฟ๐ถ๐๐ถ๐ป๐ด ๐ฏ๐ฒ๐๐๐ฒ๐ฟ ๐ฝ๐ฟ๐ผ๐บ๐ฝ๐๐. Start engineering the system that feeds your LLM the right context at the right time. We've just released our new e-book on ๐๐ผ๐ป๐๐ฒ๐
๐ ๐๐ป๐ด๐ถ๐ป๐ฒ๐ฒ๐ฟ๐ถ๐ป๐ด going into details on exactly this ๐ฝ Download it for free
16
71
429
Cohereโs models and the fabulous @JayAlammar are hard at work, to help us explore all that NeurIPS 2025 has to offer!
The Illustrated NeurIPS 2025: A Visual Map of the AI Frontier New blog post! NeurIPS 2025 papers are outโand itโs a lot to take in.ย This visualization lets you explore the entire research landscape interactively, with clusters, summaries, and @cohere LLM-generated explanations
1
11
84
This seems like a major breakthrough for AI advancement Tencent and Tsinghua introduced CALM (Continuous Autoregressive Language Models), a new approach that replaces next token prediction with continuous vector prediction, allowing the model to think in ideas instead of words.
Holy shit... this might be the next big paradigm shift in AI. ๐คฏ Tencent + Tsinghua just dropped a paper called Continuous Autoregressive Language Models (CALM) and it basically kills the โnext-tokenโ paradigm every LLM is built on. Instead of predicting one token at a time,
63
239
2K
Prompt Engineering is dead. The GAIR team just dropped Context Engineering 2.0 and it completely reframes how we think about humanโAI interaction. Forget prompts. Forget few-shot. Context is the real interface. Their core idea: โA person is the sum of their contexts.โ
32
53
202
Google DeepMind release: Towards Robust Mathematical Reasoning Introduces IMO-Bench, a suite of advanced reasoning benchmarks that played a crucial role in GDM's IMO-gold journey. Vetted by a panel of IMO medalists and mathematicians. IMO-AnswerBench - a large-scale test on
25
154
977
๐๐ ๐๐ด๐ฒ๐ป๐โ๐ ๐ ๐ฒ๐บ๐ผ๐ฟ๐ is the most important piece of ๐๐ผ๐ป๐๐ฒ๐
๐ ๐๐ป๐ด๐ถ๐ป๐ฒ๐ฒ๐ฟ๐ถ๐ป๐ด, this is how we define it ๐ In general, the memory for an agent is something that we provide via context in the prompt passed to LLM that helps the agent to better plan and
44
111
751
This was almost 4 years before the term "Foundational model" was even coined. Early Continual Learning research was genuinely ahead of its time. It was also nice to see an early synergy of nascent ideas in this paper (Bayesian CL + replay through Coresets) ๐ง
The VCL paper has arguably the first example of modern continual learning for GenAI: VAEs trained on digit/alphabet images 1-by-1 https://t.co/iiMQtOOAt2 Coded by yours truly โบ๏ธ who was (and still is) ๐ฅฐ in generative models. Time to get back to continual learning again?
0
1
6
๐จ RIP โPrompt Engineering.โ The GAIR team just dropped Context Engineering 2.0 โ and it completely reframes how we think about humanโAI interaction. Forget prompts. Forget โfew-shot.โ Context is the real interface. Hereโs the core idea: โA person is the sum of their
106
401
2K
This broke my brain. A team at Sea AI Lab just discovered that most of the chaos in reinforcement learning training collapse, unstable gradients, inference drift wasnโt caused by the algorithms at all. It was caused by numerical precision. The default BF16 format, used across
19
54
257
How can we be sure a generative model (LLMs, Diffusion) has truly unlearned something? What if existing evaluation metrics are misleading us? In our new paper, we introduce FADE, a new metric that assesses genuine unlearning by measuring distributional alignment, moving beyond
1
10
38
This thought is converging from many sides. Transformer based LLMs are not going take us to human level AI. That famous Yann LeCun interview. "We are not going to get to human level AI by just scaling up MLMs. This is just not going to happen. There's no way. Okay, absolutely
Fei-Fei Li (@drfeifei) on limitations of LLMs. "There's no language out there in nature. You don't go out in nature and there's words written in the sky for you.. There is a 3D world that follows laws of physics." Language is purely generated signal. https://t.co/FOomRpGTad
117
167
1K
@prfsanjeevarora As a researcher deeply interested in unlearning, this is a fascinating paper! Your "path-dependence" theory explains why true RE is theoretically impossible & why we observed the "universal failure" in our new work. We proposed FADE, a metric measuring distributional similarity
arxiv.org
Current unlearning metrics for generative models evaluate success based on reference responses or classifier outputs rather than assessing the core objective: whether the unlearned model behaves...
0
1
5
Going to San Diego for Neurips? We at @evaluatingevals, along with the UK @AISecurityInst are hosting a closed door state of evals workshop at @UCSanDiego on Dec 8th. Request to join below! :) https://t.co/45Wmjc0Lwo
evalevalai.com
EvalEval, UK AI Security Institute (AISI), and UC San Diego (UCSD) are excited to announce the upcoming Evaluating AI in Practice workshop, happening on December 8, 2025, in San Diego, California.
6
12
100
Very excited with this new finding from our lab: machine unlearning as currently defined (i.e. model should behave as if it had never seen the unlearned data) may be impossible. Main reason: we show that outcome of unlearning --mathematically speaking, gradient ascent --is
Can AI truly forgets? Machine Unlearning (MU) aims to make AI behave as if it has never seen some training data. Our paper: ๐จ MU may be impossible as currently conceived ๐จ. Paper: https://t.co/hQOHp5S7Rk Homepage: https://t.co/zUyyD1Cq92
11
18
146