Justin Cho
@HJCH0
Followers
970
Following
3K
Media
102
Statuses
621
building collaborative and contextualized AI | final-year PhD student at @USC_ISI | on the job market!
Los Angeles
Joined October 2018
🚨 New preprint! ❓Asking questions shape how we learn, but which questions help the most? A question's value is often measured by indirect metrics like expected info gain or salience. 🎯 What if we measure a question's usefulness **directly** based on its impact on learning?
1
2
10
Thank you for inviting me for this talk! It was my honor to give my first talk as Dr. Cho to a group of very inquisitive students :)
Thank you @HJCH0 (with a cute baby! :)) for an insightful invited talk with SKKU students "Context Synchronization for Collaborative and Personalized AI" Bridging human AI context gaps via grounding evaluation, simulated contexts, and multimodal understanding
0
0
5
Can’t recommend working with Alex enough! It’s hard to think of a positive trait that doesn’t apply to Alex, he’s creative, caring, passionate, and super sharp. You shouldn’t miss out on this opportunitiy if ur planning on doing a PhD
✨ Very overdue update: I'll be starting as an Assistant Professor in CS at University of Minnesota, Twin Cities, Fall 2026. I will be recruiting PhD students!! Please help me spread the word! [Thread] 1/n
1
0
5
No better time to learn about that #AI thing everyone's talking about... 📢 I'm recruiting PhD students in Computer Science or Information Science @Cornell_Bowers! If you're interested, apply to either department (yes, either program!) and list me as a potential advisor!
18
105
540
Can you tell what actions are being mimed in this video? If so, you’re smarter than AI models! Check the last tweet in this thread for answers. In a new paper, we present MIME, which evaluates whether vision language models (VLMs) have a robust understanding of human actions. 🧵
1
9
23
X algorithm is interesting to say the least 😂 a tweet that I put a lot of thought into gets <50 views and a tweet mistakenly posted out of context that was supposed to be a reply gets >500 views. causing confusion is a nice recipe to engagement i guess?
2
0
3
📣Hiring! Two opportunities 1. Research internship@MSR AI Interaction and Learning (Current PhD students) 2. Multiple positions in my lab @UTiSchool on LLM Personalization/Human-AI Alignment (Prospective PhD students) Details in thread below👇
4
44
250
the tree that keeps on giving 🙏
Excited to release new repo: nanochat! (it's among the most unhinged I've written). Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single,
0
0
6
Can someone for the life of me explain to me why the whole LLM RL community is researching GSM8K and making general claims from this? It's f*king arithmetic. I kinda got it during NeurIPS, rushed, not much better training data around. But 9 months after R1? Inexcusable.
12
3
141
📢 New Preprint! 📢 https://t.co/WkzBsrDWau TL;DR: textual entailment and token probability behave very differently as bias evaluation metrics, even on the exact same bias definitions. Also, I'm looking for summer 2026 research internships in responsible AI - please reach out!
arxiv.org
Measurement of social bias in language models is typically by token probability (TP) metrics, which are broadly applicable but have been criticized for their distance from real-world langugage...
0
1
4
That's a wrap!🎬 It was a pleasure to partner with @Dongho_Lee_ for this work, as well as @jay_mlr and @jonathanmay! Check out the paper for more details: https://t.co/tthWDUSUTF Code and data: https://t.co/Hb3vesoHEP
0
0
4
📍QUEST is a paradigm shift for question generation grounded in direct, outcome-driven utility leveraging LMs as simulators ✅ The QUEST framework is generalizable to tasks that can benefit from questions and have measurable outcomes, e.g., medical diagnosis, software debugging!
1
0
1
⚠️Setting a higher threshold for rejection sampling yields higher exam scores, but it is nontrivial to filter for a meaningful number of training samples as these simulations are inference-heavy. Future work that scales QUEST could further yield outsized gains.
1
0
1
🔎 We find that QUEST uncovers a unique signal that proxies miss! Question utility correlates only weakly with salience, information gain, or semantic & lexical similarity with the exam questions.
1
0
1
📈QUEST can train models to generate better questions by filtering for high-utility questions and fine-tuning with rejection sampling. QUEST-trained models generate questions that boosts scores by 20+% 📈 compared to strong baselines, even direct SFT on exam questions!
1
0
1
To evaluate QUEST, we curate TEXTBOOK-EXAM 📚, a dataset that maps textbook sections with end-of-chapter exam questions across 5 subjects, letting us directly test question impact.
1
0
1
Instead of relying on indirect measures of a question’s value like salience or expected info gain, QUEST directly measures: “Does this question improve learning outcomes?” See this example for chemistry:
1
0
1
We introduce QUEST (Question Utility Estimation with Simulated Tests) 🚀 QUEST uses language models as simulated learners that study, ask questions, and take exams to measure question utility: how much each question improves exam performance.
1
0
2