Jinu Lee
@jinulee_v
Followers
472
Following
182
Media
17
Statuses
57
PhD Student @UIUC_NLP. Interested in *semantics of reasoning*, from neuro-symbolic methods to reasoning evaluation/improvement in LLMs. Ex-Intern @MSFTResearch
Joined November 2023
Happy to announce that "Evaluating step-by-step reasoning traces" is accepted to EMNLP 2025 Findings! Check out the survey for (1) different criteria about what is a good reasoning trace/step (2) datasets and methods for evaluating reasoning traces: https://t.co/pcNtI8cHdD
5
8
81
Balancing versatility and reliability in long-form text evaluation is no easy task. Yukyung’s work (and other pioneers) inspired me to explore rubric-based checklists, and they’re proving powerful for evaluating complex chains of thought! Check her cool work in EMNLP✅
How reliable is your LLM-as-a-Judge?⚖️ Existing methods suffer from (a) rating inconsistencies and (b) low stability of correlation with human judgments across evaluator models. Excited to share CheckEval (@emnlpmeeting), a framework that improves reliability for LLM-as-a-Judge.
0
0
4
I'm at #EMNLP2025 this week in Suzhou to present LegalSearchLM! 🤗 Wed, 16:30-18:00 📄paper: https://t.co/lqu90I1596 Come see how we - use first-token-aware autoregressive LMs as retrievers with an FM-index for legal-element reasoning in complex legal case retrieval - release
0
2
16
How do LLMs really navigate the thinking space? Straight off to a final answer OR follow a wiggly path? Definitely commit OR get stuck to “infinite” self-doubting? In our latest study, we unravel (over-)thinking through the lens of sub-thoughts: https://t.co/Wb5AIcbI6a more in 🧵
2
25
59
Life updates: I am back at Illinois! I had a wonderful summer in @MSFTResearch training reasoning models for verifiable programming😎
1
0
49
🔄 We were nominated for Oral+top 1 in the MATH-AI workshp at #ICML! 🚨Why? ≈46 % of GitHub commits are AI-generated—but can we verify them correct? 📢 VeriBench challenges agents; turn Python into Lean code! 🧵1/14 📃 Paper: https://t.co/QPCxg5lKM4
1
19
39
I will be presenting at 7/28(Mon) 11:00-12:30, Hall 4/5. Would love to chat about reasoning, CoT, NLP+formal methods and more! Can't wait to meet old and new friends😁
I am happy to announce that my first-author paper has been accepted to ACL 2025 Main! 🇦🇹 https://t.co/LUnMGpueTF We tackle natural language-first-order logic translation (NL2FOL) using reinforcement learning with NLI labels as rewards. (1/7)
1
1
22
New blog post about asymmetry of verification and "verifier's law": https://t.co/bvS8HrX1jP Asymmetry of verification–the idea that some tasks are much easier to verify than to solve–is becoming an important idea as we have RL that finally works generally. Great examples of
53
245
2K
We always want to scale up RL, yet simply training longer doesn't necessarily push the limits - exploration gets impeded by entropy collapse. We show that the performance ceiling is surprisingly predictable, and the collapse is driven by covariance between logp and advantage.
9
94
540
📢 New paper! FoVer enhances PRMs for step-level verification of LLM reasoning w/o human annotation 🚀 We synthesize training data using formal verification tools and improve LLMs at step-level verification of LLM responses on MATH, AIME, MMLU, BBH, etc. https://t.co/gdK6BC7rJv
4
26
127
Life update: Just started my summer internship in @MSFTResearch (Redmond, WA)! Happy to chat with fellow MSR people, or anyone around the wider Seattle area 🏙️
3
2
113
Finally, I sincerely thank my co-authors, Qi, Runzhi, Vincent, Ziqi @wzq016, Heng @hengjinlp , and Julia. See you in Vienna! 🇦🇹 (7/7)
0
0
1
We also show that RL reduces the arbitrariness of FOL. Arbitrariness is where one can express an NL phrase into different FOLs (predicate names/arities). We show that rounds of RL reduce corpus-wide arbitrariness, which explains the gain in the entailment-preserving rate. (6/7)
1
0
2
As a result, the model trained with our objective achieves the best EPR across all three datasets (EntailmentBank, eQASC, and e-SNLI). It outperforms other sentence-to-FOL translation systems, including semantic representation-based methods and end-to-end generative models. (5/7)
1
0
2
Next, we train an NL2FOL translator using NLI labels as rewards. First, using a base model, we obtain 16 translations from each of the premises and hypotheses. We reward FOLs that are involved in any entailment-preserving combination. We repeat 5 rounds of training. (4/7)
1
0
1
NL2FOL translation provides a reliable method for logical reasoning. However, its application is mostly limited to (near-)synthetic reasoning tasks. How can we improve an NL2FOL translator to catch the diverse semantics of natural language expressed in NLI tasks? (2/7)
1
0
1
I am happy to announce that my first-author paper has been accepted to ACL 2025 Main! 🇦🇹 https://t.co/LUnMGpueTF We tackle natural language-first-order logic translation (NL2FOL) using reinforcement learning with NLI labels as rewards. (1/7)
3
12
96
🚀Our ICML 2025 paper introduces "Premise-Augmented Reasoning Chains" - a structured approach to induce explicit dependencies in reasoning chains. By revealing the dependencies within chains, we significantly improve how LLM reasoning can be verified. 🧵[1/n]
1
25
74
📢 I will be presenting **SymBa: Symbolic Backward Chaining for Structured Natural Language Reasoning** at NAACL 2025! Poster: 5/1 2:00-3:30 Hall 3 Let's chat about neurosymbolic reasoning and reasoning evaluation! I will also attend the complex reasoning BoF😁 See you at ABQ!
I am happy to announce that my first-author paper is accepted to NAACL 2025 Main! Existing backward chaining (top-down reasoning) methods are incomplete, leading to suboptimal performance. We build SymBa, a complete neuro-symbolic backward chaining method using SLD-Resolution.
0
7
36