Jinu Lee Profile
Jinu Lee

@jinulee_v

Followers
472
Following
182
Media
17
Statuses
57

PhD Student @UIUC_NLP. Interested in *semantics of reasoning*, from neuro-symbolic methods to reasoning evaluation/improvement in LLMs. Ex-Intern @MSFTResearch

Joined November 2023
Don't wanna be here? Send us removal request.
@jinulee_v
Jinu Lee
3 months
Happy to announce that "Evaluating step-by-step reasoning traces" is accepted to EMNLP 2025 Findings! Check out the survey for (1) different criteria about what is a good reasoning trace/step (2) datasets and methods for evaluating reasoning traces: https://t.co/pcNtI8cHdD
5
8
81
@jinulee_v
Jinu Lee
15 hours
Balancing versatility and reliability in long-form text evaluation is no easy task. Yukyung’s work (and other pioneers) inspired me to explore rubric-based checklists, and they’re proving powerful for evaluating complex chains of thought! Check her cool work in EMNLP✅
@yukyunglee_
Yukyung Lee
16 hours
How reliable is your LLM-as-a-Judge?⚖️ Existing methods suffer from (a) rating inconsistencies and (b) low stability of correlation with human judgments across evaluator models. Excited to share CheckEval (@emnlpmeeting), a framework that improves reliability for LLM-as-a-Judge.
0
0
4
@chaechaek1214
Chaeeun Kim @EMNLP2025
1 day
I'm at #EMNLP2025 this week in Suzhou to present LegalSearchLM! 🤗 Wed, 16:30-18:00 📄paper: https://t.co/lqu90I1596 Come see how we - use first-token-aware autoregressive LMs as retrievers with an FM-index for legal-element reasoning in complex legal case retrieval - release
0
2
16
@FrederickXZhang
Xinliang (Frederick) Zhang
21 days
How do LLMs really navigate the thinking space? Straight off to a final answer OR follow a wiggly path? Definitely commit OR get stuck to “infinite” self-doubting? In our latest study, we unravel (over-)thinking through the lens of sub-thoughts: https://t.co/Wb5AIcbI6a more in 🧵
2
25
59
@jinulee_v
Jinu Lee
3 months
Life updates: I am back at Illinois! I had a wonderful summer in @MSFTResearch training reasoning models for verifiable programming😎
1
0
49
@BrandoHablando
Brando Miranda
4 months
🔄 We were nominated for Oral+top 1 in the MATH-AI workshp at #ICML! 🚨Why? ≈46 % of GitHub commits are AI-generated—but can we verify them correct? 📢 VeriBench challenges agents; turn Python into Lean code! 🧵1/14 📃 Paper: https://t.co/QPCxg5lKM4
1
19
39
@jinulee_v
Jinu Lee
4 months
I will be presenting at 7/28(Mon) 11:00-12:30, Hall 4/5. Would love to chat about reasoning, CoT, NLP+formal methods and more! Can't wait to meet old and new friends😁
@jinulee_v
Jinu Lee
6 months
I am happy to announce that my first-author paper has been accepted to ACL 2025 Main! 🇦🇹 https://t.co/LUnMGpueTF We tackle natural language-first-order logic translation (NL2FOL) using reinforcement learning with NLI labels as rewards. (1/7)
1
1
22
@_jasonwei
Jason Wei
4 months
New blog post about asymmetry of verification and "verifier's law": https://t.co/bvS8HrX1jP Asymmetry of verification–the idea that some tasks are much easier to verify than to solve–is becoming an important idea as we have RL that finally works generally. Great examples of
53
245
2K
@lifan__yuan
Lifan Yuan
5 months
We always want to scale up RL, yet simply training longer doesn't necessarily push the limits - exploration gets impeded by entropy collapse. We show that the performance ceiling is surprisingly predictable, and the collapse is driven by covariance between logp and advantage.
9
94
540
@RyoKamoi
Ryo Kamoi
6 months
📢 New paper! FoVer enhances PRMs for step-level verification of LLM reasoning w/o human annotation 🚀 We synthesize training data using formal verification tools and improve LLMs at step-level verification of LLM responses on MATH, AIME, MMLU, BBH, etc. https://t.co/gdK6BC7rJv
4
26
127
@jinulee_v
Jinu Lee
6 months
Life update: Just started my summer internship in @MSFTResearch (Redmond, WA)! Happy to chat with fellow MSR people, or anyone around the wider Seattle area 🏙️
3
2
113
@jinulee_v
Jinu Lee
6 months
Finally, I sincerely thank my co-authors, Qi, Runzhi, Vincent, Ziqi @wzq016, Heng @hengjinlp , and Julia. See you in Vienna! 🇦🇹 (7/7)
0
0
1
@jinulee_v
Jinu Lee
6 months
We also show that RL reduces the arbitrariness of FOL. Arbitrariness is where one can express an NL phrase into different FOLs (predicate names/arities). We show that rounds of RL reduce corpus-wide arbitrariness, which explains the gain in the entailment-preserving rate. (6/7)
1
0
2
@jinulee_v
Jinu Lee
6 months
As a result, the model trained with our objective achieves the best EPR across all three datasets (EntailmentBank, eQASC, and e-SNLI). It outperforms other sentence-to-FOL translation systems, including semantic representation-based methods and end-to-end generative models. (5/7)
1
0
2
@jinulee_v
Jinu Lee
6 months
Next, we train an NL2FOL translator using NLI labels as rewards. First, using a base model, we obtain 16 translations from each of the premises and hypotheses. We reward FOLs that are involved in any entailment-preserving combination. We repeat 5 rounds of training. (4/7)
1
0
1
@jinulee_v
Jinu Lee
6 months
First, we formally define the entailment-preserving rate (EPR). Given a set of premises and a following hypothesis, we want the FOL translation of the premises to also logically entail the hypothesis. Based on this idea, we define three metrics: EPR, EPR@K, EPR@K-Oracle. (3/7)
1
0
0
@jinulee_v
Jinu Lee
6 months
NL2FOL translation provides a reliable method for logical reasoning. However, its application is mostly limited to (near-)synthetic reasoning tasks. How can we improve an NL2FOL translator to catch the diverse semantics of natural language expressed in NLI tasks? (2/7)
1
0
1
@jinulee_v
Jinu Lee
6 months
I am happy to announce that my first-author paper has been accepted to ACL 2025 Main! 🇦🇹 https://t.co/LUnMGpueTF We tackle natural language-first-order logic translation (NL2FOL) using reinforcement learning with NLI labels as rewards. (1/7)
3
12
96
@saagnikkk
Sagnik Mukherjee
6 months
🚀Our ICML 2025 paper introduces "Premise-Augmented Reasoning Chains" - a structured approach to induce explicit dependencies in reasoning chains. By revealing the dependencies within chains, we significantly improve how LLM reasoning can be verified. 🧵[1/n]
1
25
74
@jinulee_v
Jinu Lee
6 months
📢 I will be presenting **SymBa: Symbolic Backward Chaining for Structured Natural Language Reasoning** at NAACL 2025! Poster: 5/1 2:00-3:30 Hall 3 Let's chat about neurosymbolic reasoning and reasoning evaluation! I will also attend the complex reasoning BoF😁 See you at ABQ!
@jinulee_v
Jinu Lee
10 months
I am happy to announce that my first-author paper is accepted to NAACL 2025 Main! Existing backward chaining (top-down reasoning) methods are incomplete, leading to suboptimal performance. We build SymBa, a complete neuro-symbolic backward chaining method using SLD-Resolution.
0
7
36