alanli2020 Profile Banner
Alan Li Profile
Alan Li

@alanli2020

Followers
117
Following
145
Media
5
Statuses
63

PhD @YaleNLP | undergrad @UWcse @nlpnoah | interned at @kotoba_tech

Joined November 2021
Don't wanna be here? Send us removal request.
@alanli2020
Alan Li
1 year
Thank you @kotoba_tech and special thanks to @jungokasai and @noriyuki_kojima! Wonderful and rewarding experience in Tokyo for the summer, surrounded by such a passionate team of talented engineers. Always excited about Kotoba's next release and look forward to keeping in touch!
@kotoba_tech
Kotoba Technologies
1 year
Kotoba's former intern, Alan Li (@alanli2020), is starting his CS PhD at @Yale. Best of luck on your PhD journey, and we'll stay in touch!
1
2
6
@KeisukeKamahori
Keisuke Kamahori
26 days
I will be attending #EMNLP2025 this week to present LiteASR, a compression method for speech encoders (a collaborative work with @kotoba_tech). Catch our poster at the first poster session on Wednesday morning. Happy to chat about efficiency, speech, or both!
@bariskasikci
Baris Kasikci
3 months
🚀 Presenting LiteASR: a method that halves the compute cost of speech encoders by 2x, leveraging low-rank approximation of activations. LiteASR is accepted to #EMNLP2025 (main) @emnlpmeeting
1
3
10
@ed_li_tianjin
Ed Li
1 month
As PhD students, we believe research automation systems should belong to everyone, not just Google, so we built freephdlabor. Customize your multi-agent system for end-to-end research that WORKS FOR YOUR DOMAIN within hours. full source code: https://t.co/NkiFnLwqVM
3
5
8
@YilunZhao_NLP
Yilun Zhao
1 month
If you are at #ICCV2025 - the Knowledge-Intensive Multimodal Reasoning Workshop is about to start, in Room 313 C !
@armancohan
Arman Cohan
1 month
If you are at #ICCV2025, join us today for the multimodal reasoning workshop! We have an amazing lineup of speakers and an exciting panel on the future of multimodal reasoning!
0
7
18
@alanli2020
Alan Li
3 months
Love the thread, thank you Rohan!
@rohanpaul_ai
Rohan Paul
3 months
New Harvard+Yale paper says, strong reasoning helps, but accessing the right knowledge first is what really limits performance. So knowledge recall is the main bottleneck in scientific problem solving with LLMs. They build benchmark suites SCIREAS and SCIREAS‑PRO to measure
1
1
5
@alanli2020
Alan Li
3 months
9/9 Thank you to all collaborators! @YixinLiu17 @arpsark @_DougDowney @armancohan
0
0
4
@alanli2020
Alan Li
3 months
8/9 This work is a collaboration between YaleNLP @yalenlp and Ai2 @allen_ai . Code/benchmark 📈 https://t.co/uCVKwpXhvl. Paper: 📄 https://t.co/XP8011DqsU Models: 🤗
Tweet card summary image
huggingface.co
1
0
3
@alanli2020
Alan Li
3 months
7/9 Takeaways: - Knowledge access remains a bottleneck. - Reasoning improves knowledge recall – even w/o knowledge injection. - Best results: reasoners + external knowledge. - Practitioners: do task-specific evals for cost-efficient large scale application.
1
0
2
@alanli2020
Alan Li
3 months
6/9 Finally, learning from our investigation, we release SciLit01, a strong 8B baseline model SFTed from Qwen3-Base using our Math+STEM data composition. Our data composition is competitive on scientific reasoning among concurrent efforts on reasoning enhancement SFT
1
0
2
@alanli2020
Alan Li
3 months
5/9 We SFT models in a controlled way on different sources of data. Using KRUX we find: (i) Retrieving task-relevant knowledge from parameters is a key bottleneck; (ii) Reasoning-fine-tuned models show complementary gains from explicit knowledge access; (iii) Long CoT
1
0
2
@alanli2020
Alan Li
3 months
4/9 Next, we design a new framework to study separate roles of knowledge vs reasoning: KRUX (Knowledge& Reasoning Utilization eXams). It pulls atomic “knowledge ingredients” (KIs) from reasoning traces. Prepend those KIs to the original question and test another model. KRUX
1
0
2
@alanli2020
Alan Li
3 months
3/9 Evaluating frontier models on SciReas, we observe patterns that otherwise remain obscure if just looking at individual benchmarks. Different LLMs may have their expertise in different tasks, even the same LLM can have significant performance gaps under different reasoning
1
0
2
@alanli2020
Alan Li
3 months
2/9 We introduce SciReas, and SciReas-Pro, efficient, comprehensive and reasoning-focused benchmarks to evaluate scientific problem-solving.
1
0
2
@alanli2020
Alan Li
3 months
1/9 🚀 New paper: Demystifying Scientific Problem-Solving in LLMs — How does reasoning enhancement affect knowledge recall, and do LLMs benefit from external knowledge complimentary to reasoning? Tldr; 📊 SciReas: holistic and efficient evaluation suite for scientific reasoning
1
2
16
@alanli2020
Alan Li
3 months
Update: It’s happening at 2pm! Exciting journey, Come and join us!
@AsafYehudai
Asaf Yehudai
3 months
Today at 4 PM, we’re presenting our tutorial: “Evaluating LLM-based Agents: Foundations, Best Practices, & Open Challenges” If you’re in Montreal for @IJCAIconf, come join us to dive into the future of #AgentEvaluation! 🇨🇦🤖 w. @RoyBarHaim @LilachEdel and @alanli2020
0
0
6
@armancohan
Arman Cohan
5 months
Excited for the release of SciArena with @allen_ai! LLMs are now an integral part of research workflows, and SciArena helps measure progress on scientific literature tasks. Also checkout the preprint for a lot more results/analyses. Led by: @YilunZhao_NLP, @kaiyan_z 📄 paper:
@allen_ai
Ai2
5 months
Introducing SciArena, a platform for benchmarking models across scientific literature tasks. Inspired by Chatbot Arena, SciArena applies a crowdsourced LLM evaluation approach to the scientific domain. 🧵
1
12
82
@HanSineng
Sophia S. Han @ NeurIPS
5 months
Excited to see more investigation into LLM creativity. We have some pioneering work on this topic as well: Creativity or Brute Force? Using Brainteasers as a Window into the Problem-Solving Abilities of Large Language Models. https://t.co/QNyQp1Zs80.
@YiyouSun
Yiyou Sun
5 months
🚨 New study on LLM's reasoning boundary! Can LLMs really think out of the box? We introduce OMEGA—a benchmark probing how they generalize: 🔹 RL boosts accuracy on slightly harder problems with familiar strategies, 🔹 but struggles with creative leaps & strategy composition. 👇
0
10
18
@pybeebee
Gabrielle Kaili-May Liu
6 months
🔥 Excited to share MetaFaith: Understanding and Improving Faithful Natural Language Uncertainty Expression in LLMs🔥 How can we make LLMs talk about uncertainty in a way that truly reflects what they internally "know"? Check out our new preprint to find out! Details in 🧵(1/n):
2
4
13
@HanSineng
Sophia S. Han @ NeurIPS
6 months
Besides natural language and formal language, truth table is also a great media for logical reasoning with a synergistic effect. Check out this cool idea from @LichangChen2!
@omarsar0
elvis
6 months
Learn to Reason via Mixture-of-Thought Interesting paper to improve LLM reasoning utilizing multiple reasoning modalities: - code - natural language - symbolic (truth-table) representations Cool idea and nice results. My notes below:
2
5
17
@overleaf
Overleaf
7 months
⚠️ Attention: The site is currently down. Our engineering team is investigating. We will update as soon as possible. You can track progress here: https://t.co/y7aRh5SBN4 Sorry for any inconvenience.
221
202
788
@hskhalaf
Hadi Khalaf
8 months
Happy to share we received best paper at NENLP workshop at Yale 🥳🥳! tldr: Current alignment methods give excessive discretion to annotators in defining what good behavior means. This means we don't know what we are aligning to ‼️ We formalize discretion in alignment and
3
4
23