Alan Li @alanli2020 X Profile

Alan Li

@alanli2020

Followers

117

Following

145

Media

5

Statuses

63

PhD @YaleNLP | undergrad @UWcse @nlpnoah | interned at @kotoba_tech

https://t.co/WBDwoMqvqC

Joined November 2021

Don't wanna be here? Send us removal request.

Alan Li

@alanli2020

1 year

Thank you @kotoba_tech and special thanks to @jungokasai and @noriyuki_kojima! Wonderful and rewarding experience in Tokyo for the summer, surrounded by such a passionate team of talented engineers. Always excited about Kotoba's next release and look forward to keeping in touch!

Kotoba Technologies

@kotoba_tech

1 year

Kotoba's former intern, Alan Li (@alanli2020), is starting his CS PhD at @Yale. Best of luck on your PhD journey, and we'll stay in touch!

1

2

6

Keisuke Kamahori

@KeisukeKamahori

26 days

I will be attending #EMNLP2025 this week to present LiteASR, a compression method for speech encoders (a collaborative work with @kotoba_tech). Catch our poster at the first poster session on Wednesday morning. Happy to chat about efficiency, speech, or both!

Baris Kasikci

@bariskasikci

3 months

🚀 Presenting LiteASR: a method that halves the compute cost of speech encoders by 2x, leveraging low-rank approximation of activations. LiteASR is accepted to #EMNLP2025 (main) @emnlpmeeting

1

3

10

Ed Li

@ed_li_tianjin

1 month

As PhD students, we believe research automation systems should belong to everyone, not just Google, so we built freephdlabor. Customize your multi-agent system for end-to-end research that WORKS FOR YOUR DOMAIN within hours. full source code: https://t.co/NkiFnLwqVM

3

5

8

Yilun Zhao

@YilunZhao_NLP

1 month

If you are at #ICCV2025 - the Knowledge-Intensive Multimodal Reasoning Workshop is about to start, in Room 313 C !

Arman Cohan

@armancohan

1 month

If you are at #ICCV2025, join us today for the multimodal reasoning workshop! We have an amazing lineup of speakers and an exciting panel on the future of multimodal reasoning!

0

7

18

Alan Li

@alanli2020

3 months

Love the thread, thank you Rohan!

Rohan Paul

@rohanpaul_ai

3 months

New Harvard+Yale paper says, strong reasoning helps, but accessing the right knowledge first is what really limits performance. So knowledge recall is the main bottleneck in scientific problem solving with LLMs. They build benchmark suites SCIREAS and SCIREAS‑PRO to measure

1

5

Alan Li

@alanli2020

3 months

9/9 Thank you to all collaborators! @YixinLiu17 @arpsark @_DougDowney @armancohan

0

4

Alan Li

@alanli2020

3 months

8/9 This work is a collaboration between YaleNLP @yalenlp and Ai2 @allen_ai . Code/benchmark 📈 https://t.co/uCVKwpXhvl. Paper: 📄 https://t.co/XP8011DqsU Models: 🤗

huggingface.co

1

0

3

Alan Li

@alanli2020

3 months

7/9 Takeaways: - Knowledge access remains a bottleneck. - Reasoning improves knowledge recall – even w/o knowledge injection. - Best results: reasoners + external knowledge. - Practitioners: do task-specific evals for cost-efficient large scale application.

1

0

2

Alan Li

@alanli2020

3 months

6/9 Finally, learning from our investigation, we release SciLit01, a strong 8B baseline model SFTed from Qwen3-Base using our Math+STEM data composition. Our data composition is competitive on scientific reasoning among concurrent efforts on reasoning enhancement SFT

1

0

2

Alan Li

@alanli2020

3 months

5/9 We SFT models in a controlled way on different sources of data. Using KRUX we find: (i) Retrieving task-relevant knowledge from parameters is a key bottleneck; (ii) Reasoning-fine-tuned models show complementary gains from explicit knowledge access; (iii) Long CoT

1

0

2

Alan Li

@alanli2020

3 months

4/9 Next, we design a new framework to study separate roles of knowledge vs reasoning: KRUX (Knowledge& Reasoning Utilization eXams). It pulls atomic “knowledge ingredients” (KIs) from reasoning traces. Prepend those KIs to the original question and test another model. KRUX

1

0

2

Alan Li

@alanli2020

3 months

3/9 Evaluating frontier models on SciReas, we observe patterns that otherwise remain obscure if just looking at individual benchmarks. Different LLMs may have their expertise in different tasks, even the same LLM can have significant performance gaps under different reasoning

1

0

2

Alan Li

@alanli2020

3 months

2/9 We introduce SciReas, and SciReas-Pro, efficient, comprehensive and reasoning-focused benchmarks to evaluate scientific problem-solving.

1

0

2

Alan Li

@alanli2020

3 months

1/9 🚀 New paper: Demystifying Scientific Problem-Solving in LLMs — How does reasoning enhancement affect knowledge recall, and do LLMs benefit from external knowledge complimentary to reasoning? Tldr; 📊 SciReas: holistic and efficient evaluation suite for scientific reasoning

1

2

16

Alan Li

@alanli2020

3 months

Update: It’s happening at 2pm! Exciting journey, Come and join us!

Asaf Yehudai

@AsafYehudai

3 months

Today at 4 PM, we’re presenting our tutorial: “Evaluating LLM-based Agents: Foundations, Best Practices, & Open Challenges” If you’re in Montreal for @IJCAIconf, come join us to dive into the future of #AgentEvaluation! 🇨🇦🤖 w. @RoyBarHaim @LilachEdel and @alanli2020

0

6

Arman Cohan

@armancohan

5 months

Excited for the release of SciArena with @allen_ai! LLMs are now an integral part of research workflows, and SciArena helps measure progress on scientific literature tasks. Also checkout the preprint for a lot more results/analyses. Led by: @YilunZhao_NLP, @kaiyan_z 📄 paper:

Ai2

@allen_ai

5 months

Introducing SciArena, a platform for benchmarking models across scientific literature tasks. Inspired by Chatbot Arena, SciArena applies a crowdsourced LLM evaluation approach to the scientific domain. 🧵

1

12

82

Sophia S. Han @ NeurIPS

@HanSineng

5 months

Excited to see more investigation into LLM creativity. We have some pioneering work on this topic as well: Creativity or Brute Force? Using Brainteasers as a Window into the Problem-Solving Abilities of Large Language Models. https://t.co/QNyQp1Zs80.

Yiyou Sun

@YiyouSun

5 months

🚨 New study on LLM's reasoning boundary! Can LLMs really think out of the box? We introduce OMEGA—a benchmark probing how they generalize: 🔹 RL boosts accuracy on slightly harder problems with familiar strategies, 🔹 but struggles with creative leaps & strategy composition. 👇

0

10

18

Gabrielle Kaili-May Liu

@pybeebee

6 months

🔥 Excited to share MetaFaith: Understanding and Improving Faithful Natural Language Uncertainty Expression in LLMs🔥 How can we make LLMs talk about uncertainty in a way that truly reflects what they internally "know"? Check out our new preprint to find out! Details in 🧵(1/n):

2

4

13

Sophia S. Han @ NeurIPS

@HanSineng

6 months

Besides natural language and formal language, truth table is also a great media for logical reasoning with a synergistic effect. Check out this cool idea from @LichangChen2!

elvis

@omarsar0

6 months

Learn to Reason via Mixture-of-Thought Interesting paper to improve LLM reasoning utilizing multiple reasoning modalities: - code - natural language - symbolic (truth-table) representations Cool idea and nice results. My notes below:

2

5

17

Overleaf

@overleaf

7 months

⚠️ Attention: The site is currently down. Our engineering team is investigating. We will update as soon as possible. You can track progress here: https://t.co/y7aRh5SBN4 Sorry for any inconvenience.

221

202

788

Hadi Khalaf

@hskhalaf

8 months

Happy to share we received best paper at NENLP workshop at Yale 🥳🥳! tldr: Current alignment methods give excessive discretion to annotators in defining what good behavior means. This means we don't know what we are aligning to ‼️ We formalize discretion in alignment and

3

4

23