Nan Jiang @nanjiangwill X Profile

Nan Jiang

@nanjiangwill

Followers

84

Following

128

Media

5

Statuses

31

building stuff

https://t.co/20Y9baGYIz

San Francisco, CA

Joined January 2018

Don't wanna be here? Send us removal request.

Christopher Manning

@chrmanning

1 month

This paper by Ivan Lee (@ivn1e) & @BergKirkpatrick was great! Best thing I’ve seen at #COLM2025 so far! Readability ≠ Learnability: Rethinking the Role of Simplicity in Training Small Language Models https://t.co/lDpWWKaBLg

openreview.net

Recent studies suggest that very small language models (SLMs) can generate surprisingly coherent text when trained on simplified, child-directed corpora such as TinyStories. These findings have...

5

34

274

Songlin Yang

@SonglinYang4

6 months

📢 (1/16) Introducing PaTH 🛣️ — a RoPE-free contextualized position encoding scheme, built for stronger state tracking, better extrapolation, and hardware-efficient training. PaTH outperforms RoPE across short and long language modeling benchmarks https://t.co/nJItUuYKWZ

arxiv.org

The attention mechanism is a core primitive in modern large language models (LLMs) and AI more broadly. Since attention by itself is permutation-invariant, position encoding is essential for...

9

94

542

Wenting Zhao

@wzhao_nlp

6 months

Some personal news: I'll join @UMassAmherst CS as an assistant professor in fall 2026. Until then, I'll postdoc at @Meta nyc. Reasoning will continue to be my main interest, with a focus on data-centric approaches🤩 If you're also interested, apply to me (phds & a postdoc)!

94

31

852

Nan Jiang

@nanjiangwill

7 months

amazing Jason, amazing Nexad, please check this out!

Jason Hu

@onjas_6

7 months

Let’s be real—ads have annoyed me for years. Pop-ups, spam, etc… while the world is moving towards AGI, the ad world felt stuck in the past. So I decided to flip the script. Today, I’m proud to share: Nexad has raised a $6M seed round, led by @a16z SR04, @Prosus_Ventures ,

1

0

1

Wenting Zhao

@wzhao_nlp

8 months

Coding agents can debug their own outputs, but what if none of the fixes are correct? We overcome sparse rewards by making them continuous📈 Instead of having binary execution rewards, we introduce a learned verifier to measure how close the current solution is to a correct one📏

2

26

208

Sasha Rush

@srush_nlp

1 year

I teach a class where students code up an ML library from scratch in Python. Wenting showed me that a Claude Agent (with interactive unit test feedback and the spec) could solve it 100%. We thought it would be fun to scale this idea to every Python library in the world.

Wenting Zhao

@wzhao_nlp

1 year

Introducing the commit0 interactive environment for coding agents. Challenge: generate Python libraries from scratch. Commit0 is designed with interactivity, dependencies, and specifications as first-class considerations. We include a benchmark with 50+ challenging libraries.

8

37

392

Nan Jiang

@nanjiangwill

1 year

So... can agents now build a package from scratch? Test them on Commit0! This is an amazing and fun project this summer! Huge thanks to Wenting and to everyone in the lab for their support and guidance! 🚀👏

Wenting Zhao

@wzhao_nlp

1 year

Introducing the commit0 interactive environment for coding agents. Challenge: generate Python libraries from scratch. Commit0 is designed with interactivity, dependencies, and specifications as first-class considerations. We include a benchmark with 50+ challenging libraries.

0

2

9

Xiuyu Li

@xiuyu_l

2 years

Handling long context in LLMs is expensive, but can we cut the cost by learning them offline for a specific set/genre of documents? Introducing LLoCO, our new technique that learns documents offline through context compression and in-domain finetuning using LoRA, which archives

7

52

262

Jason Hu

@onjas_6

2 years

🚀 Introducing RouterBench, the first comprehensive benchmark for evaluating LLM routers! 🎉 A collaboration between @withmartian and Prof. @KurtKeutzer at @UCBerkeley, we've created the first holistic framework to assess LLM routing systems. 🧵1/8 To read more:

8

30

132

Wai Keen Vong

@wkvong

2 years

1/ Today in Science, we train a neural net from scratch through the eyes and ears of one child. The model learns to map words to visual referents, showing how grounded language learning from just one child's perspective is possible with today's AI tools. https://t.co/hPZiiQt6Vv

50

681

3K

Nan Jiang

@nanjiangwill

2 years

We're excited to contribute to the exploration of alternative architectures and emergent capabilities!! 🎉🎉🎉 Huge congrats and many thanks to Ivan Lee and Prof. Taylor Berg-Kirkpatrick @BergKirkpatrick 🧵[9/9]

0

1

Nan Jiang

@nanjiangwill

2 years

Section 3.1: A Simple Few-Shot Natural Language Task 1) Stronger models tend to have worse performance when not relying on semantics 2) Most architectures fail in the flipped setting while Hyena is the best model compared to other models that is not pre-trained. 🧵[8/9]

1

0

1

Nan Jiang

@nanjiangwill

2 years

Section 3: ICL in the real world: LM and Commonsense Reasoning 1) Most architectures exhibit an abrupt improvement in ICL score 2) Models in the same family behave similarly 3) We use these models that trained from scratch to perform generation task, see Section 3.1 🧵[7/9]

1

0

1

Nan Jiang

@nanjiangwill

2 years

Section 2: Effects of Training Data Distribution on Omniglot 1) ICL does not emerge when trained on purely non-bursty examples. 2) Some architectures(Llama 2, GPT-2, Heyna, H3, RWKV) are predisposed towards ICL and others are predisposed towards memorization. 🧵[6/9]

1

0

2

Nan Jiang

@nanjiangwill

2 years

Section 1: ICL to Associative Recall, Linear Regression and Multiclass Classification 1) All architectures can perform ICL 2) Many architectures are poor at extrapolation while some attention alternatives are good at it 3) Many models are comparable to transformer 🧵[5/9]

1

0

2

Nan Jiang

@nanjiangwill

2 years

We discover that 1) all architectures can perform ICL under certain conditions. 2) emerging attention alternatives with sub-quadratic time and memory complexity are more robust in-context learners than transformer. 🧵[4/9]

1

0

3

Nan Jiang

@nanjiangwill

2 years

We evaluate 12 architectures from 4 families across a suite of synthetic ICL tasks. 🧵[3/9]

1

0

2

Nan Jiang

@nanjiangwill

2 years

Background: While transformers demonstrate impressive in-context learning (ICL) capabilities, they face challenges such as quadratic time complexity and increased memory demand. So, could other architectures effectively perform ICL? 🧵[2/9]

1

0

1

Nan Jiang

@nanjiangwill

2 years

❓Are attention-based models needed for In-Context Learning(ICL)? 🤔Can emerging architectures perform ICL? 🎉Check out our #ICLR2024 paper "Exploring the Relationship Between Model Architecture and In-Context Learning Ability" 🎉 #LLM Paper: https://t.co/NzCMVRnPWg 🧵[1/9]

arxiv.org

What is the relationship between model architecture and the ability to perform in-context learning? In this empirical study, we take the first steps toward answering this question. We evaluate...

1

29

123

Marcus Min

@marcusjmin

2 years

🚨 #GPT4 doesn't understand the code/specification written by itself!? 🚨 🥳 Check out our #ICLR2024 paper "Beyond Accuracy: Evaluating Self-Consistency of Code Large Language Models with ldentityChain" 🥳#LLM Paper: https://t.co/Mu7MTvd6t7 Code: https://t.co/aPQL0GptPB [1/6]

10

5

12