Nan Jiang @nanjiang_cs X Profile

Nan Jiang

@nanjiang_cs

Followers

10K

Following

14K

Media

150

Statuses

2K

machine learning researcher, with focus on reinforcement learning. assoc prof @ uiuc cs. Course on RL theory (w/ videos): https://t.co/vqVKwY4RJE

https://t.co/2r6KkAJgFu

Joined November 2017

Don't wanna be here? Send us removal request.

Nan Jiang

@nanjiang_cs

5 years

Learning Q* with + poly-sized exploratory data + an arbitrary Q-class that contains Q* ...has seemed impossible for yrs, or so I believed when I talked at @RLtheory 2mo ago. And what's the saying? Impossible is NOTHING https://t.co/5pib5AxUOz Exciting new work w/@tengyangx! 1/

6

15

118

Nan Jiang

@nanjiang_cs

3 days

On one hand I hope this is just yet another bill that is put out to attract eyes. otoh, with increasing warning from univ ("don't give talk at Chinese univ or your federal funding may be banned"), part of me is like "just implement that already to give us peace of mind"

(((ل()(ل() 'yoav))))👾

@yoavgo

3 days

the language here is somewhat ambiguous but if it indeed includes a US funding ban on any researcher who supervised a PhD student with Chinese or Iranian citizenship, this is an INSANE bill.

2

0

5

Shuaichen Chang

@ShuaichenChang

5 days

ICLR reviews are out. One of my assigned papers was withdrawn right after reviews were released. This has happened to me several times this year across top conferences. These papers are often poorly written with full of undefined new terms, missing citations, and sometimes

11

7

169

Nan Jiang

@nanjiang_cs

6 days

aurora over cornfield (literally) tonight caveat: colors look way dimmer in naked eye compared to what phone camera captures… when I first saw the green part thought it was just clouds 😅

3

1

42

Nan Jiang

@nanjiang_cs

12 days

quack quack

1

0

11

Nan Jiang

@nanjiang_cs

15 days

Another rabbit hole that haunted me for YEARS and I am glad I finally figured it out! Not directly useful in the project and I spent a good chunk of the last week on it 🫠 In-Sample Moments "Generalize" under Overfitting https://t.co/GWFI7RIhHC

0

2

73

Nan Jiang

@nanjiang_cs

20 days

One of the consequences that’s my pet peeve: we say its MDP and treat state reset as granted, but how ridiciulously difficult it is to rigorously reset state (when agent only sees pixel obs) can be surprising https://t.co/lH6Pp18Onh

Pablo Samuel Castro

@pcastr

20 days

What's the MDP state space? Atari frames? Nope, single Atari frames are not Markovian => for Markovian policies, design choices like frame skipping/stacking & max-pooling were taken. *This means we're dealing with a POMDP!* And these choices matter a ton (see image below)! 4/X

3

4

42

Pablo Samuel Castro

@pcastr

20 days

🚨The Formalism-Implementation Gap in RL research🚨 Lots of progress in RL research over last 10 years, but too much performance-driven => overfitting to benchmarks (like the ALE). 1⃣ Let's advance science of RL 2⃣ Let's be explicit about how benchmarks map to formalism 1/X

2

27

155

Nan Jiang

@nanjiang_cs

24 days

got confused by something basic and went down a rabbit hole, so I just wrote a blogpost about it. "Is Density vs. Feature Coverage That Different?" https://t.co/rbHPx2g6rv

0

6

47

Csaba Szepesvari

@CsabaSzepesvari

29 days

@karpathy @karpathy I think it would be good to distinguish RL as a problem from the algorithms that people use to address RL problems. This would allow us to discuss if the problem is with the algorithms, or if the problem is with posing a problem as an RL problem. 1/x

9

38

416

Shane Gu

@shaneguML

1 month

It's the "model" in model-based RL

Junyang Lin

@JustinLin610

1 month

what is a world model?

4

7

97

Nan Jiang

@nanjiang_cs

1 month

asked LLMs and model responses are mixed, and as usual grok (free ver) kept making up plausible sounding argument and pissed me off 🙃 if it's indeed a problem, what's the community's perception on the issue? is it widely known and corrected later, or people just don't care?

0

1

Nan Jiang

@nanjiang_cs

1 month

problem is if you just draw the random noises in the most straightforward way, the actions observed in data may not be optimal for the random draw in simulation but we assume bayesian optimal agents. seems some rejection sampling/importance reweighting is needed...

1

0

Nan Jiang

@nanjiang_cs

1 month

Econ ppl: I learned Erdem & Keane'96 and maximum simulated likelihood from wife (who will teach this in phd seminar). my understanding is that when they simulate distribution of purchasing action at time t they just replay previous data actions w/o reweighting. that's biased??

2

0

2

Nan Jiang

@nanjiang_cs

1 month

is this useful anywhere: a feed-fwd net, when inputs are viewed as weights and weights viewed as inputs, is a recurrent net (is it true)...? I imagine that this could be relevant for those who perturb inputs (adv. robust?).

3

0

18

Nan Jiang

@nanjiang_cs

1 month

Reliability from the perspective of content creators who dug into the model-gen summaries (e.g., deep research) and fact-checked them with experts. These kind of reports seem valuable and complementary to what's done in academic/industry research

0

1

6

Dylan Foster 🐢

@canondetortugas

1 month

Excited to announce our NeurIPS ’25 tutorial: Foundations of Imitation Learning: From Language Modeling to Continuous Control With Adam Block & Max Simchowitz (@max_simchowitz)

6

51

360

Pramod Goyal

@goyal__pramod

1 month

My favorite thing to do, Dive deep into research blogs of people from different labs (Thinking machines in this case!).

4

29

442

Nan Jiang

@nanjiang_cs

1 month

Happy mid-autumn festival! 中秋快乐！

1

2

55

Nan Jiang

@nanjiang_cs

1 month

I'm probably one of the very few who still cover PSRs in course! With some cute matrix multiplication animation in slides. It has been treated as a very obscure topic but really it's just low-rank + hankelness... slides: https://t.co/ZjVTyXEIc2 video:

Tom Silver

@tomssilver

1 month

This week's #PaperILike is "Predictive Representations of State" (Littman et al., 2001). A lesser known classic that is overdue for a revival. Fans of POMDPs will enjoy. PDF:

4

12

87

Nan Jiang

@nanjiang_cs

1 month

back to building sim from data, while the job after that is classic RL (as sample-based planning), the entire pipeline is one of offline RL. the current practice of divide-and-conquer is likely naive, and one day we may build sim in a "RL-aware" manner. (3/3)

1

0

12