nanjiang_cs Profile Banner
Nan Jiang Profile
Nan Jiang

@nanjiang_cs

Followers
10K
Following
14K
Media
150
Statuses
2K

machine learning researcher, with focus on reinforcement learning. assoc prof @ uiuc cs. Course on RL theory (w/ videos): https://t.co/vqVKwY4RJE

Joined November 2017
Don't wanna be here? Send us removal request.
@nanjiang_cs
Nan Jiang
5 years
Learning Q* with + poly-sized exploratory data + an arbitrary Q-class that contains Q* ...has seemed impossible for yrs, or so I believed when I talked at @RLtheory 2mo ago. And what's the saying? Impossible is NOTHING https://t.co/5pib5AxUOz Exciting new work w/@tengyangx! 1/
6
15
118
@nanjiang_cs
Nan Jiang
3 days
On one hand I hope this is just yet another bill that is put out to attract eyes. otoh, with increasing warning from univ ("don't give talk at Chinese univ or your federal funding may be banned"), part of me is like "just implement that already to give us peace of mind"
@yoavgo
(((ل()(ل() 'yoav))))👾
3 days
the language here is somewhat ambiguous but if it indeed includes a US funding ban on any researcher who supervised a PhD student with Chinese or Iranian citizenship, this is an INSANE bill.
2
0
5
@ShuaichenChang
Shuaichen Chang
5 days
ICLR reviews are out. One of my assigned papers was withdrawn right after reviews were released. This has happened to me several times this year across top conferences. These papers are often poorly written with full of undefined new terms, missing citations, and sometimes
11
7
169
@nanjiang_cs
Nan Jiang
6 days
aurora over cornfield (literally) tonight caveat: colors look way dimmer in naked eye compared to what phone camera captures… when I first saw the green part thought it was just clouds 😅
3
1
42
@nanjiang_cs
Nan Jiang
12 days
quack quack
1
0
11
@nanjiang_cs
Nan Jiang
15 days
Another rabbit hole that haunted me for YEARS and I am glad I finally figured it out! Not directly useful in the project and I spent a good chunk of the last week on it 🫠 In-Sample Moments "Generalize" under Overfitting https://t.co/GWFI7RIhHC
0
2
73
@nanjiang_cs
Nan Jiang
20 days
One of the consequences that’s my pet peeve: we say its MDP and treat state reset as granted, but how ridiciulously difficult it is to rigorously reset state (when agent only sees pixel obs) can be surprising https://t.co/lH6Pp18Onh
@pcastr
Pablo Samuel Castro
20 days
What's the MDP state space? Atari frames? Nope, single Atari frames are not Markovian => for Markovian policies, design choices like frame skipping/stacking & max-pooling were taken. *This means we're dealing with a POMDP!* And these choices matter a ton (see image below)! 4/X
3
4
42
@pcastr
Pablo Samuel Castro
20 days
🚨The Formalism-Implementation Gap in RL research🚨 Lots of progress in RL research over last 10 years, but too much performance-driven => overfitting to benchmarks (like the ALE). 1⃣ Let's advance science of RL 2⃣ Let's be explicit about how benchmarks map to formalism 1/X
2
27
155
@nanjiang_cs
Nan Jiang
24 days
got confused by something basic and went down a rabbit hole, so I just wrote a blogpost about it. "Is Density vs. Feature Coverage That Different?" https://t.co/rbHPx2g6rv
0
6
47
@CsabaSzepesvari
Csaba Szepesvari
29 days
@karpathy @karpathy I think it would be good to distinguish RL as a problem from the algorithms that people use to address RL problems. This would allow us to discuss if the problem is with the algorithms, or if the problem is with posing a problem as an RL problem. 1/x
9
38
416
@shaneguML
Shane Gu
1 month
It's the "model" in model-based RL
@JustinLin610
Junyang Lin
1 month
what is a world model?
4
7
97
@nanjiang_cs
Nan Jiang
1 month
asked LLMs and model responses are mixed, and as usual grok (free ver) kept making up plausible sounding argument and pissed me off 🙃 if it's indeed a problem, what's the community's perception on the issue? is it widely known and corrected later, or people just don't care?
0
0
1
@nanjiang_cs
Nan Jiang
1 month
problem is if you just draw the random noises in the most straightforward way, the actions observed in data may not be optimal for the random draw in simulation but we assume bayesian optimal agents. seems some rejection sampling/importance reweighting is needed...
1
0
0
@nanjiang_cs
Nan Jiang
1 month
Econ ppl: I learned Erdem & Keane'96 and maximum simulated likelihood from wife (who will teach this in phd seminar). my understanding is that when they simulate distribution of purchasing action at time t they just replay previous data actions w/o reweighting. that's biased??
2
0
2
@nanjiang_cs
Nan Jiang
1 month
is this useful anywhere: a feed-fwd net, when inputs are viewed as weights and weights viewed as inputs, is a recurrent net (is it true)...? I imagine that this could be relevant for those who perturb inputs (adv. robust?).
3
0
18
@nanjiang_cs
Nan Jiang
1 month
Reliability from the perspective of content creators who dug into the model-gen summaries (e.g., deep research) and fact-checked them with experts. These kind of reports seem valuable and complementary to what's done in academic/industry research
0
1
6
@canondetortugas
Dylan Foster 🐢
1 month
Excited to announce our NeurIPS ’25 tutorial: Foundations of Imitation Learning: From Language Modeling to Continuous Control With Adam Block & Max Simchowitz (@max_simchowitz)
6
51
360
@goyal__pramod
Pramod Goyal
1 month
My favorite thing to do, Dive deep into research blogs of people from different labs (Thinking machines in this case!).
4
29
442
@nanjiang_cs
Nan Jiang
1 month
Happy mid-autumn festival! 中秋快乐!
1
2
55
@nanjiang_cs
Nan Jiang
1 month
I'm probably one of the very few who still cover PSRs in course! With some cute matrix multiplication animation in slides. It has been treated as a very obscure topic but really it's just low-rank + hankelness... slides: https://t.co/ZjVTyXEIc2 video:
@tomssilver
Tom Silver
1 month
This week's #PaperILike is "Predictive Representations of State" (Littman et al., 2001). A lesser known classic that is overdue for a revival. Fans of POMDPs will enjoy. PDF:
4
12
87
@nanjiang_cs
Nan Jiang
1 month
back to building sim from data, while the job after that is classic RL (as sample-based planning), the entire pipeline is one of offline RL. the current practice of divide-and-conquer is likely naive, and one day we may build sim in a "RL-aware" manner. (3/3)
1
0
12