
Saining Xie
@sainingxie
Followers
21K
Following
5K
Media
57
Statuses
583
researcher in #deeplearning #computervision | assistant professor at @NYU_Courant @nyuniversity | previous: research scientist @metaai (FAIR) @UCSanDiego #YNWA
Joined July 2020
The three biggest hps for stable training in everything are lr, bs, and beta2. We’ve built up good intuitions on how to tune them over time, but this lays it all out analytically and convincingly. this is definitely my new handbook for training big models on small gpus.
🚨 Did you know that small-batch vanilla SGD without momentum (i.e. the first optimizer you learn about in intro ML) is virtually as fast as AdamW for LLM pretraining on a per-FLOP basis? 📜 1/n
3
21
204
RT @_albertgu: I converted one of my favorite talks I've given over the past year into a blog post. "On the Tradeoffs of SSMs and Transfor….
0
113
0
RT @jonLorraine9: @sainingxie @joserf28323 @CVPR @ICCVConference @nyuniversity As the original poster of this strategy (sorry!), I agree th….
0
1
0
internet at its peak--just look at how people roasted him in quotes/comments 6 months ago. Again, I think this is very wrong, but can you really blame the students if the community was encouraging this idea, and then suddenly next day they’re being treated like the big villain?.
DO NOT DO THIS. I have previously raised this for Ethics Review when I saw it in a paper. You are not sneaky.
4
2
56
RT @AlexiGlad: How can we unlock generalized reasoning?. ⚡️Introducing Energy-Based Transformers (EBTs), an approach that out-scales (feed-….
0
244
0
@joserf28323 @CVPR @ICCVConference @nyuniversity Thanks for bringing this to my attention. I honestly wasn’t aware of the situation until the recent posts started going viral. I would never encourage my students to do anything like this—if I were serving as an Area Chair, any paper with this kind of prompt would be.
3
1
48
RT @sedielem: This looks like a great deep dive on neural network architectures for diffusion models. tl;dr use a Transformer, but there's….
0
10
0
RT @ManlingLi_: Can VLMs build Spatial Mental Models like humans?. Reasoning from limited views?.Reasoning from partial observations?.Reaso….
0
56
0
awesome work by @jiacheng_chen_ and @sanghyunwoo1219 on 3D-grounded visual compositing (and nice demos!).
Introducing BlenderFusion: Reassemble your visual elements—objects, camera, and background—to compose a new visual narrative. Play the interactive demo:
4
9
56
metaquery is now open-source — with both the data and code available.
The code and instruction-tuning data for MetaQuery are now open-sourced!.Code: Data: Two months ago, we released MetaQuery, a minimal training recipe for SOTA unified understanding and generation models. We showed that tuning few.
2
7
56
RT @tallinzen: I'm hiring at least one post-doc! We're interested in creating language models that process language more like humans than m….
0
52
0
guys, real geospatial data is a total goldmine for digital agents. step away from the web browser and get real. (we explored a bit in but building a simulation-ready pipeline like this could take things way further).
Virtual Community provides an online pipeline that automatically generates 3D scenes from real geospatial data, performing comprehensive cleaning and enhancement of both geometry and texture — including mesh simplification, texture refinement, object placement, and automatic
4
18
103
wait, speaking of false dichotomies---during your phd, you *can* write code, dive into data and systems, collaborate with a team, and build useful things---all while enjoying complete openness and the freedom to pursue what *genuinely* excites you.
i left my phd before joining openai. working in industry demands more rigor – you don’t just need to convince reviewer 2 with a nice graph and an ego-cite, it better actually work if it’s underwriting billions in research investment. not saying it always pans out that way in.
11
10
298
RT @mathusmassias: New paper on the generalization of Flow Matching 🤯 Why does flow matching generalize? Did you k….
0
232
0
RT @FeuerBenjamin: So excited to announce the DCVLR (Data Curation for Vision-Language Reasoning) competition at NeurIPS 2025, led by @Oumi….
0
11
0
RT @rohanpaul_ai: This is really BAD news of LLM's coding skill. ☹️. The best Frontier LLM models achieve 0% on hard real-life Programming….
0
317
0
RT @deedydas: LLMs are far worse at competitive programming than we thought. Every one scored 0% on Hard problems. LiveCodeBench-Pro is a….
0
218
0