yule_gan Profile Banner
Yulu Gan Profile
Yulu Gan

@yule_gan

Followers
1K
Following
729
Media
10
Statuses
80

PhD student @MITEECS @MIT_CSAIL @MIT_CBMM / ex @PKU1898 @MSFTResearch (M)LLM Reasoning, Neuroevolution, Emergence of Intelligence, Understanding Intelligence

Cambridge, MA
Joined October 2022
Don't wanna be here? Send us removal request.
@yule_gan
Yulu Gan
4 days
Reinforcement Learning (RL) has long been the dominant method for fine-tuning, powering many state-of-the-art LLMs. Methods like PPO and GRPO explore in action space. But can we instead explore directly in parameter space? YES we can. We propose a scalable framework for
88
386
3K
@yule_gan
Yulu Gan
8 hours
We’ve opened a Discussion Forum on GitHub for our paper on using evolution strategies to fine-tune LLMs. If you have any questions, suggestions, or thoughts about this research direction, feel free to join the discussion — there are already a few discussion threads live:
2
0
22
@cognizantailab
Cognizant AI Lab
3 days
Our researchers show Evolution Strategies (ES) scale to billions of LLM parameters, outperforming RL (PPO, GRPO) in robustness, sample efficiency, and long-horizon stability. 📄 Paper: https://t.co/nj3SJbLa2N 💻 Code: https://t.co/dNg9njvGcV #LLMs #AIResearch #RL
1
2
6
@SophieLWang
Sophie Wang
3 days
LLMs, trained only on text, might already know more about other modalities than we realized; we just need to find ways elicit it. project page: https://t.co/8cIf1DW0OQ w/ @phillip_isola and @thisismyhat
16
62
567
@yule_gan
Yulu Gan
3 days
Thanks to my amazing collaborators — Xin Qiu, @conorfhayes , @catherineliangq , Elliot Meyerson, @babakatwork and Risto Miikkulainen — and to Cognizant @cognizantailab for an incredible summer internship experience in SF! 🌉
1
1
30
@yule_gan
Yulu Gan
4 days
To recap — ES can outperform RL for LLM fine-tuning. No gradients. No reward hacking. Just stability, efficiency, and scalability. ES shows low variance across seeds, minimal hyperparameter sensitivity, and strong reward–KL tradeoffs — all without actor-critic complexity.
0
3
77
@yule_gan
Yulu Gan
4 days
Another key advantage of ES fine-tuning is its reliability. It runs stably across seeds, barely depends on hyperparameters, and avoids reward hacking — all while skipping gradients and actor-critic setups. In the figure, you can see ES finds a much better reward–KL balance than
1
4
70
@yule_gan
Yulu Gan
4 days
On the symbolic-reasoning Countdown task, ES beats PPO/GRPO across Qwen-2.5 (0.5B–7B) & Llama-3 (1B–8B) with huge gains. Moreover, as shown in TinyZero by @jiayi_pirate and DeepSeek-R1, RL fails on small models like Qwen-0.5B — yet ES succeeds! 🚀
1
3
63
@yule_gan
Yulu Gan
4 days
As noted in DeepSeek-R1 and other studies, RL fine-tuning has several limitations, including challenges with long-horizon and outcome-only rewards, low sample efficiency, high-variance credit assignment, instability, and reward hacking. ES sidesteps these issues: it perturbs
4
6
79
@yule_gan
Yulu Gan
4 days
Our work stands on the shoulders of giants: @jeffclune and @kenneth0stanley demonstrated the potential of ES in several very insightful papers, including https://t.co/z5iTMEPc2u and https://t.co/nBCOz3VpQF. Earlier, @SchmidhuberAI proposed Natural Evolution Strategies (NES)
2
8
119
@yule_gan
Yulu Gan
4 days
Thanks for sharing! @rohanpaul_ai We hope our paper provides insights into a new direction for LLM fine-tuning
@rohanpaul_ai
Rohan Paul
4 days
The paper shows that evolution strategies can fine tune full LLMs at scale and often beat reinforcement learning on reasoning. The key finding is that parameter space search with only outcome scores can outperform token level RL across models and tasks. It tweaks whole models,
2
4
19
@phillip_isola
Phillip Isola
4 months
Our computer vision textbook is now available for free online here: https://t.co/ERy2Spc7c2 We are working on adding some interactive components like search and (beta) integration with LLMs. Hope this is useful and feel free to submit Github issues to help us improve the text!
Tweet card summary image
visionbook.mit.edu
35
618
3K
@RichardSSutton
Richard Sutton
6 months
I’ve changed so little. From my 1978 Bachelor’s thesis: “The adult human mind is very complex, but the question remains open whether the learning processes that constructed it in interaction with the environment are similarly complex. Much evidence and many peoples’ intuitions
10
64
551
@sainingxie
Saining Xie
9 months
When I first saw diffusion models, I was blown away by how naturally they scale during inference: you train them with fixed flops, but during test time, you can ramp it up by like 1,000x. This was way before it became a big deal with o1. But honestly, the scaling isn’t that
@ma_nanye
Willis (Nanye) Ma
9 months
Inference-time scaling for LLMs drastically improves the model's ability in many perspectives, but what about diffusion models? In our latest study—Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps—we reframe inference-time scaling as a search problem
9
70
474
@deepseek_ai
DeepSeek
9 months
🚀 DeepSeek-R1 is here! ⚡ Performance on par with OpenAI-o1 📖 Fully open-source model & technical report 🏆 MIT licensed: Distill & commercialize freely! 🌐 Website & API are live now! Try DeepThink at https://t.co/v1TFy7LHNy today! 🐋 1/n
2K
7K
37K
@DrJimFan
Jim Fan
9 months
Introducing NVIDIA Cosmos, an open-source, open-weight Video World Model. It's trained on 20M hours of videos and weighs from 4B to 14B. Cosmos offers two flavors: diffusion (continuous tokens) and autoregressive (discrete tokens); and two generation modes: text->video and
96
744
4K
@kenneth0stanley
Kenneth Stanley
9 months
@daniel_mac8 @nickcammarata @jeffclune I think my former OpenAI colleague @nickcammarata makes a good point here that is very aligned with arguments from open-endedness, though said in his own unique way. I’d add that I think Nick is largely pointing to the perennial benchmark obsession that dominates the field. And
1
8
48
@haotiant1998
Haotian Tang
9 months
Personal update: I am excited to share that I will join @GoogleDeepMind next week after defending my PhD thesis @MITEECS earlier last month. I will be working on generative models that simulate the physical world. Looking forward to the new journey ahead in 2025!
73
53
2K
@akarshkumar0101
Akarsh Kumar
10 months
Very excited to share ASAL! Artificial Life aims to recreate natural evolution, but is severely bottlenecked by hand-designed simulations. We propose using CLIP to automatically discover the interesting ALife simulations!
@SakanaAILabs
Sakana AI
10 months
Introducing ASAL: Automating the Search for Artificial Life with Foundation Models https://t.co/uUq63UNrjv Artificial Life (ALife) research holds key insights that can transform and accelerate progress in AI. By speeding up ALife discovery with AI, we accelerate our
7
29
236
@yule_gan
Yulu Gan
10 months
RT @JeffDean: I and other members of the Gemini team are looking forward to chatting with @NeurIPS attendees tomorrow at the @GoogleDeepMin
0
1
0