Yulu Gan @yule_gan X Profile

Yulu Gan

@yule_gan

Followers

1K

Following

729

Media

10

Statuses

80

PhD student @MITEECS @MIT_CSAIL @MIT_CBMM / ex @PKU1898 @MSFTResearch (M)LLM Reasoning, Neuroevolution, Emergence of Intelligence, Understanding Intelligence

https://t.co/l3phfstNET

Cambridge, MA

Joined October 2022

Don't wanna be here? Send us removal request.

Yulu Gan

@yule_gan

4 days

Reinforcement Learning (RL) has long been the dominant method for fine-tuning, powering many state-of-the-art LLMs. Methods like PPO and GRPO explore in action space. But can we instead explore directly in parameter space? YES we can. We propose a scalable framework for

88

386

3K

Yulu Gan

@yule_gan

8 hours

We’ve opened a Discussion Forum on GitHub for our paper on using evolution strategies to fine-tune LLMs. If you have any questions, suggestions, or thoughts about this research direction, feel free to join the discussion — there are already a few discussion threads live:

2

0

22

Cognizant AI Lab

@cognizantailab

3 days

Our researchers show Evolution Strategies (ES) scale to billions of LLM parameters, outperforming RL (PPO, GRPO) in robustness, sample efficiency, and long-horizon stability. 📄 Paper: https://t.co/nj3SJbLa2N 💻 Code: https://t.co/dNg9njvGcV #LLMs #AIResearch #RL

1

2

6

Sophie Wang

@SophieLWang

3 days

LLMs, trained only on text, might already know more about other modalities than we realized; we just need to find ways elicit it. project page: https://t.co/8cIf1DW0OQ w/ @phillip_isola and @thisismyhat

16

62

567

Yulu Gan

@yule_gan

3 days

Thanks to my amazing collaborators — Xin Qiu, @conorfhayes , @catherineliangq , Elliot Meyerson, @babakatwork and Risto Miikkulainen — and to Cognizant @cognizantailab for an incredible summer internship experience in SF! 🌉

1

30

Yulu Gan

@yule_gan

4 days

To recap — ES can outperform RL for LLM fine-tuning. No gradients. No reward hacking. Just stability, efficiency, and scalability. ES shows low variance across seeds, minimal hyperparameter sensitivity, and strong reward–KL tradeoffs — all without actor-critic complexity.

0

3

77

Yulu Gan

@yule_gan

4 days

Another key advantage of ES fine-tuning is its reliability. It runs stably across seeds, barely depends on hyperparameters, and avoids reward hacking — all while skipping gradients and actor-critic setups. In the figure, you can see ES finds a much better reward–KL balance than

1

4

70

Yulu Gan

@yule_gan

4 days

On the symbolic-reasoning Countdown task, ES beats PPO/GRPO across Qwen-2.5 (0.5B–7B) & Llama-3 (1B–8B) with huge gains. Moreover, as shown in TinyZero by @jiayi_pirate and DeepSeek-R1, RL fails on small models like Qwen-0.5B — yet ES succeeds! 🚀

1

3

63

Yulu Gan

@yule_gan

4 days

As noted in DeepSeek-R1 and other studies, RL fine-tuning has several limitations, including challenges with long-horizon and outcome-only rewards, low sample efficiency, high-variance credit assignment, instability, and reward hacking. ES sidesteps these issues: it perturbs

4

6

79

Yulu Gan

@yule_gan

4 days

Our work stands on the shoulders of giants: @jeffclune and @kenneth0stanley demonstrated the potential of ES in several very insightful papers, including https://t.co/z5iTMEPc2u and https://t.co/nBCOz3VpQF. Earlier, @SchmidhuberAI proposed Natural Evolution Strategies (NES)

2

8

119

Yulu Gan

@yule_gan

4 days

Thanks for sharing! @rohanpaul_ai We hope our paper provides insights into a new direction for LLM fine-tuning

Rohan Paul

@rohanpaul_ai

4 days

The paper shows that evolution strategies can fine tune full LLMs at scale and often beat reinforcement learning on reasoning. The key finding is that parameter space search with only outcome scores can outperform token level RL across models and tasks. It tweaks whole models,

2

4

19

Phillip Isola

@phillip_isola

4 months

Our computer vision textbook is now available for free online here: https://t.co/ERy2Spc7c2 We are working on adding some interactive components like search and (beta) integration with LLMs. Hope this is useful and feel free to submit Github issues to help us improve the text!

visionbook.mit.edu

35

618

3K

Richard Sutton

@RichardSSutton

6 months

I’ve changed so little. From my 1978 Bachelor’s thesis: “The adult human mind is very complex, but the question remains open whether the learning processes that constructed it in interaction with the environment are similarly complex. Much evidence and many peoples’ intuitions

10

64

551

Saining Xie

@sainingxie

9 months

When I first saw diffusion models, I was blown away by how naturally they scale during inference: you train them with fixed flops, but during test time, you can ramp it up by like 1,000x. This was way before it became a big deal with o1. But honestly, the scaling isn’t that

Willis (Nanye) Ma

@ma_nanye

9 months

Inference-time scaling for LLMs drastically improves the model's ability in many perspectives, but what about diffusion models? In our latest study—Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps—we reframe inference-time scaling as a search problem

9

70

474

DeepSeek

@deepseek_ai

9 months

🚀 DeepSeek-R1 is here! ⚡ Performance on par with OpenAI-o1 📖 Fully open-source model & technical report 🏆 MIT licensed: Distill & commercialize freely! 🌐 Website & API are live now! Try DeepThink at https://t.co/v1TFy7LHNy today! 🐋 1/n

2K

7K

37K

Jim Fan

@DrJimFan

9 months

Introducing NVIDIA Cosmos, an open-source, open-weight Video World Model. It's trained on 20M hours of videos and weighs from 4B to 14B. Cosmos offers two flavors: diffusion (continuous tokens) and autoregressive (discrete tokens); and two generation modes: text->video and

96

744

4K

Kenneth Stanley

@kenneth0stanley

9 months

@daniel_mac8 @nickcammarata @jeffclune I think my former OpenAI colleague @nickcammarata makes a good point here that is very aligned with arguments from open-endedness, though said in his own unique way. I’d add that I think Nick is largely pointing to the perennial benchmark obsession that dominates the field. And

1

8

48

Haotian Tang

@haotiant1998

9 months

Personal update: I am excited to share that I will join @GoogleDeepMind next week after defending my PhD thesis @MITEECS earlier last month. I will be working on generative models that simulate the physical world. Looking forward to the new journey ahead in 2025!

73

53

2K

Akarsh Kumar

@akarshkumar0101

10 months

Very excited to share ASAL! Artificial Life aims to recreate natural evolution, but is severely bottlenecked by hand-designed simulations. We propose using CLIP to automatically discover the interesting ALife simulations!

Sakana AI

@SakanaAILabs

10 months

Introducing ASAL: Automating the Search for Artificial Life with Foundation Models https://t.co/uUq63UNrjv Artificial Life (ALife) research holds key insights that can transform and accelerate progress in AI. By speeding up ALife discovery with AI, we accelerate our

7

29

236

Yulu Gan

@yule_gan

10 months

RT @JeffDean: I and other members of the Gemini team are looking forward to chatting with @NeurIPS attendees tomorrow at the @GoogleDeepMin…

0

1

0