Conor Hayes
@conorfhayes
Followers
158
Following
417
Media
0
Statuses
73
Research scientist @ Cognizant AI Lab
Oakland, CA
Joined September 2017
🚨🚨 New 📜 from Cognizant AI Lab! 🚨🚨 We show that Evolution Strategies (ES) can scale to fine-tune LLMs with billions of parameters, beating RL on robustness, sample efficiency, and tolerance to long-horizon tasks without gradient computation. See @yule_gan's amazing 🧵below:
Reinforcement Learning (RL) has long been the dominant method for fine-tuning, powering many state-of-the-art LLMs. Methods like PPO and GRPO explore in action space. But can we instead explore directly in parameter space? YES we can. We propose a scalable framework for
1
0
4
Check out this nice tutorial video ( https://t.co/n63MxIhaso) from @yacinelearning I also did a live chat with him this morning — check out the recording ( https://t.co/aQBOPCIBVb) where I answered some questions from Yacine and the audience about our work :)
alright we're live in about 3 min to figure out how the guys manage to make evolutionary strategies works for finetuning LLMs tune in!
0
4
23
Join to ask questions about our paper with @yule_gan 🔥🔥
ladies and gentleman this thursday at 10:00 AM EST we are going to run a Q&A with @yule_gan one of the author of that nice LLM finetuning paper with evolution strategies tune in to ask him any dumb questions you might have on ES, RL, tickling LLMs, or what's next.
0
0
5
ladies and gentleman this thursday at 10:00 AM EST we are going to run a Q&A with @yule_gan one of the author of that nice LLM finetuning paper with evolution strategies tune in to ask him any dumb questions you might have on ES, RL, tickling LLMs, or what's next.
this sunday we are figuring out how folks scaled evolutionary optimization methods to LLMs kinda cool that old tricks are used again in modern times to great effects
5
2
78
we are live over here folks for the live paper review on evolution strategies at scale:
1
1
11
this sunday we are figuring out how folks scaled evolutionary optimization methods to LLMs kinda cool that old tricks are used again in modern times to great effects
19
22
457
🧵 As AI labs race to scale RL, one question matters: when should you stop pre-training and start RL? We trained 5 Qwen models (0.6B→14B) with RL on GSM8K and found something wild: Small models see EMERGENCE-LIKE jumps. Large models see diminishing returns. The scaling law?
38
120
758
If you are attending @COLM_conf and looking to hire a research scientist, I highly recommend you talk to my postdoc, Mathieu Reymond, who is in the job market and at the conference! Mathieu is an expert in mult-objective RL, multi-agent RL, RL for scientific discovery, and RL for
2
12
37
If ES really beats PPO/GRPO on reasoning this could be super compelling. Nice work @yule_gan & team.
Reinforcement Learning (RL) has long been the dominant method for fine-tuning, powering many state-of-the-art LLMs. Methods like PPO and GRPO explore in action space. But can we instead explore directly in parameter space? YES we can. We propose a scalable framework for
0
1
4
Evolution Strategies can be applied at scale to fine-tune LLMs, and outperforms PPO and GRPO in many model settings! Fantastic paper “Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning” by @yule_gan, Risto Miikkulainen and team. https://t.co/CEyX6Z5ulG
arxiv.org
Fine-tuning pre-trained large language models (LLMs) for down-stream tasks is a critical step in the AI deployment pipeline. Reinforcement learning (RL) is arguably the most prominent fine-tuning...
Reinforcement Learning (RL) has long been the dominant method for fine-tuning, powering many state-of-the-art LLMs. Methods like PPO and GRPO explore in action space. But can we instead explore directly in parameter space? YES we can. We propose a scalable framework for
10
36
279
Super cool work that shows finetuning with Evolution Strategies (ES) in the parameter space outperforms GRPO! Check out
github.com
This repo contains the source code for the paper "Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning" - VsonicV/es-fine-tuning-paper
Reinforcement Learning (RL) has long been the dominant method for fine-tuning, powering many state-of-the-art LLMs. Methods like PPO and GRPO explore in action space. But can we instead explore directly in parameter space? YES we can. We propose a scalable framework for
1
3
14
damn never thought my joke last year (noisekit, https://t.co/46fhRupnd3) could be made into a viable training algorithm 😂
github.com
Contribute to pharaouk/noisekit development by creating an account on GitHub.
Reinforcement Learning (RL) has long been the dominant method for fine-tuning, powering many state-of-the-art LLMs. Methods like PPO and GRPO explore in action space. But can we instead explore directly in parameter space? YES we can. We propose a scalable framework for
1
1
16
These are some of my favorite papers I have been a part of, because of the deep insight we learned while making them (driven by @joelbot3000 's genius). Glad to see them inspire cool new work!
Our work stands on the shoulders of giants: @jeffclune and @kenneth0stanley demonstrated the potential of ES in several very insightful papers, including https://t.co/z5iTMEPc2u and https://t.co/nBCOz3VpQF. Earlier, @SchmidhuberAI proposed Natural Evolution Strategies (NES)
1
4
26
What comes after Reinforcement Learning? Cognizant AI Lab scaled Evolution Strategies (ES) to fine-tune LLMs with billions of parameters — no gradients, less instability, and more efficiency. #finetuningllm #reinforcementlearning A new path forward begins here. 🔗 Blog | Paper
0
1
1
Our work stands on the shoulders of giants: @jeffclune and @kenneth0stanley demonstrated the potential of ES in several very insightful papers, including https://t.co/z5iTMEPc2u and https://t.co/nBCOz3VpQF. Earlier, @SchmidhuberAI proposed Natural Evolution Strategies (NES)
2
8
121
Nice to see an exploration of the potential for ES (evolution strategies) in LLM fine tuning! Many potential advantages are discussed in this thread from @yule_gan .
Reinforcement Learning (RL) has long been the dominant method for fine-tuning, powering many state-of-the-art LLMs. Methods like PPO and GRPO explore in action space. But can we instead explore directly in parameter space? YES we can. We propose a scalable framework for
4
16
147
Thanks for sharing! @rohanpaul_ai We hope our paper provides insights into a new direction for LLM fine-tuning
The paper shows that evolution strategies can fine tune full LLMs at scale and often beat reinforcement learning on reasoning. The key finding is that parameter space search with only outcome scores can outperform token level RL across models and tasks. It tweaks whole models,
2
4
19
despite the bitter lesson, i’m still convinced that evolutionary strategies—particularly with an elegant genotype-to-phenotype map that directs the development of a neural architecture—will play a major role in future advances in ai.
The paper shows that evolution strategies can fine tune full LLMs at scale and often beat reinforcement learning on reasoning. The key finding is that parameter space search with only outcome scores can outperform token level RL across models and tasks. It tweaks whole models,
0
1
2
The paper shows that evolution strategies can fine tune full LLMs at scale and often beat reinforcement learning on reasoning. The key finding is that parameter space search with only outcome scores can outperform token level RL across models and tasks. It tweaks whole models,
8
40
237