jyo_pari Profile Banner
Jyo Pari Profile
Jyo Pari

@jyo_pari

Followers
2K
Following
755
Media
25
Statuses
121

Working on continual learning | PhD @MIT

Cambridge, MA
Joined December 2021
Don't wanna be here? Send us removal request.
@jyo_pari
Jyo Pari
2 months
What if an LLM could update its own weights?. Meet SEAL🦭: a framework where LLMs generate their own training data (self-edits) to update their weights in response to new inputs. Self-editing is learned via RL, using the updated model’s downstream performance as reward.
Tweet media one
131
529
3K
@jyo_pari
Jyo Pari
8 hours
RT @a1zhang: announcing the @GPU_MODE x @scaleml summer speaker series happening next week, a 5⃣-day series where top researchers will teac….
0
24
0
@grok
Grok
8 days
Generate videos in just a few seconds. Try Grok Imagine, free for a limited time.
416
687
3K
@jyo_pari
Jyo Pari
8 hours
Finally, this isn't possible without the amazing speakers!. @sewon__min, @Guangxuan_Xiao, @chrismdesa, @SonglinYang4, @simran_s_arora, @exists_forall. and the co-organizers!.@a1zhang, @HanGuo97 , @SonglinYang4, @pulkitology, @yoonrkim, @tri_dao, @lateinteraction.
0
0
1
@jyo_pari
Jyo Pari
8 hours
from aug 25 (mon) - aug 29 (fri), we dedicate each day to an invited speaker on a specific component of frontier models, e.g. PEs, MoEs, GPU programming, etc. for more details, see . the event will be live-streamed and recorded:
Tweet card summary image
youtube.com
A GPU reading group and community https://discord.gg/gpumode Supplementary content here https://github.com/gpu-mode Created by Mark Saroufim and Andreas Kƶpf
1
0
1
@jyo_pari
Jyo Pari
8 hours
We have a fun collaboration of @GPU_MODE x @scaleml coming up!. We’re hosting a week-long online bootcamp that explores the core components of GPT-OSS while also diving into cutting-edge research that pushes beyond what’s currently in GPT-OSS!. For example, how can MoE's power
Tweet media one
1
11
34
@jyo_pari
Jyo Pari
1 month
If you are interested in questioning how we should pretrain models and create new architectures for general reasoning . - then checkout E606 @ ICML, our position by @seungwookh and I on potential directions for the next generation reasoning models!
Tweet media one
0
5
21
@jyo_pari
Jyo Pari
1 month
MoE Routers are trained a bit strangely but things seem to still work. @minyoung_huh and I got curious about combining specialized experts at test time through routing… and ended up deep in the weeds of MoE optimization. Here's a blog post! .
Tweet media one
2
19
140
@jyo_pari
Jyo Pari
1 month
Current adaptive tokenizers still rely on humans to set the desired fidelity a priori. But what if the model could learn that itself? . The part I like a lot about this paper beyond the high level idea is the way @ShivamDuggal4 trained for this ability. Cudos šŸŽ‡!.
@ShivamDuggal4
Shivam Duggal
1 month
Compression is the heart of intelligence.From Occam to Kolmogorov—shorter programs=smarter representations. Meet KARL: Kolmogorov-Approximating Representation Learning. Given an image, token budget T & target quality šœ– —KARL finds the smallest t≤T to reconstruct it within šœ–šŸ§µ
Tweet media one
0
0
5
@jyo_pari
Jyo Pari
2 months
Thanks @willknight for covering SEAL! Really appreciate the thoughtful and insightful way you captured the work.
@WIRED
WIRED
2 months
Scientists at Massachusetts Institute of Technology have devised a way for large language models to keep learning on the fly—a step toward building AI that continually improves itself.
1
5
13
@jyo_pari
Jyo Pari
2 months
RT @AdamZweiger: An underrated and potentially more practical aspect of our Self-Adapting LMs paper is the potential for general pre/post-t….
0
5
0
@jyo_pari
Jyo Pari
2 months
RT @akyurekekin: There are three types of storage: activations (in-context), external memory, and model weights. If the models will spend….
0
16
0
@jyo_pari
Jyo Pari
2 months
RT @jxmnop: it really is incredible what kinds of things become possible when RL on LLMs works. clearly we’re just getting started.
0
54
0
@jyo_pari
Jyo Pari
2 months
@AdamZweiger and I had an amazing group to help us. Huge thanks to @HanGuo97 and @akyurekekin for the invaluable guidance throughout this project, and to @yoonrkim and @pulkitology for being incredible advisors. Paper: Website:
6
7
79
@jyo_pari
Jyo Pari
2 months
Limitations / Future Work: One of our original motivations was to work towards the ultimate goal of continual learning—think about agents continually self-adapting based on their interactions in an environment. While SEAL doesn't explicitly train for this, we still were curious
Tweet media one
4
3
67
@jyo_pari
Jyo Pari
2 months
You may have noticed that generations kept increasing after each round of RL. This is expected since we get more diverse content containing relevant information. Could we just prompt the base model to generate longer sequences instead? We find that prompting for longer
Tweet media one
2
0
50
@jyo_pari
Jyo Pari
2 months
Here is an example passage (Input Context) along with SEAL's self-edit generations (Rewrite) and subsequent responses to downstream questions after each round of RL.
Tweet media one
1
0
56
@jyo_pari
Jyo Pari
2 months
While RL training is done in the single passage regime, where we can easily quantify the contribution of each self-edit generation, the SEAL model's self-edits are still useful in a continued pretraining setting, where we incorporate many passages in a single update.
Tweet media one
1
0
54
@jyo_pari
Jyo Pari
2 months
For incorporating knowledge from a passage into weights, we find that after 2 rounds of RL training, each on a batch of 50 passages, self-editing even matches using synthetic data generated by GPT-4.1.
Tweet media one
1
2
64
@jyo_pari
Jyo Pari
2 months
In the few-shot domain, we outperform both ICL and self-edits from the base model, though we still don't reach the optimal human-crafted test-time training (TTT) configuration. Note: these results are from a curated subset that is easier for small LMs.
Tweet media one
2
2
68
@jyo_pari
Jyo Pari
2 months
We explore two settings: (1) incorporating knowledge from a passage, where self-edits are generated text in the form of "implications" of the passage, and (2) adapting to few-shot examples on ARC, where self-edits are tool-calls for data augmentation and optimization params, as
Tweet media one
3
2
67