
Jyo Pari
@jyo_pari
Followers
2K
Following
755
Media
25
Statuses
121
Working on continual learning | PhD @MIT
Cambridge, MA
Joined December 2021
What if an LLM could update its own weights?. Meet SEALš¦: a framework where LLMs generate their own training data (self-edits) to update their weights in response to new inputs. Self-editing is learned via RL, using the updated modelās downstream performance as reward.
131
529
3K
Finally, this isn't possible without the amazing speakers!. @sewon__min, @Guangxuan_Xiao, @chrismdesa, @SonglinYang4, @simran_s_arora, @exists_forall. and the co-organizers!.@a1zhang, @HanGuo97 , @SonglinYang4, @pulkitology, @yoonrkim, @tri_dao, @lateinteraction.
0
0
1
from aug 25 (mon) - aug 29 (fri), we dedicate each day to an invited speaker on a specific component of frontier models, e.g. PEs, MoEs, GPU programming, etc. for more details, see . the event will be live-streamed and recorded:
youtube.com
A GPU reading group and community https://discord.gg/gpumode Supplementary content here https://github.com/gpu-mode Created by Mark Saroufim and Andreas Kƶpf
1
0
1
If you are interested in questioning how we should pretrain models and create new architectures for general reasoning . - then checkout E606 @ ICML, our position by @seungwookh and I on potential directions for the next generation reasoning models!
0
5
21
MoE Routers are trained a bit strangely but things seem to still work. @minyoung_huh and I got curious about combining specialized experts at test time through routing⦠and ended up deep in the weeds of MoE optimization. Here's a blog post! .
2
19
140
Current adaptive tokenizers still rely on humans to set the desired fidelity a priori. But what if the model could learn that itself? . The part I like a lot about this paper beyond the high level idea is the way @ShivamDuggal4 trained for this ability. Cudos š!.
Compression is the heart of intelligence.From Occam to Kolmogorovāshorter programs=smarter representations. Meet KARL: Kolmogorov-Approximating Representation Learning. Given an image, token budget T & target quality š āKARL finds the smallest tā¤T to reconstruct it within šš§µ
0
0
5
Thanks @willknight for covering SEAL! Really appreciate the thoughtful and insightful way you captured the work.
Scientists at Massachusetts Institute of Technology have devised a way for large language models to keep learning on the flyāa step toward building AI that continually improves itself.
1
5
13
RT @AdamZweiger: An underrated and potentially more practical aspect of our Self-Adapting LMs paper is the potential for general pre/post-tā¦.
0
5
0
RT @akyurekekin: There are three types of storage: activations (in-context), external memory, and model weights. If the models will spendā¦.
0
16
0
@AdamZweiger and I had an amazing group to help us. Huge thanks to @HanGuo97 and @akyurekekin for the invaluable guidance throughout this project, and to @yoonrkim and @pulkitology for being incredible advisors. Paper: Website:
6
7
79