Brian Lester
@blester125
Followers
453
Following
93
Media
12
Statuses
93
Senior Research Engineer at Google Deep Mind working on parameter-efficient adaptation and few-shot generalization, mostly within NLP. View are my own. he/him
Joined July 2013
Is Kevin onto something? We found that LLMs can struggle to understand compressed text, unless you do some specific tricks. Check out https://t.co/DRO2IbTFCg and help @hoonkp, @alemi, Jeffrey Pennington, @ada_rob, @jaschasd, @noahconst and I make Kevin’s dream a reality.
0
6
15
We just pushed a new update adding support for the (very impressive) safetensors library from our friends at @huggingface! Git-Theta's plug-in system meant that we spent more time waiting on CI/CD than actually adding support (I'll get off my soapbox now 🧼📦).
Introducing Git-Theta, a Git extension that enables collaborative and continual development of ML models with merges, diffs, and parameter-efficient updates—all using the standard Git workflow! 📄 https://t.co/UejQ1WWg85 💽 https://t.co/ED5K2ZvYA6 🗣️ https://t.co/ehMFk2E5sw 🧵⬇️
0
3
20
This was joint work will wonderful collaborators: @kandpal_nikhil @Muqeeth10 @anisham197 @montymevans Vishal Baskaran @TenghaoHuang45 @liu_haokun and @colinraffel
3
0
10
Git-Theta is designed around plug-ins—this means that if we don’t support your favorite framework, merging strategy, or parameter-efficient update yet, you can add it! Join us on GitHub https://t.co/ED5K2ZvYA6 or Zulip https://t.co/ehMFk2E5sw to start contributing!
1
1
13
In our ICML paper https://t.co/UejQ1WWg85, we describe the design and implementation of Git-Theta and show that it supports a collaborative workflow involving continually adapting and modifying a pre-trained model, all while saving significant communication and space.
1
0
14
All of this functionality is integrated with the standard Git workflow—after running git theta track on your model checkpoint, you can git add, branch, merge, and commit as usual! Git-Theta is compatible with any Git remote that supports Git LFS (GitHub, Hugging Face Hub, etc.)
1
0
12
Git-Theta leverages model checkpoint structure to provide meaningful diffs between model versions. During a `git merge`, Git-Theta offers a suite of interactive merge resolution strategies, such as parameter averaging, that can be applied to individual weights.
1
0
15
When using Git/Git LFS to track a model checkpoint, any change to any parameter model re-saves the whole checkpoint. Git-Theta supports incremental updates to ML models, either by changing a subset of the parameters or via parameter-efficient updates like LoRA.
1
0
17
Introducing Git-Theta, a Git extension that enables collaborative and continual development of ML models with merges, diffs, and parameter-efficient updates—all using the standard Git workflow! 📄 https://t.co/UejQ1WWg85 💽 https://t.co/ED5K2ZvYA6 🗣️ https://t.co/ehMFk2E5sw 🧵⬇️
5
83
409
.@MotiveStudio, I saw @PyTorch in the licenses for @deadspace. Are you using it as a GPU-accelerated linear algebra library or are there actually neural nets running during the game? #deadspace #deadspaceremake
0
0
0
While parameter-efficient tuning methods are originally proposed to reduce computation & storage costs, it turns out they can help overcome catastrophic forgetting and thus improve performance on zero-shot cross-lingual generation. Checkout our work @GoogleAI @emnlpmeeting👇1/10
1
30
107
Am I missing something wrt to the name "gradient checkpointing"? Clearing cached activations and recomputing them in the backwards pass seems like the opposite of checkpointing. The name makes it sound like we are storing the activations on disk. https://t.co/C1nKvpno0B
2
0
1
We are presenting SPoT: Better Frozen Model Adaption through Soft Prompt Transfer @aclmeeting today during the 2pm in-person ML for NLP poster session and tomorrow at the 7:30am virtual poster session (virtual session w/@tuvuumass). #acl2022 #NLProc #ACLinDublin #acl2022nlp
1
1
8
Happy to share our soft prompt transfer (SPoT) paper made it to #ACL2022 🎉. On the SuperGLUE leaderboard, SPoT is the first parameter-efficient approach that is competitive with methods that tune billions of parameters. w/ @blester125, @noahconst, @aboSamoor, @daniel_m_cer
Sharing my internship work @GoogleAI: 1) w/ Soft Prompt Transfer, Prompt Tuning matches or significantly outperforms Model Tuning across model sizes, 2) tasks can help each other via their prompts & task prompts can be used as task embeddings to formalize task similarity. 🧵 1/8
2
9
55
The blog post for my EMNLP 2021 paper on Prompt Tuning is out! Writing for a blog is pretty different than writing for a conference, so if anything was confusing in the paper maybe this will help it click (or you could have just asked me lol)
Fine-tuning pre-trained models is common in NLP, but forking the model for each task can be a burden. Prompt tuning adds a small set of learnable vectors to the input and can match fine-tuning quality while sharing the same frozen model across all tasks. https://t.co/NKHhMzk056
0
2
14
Huge thanks to my collaborators, the people who have put this library through its paces, and the T5X and Flaxformer authors! @noahconst @aboSamoor @tuvuumass @daniel_m_cer @GreenBeanDou @ada_rob @hwchung27 @anselmlevskaya and so many more.
0
0
7
It took a bit, but, like the best desserts, it needed to cool before we could bite in. Our code for Prompt Tuning has been open sourced! It enables training with all T5 sizes on TPU, reproducing our results, and is a great starting point for YOUR work. https://t.co/hxqaIOe0oG
3
10
43
In addition to the impressive performance gains, I'm incredibly excited about how this work opens new exploration of targeted transfer learning via prompt similarity. I can't wait to see what gets built on this!
Sharing my internship work @GoogleAI: 1) w/ Soft Prompt Transfer, Prompt Tuning matches or significantly outperforms Model Tuning across model sizes, 2) tasks can help each other via their prompts & task prompts can be used as task embeddings to formalize task similarity. 🧵 1/8
0
1
5
A huge shout out to my amazing mentors, @noahconst and @aboSamoor, who were a big part of making this project possible. (7/7)
0
0
5