blester125 Profile Banner
Brian Lester Profile
Brian Lester

@blester125

Followers
453
Following
93
Media
12
Statuses
93

Senior Research Engineer at Google Deep Mind working on parameter-efficient adaptation and few-shot generalization, mostly within NLP. View are my own. he/him

Joined July 2013
Don't wanna be here? Send us removal request.
@blester125
Brian Lester
2 years
Is Kevin onto something? We found that LLMs can struggle to understand compressed text, unless you do some specific tricks. Check out https://t.co/DRO2IbTFCg and help @hoonkp, @alemi, Jeffrey Pennington, @ada_rob, @jaschasd, @noahconst and I make Kevin’s dream a reality.
0
6
15
@blester125
Brian Lester
2 years
We just pushed a new update adding support for the (very impressive) safetensors library from our friends at @huggingface! Git-Theta's plug-in system meant that we spent more time waiting on CI/CD than actually adding support (I'll get off my soapbox now 🧼📦).
@blester125
Brian Lester
2 years
Introducing Git-Theta, a Git extension that enables collaborative and continual development of ML models with merges, diffs, and parameter-efficient updates—all using the standard Git workflow! 📄 https://t.co/UejQ1WWg85 💽 https://t.co/ED5K2ZvYA6 🗣️ https://t.co/ehMFk2E5sw 🧵⬇️
0
3
20
@blester125
Brian Lester
2 years
This was joint work will wonderful collaborators: @kandpal_nikhil @Muqeeth10 @anisham197 @montymevans Vishal Baskaran @TenghaoHuang45 @liu_haokun and @colinraffel
3
0
10
@blester125
Brian Lester
2 years
Git-Theta is designed around plug-ins—this means that if we don’t support your favorite framework, merging strategy, or parameter-efficient update yet, you can add it! Join us on GitHub https://t.co/ED5K2ZvYA6 or Zulip https://t.co/ehMFk2E5sw to start contributing!
1
1
13
@blester125
Brian Lester
2 years
In our ICML paper https://t.co/UejQ1WWg85, we describe the design and implementation of Git-Theta and show that it supports a collaborative workflow involving continually adapting and modifying a pre-trained model, all while saving significant communication and space.
1
0
14
@blester125
Brian Lester
2 years
All of this functionality is integrated with the standard Git workflow—after running git theta track on your model checkpoint, you can git add, branch, merge, and commit as usual! Git-Theta is compatible with any Git remote that supports Git LFS (GitHub, Hugging Face Hub, etc.)
1
0
12
@blester125
Brian Lester
2 years
Git-Theta leverages model checkpoint structure to provide meaningful diffs between model versions. During a `git merge`, Git-Theta offers a suite of interactive merge resolution strategies, such as parameter averaging, that can be applied to individual weights.
1
0
15
@blester125
Brian Lester
2 years
When using Git/Git LFS to track a model checkpoint, any change to any parameter model re-saves the whole checkpoint. Git-Theta supports incremental updates to ML models, either by changing a subset of the parameters or via parameter-efficient updates like LoRA.
1
0
17
@blester125
Brian Lester
2 years
Introducing Git-Theta, a Git extension that enables collaborative and continual development of ML models with merges, diffs, and parameter-efficient updates—all using the standard Git workflow! 📄 https://t.co/UejQ1WWg85 💽 https://t.co/ED5K2ZvYA6 🗣️ https://t.co/ehMFk2E5sw 🧵⬇️
5
83
409
@blester125
Brian Lester
3 years
.@MotiveStudio, I saw @PyTorch in the licenses for @deadspace. Are you using it as a GPU-accelerated linear algebra library or are there actually neural nets running during the game? #deadspace #deadspaceremake
0
0
0
@tuvllms
Tu Vu
3 years
While parameter-efficient tuning methods are originally proposed to reduce computation & storage costs, it turns out they can help overcome catastrophic forgetting and thus improve performance on zero-shot cross-lingual generation. Checkout our work @GoogleAI @emnlpmeeting👇1/10
1
30
107
@blester125
Brian Lester
3 years
Am I missing something wrt to the name "gradient checkpointing"? Clearing cached activations and recomputing them in the backwards pass seems like the opposite of checkpointing. The name makes it sound like we are storing the activations on disk. https://t.co/C1nKvpno0B
2
0
1
@daniel_m_cer
Daniel Cer
3 years
We are presenting SPoT: Better Frozen Model Adaption through Soft Prompt Transfer @aclmeeting today during the 2pm in-person ML for NLP poster session and tomorrow at the 7:30am virtual poster session (virtual session w/@tuvuumass). #acl2022 #NLProc #ACLinDublin #acl2022nlp
1
1
8
@tuvllms
Tu Vu
4 years
Happy to share our soft prompt transfer (SPoT) paper made it to #ACL2022 🎉. On the SuperGLUE leaderboard, SPoT is the first parameter-efficient approach that is competitive with methods that tune billions of parameters. w/ @blester125, @noahconst, @aboSamoor, @daniel_m_cer
@tuvllms
Tu Vu
4 years
Sharing my internship work @GoogleAI: 1) w/ Soft Prompt Transfer, Prompt Tuning matches or significantly outperforms Model Tuning across model sizes, 2) tasks can help each other via their prompts & task prompts can be used as task embeddings to formalize task similarity. 🧵 1/8
2
9
55
@blester125
Brian Lester
4 years
The blog post for my EMNLP 2021 paper on Prompt Tuning is out! Writing for a blog is pretty different than writing for a conference, so if anything was confusing in the paper maybe this will help it click (or you could have just asked me lol)
@GoogleAI
Google AI
4 years
Fine-tuning pre-trained models is common in NLP, but forking the model for each task can be a burden. Prompt tuning adds a small set of learnable vectors to the input and can match fine-tuning quality while sharing the same frozen model across all tasks. https://t.co/NKHhMzk056
0
2
14
@blester125
Brian Lester
4 years
Huge thanks to my collaborators, the people who have put this library through its paces, and the T5X and Flaxformer authors! @noahconst @aboSamoor @tuvuumass @daniel_m_cer @GreenBeanDou @ada_rob @hwchung27 @anselmlevskaya and so many more.
0
0
7
@blester125
Brian Lester
4 years
It took a bit, but, like the best desserts, it needed to cool before we could bite in. Our code for Prompt Tuning has been open sourced! It enables training with all T5 sizes on TPU, reproducing our results, and is a great starting point for YOUR work. https://t.co/hxqaIOe0oG
3
10
43
@blester125
Brian Lester
4 years
In addition to the impressive performance gains, I'm incredibly excited about how this work opens new exploration of targeted transfer learning via prompt similarity. I can't wait to see what gets built on this!
@tuvllms
Tu Vu
4 years
Sharing my internship work @GoogleAI: 1) w/ Soft Prompt Transfer, Prompt Tuning matches or significantly outperforms Model Tuning across model sizes, 2) tasks can help each other via their prompts & task prompts can be used as task embeddings to formalize task similarity. 🧵 1/8
0
1
5
@blester125
Brian Lester
4 years
A huge shout out to my amazing mentors, @noahconst and @aboSamoor, who were a big part of making this project possible. (7/7)
0
0
5