
Jaehoon Lee
@hoonkp
Followers
1K
Following
222
Media
6
Statuses
244
Researcher in machine learning with background in physics; Member of Technical Staff @AnthropicAI; Prev. Research scientist @GoogleDeepMind/@GoogleBrain.
San Francisco Bay Area, CA
Joined November 2009
Claude 4 models are here 🎉 From research to engineering, safety to product - this launch showcases what's possible when the entire Anthropic team comes together. Honored to be part of this journey! Claude has been transforming my daily workflow, hope it does the same for you!.
Introducing the next generation: Claude Opus 4 and Claude Sonnet 4. Claude Opus 4 is our most powerful model yet, and the world’s best coding model. Claude Sonnet 4 is a significant upgrade from its predecessor, delivering superior coding and reasoning.
0
0
14
RT @bneyshabur: @ethansdyer and I have started a new team at @AnthropicAI — and we’re hiring!. Our team is organized around the north star….
0
19
0
Tour de force led by @_katieeverett investigating the interplay between neural network parameterization and optimizers; the thread/paper includes lot of gems (theory insight, extensive empirics, and cool new tricks)!.
0
0
12
RT @peterjliu: It was a pleasure working on Gemma 2. The team is relatively small but very capable. Glad to see it get released. On the or….
0
23
0
RT @peterjliu: We recently open-sourced a relatively minimal implementation example of Transformer language model training in JAX, called N….
0
60
0
RT @noahconst: Ever wonder why we don’t train LLMs over highly compressed text? Turns out it’s hard to make it work. Check out our paper fo….
arxiv.org
In this paper, we explore the idea of training large language models (LLMs) over highly compressed text. While standard subword tokenizers compress text by a small factor, neural text compressors...
0
10
0
RT @blester125: Is Kevin onto something? We found that LLMs can struggle to understand compressed text, unless you do some specific tricks.….
0
6
0
Analyzing training instabilities in Transformers made more accessible by awesome work by @Mitchnw during his internship at @GoogleDeepMind!. We encourage you to think more on understanding the fundamental cause and effect of training instabilities as the models scale up!.
Sharing some highlights from our work on small-scale proxies for large-scale Transformer training instabilities: With fantastic collaborators @peterjliu, @Locchiu, @_katieeverett, many others (see final tweet!), @hoonkp, @jmgilmer, @skornblith!. (1/15)
0
4
26
This is amazing opportunity to work on impactful problems in Large Language Models with cool people! Highly recommended!.
Interested in Reasoning with Large Language Models?. We are hiring!. Internship:.Full-Time Research Scientist:.Full-Time Research Engineer:. Learn more about Blueshift Team:
1
0
5
RT @ziwphd: Jasper @latentjasper talking about the ongoing journey towards BIG Gaussian processes! A team effort with @hoonkp, Ben Adlam, @….
0
6
0
Today at 11am CT, Hall J #806 we are presenting our paper on infinite width neural network kernels! We have methods to compute NTK/NNGP for extended set of activations + sketched embeddings for efficient approximation (100x) for compute intensive conv kernels! See you there!.
Most infinitely wide NTK and NNGP kernels are based on the ReLU activation. In we propose a method of computing neural kernels with *general* activations. For homogeneous activations, we approximate the kernel matrices by linear-time sketching algorithms.
0
1
10
RT @jmes_harrison: Tired of tuning your neural network optimizer? Wish there was an optimizer that just worked? We’re excited to release Ve….
0
165
0
Very interesting paper by @jamiesully2, @danintheory and Alex Maloney investigating theoretical origin of neural scaling laws! . Happy to read the 97p paper and learn about new tools in RMT and insights of how statistics of natural datasets are translated into power-law scaling.
New work on the origin of @OpenAI's neural scaling laws w/ Alex Maloney and @jamiesully2: we solve a simplified model of scaling laws to gain insight into how scaling behavior arises and to probe its behavior in regimes where scaling laws break down. 1/.
0
1
9
RT @lilianweng: 🧮 I finally spent some time learning what exactly Neural Tangent Kernel (NTK) is and went through some mathematical proof.….
lilianweng.github.io
Neural networks are well known to be over-parameterized and can often easily fit data with near-zero training loss with decent generalization performance on test dataset. Although all these paramet...
0
181
0
RT @ethansdyer: 1/ Super excited to introduce #Minerva 🦉(. Minerva was trained on math and science found on the web….
0
523
0