henryHM_ko Profile Banner
Henry Ko Profile
Henry Ko

@henryHM_ko

Followers
874
Following
225
Media
4
Statuses
54

performance and efficiency in ML | CS @ UC Berkeley, @BerkeleyML

Joined June 2024
Don't wanna be here? Send us removal request.
@henryHM_ko
Henry Ko
1 month
I had a lot of fun writing a new blog: Optimizing NSA for TPUs There's an accompanying colab notebook with it too! I hope this helps people tinker with NSA in JAX + TPU kernels with Pallas
5
44
452
@michaelbzhu
Michael Zhu
5 days
I spent the last ~2 weeks recreating the @thinkymachines LoRA without Regret experiments from scratch. SFT: Qwen3-4B on No Robots dataset RL: Qwen3-1.7B on MATH dataset so cool to see rank 1 LoRA matching the performance of full finetuning! 🤯
13
39
434
@suryasure05
surya
1 month
I just wrapped up what will probably be the MOST memorable summer of my life. tldr: built a cool project and ended up joining @GroqInc to work on distributed systems. here's a timeline of everything that happened in the last ~6 months: > feb 24: @evanliin explains to me
30
23
400
@SonglinYang4
Songlin Yang
2 months
Excited to see Gated DeltaNet being adopted in the @Alibaba_Qwen series ! It has also previously demonstrated strong effectiveness in @nvidia's Jet-Nemotron
@Alibaba_Qwen
Qwen
2 months
🚀 Introducing Qwen3-Next-80B-A3B — the FUTURE of efficient LLMs is here! 🔹 80B params, but only 3B activated per token → 10x cheaper training, 10x faster inference than Qwen3-32B.(esp. @ 32K+ context!) 🔹Hybrid Architecture: Gated DeltaNet + Gated Attention → best of speed &
9
53
551
@evanliin
evan
2 months
i remade tiny-tpu to support both inference and training! we successfully tested our architecture on the classic XOR problem. here's what i learned throughout the process:👇
@evanliin
evan
1 year
as the unemployed friend on a monday afternoon, i spent the past two months building a TPU without any prerequisite knowledge in digital logic, ASIC design, or verilog. here are my coolest unlocks so far:
38
55
324
@jacobaustin132
Jacob Austin
2 months
Today we're putting out an update to the JAX TPU book, this time on GPUs. How do GPUs work, especially compared to TPUs? How are they networked? And how does this affect LLM training? 1/n
38
525
4K
@phillip_lippe
Phillip Lippe
2 years
Do you want to train massive deep learning models with ease? Our 10 new tutorial notebooks of our popular UvA DL course show you how, implementing data, pipeline and tensor parallelism (and more) from scratch in JAX+Flax! 🚀🚀 Check them out here: https://t.co/7AtaB7j8l9 🧵 1/11
10
122
617
@trychroma
Chroma
4 months
Introducing our latest technical report: Context Rot - How Increasing Input Tokens Impacts LLM Performance Our results reveal that models do not use their context uniformly. full report in replies
39
91
898
@Michael_J_Lutz
Michael Lutz
4 months
Context windows are huge now (1M+ tokens) but context depth remains limited. Attention can only resolve one link at a time. Our tiny 5-layer model beats GPT-4.5 on a task requiring deep recursion. How? It learned to divide & conquer. Why this mattersđź§µ
5
7
56
@evanliin
evan
4 months
looking for my next thing! thinking about dropping out. would love to learn more about opportunities within hardware acceleration or interpretability. dms are open. happy to chat. would love to hear what makes you excited!
25
22
138
@SonglinYang4
Songlin Yang
4 months
Recordings:
@SonglinYang4
Songlin Yang
4 months
@oswaldjoh and @ninoscherrer will present MesaNet at the ASAP seminar on Tuesday, June 24 at 2 PM ET! MesaNet is a locally optimal test-time training (TTT) layer that optimizes the key-value reconstruction objective over the entire history. If you're into TTT, don't miss it!
1
13
65
@SemiAnalysis_
SemiAnalysis
4 months
NVIDIA Tensor Core Evolution From Volta To Blackwell Amdahl’s Law, Strong Scaling Asynchronous Execution Blackwell, Hopper, Ampere, Turing, Volta https://t.co/qZRKz2VdKw
Tweet card summary image
newsletter.semianalysis.com
Amdahl’s Law, Strong Scaling, Asynchronous Execution, Blackwell, Hopper, Ampere, Turing, Volta, TMA
4
30
241
@BerkeleyML
Machine Learning at Berkeley
4 months
Great deep dive into TPUs with amazing visuals by our very own @henryHM_ko!
@henryHM_ko
Henry Ko
4 months
I wrote a new blog on TPUs -- it's been fun seeing how different they are from GPUs and also drawing things on excalidraw again✏️ https://t.co/kEZXbB8vmX
0
3
22
@henryHM_ko
Henry Ko
4 months
I wrote a new blog on TPUs -- it's been fun seeing how different they are from GPUs and also drawing things on excalidraw again✏️ https://t.co/kEZXbB8vmX
37
190
1K
@Michael_J_Lutz
Michael Lutz
5 months
ksim is a JAX-based framework that makes your wackiest RL ideas simple to implement. Why use it? It's modular. Trying new architectures, updating rollout logic, and reformulating your objective is as easy as overriding a method. https://t.co/5lqlYTXv1a
Tweet card summary image
github.com
RL training library for humanoid locomotion and manipulation. Built on top of MuJoCo and JAX. - kscalelabs/ksim
@kscalelabs
K-Scale Labs
5 months
In the last month, we’ve been building an open-source framework for robot learning and sim-to-real transfer, made for RL whole-body control from simple walking to complex human imitation Check out the details on HN: https://t.co/QWAVQloqPs Get started in 5 minutes ⬇️
0
1
23
@thtrkim
Terry Kim
6 months
(1/5) I’m pleased to share that my research with @seowondeog12052 has been accepted to RECOMB 2025 (Poster) and IEEE EMBC 2025 (Paper)! Preprint: https://t.co/sW1AsKkZDL We introduce a generative approach to pesticide design—optimizing small molecules to reduce toxicity.
Tweet card summary image
arxiv.org
Global climate change has reduced crop resilience and pesticide efficacy, making reliance on synthetic pesticides inevitable, even though their widespread use poses significant health and...
2
3
8
@voidshapes
stacy 🌤
7 months
per the event description: ​Viren Jain is a Senior Staff Research Scientist at Google in Mountain View, California, where he leads Google’s Connectomics team., responsible for tools like SegCLR and TensorStore. https://t.co/KCNeLkI63a
Tweet card summary image
luma.com
Viren Jain is a Senior Staff Research Scientist at Google in Mountain View, California, where he leads Google’s Connectomics team. They have been responsible…
0
2
11
@itsclivetime
Clive Chan
7 months
Google's TPUv7 is out! ML accelerator marketing material is usually pretty inscrutable (what numbers are even comparable?), so here I'll explain concretely how this compares with Nvidia. đź§µ
18
143
2K
@kellyhongsn
kelly
7 months
excited to share what I’ve been working on @trychroma! we introduce representative generative benchmarking - custom eval sets built from your own data link to technical report in replies
17
28
303