Henry Ko @henryHM_ko X Profile

Henry Ko

@henryHM_ko

Followers

874

Following

225

Media

4

Statuses

54

performance and efficiency in ML | CS @ UC Berkeley, @BerkeleyML

https://t.co/24Im3xQBp4

Joined June 2024

Don't wanna be here? Send us removal request.

Henry Ko

@henryHM_ko

1 month

I had a lot of fun writing a new blog: Optimizing NSA for TPUs There's an accompanying colab notebook with it too! I hope this helps people tinker with NSA in JAX + TPU kernels with Pallas

5

44

452

Michael Zhu

@michaelbzhu

5 days

I spent the last ~2 weeks recreating the @thinkymachines LoRA without Regret experiments from scratch. SFT: Qwen3-4B on No Robots dataset RL: Qwen3-1.7B on MATH dataset so cool to see rank 1 LoRA matching the performance of full finetuning! 🤯

13

39

434

surya

@suryasure05

1 month

I just wrapped up what will probably be the MOST memorable summer of my life. tldr: built a cool project and ended up joining @GroqInc to work on distributed systems. here's a timeline of everything that happened in the last ~6 months: > feb 24: @evanliin explains to me

30

23

400

Henry Ko

@henryHM_ko

1 month

blog link: https://t.co/DixWVsgOD9 colab link:

colab.research.google.com

Colab notebook

0

5

27

Songlin Yang

@SonglinYang4

2 months

Excited to see Gated DeltaNet being adopted in the @Alibaba_Qwen series ! It has also previously demonstrated strong effectiveness in @nvidia's Jet-Nemotron

Qwen

@Alibaba_Qwen

2 months

🚀 Introducing Qwen3-Next-80B-A3B — the FUTURE of efficient LLMs is here! 🔹 80B params, but only 3B activated per token → 10x cheaper training, 10x faster inference than Qwen3-32B.(esp. @ 32K+ context!) 🔹Hybrid Architecture: Gated DeltaNet + Gated Attention → best of speed &

9

53

551

evan

@evanliin

2 months

i remade tiny-tpu to support both inference and training! we successfully tested our architecture on the classic XOR problem. here's what i learned throughout the process:👇

evan

@evanliin

1 year

as the unemployed friend on a monday afternoon, i spent the past two months building a TPU without any prerequisite knowledge in digital logic, ASIC design, or verilog. here are my coolest unlocks so far:

38

55

324

Jacob Austin

@jacobaustin132

2 months

Today we're putting out an update to the JAX TPU book, this time on GPUs. How do GPUs work, especially compared to TPUs? How are they networked? And how does this affect LLM training? 1/n

38

525

4K

Phillip Lippe

@phillip_lippe

2 years

Do you want to train massive deep learning models with ease? Our 10 new tutorial notebooks of our popular UvA DL course show you how, implementing data, pipeline and tensor parallelism (and more) from scratch in JAX+Flax! 🚀🚀 Check them out here: https://t.co/7AtaB7j8l9 🧵 1/11

10

122

617

Chroma

@trychroma

4 months

Introducing our latest technical report: Context Rot - How Increasing Input Tokens Impacts LLM Performance Our results reveal that models do not use their context uniformly. full report in replies

39

91

898

Michael Lutz

@Michael_J_Lutz

4 months

Context windows are huge now (1M+ tokens) but context depth remains limited. Attention can only resolve one link at a time. Our tiny 5-layer model beats GPT-4.5 on a task requiring deep recursion. How? It learned to divide & conquer. Why this matters🧵

5

7

56

evan

@evanliin

4 months

looking for my next thing! thinking about dropping out. would love to learn more about opportunities within hardware acceleration or interpretability. dms are open. happy to chat. would love to hear what makes you excited!

25

22

138

Songlin Yang

@SonglinYang4

4 months

Recordings:

Songlin Yang

@SonglinYang4

4 months

@oswaldjoh and @ninoscherrer will present MesaNet at the ASAP seminar on Tuesday, June 24 at 2 PM ET! MesaNet is a locally optimal test-time training (TTT) layer that optimizes the key-value reconstruction objective over the entire history. If you're into TTT, don't miss it!

1

13

65

SemiAnalysis

@SemiAnalysis_

4 months

NVIDIA Tensor Core Evolution From Volta To Blackwell Amdahl’s Law, Strong Scaling Asynchronous Execution Blackwell, Hopper, Ampere, Turing, Volta https://t.co/qZRKz2VdKw

newsletter.semianalysis.com

Amdahl’s Law, Strong Scaling, Asynchronous Execution, Blackwell, Hopper, Ampere, Turing, Volta, TMA

4

30

241

Machine Learning at Berkeley

@BerkeleyML

4 months

Great deep dive into TPUs with amazing visuals by our very own @henryHM_ko!

Henry Ko

@henryHM_ko

4 months

I wrote a new blog on TPUs -- it's been fun seeing how different they are from GPUs and also drawing things on excalidraw again✏️ https://t.co/kEZXbB8vmX

0

3

22

Henry Ko

@henryHM_ko

4 months

I wrote a new blog on TPUs -- it's been fun seeing how different they are from GPUs and also drawing things on excalidraw again✏️ https://t.co/kEZXbB8vmX

37

190

1K

Michael Lutz

@Michael_J_Lutz

5 months

ksim is a JAX-based framework that makes your wackiest RL ideas simple to implement. Why use it? It's modular. Trying new architectures, updating rollout logic, and reformulating your objective is as easy as overriding a method. https://t.co/5lqlYTXv1a

github.com

RL training library for humanoid locomotion and manipulation. Built on top of MuJoCo and JAX. - kscalelabs/ksim

K-Scale Labs

@kscalelabs

5 months

In the last month, we’ve been building an open-source framework for robot learning and sim-to-real transfer, made for RL whole-body control from simple walking to complex human imitation Check out the details on HN: https://t.co/QWAVQloqPs Get started in 5 minutes ⬇️

0

1

23

Terry Kim

@thtrkim

6 months

(1/5) I’m pleased to share that my research with @seowondeog12052 has been accepted to RECOMB 2025 (Poster) and IEEE EMBC 2025 (Paper)! Preprint: https://t.co/sW1AsKkZDL We introduce a generative approach to pesticide design—optimizing small molecules to reduce toxicity.

arxiv.org

Global climate change has reduced crop resilience and pesticide efficacy, making reliance on synthetic pesticides inevitable, even though their widespread use poses significant health and...

2

3

8

stacy 🌤

@voidshapes

7 months

per the event description: Viren Jain is a Senior Staff Research Scientist at Google in Mountain View, California, where he leads Google’s Connectomics team., responsible for tools like SegCLR and TensorStore. https://t.co/KCNeLkI63a

luma.com

Viren Jain is a Senior Staff Research Scientist at Google in Mountain View, California, where he leads Google’s Connectomics team. They have been responsible…

0

2

11

Clive Chan

@itsclivetime

7 months

Google's TPUv7 is out! ML accelerator marketing material is usually pretty inscrutable (what numbers are even comparable?), so here I'll explain concretely how this compares with Nvidia. 🧵

18

143

2K

kelly

@kellyhongsn

7 months

excited to share what I’ve been working on @trychroma! we introduce representative generative benchmarking - custom eval sets built from your own data link to technical report in replies

17

28

303