Zach Nussbaum @zach_nussbaum X Profile

Zach Nussbaum

@zach_nussbaum

Followers

1K

Following

5K

Media

34

Statuses

500

https://t.co/t0VFctEaZi | prev @nomic_ai 🗺️📍

Manhattan

Joined April 2019

Don't wanna be here? Send us removal request.

Zach Nussbaum

@zach_nussbaum

2 months

We trained all of the Nomic Embed models on limited compute. One trick that helped us train SoTA embeddings on 16 H100s? GradCache, a gradient checkpointing-like technique tailored for contrastive learning. I kept forgetting how it works, so I dug into the math and wrote about it

6

35

275

Zach Nussbaum

@zach_nussbaum

5 days

RT @max_spero_: Pangram was featured in a @DebunkEU investigation identifying thousands of bots on X spreading AI-generated pro-Kremlin dis….

0

1

0

Zach Nussbaum

@zach_nussbaum

6 days

RT @orionweller: 🤔 Have you ever wondered how good ModernBERT is compared to decoders like Llama?. We made an open-data version of ModernBE….

0

45

0

Zach Nussbaum

@zach_nussbaum

19 days

daniel's put a lot of what i've been thinking about into words: when and how much should we automate in the face of things like claude code?. i *really* like the conscious architects framing.

Daniel Campos

@spacemanidol

19 days

I am trying out this Thought-Boi Thing. Give it a read. The Hidden Cost of Augmentation: Every Tool You Use Changes You.

0

1

Zach Nussbaum

@zach_nussbaum

26 days

@adityakusupati in related news:

Aditya Kusupati

@adityakusupati

26 days

📢Now open, Gemma 3n weights & it is natively flexible, first of its kind, thanks to MatFormer🪆. Any model between E4B & E2B with ZERO training near Pareto -- we found a bunch!. Find a better E3B than what we released, I will send you a 🪆😉. Find the colab for extraction 🧵👇🪆

0

Zach Nussbaum

@zach_nussbaum

26 days

someone should train a large (7B) text embedding model with MatFormer and run query-side embeddings with a smaller submodule. although you may be bottlenecked by embedding dimension in the NN index at that point

2

0

4

Zach Nussbaum

@zach_nussbaum

28 days

RT @jxmnop: In the beginning, there was BERT. Eventually BERT gave rise to RoBERTa. Then, DeBERTa. Later, ModernBERT. And now, NeoBERT.….

0

69

0

Zach Nussbaum

@zach_nussbaum

30 days

RT @andersonbcdefg: i wrote the blog. link below.

0

18

0

Zach Nussbaum

@zach_nussbaum

1 month

RT @nomic_ai: so exciting to get a chance to collaborate with @Wikipedia & @Wikimedia on the first full multilingual wikipedia map! even mo….

enterprise.wikimedia.com

Nomic AI’s NOMAD Projection research visualizes multilingual Wikipedia, leveraging Wikimedia Enterprise datasets for powerful AI insights.

0

15

0

Zach Nussbaum

@zach_nussbaum

1 month

finally had some time to read this great blog! . really helps motivate the why behind things like disaggregated serving:

Felix

@felix_red_panda

2 months

deep dive on LLM inference (read it if you haven't already!) link in the post post below

0

2

Zach Nussbaum

@zach_nussbaum

2 months

jack is not only 10/10 researcher, but also a 10/10 person. any org would be lucky to have him!.

jxmo

@jxmnop

2 months

hello twittersphere! i am planning to graduate in a few months, so i am officially ✨ Looking For A Job ✨. if you know of a role that'd be a good fit, or just want to chat, please reach out!. here are some projects i've worked on that i'm most proud of 👇

1

0

19

Zach Nussbaum

@zach_nussbaum

2 months

RT @antoine_chaffin: Modern retrievers can perform reasoning internally, yet they benefit from using reasoning traces from LLM! .So how doe….

0

34

0

Zach Nussbaum

@zach_nussbaum

2 months

✨rejoicing✨

3

1

10

Zach Nussbaum

@zach_nussbaum

2 months

also h/t @playground_ai for the article image :).

0

1

5

Zach Nussbaum

@zach_nussbaum

2 months

Using GradCache, we trained state-of-the-art text embedding models on just 16 H100s, dramatically less compute than you'd normally need for large batch contrastive learning!.

1

7

Zach Nussbaum

@zach_nussbaum

2 months

GradCache was first introduced by @luyu_gao in.Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup:.

arxiv.org

Contrastive learning has been applied successfully to learn vector representations of text. Previous research demonstrated that learning high-quality representations benefits from batch-wise...

1

2

11

Zach Nussbaum

@zach_nussbaum

2 months

Full write-up:.

1

14

Zach Nussbaum

@zach_nussbaum

2 months

how is it not possible to access desktop substack drafts on the mobile app.

0

Zach Nussbaum

@zach_nussbaum

2 months

code is open-sourced here:

github.com

BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval - nomic-ai/BRIGHT

Zach Nussbaum

@zach_nussbaum

2 months

Agentic RAG is the ✨hot✨new✨thing. so I was curious how current LLMs performed, with no training, on BRIGHT, a reasoning intensive benchmark for retrieval and reranking. Surprisingly, Qwen3 32B and Qwen QwQ set a new SoTA on BRIGHT!. zero training, just reranking BM25!

0

5