Leonard Tang @leonardtang_ X Profile

Leonard Tang

@leonardtang_

Followers

4K

Following

11K

Media

95

Statuses

2K

co-founder & ceo @haizelabs

https://t.co/vtkiLEOTiI

nyc

Joined May 2013

Don't wanna be here? Send us removal request.

Leonard Tang

@leonardtang_

6 months

You don’t need frontier lab resources for frontier lab automated LLM evaluation. To prove this, we’re open-sourcing j1-nano and j1-micro: two absurdly tiny (600M & 1.7B parameters) but mighty reward models competitive with orders-of-magnitude larger peers. j1-nano and j1-micro

30

69

623

Leonard Tang

@leonardtang_

4 days

i have a dream that one day superintelligence will be in our pockets

Jon Saad-Falcon

@JonSaadFalcon

4 days

Data centers dominate AI, but they're hitting physical limits. What if the future of AI isn't just bigger data centers, but local intelligence in our hands? The viability of local AI depends on intelligence efficiency. To measure this, we propose intelligence per watt (IPW):

1

0

7

Leonard Tang

@leonardtang_

17 days

dinner activities

2

0

40

Leonard Tang

@leonardtang_

19 days

We're excited for you to try it out. We eagerly welcome feedback as we explore this new mode of scaling human supervision. https://t.co/R9SLazuWfl

github.com

Skill to annotate and create ai judges from agent logs - haizelabs/annotate

0

4

Leonard Tang

@leonardtang_

19 days

The Haize team has recently been exploring this question through a Claude Skill simply called Annotate. This Skill: - Analyzes arbitrary agent traces - Performs a transformation (such as summarization, decomposition, or extraction) on these traces to make them more

1

8

Leonard Tang

@leonardtang_

19 days

Instead of forcing a single annotation interface onto a user, why not enable the user to easily spin up a custom evaluation interface tailored to their use case?

1

0

2

Leonard Tang

@leonardtang_

19 days

The reality is that doing LLM evaluations well is inherently artisanal and bespoke. Every application has: 1. Different architectures 2. Different shapes of trace data 3. Different definitions of quality 4. Different human experts performing annotations 5. Different sources of

1

0

2

Leonard Tang

@leonardtang_

19 days

Customers building AI agents often lament at the difficulty of using off-the-shelf LLM Eval tools for their specific app. While there's no doubt that human supervision is required, not all supervision is the same. Why not transform the supervision problem to make it easier?

1

14

Leonard Tang

@leonardtang_

23 days

Stay tuned — we’ll be sharing more about this collaboration and our upcoming work soon!

0

4

Leonard Tang

@leonardtang_

23 days

Given the natural alignment (pun intended) in our research agendas, we are especially excited when the opportunity arose to collaborate closely with her. As an advisor, Professor He will help guide our research strategy as we continue to advance the science and practicality of

1

0

4

Leonard Tang

@leonardtang_

23 days

We are thrilled to welcome Professor He He @hhexiy as an advisor to the Haize Labs team! Professor He leads a group at NYU focused on evaluation, scalable oversight, human–AI collaboration, and reasoning.

3

10

112

Leonard Tang

@leonardtang_

24 days

Yesterday, Zuck laid off 600 brilliant researchers at FAIR and GenAI. To these scientists: You deserve a place where your brilliance can shine. You deserve to directly shape the company's future. You deserve to make bold, original research bets. contact@haizelabs.com

0

3

55

Leonard Tang

@leonardtang_

27 days

deep conviction in apples

6

0

23

Nikunj Kothari

@nikunj

1 month

@NotionHQ @FPVventures @EragonAI @dedaluslabs @agentmail_to @joshua_sirota @OverlayyAI @mesnhl @TryArcade @shubcodes @EmergentLabsHQ @madhavjha @tryconversion @useStable @HoneyHomesHQ @asylumventures @lolawajs @MeetGamma @ParafinHQ @tryflexprice @portfolio_iq @parseablehq @Kissan_AI @descopeinc @gethuxe @konfirmity @joinformal @infinitemachine @josephcohen @zoodotdev @WithAmpersand @extraordinary @qodex_ai @StructifyAI @figma @zoink @chintanparikh94 @santa_kaus @floguo @adelwu_ @pallavi_benawri @pkj__m @robonafisi @kwong47 @AnirudhPupneja @MayinJoshi @trywindmill @GedditAI @navii_ai Day 38: @haizelabs - AI systems you can trust. Haize embeds trust, safety, and reliability into your GenAI applications to get you out of POCs and into production. I really like this white dove hat - thank you. Book a demo here: https://t.co/pHZdb9WEjz

2

7

Leonard Tang

@leonardtang_

2 months

new merch dm if you would like some

12

1

87

Alexander Panfilov

@kotekjedi_ml

2 months

🚨 New paper! LLMs, when asked harmful questions, sometimes produce outputs that look helpful (and harmful) — but are actually 𝗱𝗲𝗹𝗶𝗯𝗲𝗿𝗮𝘁𝗲𝗹𝘆 𝘄𝗿𝗼𝗻𝗴 What’s bad - current LLM-based jailbreak scorers can’t tell the difference (me neither) More in 🧵👇

4

16

97

Leonard Tang

@leonardtang_

2 months

Fortune 100 companies are already using https://t.co/ep1Buvkeiz to make smarter, evidence-based decisions about the models they put into production. No more guesswork. Huge thanks to @NomaSecurity, @cloudsa, @harmonicsec, and the inimitable @csima for bringing this to life.

0

4

Leonard Tang

@leonardtang_

2 months

This feeds into RiskRubric’s dual-assessment framework: 1. Haize-powered Red Teaming → adversarial stress-testing at scale. 2. Open-Source Intelligence (from @NomaSecurity) → repo activity, lineage, bias analysis. Together, they generate evidence-based scores across six risk

1

0

4

Leonard Tang

@leonardtang_

2 months

The Haize red-teaming engine powers the https://t.co/ep1Buvkeiz by bombarding models with hundreds of thousands of adaptive adversarial prompts across 125+ risk behaviors: > prompt injection > jailbreaks > data leakage > evasion

1

0

1

Leonard Tang

@leonardtang_

2 months

Every enterprise AI leader is asking the same question: how do we know which models are safe, reliable, and secure? We've long been obsessed with this problem at Haize Labs. That's why we're pleased to announce that our red-teaming engine now powers https://t.co/ep1Buvkeiz

4

3

33

Leonard Tang

@leonardtang_

2 months

Fortune 100 companies are already using https://t.co/ep1Buvkeiz to make smarter, evidence-based decisions about the models they put into production. No more guesswork. Huge thanks to @NomaSecurity, @cloudsa, @harmonicsec, and the inimitable @csima for bringing this to life.

0

2