leonardtang_ Profile Banner
Leonard Tang Profile
Leonard Tang

@leonardtang_

Followers
4K
Following
11K
Media
95
Statuses
2K

co-founder & ceo @haizelabs

nyc
Joined May 2013
Don't wanna be here? Send us removal request.
@leonardtang_
Leonard Tang
6 months
You don’t need frontier lab resources for frontier lab automated LLM evaluation. To prove this, we’re open-sourcing j1-nano and j1-micro: two absurdly tiny (600M & 1.7B parameters) but mighty reward models competitive with orders-of-magnitude larger peers. j1-nano and j1-micro
30
69
623
@leonardtang_
Leonard Tang
4 days
i have a dream that one day superintelligence will be in our pockets
@JonSaadFalcon
Jon Saad-Falcon
4 days
Data centers dominate AI, but they're hitting physical limits. What if the future of AI isn't just bigger data centers, but local intelligence in our hands? The viability of local AI depends on intelligence efficiency. To measure this, we propose intelligence per watt (IPW):
1
0
7
@leonardtang_
Leonard Tang
17 days
dinner activities
2
0
40
@leonardtang_
Leonard Tang
19 days
We're excited for you to try it out. We eagerly welcome feedback as we explore this new mode of scaling human supervision. https://t.co/R9SLazuWfl
Tweet card summary image
github.com
Skill to annotate and create ai judges from agent logs - haizelabs/annotate
0
0
4
@leonardtang_
Leonard Tang
19 days
The Haize team has recently been exploring this question through a Claude Skill simply called Annotate. This Skill: - Analyzes arbitrary agent traces - Performs a transformation (such as summarization, decomposition, or extraction) on these traces to make them more
1
1
8
@leonardtang_
Leonard Tang
19 days
Instead of forcing a single annotation interface onto a user, why not enable the user to easily spin up a custom evaluation interface tailored to their use case?
1
0
2
@leonardtang_
Leonard Tang
19 days
The reality is that doing LLM evaluations well is inherently artisanal and bespoke. Every application has: 1. Different architectures 2. Different shapes of trace data 3. Different definitions of quality 4. Different human experts performing annotations 5. Different sources of
1
0
2
@leonardtang_
Leonard Tang
19 days
Customers building AI agents often lament at the difficulty of using off-the-shelf LLM Eval tools for their specific app. While there's no doubt that human supervision is required, not all supervision is the same. Why not transform the supervision problem to make it easier?
1
1
14
@leonardtang_
Leonard Tang
23 days
Stay tuned — we’ll be sharing more about this collaboration and our upcoming work soon!
0
0
4
@leonardtang_
Leonard Tang
23 days
Given the natural alignment (pun intended) in our research agendas, we are especially excited when the opportunity arose to collaborate closely with her. As an advisor, Professor He will help guide our research strategy as we continue to advance the science and practicality of
1
0
4
@leonardtang_
Leonard Tang
23 days
We are thrilled to welcome Professor He He @hhexiy as an advisor to the Haize Labs team! Professor He leads a group at NYU focused on evaluation, scalable oversight, human–AI collaboration, and reasoning.
3
10
112
@leonardtang_
Leonard Tang
24 days
Yesterday, Zuck laid off 600 brilliant researchers at FAIR and GenAI. To these scientists: You deserve a place where your brilliance can shine. You deserve to directly shape the company's future. You deserve to make bold, original research bets. contact@haizelabs.com
0
3
55
@leonardtang_
Leonard Tang
27 days
deep conviction in apples
6
0
23
@leonardtang_
Leonard Tang
2 months
new merch dm if you would like some
12
1
87
@kotekjedi_ml
Alexander Panfilov
2 months
🚨 New paper! LLMs, when asked harmful questions, sometimes produce outputs that look helpful (and harmful) — but are actually 𝗱𝗲𝗹𝗶𝗯𝗲𝗿𝗮𝘁𝗲𝗹𝘆 𝘄𝗿𝗼𝗻𝗴 What’s bad - current LLM-based jailbreak scorers can’t tell the difference (me neither) More in 🧵👇
4
16
97
@leonardtang_
Leonard Tang
2 months
Fortune 100 companies are already using https://t.co/ep1Buvkeiz to make smarter, evidence-based decisions about the models they put into production. No more guesswork. Huge thanks to @NomaSecurity, @cloudsa, @harmonicsec, and the inimitable @csima for bringing this to life.
0
0
4
@leonardtang_
Leonard Tang
2 months
This feeds into RiskRubric’s dual-assessment framework: 1. Haize-powered Red Teaming → adversarial stress-testing at scale. 2. Open-Source Intelligence (from @NomaSecurity) → repo activity, lineage, bias analysis. Together, they generate evidence-based scores across six risk
1
0
4
@leonardtang_
Leonard Tang
2 months
The Haize red-teaming engine powers the https://t.co/ep1Buvkeiz by bombarding models with hundreds of thousands of adaptive adversarial prompts across 125+ risk behaviors: > prompt injection > jailbreaks > data leakage > evasion
1
0
1
@leonardtang_
Leonard Tang
2 months
Every enterprise AI leader is asking the same question: how do we know which models are safe, reliable, and secure? We've long been obsessed with this problem at Haize Labs. That's why we're pleased to announce that our red-teaming engine now powers https://t.co/ep1Buvkeiz
4
3
33
@leonardtang_
Leonard Tang
2 months
Fortune 100 companies are already using https://t.co/ep1Buvkeiz to make smarter, evidence-based decisions about the models they put into production. No more guesswork. Huge thanks to @NomaSecurity, @cloudsa, @harmonicsec, and the inimitable @csima for bringing this to life.
0
0
2