Stanford Trustworthy AI Research (STAIR) Lab @stai_research X Profile

Stanford Trustworthy AI Research (STAIR) Lab

@stai_research

Followers

611

Following

320

Media

3

Statuses

281

A research group in @StanfordAILab researching AI Capabilities, Trust and Safety, Equity and Reliability Website: https://t.co/CgOHvNHL4x

Stanford, CA

Joined November 2023

Don't wanna be here? Send us removal request.

Stanford Trustworthy AI Research (STAIR) Lab

@stai_research

1 day

RT @DeepIndaba: 🚨 Keynote alert! We’re thrilled to welcome @sanmikoyejo as our next speaker in #DLI2025!.Catch the session "Beyond benchma….

0

7

0

Stanford Trustworthy AI Research (STAIR) Lab

@stai_research

4 days

RT @BrandoHablando: work at/with @sanmikoyejo @stai_research !.

0

1

0

Stanford Trustworthy AI Research (STAIR) Lab

@stai_research

4 days

RT @BrandoHablando: @OpenAI @RylanSchaeffer 🎯 From “looks right” ➜ mathematically verified. Visit our poster #ICML2025 West Ballroom C.Fri….

0

2

0

Stanford Trustworthy AI Research (STAIR) Lab

@stai_research

4 days

RT @BrandoHablando: @OpenAI @RylanSchaeffer @sanmikoyejo @stai_research @zhankezhou @allenainie @kaifronsdal @westonkirk_ @ObbadElyas @Ying….

microsoft.github.io

Trace is a new AutoDiff-like tool for training AI systems end-to-end with general feedback.

0

3

0

Stanford Trustworthy AI Research (STAIR) Lab

@stai_research

4 days

RT @BrandoHablando: @OpenAI @RylanSchaeffer @sanmikoyejo @stai_research @zhankezhou @allenainie @kaifronsdal @westonkirk_ @ObbadElyas @Ying….

dspy.ai

The framework for programming—rather than prompting—language models.

0

2

0

Stanford Trustworthy AI Research (STAIR) Lab

@stai_research

4 days

RT @BrandoHablando: @OpenAI @RylanSchaeffer @sanmikoyejo @stai_research @zhankezhou @allenainie @kaifronsdal @westonkirk_ @ObbadElyas @Ying….

0

3

0

Stanford Trustworthy AI Research (STAIR) Lab

@stai_research

4 days

RT @BrandoHablando: @OpenAI @RylanSchaeffer @sanmikoyejo @stai_research @zhankezhou @allenainie @kaifronsdal @westonkirk_ @ObbadElyas @Ying….

github.com

A Machine-to-Machine Interaction System for Lean 4. - stanford-centaur/PyPantograph

0

2

0

Stanford Trustworthy AI Research (STAIR) Lab

@stai_research

4 days

RT @BrandoHablando: @_akhaliq @_alycialee Joint work with @ObbadElyas Mario Krrish Aryan @sanmikoyejo Me Sudarsan at @stai_research !. Than….

0

2

0

Stanford Trustworthy AI Research (STAIR) Lab

@stai_research

4 days

RT @BrandoHablando: @_akhaliq @_alycialee @ObbadElyas @sanmikoyejo @stai_research Preprint on arxiv: 🧵4/3.

arxiv.org

Contrary to the conventional emphasis on dataset size, we explore the role of data alignment -- an often overlooked aspect of data quality -- in training capable Large Language Models (LLMs). To...

0

2

0

Stanford Trustworthy AI Research (STAIR) Lab

@stai_research

4 days

RT @BrandoHablando: Come to Convention Center West room 208-209 2nd floor to learn about optimal data selection using compression like gzip….

0

4

0

Stanford Trustworthy AI Research (STAIR) Lab

@stai_research

6 days

RT @BrandoHablando: 🕵️‍♂️ Takeaway: report dynamic splits + step metrics or risk over-claiming your model’s reasoning skills. Putnam-AXIOM….

openreview.net

Current mathematical reasoning benchmarks for large language models (LLMs) are approaching saturation, with some achieving $>$ 90% accuracy, and are increasingly compromised by training-set...

0

3

0

Stanford Trustworthy AI Research (STAIR) Lab

@stai_research

7 days

RT @sangttruong: GitHub: HuggingFace: Come talk to us to learn more about better LM evalua….

huggingface.co

0

2

0

Stanford Trustworthy AI Research (STAIR) Lab

@stai_research

7 days

RT @sangttruong: We thank Andrew Myers and Jill Wu from @StanfordEng for bringing our research to the broader community:..

0

2

0

Stanford Trustworthy AI Research (STAIR) Lab

@stai_research

7 days

RT @sangttruong: The adaptive testing is integrated into HELM: HELM integration blog:. You….

0

3

0

Stanford Trustworthy AI Research (STAIR) Lab

@stai_research

7 days

RT @sangttruong: Adaptive testing needs a large & diverse question bank, but manual curation is costly. We use the amortized difficulty pre….

0

2

0

Stanford Trustworthy AI Research (STAIR) Lab

@stai_research

7 days

RT @sangttruong: During calibration, IRT estimates question difficulty from LM responses, but querying LMs is costly. We introduce *amortiz….

0

2

0

Stanford Trustworthy AI Research (STAIR) Lab

@stai_research

7 days

RT @sangttruong: IRT includes 2 phases: calibration (estimate question difficulty) and adaptive testing (select informative questions to ev….

0

2

0

Stanford Trustworthy AI Research (STAIR) Lab

@stai_research

7 days

RT @sangttruong: LMs are evaluated by average scores on benchmark subsets to save costs, but that’s unreliable. Item response theory (IRT)….

0

2

0

Stanford Trustworthy AI Research (STAIR) Lab

@stai_research

7 days

RT @sangttruong: @sanmikoyejo gives a nice talk contextualizing our paper contribution in the broader AI Measurement Sciences community in….

hai.stanford.edu

The widepread deployment of AI systems in critical domains demands more rigorous approaches to evaluating their capabilities and safety.

0

5

0

Stanford Trustworthy AI Research (STAIR) Lab

@stai_research

7 days

RT @sangttruong: Interested in LLM evaluation reliability & efficiency?. Check our ICML’25 paper. Reliable and Efficient Amortized Model-ba….

0

14

0