stai_research Profile Banner
Stanford Trustworthy AI Research (STAIR) Lab Profile
Stanford Trustworthy AI Research (STAIR) Lab

@stai_research

Followers
611
Following
320
Media
3
Statuses
281

A research group in @StanfordAILab researching AI Capabilities, Trust and Safety, Equity and Reliability Website: https://t.co/CgOHvNHL4x

Stanford, CA
Joined November 2023
Don't wanna be here? Send us removal request.
@stai_research
Stanford Trustworthy AI Research (STAIR) Lab
1 day
RT @DeepIndaba: 🚨 Keynote alert! We’re thrilled to welcome @sanmikoyejo as our next speaker in #DLI2025!.Catch the session "Beyond benchma….
0
7
0
@stai_research
Stanford Trustworthy AI Research (STAIR) Lab
4 days
0
1
0
@stai_research
Stanford Trustworthy AI Research (STAIR) Lab
4 days
RT @BrandoHablando: @OpenAI @RylanSchaeffer 🎯 From “looks right” ➜ mathematically verified. Visit our poster #ICML2025 West Ballroom C.Fri….
0
2
0
@stai_research
Stanford Trustworthy AI Research (STAIR) Lab
4 days
RT @BrandoHablando: @_akhaliq @_alycialee Joint work with @ObbadElyas Mario Krrish Aryan @sanmikoyejo Me Sudarsan at @stai_research !. Than….
0
2
0
@stai_research
Stanford Trustworthy AI Research (STAIR) Lab
4 days
RT @BrandoHablando: Come to Convention Center West room 208-209 2nd floor to learn about optimal data selection using compression like gzip….
0
4
0
@stai_research
Stanford Trustworthy AI Research (STAIR) Lab
6 days
RT @BrandoHablando: 🕵️‍♂️ Takeaway: report dynamic splits + step metrics or risk over-claiming your model’s reasoning skills. Putnam-AXIOM….
openreview.net
Current mathematical reasoning benchmarks for large language models (LLMs) are approaching saturation, with some achieving $>$ 90% accuracy, and are increasingly compromised by training-set...
0
3
0
@stai_research
Stanford Trustworthy AI Research (STAIR) Lab
7 days
RT @sangttruong: GitHub: HuggingFace: Come talk to us to learn more about better LM evalua….
huggingface.co
0
2
0
@stai_research
Stanford Trustworthy AI Research (STAIR) Lab
7 days
RT @sangttruong: We thank Andrew Myers and Jill Wu from @StanfordEng for bringing our research to the broader community:..
0
2
0
@stai_research
Stanford Trustworthy AI Research (STAIR) Lab
7 days
RT @sangttruong: The adaptive testing is integrated into HELM: HELM integration blog:. You….
0
3
0
@stai_research
Stanford Trustworthy AI Research (STAIR) Lab
7 days
RT @sangttruong: Adaptive testing needs a large & diverse question bank, but manual curation is costly. We use the amortized difficulty pre….
0
2
0
@stai_research
Stanford Trustworthy AI Research (STAIR) Lab
7 days
RT @sangttruong: During calibration, IRT estimates question difficulty from LM responses, but querying LMs is costly. We introduce *amortiz….
0
2
0
@stai_research
Stanford Trustworthy AI Research (STAIR) Lab
7 days
RT @sangttruong: IRT includes 2 phases: calibration (estimate question difficulty) and adaptive testing (select informative questions to ev….
0
2
0
@stai_research
Stanford Trustworthy AI Research (STAIR) Lab
7 days
RT @sangttruong: LMs are evaluated by average scores on benchmark subsets to save costs, but that’s unreliable. Item response theory (IRT)….
0
2
0
@stai_research
Stanford Trustworthy AI Research (STAIR) Lab
7 days
RT @sangttruong: @sanmikoyejo gives a nice talk contextualizing our paper contribution in the broader AI Measurement Sciences community in….
Tweet card summary image
hai.stanford.edu
The widepread deployment of AI systems in critical domains demands more rigorous approaches to evaluating their capabilities and safety.
0
5
0
@stai_research
Stanford Trustworthy AI Research (STAIR) Lab
7 days
RT @sangttruong: Interested in LLM evaluation reliability & efficiency?. Check our ICML’25 paper. Reliable and Efficient Amortized Model-ba….
0
14
0