berkeley_ai Profile Banner
Berkeley AI Research Profile
Berkeley AI Research

@berkeley_ai

Followers
233K
Following
377
Media
41
Statuses
1K

We're graduate students, postdocs, faculty and scientists at the cutting edge of artificial intelligence research.

Berkeley, CA
Joined July 2017
Don't wanna be here? Send us removal request.
@kvfrans
Kevin Frans
6 days
What really matters in matrix-whitening optimizers (Shampoo/SOAP/PSGD/Muon)? We ran a careful comparison, dissecting each algorithm. Interestingly, we find that proper matrix-whitening can be seen as *two* transformations, and not all optimizers implement both. Blog:
5
45
324
Autoregressive language models learn to compress data by mapping sequences to high-dimensional representations and decoding one token at a time. The quality of compression, as defined by the ability to predict the next token given a prompt, progressively improves (as measured by
4
17
52
@ameeshsh
Ameesh Shah
7 days
LLMs have shown a remarkable ability to “self-refine” and learn from their mistakes via in-context learning. But in robotics, most methods are single-shot. How can we bring inference-time adaptation to robot learning? A 🧵:
10
18
127
@svlevine
Sergey Levine
7 days
New work from @aditya_oberai & @seohong_park: instead of 1-step TD backups or n-step, can we "divide and conquer" over the trajectory, backing up finer and finer increments? Improves over bias of TD-0 and variance of MC. Principle is old, but getting it to work takes some care!
@aditya_oberai
Aditya Oberai
8 days
TD Learning can suffer on long tasks: ↑ deep bellman recursions → ↓ poor scalability (despite big data) We introduce a new method (TRL) with a "divide-and-conquer" value update, which scales well with long horizons!
6
26
241
@aditya_oberai
Aditya Oberai
8 days
TD Learning can suffer on long tasks: ↑ deep bellman recursions → ↓ poor scalability (despite big data) We introduce a new method (TRL) with a "divide-and-conquer" value update, which scales well with long horizons!
2
28
227
@BaruaJosh
Josh Barua
9 days
🌍 LLMs can use long chain-of-thought (CoT) to reason in English, but what about other languages? New paper w/ @BerkeleyNLP: We study how scaling, pretraining, post-training & inference affect long CoT across 9 languages. Spoiler: English long CoT ≠ multilingual long CoT 🧵
3
6
18
@_ahmedmalaa
Ahmed Alaa
10 days
Our new paper with Sonali Sharma and @RoxanaDaneshjou is out in @npjDigitalMed! We examine how medical safety and disclaimer messages in public LLMs have changed over time when answering patient questions.
@npjDigitalMed
npj Digital Medicine
16 days
Generative AI models are giving fewer medical disclaimers over time. 📉 In 2022, ~26% of AI health answers had a disclaimer. By 2025? <1%. As models get smarter, they’re getting less safe. Patients may take outputs as medical advice. https://t.co/2OYQvKdezT
3
8
17
@henseoba
Wen-Han Hsieh
12 days
AI can now see, reason, and segment the Earth. 🌍 Meet LISAt, our #NeurIPS2025 Datasets & Benchmarks paper - the first foundation model that turns language queries into pixel-level satellite segmentations. 🛰️ (1/n) 🔗 https://t.co/ApVZgGF0cU @NeurIPSConf @berkeley_ai
4
3
29
@ZehanMa123
Zehan Ma
14 days
Can a robot inspect all views of an object? Today @IROS, we present Omni-Scan from @berkeley_ai, a novel method for bimanual robo 360° object scanning & reconstruction using 3D Gaussian Splats. (1/8) 🔗 https://t.co/8emyJfUNk4
5
12
123
@akshatgupta57
Akshat Gupta
14 days
🧠 New preprint: How Do LLMs Use Their Depth? We uncover a “Guess-then-Refine” mechanism across layers - early layers predict high-frequency tokens as guesses; later layers refine them as context builds Paper - https://t.co/5PitHjmJJZ @neuranna @GopalaSpeech @berkeley_ai
15
72
516
@sfiscience
Santa Fe Institute
15 days
Catch up on our most recent Community Lecture: “Transmission Versus Truth: What Will It Take to Make an AI as Smart as a 4-Year-Old?” with Alison Gopnik.  This was the last of six community lectures for 2025, and all are available to watch on SFI’s YouTube channel. Watch here:
0
12
26
@dawnsongtweets
Dawn Song
16 days
New evaluation results from @AnthropicAI's Claude Sonnet 4.5’s system card on our CyberGym benchmark reveals a striking trend: AI cybersecurity capabilities are advancing at unprecedented speed—from ~10% (Claude-Sonnet -3.7) to ~30% success rates (Claude-Sonnet-4.5) (with single
@dawnsongtweets
Dawn Song
5 months
1/ 🔥 AI agents are reaching a breakthrough moment in cybersecurity. In our latest work: 🔓 CyberGym: AI agents discovered 15 zero-days in major open-source projects 💰 BountyBench: AI agents solved real-world bug bounty tasks worth tens of thousands of dollars 🤖
7
15
56
@Berkeley_EECS
UC Berkeley EECS
16 days
Amazing! 10 @BerkeleyEECS @SkyCompLab grad students are Amazon AI PhD Fellows! Congrats! Learn more about our fellows here: https://t.co/zuCGKlmSNe #AmazonAIFellowship @BerkeleySky
Tweet card summary image
eecs.berkeley.edu
Today, Amazon announced its new AI PhD Fellowship program, offering two years of funding to over 100 PhD students across nine universities. Ten of these inaugural fellowships have been awarded to...
@AmazonScience
Amazon Science
16 days
🎓 Amazon launches AI PhD Fellowship program, providing $68 million over two years to fund PhD students at 9 universities pursuing research in machine learning, computer vision, and natural-language processing. #AmazonAIFellowship
0
14
59
@tsunghan_wu
Tsung-Han (Patrick) Wu
16 days
Humans handle dynamic situations easily, what about models? Turns out, they break in three distinct ways: ⛔ Force Stop → Reasoning leakage (won’t stop) ⚡️ Speedup → Panic (rushed answers) ❓ Info Updates → Self-doubt (reject updates) 👉Check out https://t.co/wKrnsMkiFY
5
20
66
@alescontrela
Alejandro Escontrela
16 days
Simulation drives robotics progress, but how do we close the reality gap? Introducing GaussGym: an open-source framework for learning locomotion from pixels with ultra-fast parallelized photorealistic rendering across >4,000 iPhone, GrandTour, ARKit, and Veo scenes! Thread 🧵
11
63
325
@funmilore
Simeon ADEBOLA
16 days
How can a robot provide details of plant anatomy for plant phenotyping? Today @IROS2025 , we present Botany-Bot from @berkeley_ai @Siemens. Botany-Bot 1) creates segmented 3D models of plants using Gaussian splats and GarField 2) uses a robot arm to expose hidden details. (1/9)
2
4
24
@aomaru_21490
Jiaxin Ge
17 days
✨Introducing ECHO, the newest in-the-wild image generation benchmark! You’ve seen new image models and new use cases discussed on social media, but old benchmarks don’t test them! We distilled this qualitative discussion into a structured benchmark. 🔗 https://t.co/wJmmEY8TFQ
3
31
114
@2plus2make5
Emma Pierson
20 days
Do you have many models to choose from and little labeled data with which to evaluate them? Check out our #neurips2025 paper, which presents a method to estimate model performance more accurately than previous methods using both labeled + unlabeled data.
@dmshanmugam
Divya Shanmugam
20 days
New #NeurIPS2025 paper: how should we evaluate machine learning models without a large, labeled dataset? We introduce Semi-Supervised Model Evaluation (SSME), which uses labeled and unlabeled data to estimate performance! We find SSME is far more accurate than standard methods.
2
12
107
@sewon__min
Sewon Min
20 days
Super excited about @wenjie_ma's work on verifying math proofs! ✅ 24 competitions, 3 SoTAs (o3, Gemini-2.5-Pro, R1) ✅ Strong evaluator -- a carefully designed evaluator with simple ensemble beats agentic ones ✅ Strong best-of-n performance Check out the paper & website!
@wenjie_ma
Wenjie Ma
20 days
LLMs solving math benchmarks with verifiable answers like AIME? ✅ LLMs solving math proofs? ❌ Still an open problem. RL works great for final-answer problems, but proofs are different: - Often no single checkable answer - Correct answers can hide flawed reasoning The key
3
15
119