
Sherry Yang
@sherryyangML
Followers
4K
Following
315
Media
63
Statuses
203
Research Scientist @GoogleDeepMind. Previously PhD @UCBerkeley, M.Eng. / B.S. @MIT.
Joined September 2015
Evaluating policies on a real robot can be painful. Can we use a world model to get a rough estimate of how good a policy is?. Checkout "Evaluating Robot Policies in a World Model". Paper: Demo: Code:
6
10
65
RT @SeanKirmani: ๐ค๐ We are organizing a workshop on Robotics World Modeling at @corl_conf 2025!. We have an excellent group of speakers andโฆ.
0
37
0
RT @percyliang: Wrapped up Stanford CS336 (Language Models from Scratch), taught with an amazing team @tatsu_hashimoto @marcelroed @neilbbaโฆ.
0
585
0
Try to see if you can pick up a spoon in the world model similar to playing a claw machine: Notably, this model was trained on 2 A100 GPUs. World modeling research can be done in academia too. Great collaboration led by @julianhquevedo with @percyliang
0
0
2
False negative and false positive examples evaluating in-distribution and out-of-distribution policies. These shortcomings call for additional research in developing accurate and reliable world models for practical purposes such as evaluating robot policies.
2
0
3
Interestingly, we found the world model underestimate the policy values when evaluating in-distribution policies (the same policy that collected training data for the world model), but overestimate the policy values for out-of-distribution policies (e.g., noisy policies).
1
0
2
Action-conditioned video generation does not offer reward. We can leverage a VLM to rate generated video rollouts. To evaluate the world model, we found MSE between generated rollouts and ground truth videos to be a reasonable metric, as the dynamics are almost deterministic.
1
0
2
A world model needs to precisely respond to low-level control actions. We can sanity check the world model by sweeping its action space (left-right, up-down), or by comparing the ground truth video to the generated rollouts conditioned on the same unseen action sequence.
1
0
2
RT @liuziwei7: ๐ฌ#CVPR2025 ๐๐ฎ๐ญ๐จ๐ซ๐ข๐๐ฅ.๐บ๏ธ๐ญ๐๐๐ ๐ฝ๐๐
๐๐ ๐ฎ๐๐๐๐๐๐๐๐๐ ๐๐ ๐พ๐๐๐๐
๐ด๐๐
๐๐ @CVPR. ๐ ๐
June 11.๐Hosted by @MMLabNTU xโฆ.
0
27
0
RT @percyliang: What would truly open-source AI look like? Not just open weights, open code/data, but *open development*, where the entireโฆ.
0
208
0
A fruitful collaboration between @UCBerkeley and @GoogleDeepMind with amazing collaborators:.@JaffrelotT, A. Kaplan, Y. Lin, J. Yin,.@SaberMirzaei, @MonaAbdelgaid, A. Alawadhi, R. Cho, @ZhilingZheng, @ekindogus, C. Borgs, @jenniferchayes, @KPatBerkeley, O. Yaghi.
0
1
3
The combinatorial generalization of generative models and text is essential for discovery. Text-to-image models can generate astronauts riding horses (plausible but rare). Similarly, models can also generate half-organic half-inorganic materials (also rare but plausible).
1
0
4
The process of scientific discovery is hierarchical (high-level idea to low-level implementation). Our prior work, GenMS, also shows that a combination of foundation models trained on different sources/modalities of data can work together during inference.
Checkout Generative Hierarchical Materials Search (GenMS) โ a framework for generating crystal structures from natural language. Website: Paper:
1
0
4
Each agent serves a different purpose: LLMs give research directions (e.g., what materials to synthesize), diffusion models generate precise structures, prediction models perform analysis. If we view synthesis hardware as โrobotsโ, the entire discovery pipeline is automatable.
1
0
5
Science agents can automate discoveries from research ideas to wet labs!. We show a system of agents (LLMs, diffusion models, hardware equipments) was able to discover and synthesize 5 novel metal-organic structures, going beyond human knowledge. Paper:
4
47
195
RT @JaffrelotT: [1/6] .Generative models can dream-up materials, but can we actually make them? . We just released our preprint:.System ofโฆ.
0
3
0
RT @SeanKirmani: ๐๐๐ We are organizing a workshop on Building Physically Plausible World Models at @icmlconf 2025!. We have a great lineupโฆ.
0
23
0
At #NeurIPS2024. I'll present and talk about generative simulators, world modeling, and video agent at the D3S3 (, SSL (, and Open-World Agents workshops. I'm recruiting PhD students this application cycle.
0
16
141
Video generation models need grounding in the physical world to accurately simulate real-world dynamics. But where can they get feedback on physics?. Our work shows VLMs can serve as effective judges of physical realism, enabling RL for video generation:
arxiv.org
Large text-to-video models hold immense potential for a wide range of downstream applications. However, these models struggle to accurately depict dynamic object interactions, often resulting in...
Text-to-video models can generate photorealistic scenes but still struggle to accurately depict dynamic object interactions๐ข.Our new preprint addresses this through RL finetuning with AI feedback from VLMs capable of video understanding (e.g. Gemini, etc)๐ . 1/7
0
12
77