Sherry Yang Profile
Sherry Yang

@sherryyangML

Followers
4K
Following
315
Media
63
Statuses
203

Research Scientist @GoogleDeepMind. Previously PhD @UCBerkeley, M.Eng. / B.S. @MIT.

Joined September 2015
Don't wanna be here? Send us removal request.
@sherryyangML
Sherry Yang
3 months
Evaluating policies on a real robot can be painful. Can we use a world model to get a rough estimate of how good a policy is?. Checkout "Evaluating Robot Policies in a World Model". Paper: Demo: Code:
6
10
65
@sherryyangML
Sherry Yang
2 months
RT @SeanKirmani: ๐Ÿค–๐ŸŒŽ We are organizing a workshop on Robotics World Modeling at @corl_conf 2025!. We have an excellent group of speakers andโ€ฆ.
0
37
0
@sherryyangML
Sherry Yang
2 months
RT @percyliang: Wrapped up Stanford CS336 (Language Models from Scratch), taught with an amazing team @tatsu_hashimoto @marcelroed @neilbbaโ€ฆ.
0
585
0
@sherryyangML
Sherry Yang
3 months
Try to see if you can pick up a spoon in the world model similar to playing a claw machine: Notably, this model was trained on 2 A100 GPUs. World modeling research can be done in academia too. Great collaboration led by @julianhquevedo with @percyliang
0
0
2
@sherryyangML
Sherry Yang
3 months
False negative and false positive examples evaluating in-distribution and out-of-distribution policies. These shortcomings call for additional research in developing accurate and reliable world models for practical purposes such as evaluating robot policies.
2
0
3
@sherryyangML
Sherry Yang
3 months
Interestingly, we found the world model underestimate the policy values when evaluating in-distribution policies (the same policy that collected training data for the world model), but overestimate the policy values for out-of-distribution policies (e.g., noisy policies).
Tweet media one
1
0
2
@sherryyangML
Sherry Yang
3 months
Action-conditioned video generation does not offer reward. We can leverage a VLM to rate generated video rollouts. To evaluate the world model, we found MSE between generated rollouts and ground truth videos to be a reasonable metric, as the dynamics are almost deterministic.
Tweet media one
1
0
2
@sherryyangML
Sherry Yang
3 months
A world model needs to precisely respond to low-level control actions. We can sanity check the world model by sweeping its action space (left-right, up-down), or by comparing the ground truth video to the generated rollouts conditioned on the same unseen action sequence.
1
0
2
@sherryyangML
Sherry Yang
3 months
RT @liuziwei7: ๐ŸŽฌ#CVPR2025 ๐“๐ฎ๐ญ๐จ๐ซ๐ข๐š๐ฅ.๐Ÿ—บ๏ธ๐‘ญ๐’“๐’๐’Ž ๐‘ฝ๐’Š๐’…๐’†๐’ ๐‘ฎ๐’†๐’๐’†๐’“๐’‚๐’•๐’Š๐’๐’ ๐’•๐’ ๐‘พ๐’๐’“๐’๐’… ๐‘ด๐’๐’…๐’†๐’ @CVPR. ๐Ÿ”— ๐Ÿ“…June 11.๐Ÿš€Hosted by @MMLabNTU xโ€ฆ.
0
27
0
@sherryyangML
Sherry Yang
3 months
RT @ma_nanye: Join us for a full-day tutorial on Scalable Generative Models in Computer Vision at @CVPR in Nashville, on Wednesday, June 11โ€ฆ.
0
21
0
@sherryyangML
Sherry Yang
4 months
RT @percyliang: What would truly open-source AI look like? Not just open weights, open code/data, but *open development*, where the entireโ€ฆ.
0
208
0
@sherryyangML
Sherry Yang
4 months
A fruitful collaboration between @UCBerkeley and @GoogleDeepMind with amazing collaborators:.@JaffrelotT, A. Kaplan, Y. Lin, J. Yin,.@SaberMirzaei, @MonaAbdelgaid, A. Alawadhi, R. Cho, @ZhilingZheng, @ekindogus, C. Borgs, @jenniferchayes, @KPatBerkeley, O. Yaghi.
0
1
3
@sherryyangML
Sherry Yang
4 months
The combinatorial generalization of generative models and text is essential for discovery. Text-to-image models can generate astronauts riding horses (plausible but rare). Similarly, models can also generate half-organic half-inorganic materials (also rare but plausible).
Tweet media one
1
0
4
@sherryyangML
Sherry Yang
4 months
The process of scientific discovery is hierarchical (high-level idea to low-level implementation). Our prior work, GenMS, also shows that a combination of foundation models trained on different sources/modalities of data can work together during inference.
@sherryyangML
Sherry Yang
1 year
Checkout Generative Hierarchical Materials Search (GenMS) โ€“ a framework for generating crystal structures from natural language. Website: Paper:
1
0
4
@sherryyangML
Sherry Yang
4 months
Each agent serves a different purpose: LLMs give research directions (e.g., what materials to synthesize), diffusion models generate precise structures, prediction models perform analysis. If we view synthesis hardware as โ€œrobotsโ€, the entire discovery pipeline is automatable.
Tweet media one
1
0
5
@sherryyangML
Sherry Yang
4 months
Science agents can automate discoveries from research ideas to wet labs!. We show a system of agents (LLMs, diffusion models, hardware equipments) was able to discover and synthesize 5 novel metal-organic structures, going beyond human knowledge. Paper:
Tweet media one
4
47
195
@sherryyangML
Sherry Yang
4 months
RT @JaffrelotT: [1/6] .Generative models can dream-up materials, but can we actually make them? . We just released our preprint:.System ofโ€ฆ.
0
3
0
@sherryyangML
Sherry Yang
5 months
RT @SeanKirmani: ๐ŸŒŽ๐ŸŒ๐ŸŒ We are organizing a workshop on Building Physically Plausible World Models at @icmlconf 2025!. We have a great lineupโ€ฆ.
0
23
0
@sherryyangML
Sherry Yang
9 months
At #NeurIPS2024. I'll present and talk about generative simulators, world modeling, and video agent at the D3S3 (, SSL (, and Open-World Agents workshops. I'm recruiting PhD students this application cycle.
0
16
141
@sherryyangML
Sherry Yang
9 months
Video generation models need grounding in the physical world to accurately simulate real-world dynamics. But where can they get feedback on physics?. Our work shows VLMs can serve as effective judges of physical realism, enabling RL for video generation:
Tweet card summary image
arxiv.org
Large text-to-video models hold immense potential for a wide range of downstream applications. However, these models struggle to accurately depict dynamic object interactions, often resulting in...
@frt03_
Hiroki Furuta
9 months
Text-to-video models can generate photorealistic scenes but still struggle to accurately depict dynamic object interactions๐Ÿ˜ข.Our new preprint addresses this through RL finetuning with AI feedback from VLMs capable of video understanding (e.g. Gemini, etc)๐ŸŽ‰ . 1/7
0
12
77