Sherry Yang @sherryyangML X Profile

Sherry Yang

@sherryyangML

Followers

4K

Following

315

Media

63

Statuses

203

Research Scientist @GoogleDeepMind. Previously PhD @UCBerkeley, M.Eng. / B.S. @MIT.

Joined September 2015

Don't wanna be here? Send us removal request.

Sherry Yang

@sherryyangML

3 months

Evaluating policies on a real robot can be painful. Can we use a world model to get a rough estimate of how good a policy is?. Checkout "Evaluating Robot Policies in a World Model". Paper: Demo: Code:

6

10

65

Sherry Yang

@sherryyangML

2 months

RT @SeanKirmani: 🤖🌎 We are organizing a workshop on Robotics World Modeling at @corl_conf 2025!. We have an excellent group of speakers and….

0

37

0

Sherry Yang

@sherryyangML

2 months

RT @percyliang: Wrapped up Stanford CS336 (Language Models from Scratch), taught with an amazing team @tatsu_hashimoto @marcelroed @neilbba….

0

585

0

Sherry Yang

@sherryyangML

3 months

Try to see if you can pick up a spoon in the world model similar to playing a claw machine: Notably, this model was trained on 2 A100 GPUs. World modeling research can be done in academia too. Great collaboration led by @julianhquevedo with @percyliang

0

2

Sherry Yang

@sherryyangML

3 months

False negative and false positive examples evaluating in-distribution and out-of-distribution policies. These shortcomings call for additional research in developing accurate and reliable world models for practical purposes such as evaluating robot policies.

2

0

3

Sherry Yang

@sherryyangML

3 months

Interestingly, we found the world model underestimate the policy values when evaluating in-distribution policies (the same policy that collected training data for the world model), but overestimate the policy values for out-of-distribution policies (e.g., noisy policies).

1

0

2

Sherry Yang

@sherryyangML

3 months

Action-conditioned video generation does not offer reward. We can leverage a VLM to rate generated video rollouts. To evaluate the world model, we found MSE between generated rollouts and ground truth videos to be a reasonable metric, as the dynamics are almost deterministic.

1

0

2

Sherry Yang

@sherryyangML

3 months

A world model needs to precisely respond to low-level control actions. We can sanity check the world model by sweeping its action space (left-right, up-down), or by comparing the ground truth video to the generated rollouts conditioned on the same unseen action sequence.

1

0

2

Sherry Yang

@sherryyangML

3 months

RT @liuziwei7: 🎬#CVPR2025 𝐓𝐮𝐭𝐨𝐫𝐢𝐚𝐥.🗺️𝑭𝒓𝒐𝒎 𝑽𝒊𝒅𝒆𝒐 𝑮𝒆𝒏𝒆𝒓𝒂𝒕𝒊𝒐𝒏 𝒕𝒐 𝑾𝒐𝒓𝒍𝒅 𝑴𝒐𝒅𝒆𝒍 @CVPR. 🔗 📅June 11.🚀Hosted by @MMLabNTU x….

0

27

0

Sherry Yang

@sherryyangML

3 months

RT @ma_nanye: Join us for a full-day tutorial on Scalable Generative Models in Computer Vision at @CVPR in Nashville, on Wednesday, June 11….

0

21

0

Sherry Yang

@sherryyangML

4 months

RT @percyliang: What would truly open-source AI look like? Not just open weights, open code/data, but *open development*, where the entire….

0

208

0

Sherry Yang

@sherryyangML

4 months

A fruitful collaboration between @UCBerkeley and @GoogleDeepMind with amazing collaborators:.@JaffrelotT, A. Kaplan, Y. Lin, J. Yin,.@SaberMirzaei, @MonaAbdelgaid, A. Alawadhi, R. Cho, @ZhilingZheng, @ekindogus, C. Borgs, @jenniferchayes, @KPatBerkeley, O. Yaghi.

0

1

3

Sherry Yang

@sherryyangML

4 months

The combinatorial generalization of generative models and text is essential for discovery. Text-to-image models can generate astronauts riding horses (plausible but rare). Similarly, models can also generate half-organic half-inorganic materials (also rare but plausible).

1

0

4

Sherry Yang

@sherryyangML

4 months

The process of scientific discovery is hierarchical (high-level idea to low-level implementation). Our prior work, GenMS, also shows that a combination of foundation models trained on different sources/modalities of data can work together during inference.

Sherry Yang

@sherryyangML

1 year

Checkout Generative Hierarchical Materials Search (GenMS) – a framework for generating crystal structures from natural language. Website: Paper:

1

0

4

Sherry Yang

@sherryyangML

4 months

Each agent serves a different purpose: LLMs give research directions (e.g., what materials to synthesize), diffusion models generate precise structures, prediction models perform analysis. If we view synthesis hardware as “robots”, the entire discovery pipeline is automatable.

1

0

5

Sherry Yang

@sherryyangML

4 months

Science agents can automate discoveries from research ideas to wet labs!. We show a system of agents (LLMs, diffusion models, hardware equipments) was able to discover and synthesize 5 novel metal-organic structures, going beyond human knowledge. Paper:

4

47

195

Sherry Yang

@sherryyangML

4 months

RT @JaffrelotT: [1/6] .Generative models can dream-up materials, but can we actually make them? . We just released our preprint:.System of….

0

3

0

Sherry Yang

@sherryyangML

5 months

RT @SeanKirmani: 🌎🌏🌍 We are organizing a workshop on Building Physically Plausible World Models at @icmlconf 2025!. We have a great lineup….

0

23

0

Sherry Yang

@sherryyangML

9 months

At #NeurIPS2024. I'll present and talk about generative simulators, world modeling, and video agent at the D3S3 (, SSL (, and Open-World Agents workshops. I'm recruiting PhD students this application cycle.

0

16

141

Sherry Yang

@sherryyangML

9 months

Video generation models need grounding in the physical world to accurately simulate real-world dynamics. But where can they get feedback on physics?. Our work shows VLMs can serve as effective judges of physical realism, enabling RL for video generation:

arxiv.org

Large text-to-video models hold immense potential for a wide range of downstream applications. However, these models struggle to accurately depict dynamic object interactions, often resulting in...

Hiroki Furuta

@frt03_

9 months

Text-to-video models can generate photorealistic scenes but still struggle to accurately depict dynamic object interactions😢.Our new preprint addresses this through RL finetuning with AI feedback from VLMs capable of video understanding (e.g. Gemini, etc)🎉 . 1/7

0

12

77