Violet X. @ZiyuX X Profile

Violet X.

@ZiyuX

Followers

189

Following

616

Media

0

Statuses

83

PhD student @Stanford. Working on LLM-based agents

United States

Joined October 2011

Don't wanna be here? Send us removal request.

Violet X.

@ZiyuX

17 days

RT @rm_rafailov: Missed this paper, but it’s pretty cool - it managed to scale our “Meta-CoT” proposal to 70B models by creating synthetic….

0

9

0

Violet X.

@ZiyuX

25 days

RT @locross: Very excited to share this! Check out Concordia 2.0. Simulating human behavior with LLM agents has never been easier. Read the….

0

2

0

Violet X.

@ZiyuX

1 month

Evaluating creative writing has long been challenging and subjective - how do you standardize taste when judging stories? For details check out our work led by @DanielFein7 @sebbrusso.

Sebastian Russo

@sebbrusso

1 month

Introducing LitBench, the first standardized benchmark for creative writing verifiers! We use Reddit’s r/WritingPrompts to label human preferences across 50k story-pairs, and see how LLM-as-a-judge, Generative RMs, and Bradley-Terry RMs stack up.

0

2

12

Violet X.

@ZiyuX

1 month

RT @RylanSchaeffer: Third #ICML2025 paper! What effect will web-scale synthetic data have on future deep generative models?. Collapse or Th….

0

24

0

Violet X.

@ZiyuX

2 months

RT @synth_labs: Our new method (ALP) monitors solve rates across RL rollouts and applies inverse difficulty penalties during RL training.….

0

9

0

Violet X.

@ZiyuX

2 months

Check out this work on benchmarking how well LLMs can implement ML research papers into code led by @tianyu_hua !.

Tianyu Hua

@tianyu_hua

2 months

🚨 New benchmark alert! 🚨. Can today’s LLMs implement tomorrow’s research ideas?. We put them to the test. Introducing #ResearchCodeBench:.212 tasks from 2024–25 ML papers and code, most released after any model’s training cutoff. 🔗 🧵

0

4

8

Violet X.

@ZiyuX

3 months

RT @agarwl_: Going beyond verifiable domains, we still need reward models, which will likely be generative verifiers! Recent papers along t….

0

31

0

Violet X.

@ZiyuX

5 months

RT @gandhikanishk: New Paper!! We try to understand why some LMs self-improve their reasoning while others hit a wall. The key? Cognitive b….

0

183

0

Violet X.

@ZiyuX

5 months

RT @rm_rafailov: This is the dataset we curated for our own reasoning experiments. There is a lot of reasoning data coming out now, but we….

0

11

0

Violet X.

@ZiyuX

5 months

RT @synth_labs: Releasing Big-MATH—the first heavily curated & verifiable dataset designed specifically for large-scale RL training & LLM r….

0

16

0

Violet X.

@ZiyuX

5 months

RT @percyliang: 1/🧵How do we know if AI is actually ready for healthcare? We built a benchmark, MedHELM, that tests LMs on real clinical ta….

0

70

0

Violet X.

@ZiyuX

5 months

RT @Anikait_Singh_: Personalization in LLMs is crucial for meeting diverse user needs, yet collecting real-world preferences at scale remai….

0

13

0

Violet X.

@ZiyuX

6 months

RT @AndrewYNg: Introducing Agentic Object Detection!. Given a text prompt like “unripe strawberries” or “Kellogg’s branded cereal” and an i….

0

727

0

Violet X.

@ZiyuX

7 months

RT @rm_rafailov: Scaling inference-time interaction.

0

3

0

Violet X.

@ZiyuX

7 months

RT @jiayi_pirate: We reproduced DeepSeek R1-Zero in the CountDown game, and it just works . Through RL, the 3B base LM develops self-verifi….

0

1K

0

Violet X.

@ZiyuX

7 months

RT @synth_labs: Ever watched someone solve a hard math problem?. Their first attempt is rarely perfect. They sketch ideas, cross things out….

0

42

0

Violet X.

@ZiyuX

7 months

RT @rm_rafailov: "Superintelligence isn't about discovering new things; it's about discovering new ways to discover" -> Meta RL.

0

32

0

Violet X.

@ZiyuX

7 months

RT @rm_rafailov: We have a new position paper on "inference time compute" and what we have been working on in the last few months! We prese….

0

228

0

Violet X.

@ZiyuX

10 months

RT @sunfanyun: Training RL/robot policies requires extensive experience in the target environment, which is often difficult to obtain. How….

0

45

0

Violet X.

@ZiyuX

1 year

Excited about our new paper - Hypothetical Minds! The hypothesis-search-based approach shows a lot of promise in adapting to diverse agents in multi-agent settings. Check out the full paper for more!.

Logan Cross

@locross

1 year

Very excited to release a new paper introducing Hypothetical Minds!. A LLM agent for multi-agent settings that generates hypotheses about other agents' latent states in natural language, adapting to diverse agents across collaborative, competitive, and mixed-motive domains🧵

0

6