ZiyuX Profile Banner
Violet X. Profile
Violet X.

@ZiyuX

Followers
189
Following
616
Media
0
Statuses
83

PhD student @Stanford. Working on LLM-based agents

United States
Joined October 2011
Don't wanna be here? Send us removal request.
@ZiyuX
Violet X.
17 days
RT @rm_rafailov: Missed this paper, but it’s pretty cool - it managed to scale our “Meta-CoT” proposal to 70B models by creating synthetic….
0
9
0
@ZiyuX
Violet X.
25 days
RT @locross: Very excited to share this! Check out Concordia 2.0. Simulating human behavior with LLM agents has never been easier. Read the….
0
2
0
@ZiyuX
Violet X.
1 month
Evaluating creative writing has long been challenging and subjective - how do you standardize taste when judging stories? For details check out our work led by @DanielFein7 @sebbrusso.
@sebbrusso
Sebastian Russo
1 month
Introducing LitBench, the first standardized benchmark for creative writing verifiers! We use Reddit’s r/WritingPrompts to label human preferences across 50k story-pairs, and see how LLM-as-a-judge, Generative RMs, and Bradley-Terry RMs stack up.
Tweet media one
0
2
12
@ZiyuX
Violet X.
1 month
RT @RylanSchaeffer: Third #ICML2025 paper! What effect will web-scale synthetic data have on future deep generative models?. Collapse or Th….
0
24
0
@ZiyuX
Violet X.
2 months
RT @synth_labs: Our new method (ALP) monitors solve rates across RL rollouts and applies inverse difficulty penalties during RL training.….
0
9
0
@ZiyuX
Violet X.
2 months
Check out this work on benchmarking how well LLMs can implement ML research papers into code led by @tianyu_hua !.
@tianyu_hua
Tianyu Hua
2 months
🚨 New benchmark alert! 🚨. Can today’s LLMs implement tomorrow’s research ideas?. We put them to the test. Introducing #ResearchCodeBench:.212 tasks from 2024–25 ML papers and code, most released after any model’s training cutoff. 🔗 🧵
Tweet media one
0
4
8
@ZiyuX
Violet X.
3 months
RT @agarwl_: Going beyond verifiable domains, we still need reward models, which will likely be generative verifiers! Recent papers along t….
0
31
0
@ZiyuX
Violet X.
5 months
RT @gandhikanishk: New Paper!! We try to understand why some LMs self-improve their reasoning while others hit a wall. The key? Cognitive b….
0
183
0
@ZiyuX
Violet X.
5 months
RT @rm_rafailov: This is the dataset we curated for our own reasoning experiments. There is a lot of reasoning data coming out now, but we….
0
11
0
@ZiyuX
Violet X.
5 months
RT @synth_labs: Releasing Big-MATH—the first heavily curated & verifiable dataset designed specifically for large-scale RL training & LLM r….
0
16
0
@ZiyuX
Violet X.
5 months
RT @percyliang: 1/🧵How do we know if AI is actually ready for healthcare? We built a benchmark, MedHELM, that tests LMs on real clinical ta….
0
70
0
@ZiyuX
Violet X.
5 months
RT @Anikait_Singh_: Personalization in LLMs is crucial for meeting diverse user needs, yet collecting real-world preferences at scale remai….
0
13
0
@ZiyuX
Violet X.
6 months
RT @AndrewYNg: Introducing Agentic Object Detection!. Given a text prompt like “unripe strawberries” or “Kellogg’s branded cereal” and an i….
0
727
0
@ZiyuX
Violet X.
7 months
RT @rm_rafailov: Scaling inference-time interaction.
0
3
0
@ZiyuX
Violet X.
7 months
RT @jiayi_pirate: We reproduced DeepSeek R1-Zero in the CountDown game, and it just works . Through RL, the 3B base LM develops self-verifi….
0
1K
0
@ZiyuX
Violet X.
7 months
RT @synth_labs: Ever watched someone solve a hard math problem?. Their first attempt is rarely perfect. They sketch ideas, cross things out….
0
42
0
@ZiyuX
Violet X.
7 months
RT @rm_rafailov: "Superintelligence isn't about discovering new things; it's about discovering new ways to discover" -> Meta RL.
0
32
0
@ZiyuX
Violet X.
7 months
RT @rm_rafailov: We have a new position paper on "inference time compute" and what we have been working on in the last few months! We prese….
0
228
0
@ZiyuX
Violet X.
10 months
RT @sunfanyun: Training RL/robot policies requires extensive experience in the target environment, which is often difficult to obtain. How….
0
45
0
@ZiyuX
Violet X.
1 year
Excited about our new paper - Hypothetical Minds! The hypothesis-search-based approach shows a lot of promise in adapting to diverse agents in multi-agent settings. Check out the full paper for more!.
@locross
Logan Cross
1 year
Very excited to release a new paper introducing Hypothetical Minds!. A LLM agent for multi-agent settings that generates hypotheses about other agents' latent states in natural language, adapting to diverse agents across collaborative, competitive, and mixed-motive domains🧵
0
0
6