Oren Sultan
@oren_sultan
Followers
1K
Following
2K
Media
79
Statuses
621
Research Scientist Intern @Meta, @AIatMeta (FAIR), CS PhD Candidate @HebrewU, @HyadataLab | Past: @Lightricks @TU_Muenchen @UniMelb
Tel Aviv, Israel
Joined August 2021
Iโm excited to start a new chapter as a PhD Research Scientist Intern at Meta AI, FAIR (Fundamental AI Research) group! Grateful to be part of the CodeGen team in Tel Aviv, working on cutting-edge AI research for code reasoning, understanding and generation ๐ป๐ค
3
2
106
ืจืคื ืื ืฉืืจืืช, ืืฉืขืืจ ืจืืฉ ืืขืืจ ืืืช ืฉืื, ืืื ืืืื ืฉื ืืืืื ืกืืดืจ ืืืจืืื ืืดื ืฉื ืคื ืืงืจื ืืืืฆื ื ืื ืขืื. ืืืจืืื ืืืืจืื ืืชืจืืขื ืืจืืฉ ืฉืืืืก ืืชืื ื ืืืืื, ืื ืื ืฉืืขื ืืื. ืืงืฉืืื ืืืืจืื ืืื ืืืื ืฉื ืจืคื ืืืงืจ:
99
196
2K
Looking forward to presenting our TACL paper on enhancing LLM creativity at #EMNLP2025 tomorrow (Wed, Nov 5)! ๐ Room A108 ๐ 14:30โ16:00 (Linguistic Theories, Cognitive Modeling & Psycholinguistics) Details below โฌ๏ธ #NLP #LLMs #Creativity
How can we help LLMs move beyond the obvious toward generating more creative and diverse ideas? In our new TACL paper, we propose a novel approach to enhance LLM creative generation! https://t.co/AFCpQddN6j
@ChenShani2 @GabiStanovsky @jurafsky @HyadataLab @stanfordnlp @nlphuji
4
14
60
Heading to #EMNLP2025! ๐ Two of our papers will be there โ come say hi ๐ ๐ผ๏ธ Image Captioning Evaluation โ Nov 5, 17:45 ๐ https://t.co/TdMVA2iWSD ๐ต๏ธ Deceptive LLM Agents (Mafia Game) โ Nov 5, 13:00 ๐
arxiv.org
LLMs are used predominantly in synchronous communication, where a human user and a model communicate in alternating turns. In contrast, many real-world settings are asynchronous. For example, in...
1
6
26
We present DyPE, a framework for ultra high resolution image generation. DyPE adjusts positional embeddings to evolve dynamically with the spectral progression of diffusion. This lets pre-trained DiTs create images with 16M+ pixels without retraining or extra inference cost. ๐งต๐
9
32
102
How can we help LLMs move beyond the obvious toward generating more creative and diverse ideas? In our new TACL paper, we propose a novel approach to enhance LLM creative generation! https://t.co/AFCpQddN6j
@ChenShani2 @GabiStanovsky @jurafsky @HyadataLab @stanfordnlp @nlphuji
6
26
84
Excited to share this has now been accepted at #NeurIPS2025 as a position paper (<6% acceptance)!๐ We advocate for systematically studying entire model populations via weight-space learning, and argue that this requires charting them in a Model Atlas. @NeurIPSConf #NeurIPS ๐งต๐
๐จ New paper alert! ๐จ Millions of neural networks now populate public repositories like Hugging Face ๐ค, but most lack documentation. So, we decided to build an Atlas ๐บ๏ธ Project: https://t.co/1JpsC6dCeg Demo: https://t.co/4Xy7yLdIZY ๐งต๐๐ป Here's what we found:
0
21
64
Code World Model: producing code by imagining the effect of executing instructions and planning instructions that produce the desired effect.
(๐งต) Today, we release Meta Code World Model (CWM), a 32-billion-parameter dense LLM that enables novel research on improving code generation through agentic reasoning and planning with world models. https://t.co/BJSUCh2vtg
72
177
2K
We release Code World Model (CWM)! ๐ฉโ๐ป๐๐ A coding LLM designed to advance code generation research through agentic reasoning and world-model-based planning. Super excited about this release and proud of the teamโs work! ๐ See Gab's post for more info ๐
(๐งต) Today, we release Meta Code World Model (CWM), a 32-billion-parameter dense LLM that enables novel research on improving code generation through agentic reasoning and planning with world models. https://t.co/BJSUCh2vtg
0
11
49
1/ We released CWM, a 32B dense LLM for coding, agentic use, and, more importantly, to further World-Modeling research. To support this research, we release the pre-training, sft and rl model weights, along with inference code and the tech report. See:
(๐งต) Today, we release Meta Code World Model (CWM), a 32-billion-parameter dense LLM that enables novel research on improving code generation through agentic reasoning and planning with world models. https://t.co/BJSUCh2vtg
1
7
38
๐ฅ CWM x BigO(Bench) ๐ฅ CWM 32B was just released, and evaluated on BigO(Bench) ! Does "world-modeling-aware" training helps CWM reach higher performance on Code Complexity related tasks ?
2
5
25
Our new Code World Model (CWM) is out! I learned and gained expertise working on the RL part, and I'm super proud of what we built. Check out the thread below for the full details.
(๐งต) Today, we release Meta Code World Model (CWM), a 32-billion-parameter dense LLM that enables novel research on improving code generation through agentic reasoning and planning with world models. https://t.co/BJSUCh2vtg
0
1
15
New from Meta FAIR: Code World Model (CWM), a 32B-parameter research model designed to explore how world models can transform code generation and reasoning about code. We believe in advancing research in world modeling and are sharing CWM under a research license to help empower
103
225
1K
new research from Meta FAIR: Code World Model (CWM), a 32B research model we encourage the research community to research this open-weight model! pass@1 evals, for the curious: 65.8 % on SWE-bench Verified 68.6 % on LiveCodeBench 96.6 % on Math-500 76.0 % on AIME 2024 ๐งต
96
164
1K
(๐งต) Today, we release Meta Code World Model (CWM), a 32-billion-parameter dense LLM that enables novel research on improving code generation through agentic reasoning and planning with world models. https://t.co/BJSUCh2vtg
60
313
2K
๐ Proud to share that "Debatable Intelligence" has now been accepted to #EMNLP2025 (Main Conference)! https://t.co/zVE73m9lVu Huge thenks to my amazing collaborators @ArielGera2, @RoyBarHaim, @Hoper_Tom, @noamslonim
noy-sternlicht.github.io
We assess the judgment capabilities and behavior of LLMs by analyzing how they rate debate speeches - long texts that argue for or against a controversial topic.
๐ New Paper! We propose a challenging new benchmark for LLM judges: Evaluating debate speeches. Are they comparable to humans? Well... itโs debatable. ๐ค https://t.co/u0sd8SrGjj ๐ Here are our findings:
3
15
54
Proud to share PromptSuite! ๐ A flexible framework for generating thousands of prompt variations per instance, enabling robust multi-prompt LLM evaluation across diverse tasks. Python API & web UI included. Check it out:
eliyahabba.github.io
A flexible framework for automatic generation of prompt variations for robust LLM evaluation.
Old news: Single-prompt eval is unreliable๐คฏ New news: PromptSuite๐ - an easy way to augment your benchmark with thousands of paraphrases โก๏ธ robust eval, zero sweat! - Works on any dataset! - Python API + web UI @EliyaHabba, @GiliLior, @GabiStanovsky
https://t.co/C4VwIvzJFX
0
2
14
[1/6] ๐ฌ New paper: Story2Board We guide diffusion models to generate consistent, expressive storyboards--no training needed. By mixing attention-aligned tokens across panels, we reinforce character identity without hurting layout diversity. ๐ https://t.co/aRG81nu5qK
5
11
30
๐จ Benchmarks tell us which model is better โ but not why it fails. For developers, this means tedious, manual error analysis. We're bridging that gap. Meet CLEAR: an open-source tool for actionable error analysis of LLMs. ๐งต๐
1
14
44
Presenting my poster : ๐๏ธ DOVE - A large-scale multi-dimensional predictions dataset towards meaningful LLM evaluation, Monday 18:00 Vienna, #ACL2025 Come chat about LLM evaluation, prompt sensitivity, and our 250M COLLECTION OF MODEL OUTPUTS!
2
11
47