Or Tal
@Or__Tal
Followers
156
Following
197
Media
11
Statuses
81
PhD candidate @HebrewU; Research Assistant @MetaAI (FAIR)
Joined March 2022
Which modeling to choose for text-to-music generation? We run a head-to-head comparison to figure it out. Same data, same architecture - AR vs FM. ๐ If you care about fidelity, speed, control, or editing see this thread. ๐ https://t.co/FBWu7ThspC ๐ https://t.co/Dp1co1esvd 1/6
1
11
41
๐ Excited to share our new paper from my internship at IBM Research! We train Speech-Aware LLMs (SALLMs) with Group Relative Policy Optimization (GRPO) on open-ended tasks (Spoken QA & Speech Translation). We find that GRPO beats SFT! ๐งต
1
7
29
๐จ New paper alert! ๐จ Millions of neural networks now populate public repositories like Hugging Face ๐ค, but most lack documentation. So, we decided to build an Atlas ๐บ๏ธ Project: https://t.co/1JpsC6dCeg Demo: https://t.co/4Xy7yLdIZY ๐งต๐๐ป Here's what we found:
5
17
81
Excited to share this has now been accepted at #NeurIPS2025 as a position paper (<6% acceptance)!๐ We advocate for systematically studying entire model populations via weight-space learning, and argue that this requires charting them in a Model Atlas. @NeurIPSConf #NeurIPS ๐งต๐
๐จ New paper alert! ๐จ Millions of neural networks now populate public repositories like Hugging Face ๐ค, but most lack documentation. So, we decided to build an Atlas ๐บ๏ธ Project: https://t.co/1JpsC6dCeg Demo: https://t.co/4Xy7yLdIZY ๐งต๐๐ป Here's what we found:
0
21
64
Excited to share our work Set Block Decoding! A new paradigm combining next-token-prediction and masked (or discrete diffusion) models, allowing parallel decoding without any architectural changes and with exact KV cache. Arguably one of the simplest ways to accelerate LLMs!
3
25
115
Many modern SpeechLMs are trained with Speech-Text interleaving. How does this impact scaling trends? In our new paper, we train several dozen SLMs, and show - quite a lot! So there is room for optimism ๐ Key insights, code, models, full paper ๐๐ป
4
20
73
๐Thrilled that our paper on "scaling analysis of interleaved speech-text LMs" was accepted to #CoLM2025 It gives room for optimism when scaling SpeechLMs *right* - with large TextLMs (in place of more data), interleaving, and synth training data๐ช
1
5
29
Happy to announce that our paper โEditInspector: A Benchmark for Evaluation of Text-Guided Image Editsโ was accepted to #ACL2025 ๐ ๐ https://t.co/mwugXz1H5q ๐
2
5
22
๐ฃIntroducing PAST: a speech tokenizer that jointly model phonetics and acoustics (No SSL involved). Past demonstrates great reconstruction as well as semantic capabilities in the form of ABX and sWUGGY. ๐ค https://t.co/teQQ9s5whr Check out Nadav's post๐@NadavHarTuv @adiyossLC
huggingface.co
๐จ New paper alert! PAST: phonetic-acoustic speech tokenizer โ just got accepted to Interspeech 2025 ๐ It learns phonetic + acoustic tokens jointly, with no SSL babysitter or external vocoder. ๐ https://t.co/yGypWO6YpM ๐ If youโre into speech LMs, keep reading!
0
0
9
Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation.
arxiv.org
Recent progress in text-to-music generation has enabled models to synthesize high-quality musical segments, full compositions, and even respond to fine-grained control signals, e.g. chord...
0
2
11
๐ต๐ฌ If you are interested in Audio Tokenisers, you should check out our new work! We empirically analysed existing tokenisers from every way - reconstruction, downstream, LMs and more. Grab yourself a โ/๐บ and sit down for a read!
1
26
103
๐จ New Paper: "Time to Talk"! ๐ต๏ธ We built an LLM agent that doesn't just decide WHAT to say, but also WHEN to say it! Introducing "Time to Talk" - LLM agents for asynchronous group communication, tested in real Mafia games with human players. ๐ https://t.co/HdNUwlvF2F ๐งต1/7
3
13
55
Read the full paper! ๐ https://t.co/FBWu7TgUA4 ๐ https://t.co/Dp1co1dUFF
@FelixKreuk @adiyossLC
arxiv.org
Recent progress in text-to-music generation has enabled models to synthesize high-quality musical segments, full compositions, and even respond to fine-grained control signals, e.g. chord...
0
0
3
Weโve been exploring the trade-offs between Autoregressive and Flow-Matching models for music generation. We share our findings in this latest paper led by @Or__Tal. Many interesting take-aways and practical advice on training generative models for music! ๐ถ๐ง
Which modeling to choose for text-to-music generation? We run a head-to-head comparison to figure it out. Same data, same architecture - AR vs FM. ๐ If you care about fidelity, speed, control, or editing see this thread. ๐ https://t.co/FBWu7ThspC ๐ https://t.co/Dp1co1esvd 1/6
1
1
11
What if training steps are capped at 500k? FM reaches near-topline quality with small batches. Itโs compute-efficient and forgiving. AR needs larger batch sizes to recover performance. It benefits more from large-scale training. See๐ below by model duration + batch size: 6/6
1
0
2
โก Speed and Quality: FM can be faster - but at the cost of quality reduction. AR scales better with batch size (thanks to KV caching). 5/6
1
0
1
๐๏ธ Editing music via inpainting (filling gaps) Supervised FM came out on top ๐ง listeners rated FM best for transition smoothness & audio match AR scored the lowest FAD , but had audible transitions. Zero-shot FM is fast, flexibleโฆ still unstable. ๐ง https://t.co/FBWu7ThspC 4/6
1
0
1
๐งฉ Temporal control (chords, melody, drums)? AR tracked them more accurately. But both AR & FM lost fidelity under strict constraints. ๐ง https://t.co/FBWu7TgUA4 3/6
1
0
1
Hereโs the side-by-side: ๐ผ AR vs FM across 5 axes - fidelity, control, editing, speed, and training. No clear winner. Every strength is a trade-off. ๐ A snapshot of where each modeling paradigm shines (and struggles): ๐Full paper: https://t.co/Dp1co1dUFF 2/6
1
0
1
๐จ Happy to share our #Interspeech2025 paper! "WhiStress: Enriching Transcriptions with Sentence Stress Detection" Sentence stress is a word-level prosodic cue that marks contrast or intent. WhiStress detects it alongside transcriptionโno alignment needed. Paper, code, demo ๐
2
10
33