Or Tal @Or__Tal X Profile

Or Tal

@Or__Tal

Followers

156

Following

197

Media

11

Statuses

81

PhD candidate @HebrewU; Research Assistant @MetaAI (FAIR)

Joined March 2022

Don't wanna be here? Send us removal request.

Or Tal

@Or__Tal

6 months

Which modeling to choose for text-to-music generation? We run a head-to-head comparison to figure it out. Same data, same architecture - AR vs FM. 👇 If you care about fidelity, speed, control, or editing see this thread. 🔗 https://t.co/FBWu7ThspC 📄 https://t.co/Dp1co1esvd 1/6

1

11

41

Avishai Elmakies

@AvishaiElm37946

2 months

🚀 Excited to share our new paper from my internship at IBM Research! We train Speech-Aware LLMs (SALLMs) with Group Relative Policy Optimization (GRPO) on open-ended tasks (Spoken QA & Speech Translation). We find that GRPO beats SFT! 🧵

1

7

29

Eliahu Horwitz @ NeurIPS

@EliahuHorwitz

9 months

🚨 New paper alert! 🚨 Millions of neural networks now populate public repositories like Hugging Face 🤗, but most lack documentation. So, we decided to build an Atlas 🗺️ Project: https://t.co/1JpsC6dCeg Demo: https://t.co/4Xy7yLdIZY 🧵👇🏻 Here's what we found:

AK

@_akhaliq

9 months

Charting and Navigating Hugging Face's Model Atlas

5

17

81

Eliahu Horwitz @ NeurIPS

@EliahuHorwitz

2 months

Excited to share this has now been accepted at #NeurIPS2025 as a position paper (<6% acceptance)!🎉 We advocate for systematically studying entire model populations via weight-space learning, and argue that this requires charting them in a Model Atlas. @NeurIPSConf #NeurIPS 🧵👇

Eliahu Horwitz @ NeurIPS

@EliahuHorwitz

9 months

🚨 New paper alert! 🚨 Millions of neural networks now populate public repositories like Hugging Face 🤗, but most lack documentation. So, we decided to build an Atlas 🗺️ Project: https://t.co/1JpsC6dCeg Demo: https://t.co/4Xy7yLdIZY 🧵👇🏻 Here's what we found:

0

21

64

Heli Ben-Hamu

@helibenhamu

3 months

Excited to share our work Set Block Decoding! A new paradigm combining next-token-prediction and masked (or discrete diffusion) models, allowing parallel decoding without any architectural changes and with exact KV cache. Arguably one of the simplest ways to accelerate LLMs!

3

25

115

Gallil Maimon

@GallilMaimon

8 months

Many modern SpeechLMs are trained with Speech-Text interleaving. How does this impact scaling trends? In our new paper, we train several dozen SLMs, and show - quite a lot! So there is room for optimism 😊 Key insights, code, models, full paper 👇🏻

4

20

73

Gallil Maimon

@GallilMaimon

5 months

🎉Thrilled that our paper on "scaling analysis of interleaved speech-text LMs" was accepted to #CoLM2025 It gives room for optimism when scaling SpeechLMs *right* - with large TextLMs (in place of more data), interleaving, and synth training data💪

1

5

29

Ron Yosef

@ron_yosef

5 months

Happy to announce that our paper “EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits” was accepted to #ACL2025 🎉 📄 https://t.co/mwugXz1H5q 🌐

2

5

22

Or Tal

@Or__Tal

5 months

💣Introducing PAST: a speech tokenizer that jointly model phonetics and acoustics (No SSL involved). Past demonstrates great reconstruction as well as semantic capabilities in the form of ABX and sWUGGY. 🤗 https://t.co/teQQ9s5whr Check out Nadav's post👇@NadavHarTuv @adiyossLC

huggingface.co

נדב הר-טוב

@NadavHarTuv

5 months

🚨 New paper alert! PAST: phonetic-acoustic speech tokenizer – just got accepted to Interspeech 2025 🎉 It learns phonetic + acoustic tokens jointly, with no SSL babysitter or external vocoder. 🔗 https://t.co/yGypWO6YpM 👇 If you’re into speech LMs, keep reading!

0

9

Audio and Speech Processing Papers

@AudioAndSpeech

6 months

Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation.

arxiv.org

Recent progress in text-to-music generation has enabled models to synthesize high-quality musical segments, full compositions, and even respond to fine-grained control signals, e.g. chord...

0

2

11

Gallil Maimon

@GallilMaimon

6 months

🎵💬 If you are interested in Audio Tokenisers, you should check out our new work! We empirically analysed existing tokenisers from every way - reconstruction, downstream, LMs and more. Grab yourself a ☕/🍺 and sit down for a read!

1

26

103

Niv Eckhaus

@niveckhaus

6 months

🚨 New Paper: "Time to Talk"! 🕵️ We built an LLM agent that doesn't just decide WHAT to say, but also WHEN to say it! Introducing "Time to Talk" - LLM agents for asynchronous group communication, tested in real Mafia games with human players. 🌐 https://t.co/HdNUwlvF2F 🧵1/7

3

13

55

Or Tal

@Or__Tal

6 months

Read the full paper! 🔗 https://t.co/FBWu7TgUA4 📄 https://t.co/Dp1co1dUFF @FelixKreuk @adiyossLC

arxiv.org

Recent progress in text-to-music generation has enabled models to synthesize high-quality musical segments, full compositions, and even respond to fine-grained control signals, e.g. chord...

0

3

Felix Kreuk

@FelixKreuk

6 months

We’ve been exploring the trade-offs between Autoregressive and Flow-Matching models for music generation. We share our findings in this latest paper led by @Or__Tal. Many interesting take-aways and practical advice on training generative models for music! 🎶🧠

Or Tal

@Or__Tal

6 months

Which modeling to choose for text-to-music generation? We run a head-to-head comparison to figure it out. Same data, same architecture - AR vs FM. 👇 If you care about fidelity, speed, control, or editing see this thread. 🔗 https://t.co/FBWu7ThspC 📄 https://t.co/Dp1co1esvd 1/6

1

11

Or Tal

@Or__Tal

6 months

What if training steps are capped at 500k? FM reaches near-topline quality with small batches. It’s compute-efficient and forgiving. AR needs larger batch sizes to recover performance. It benefits more from large-scale training. See📉 below by model duration + batch size: 6/6

1

0

2

Or Tal

@Or__Tal

6 months

⚡ Speed and Quality: FM can be faster - but at the cost of quality reduction. AR scales better with batch size (thanks to KV caching). 5/6

1

0

1

Or Tal

@Or__Tal

6 months

🎛️ Editing music via inpainting (filling gaps) Supervised FM came out on top 🧠 listeners rated FM best for transition smoothness & audio match AR scored the lowest FAD , but had audible transitions. Zero-shot FM is fast, flexible… still unstable. 🎧 https://t.co/FBWu7ThspC 4/6

1

0

1

Or Tal

@Or__Tal

6 months

🧩 Temporal control (chords, melody, drums)? AR tracked them more accurately. But both AR & FM lost fidelity under strict constraints. 🎧 https://t.co/FBWu7TgUA4 3/6

1

0

1

Or Tal

@Or__Tal

6 months

Here’s the side-by-side: 🎼 AR vs FM across 5 axes - fidelity, control, editing, speed, and training. No clear winner. Every strength is a trade-off. 👇 A snapshot of where each modeling paradigm shines (and struggles): 📄Full paper: https://t.co/Dp1co1dUFF 2/6

1

0

1

Iddo Yosha

@iddoyosha

6 months

🚨 Happy to share our #Interspeech2025 paper! "WhiStress: Enriching Transcriptions with Sentence Stress Detection" Sentence stress is a word-level prosodic cue that marks contrast or intent. WhiStress detects it alongside transcription—no alignment needed. Paper, code, demo 👇

2

10

33