Or Tal Profile
Or Tal

@Or__Tal

Followers
156
Following
197
Media
11
Statuses
81

PhD candidate @HebrewU; Research Assistant @MetaAI (FAIR)

Joined March 2022
Don't wanna be here? Send us removal request.
@Or__Tal
Or Tal
6 months
Which modeling to choose for text-to-music generation? We run a head-to-head comparison to figure it out. Same data, same architecture - AR vs FM. ๐Ÿ‘‡ If you care about fidelity, speed, control, or editing see this thread. ๐Ÿ”— https://t.co/FBWu7ThspC ๐Ÿ“„ https://t.co/Dp1co1esvd 1/6
1
11
41
@AvishaiElm37946
Avishai Elmakies
2 months
๐Ÿš€ Excited to share our new paper from my internship at IBM Research! We train Speech-Aware LLMs (SALLMs) with Group Relative Policy Optimization (GRPO) on open-ended tasks (Spoken QA & Speech Translation). We find that GRPO beats SFT! ๐Ÿงต
1
7
29
@EliahuHorwitz
Eliahu Horwitz @ NeurIPS
9 months
๐Ÿšจ New paper alert! ๐Ÿšจ Millions of neural networks now populate public repositories like Hugging Face ๐Ÿค—, but most lack documentation. So, we decided to build an Atlas ๐Ÿ—บ๏ธ Project: https://t.co/1JpsC6dCeg Demo: https://t.co/4Xy7yLdIZY ๐Ÿงต๐Ÿ‘‡๐Ÿป Here's what we found:
@_akhaliq
AK
9 months
Charting and Navigating Hugging Face's Model Atlas
5
17
81
@EliahuHorwitz
Eliahu Horwitz @ NeurIPS
2 months
Excited to share this has now been accepted at #NeurIPS2025 as a position paper (<6% acceptance)!๐ŸŽ‰ We advocate for systematically studying entire model populations via weight-space learning, and argue that this requires charting them in a Model Atlas. @NeurIPSConf #NeurIPS ๐Ÿงต๐Ÿ‘‡
@EliahuHorwitz
Eliahu Horwitz @ NeurIPS
9 months
๐Ÿšจ New paper alert! ๐Ÿšจ Millions of neural networks now populate public repositories like Hugging Face ๐Ÿค—, but most lack documentation. So, we decided to build an Atlas ๐Ÿ—บ๏ธ Project: https://t.co/1JpsC6dCeg Demo: https://t.co/4Xy7yLdIZY ๐Ÿงต๐Ÿ‘‡๐Ÿป Here's what we found:
0
21
64
@helibenhamu
Heli Ben-Hamu
3 months
Excited to share our work Set Block Decoding! A new paradigm combining next-token-prediction and masked (or discrete diffusion) models, allowing parallel decoding without any architectural changes and with exact KV cache. Arguably one of the simplest ways to accelerate LLMs!
3
25
115
@GallilMaimon
Gallil Maimon
8 months
Many modern SpeechLMs are trained with Speech-Text interleaving. How does this impact scaling trends? In our new paper, we train several dozen SLMs, and show - quite a lot! So there is room for optimism ๐Ÿ˜Š Key insights, code, models, full paper ๐Ÿ‘‡๐Ÿป
4
20
73
@GallilMaimon
Gallil Maimon
5 months
๐ŸŽ‰Thrilled that our paper on "scaling analysis of interleaved speech-text LMs" was accepted to #CoLM2025 It gives room for optimism when scaling SpeechLMs *right* - with large TextLMs (in place of more data), interleaving, and synth training data๐Ÿ’ช
1
5
29
@ron_yosef
Ron Yosef
5 months
Happy to announce that our paper โ€œEditInspector: A Benchmark for Evaluation of Text-Guided Image Editsโ€ was accepted to #ACL2025 ๐ŸŽ‰ ๐Ÿ“„ https://t.co/mwugXz1H5q ๐ŸŒ
2
5
22
@Or__Tal
Or Tal
5 months
๐Ÿ’ฃIntroducing PAST: a speech tokenizer that jointly model phonetics and acoustics (No SSL involved). Past demonstrates great reconstruction as well as semantic capabilities in the form of ABX and sWUGGY. ๐Ÿค— https://t.co/teQQ9s5whr Check out Nadav's post๐Ÿ‘‡@NadavHarTuv @adiyossLC
Tweet card summary image
huggingface.co
@NadavHarTuv
ื ื“ื‘ ื”ืจ-ื˜ื•ื‘
5 months
๐Ÿšจ New paper alert! PAST: phonetic-acoustic speech tokenizer โ€“ just got accepted to Interspeech 2025 ๐ŸŽ‰ It learns phonetic + acoustic tokens jointly, with no SSL babysitter or external vocoder. ๐Ÿ”— https://t.co/yGypWO6YpM ๐Ÿ‘‡ If youโ€™re into speech LMs, keep reading!
0
0
9
@AudioAndSpeech
Audio and Speech Processing Papers
6 months
Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation.
Tweet card summary image
arxiv.org
Recent progress in text-to-music generation has enabled models to synthesize high-quality musical segments, full compositions, and even respond to fine-grained control signals, e.g. chord...
0
2
11
@GallilMaimon
Gallil Maimon
6 months
๐ŸŽต๐Ÿ’ฌ If you are interested in Audio Tokenisers, you should check out our new work! We empirically analysed existing tokenisers from every way - reconstruction, downstream, LMs and more. Grab yourself a โ˜•/๐Ÿบ and sit down for a read!
1
26
103
@niveckhaus
Niv Eckhaus
6 months
๐Ÿšจ New Paper: "Time to Talk"! ๐Ÿ•ต๏ธ We built an LLM agent that doesn't just decide WHAT to say, but also WHEN to say it! Introducing "Time to Talk" - LLM agents for asynchronous group communication, tested in real Mafia games with human players. ๐ŸŒ https://t.co/HdNUwlvF2F ๐Ÿงต1/7
3
13
55
@FelixKreuk
Felix Kreuk
6 months
Weโ€™ve been exploring the trade-offs between Autoregressive and Flow-Matching models for music generation. We share our findings in this latest paper led by @Or__Tal. Many interesting take-aways and practical advice on training generative models for music! ๐ŸŽถ๐Ÿง 
@Or__Tal
Or Tal
6 months
Which modeling to choose for text-to-music generation? We run a head-to-head comparison to figure it out. Same data, same architecture - AR vs FM. ๐Ÿ‘‡ If you care about fidelity, speed, control, or editing see this thread. ๐Ÿ”— https://t.co/FBWu7ThspC ๐Ÿ“„ https://t.co/Dp1co1esvd 1/6
1
1
11
@Or__Tal
Or Tal
6 months
What if training steps are capped at 500k? FM reaches near-topline quality with small batches. Itโ€™s compute-efficient and forgiving. AR needs larger batch sizes to recover performance. It benefits more from large-scale training. See๐Ÿ“‰ below by model duration + batch size: 6/6
1
0
2
@Or__Tal
Or Tal
6 months
โšก Speed and Quality: FM can be faster - but at the cost of quality reduction. AR scales better with batch size (thanks to KV caching). 5/6
1
0
1
@Or__Tal
Or Tal
6 months
๐ŸŽ›๏ธ Editing music via inpainting (filling gaps) Supervised FM came out on top ๐Ÿง  listeners rated FM best for transition smoothness & audio match AR scored the lowest FAD , but had audible transitions. Zero-shot FM is fast, flexibleโ€ฆ still unstable. ๐ŸŽง https://t.co/FBWu7ThspC 4/6
1
0
1
@Or__Tal
Or Tal
6 months
๐Ÿงฉ Temporal control (chords, melody, drums)? AR tracked them more accurately. But both AR & FM lost fidelity under strict constraints. ๐ŸŽง https://t.co/FBWu7TgUA4 3/6
1
0
1
@Or__Tal
Or Tal
6 months
Hereโ€™s the side-by-side: ๐ŸŽผ AR vs FM across 5 axes - fidelity, control, editing, speed, and training. No clear winner. Every strength is a trade-off. ๐Ÿ‘‡ A snapshot of where each modeling paradigm shines (and struggles): ๐Ÿ“„Full paper: https://t.co/Dp1co1dUFF 2/6
1
0
1
@iddoyosha
Iddo Yosha
6 months
๐Ÿšจ Happy to share our #Interspeech2025 paper! "WhiStress: Enriching Transcriptions with Sentence Stress Detection" Sentence stress is a word-level prosodic cue that marks contrast or intent. WhiStress detects it alongside transcriptionโ€”no alignment needed. Paper, code, demo ๐Ÿ‘‡
2
10
33