rdesh26 Profile Banner
Desh Raj Profile
Desh Raj

@rdesh26

Followers
3K
Following
7K
Media
146
Statuses
2K

Speech + LLMs @Meta MSL | Previously: @jhuclsp, @IITGuwahati

New York, NY
Joined September 2009
Don't wanna be here? Send us removal request.
@rdesh26
Desh Raj
2 years
**Dissertation now available** 📜: https://t.co/wpXl7D3rF7 📽️: https://t.co/tJkC9j7YiY ⏯️: https://t.co/2oDan50NyB It's a 332-page tome, but I have summarized it in this thread 👇 1/n
@ArxivSound
arXiv Sound
2 years
``Listening to Multi-talker Conversations: Modular and End-to-end Perspectives,'' Desh Raj,
3
9
104
@andrewgwils
Andrew Gordon Wilson
2 days
Get off the hamster wheel if you can... life is too short.
8
7
167
@rdesh26
Desh Raj
5 days
My favorite talk of the day was a masterclass by @fakufakurevenge on "Generative models for universal speech enhancement". Working on ASR during my PhD and now on speech LLMs, I often tend to equate "audio" with "spoken language". So it was really nice to get out of this bubble
1
2
10
@rdesh26
Desh Raj
5 days
I am attending the SANE 2025 workshop today at Google NYC!
3
0
13
@rdesh26
Desh Raj
14 days
A few days ago, I conducted the following poll: For the same parameter size, network architecture, and training data, which of the following models do you think would perform best at streaming ASR? With 57% of the votes, RNN-T was the winner. This is not surprising, since most
@rdesh26
Desh Raj
18 days
For the same parameter size, network architecture, and training data, which of the following models do you think would perform best at streaming ASR?
1
0
13
@rdesh26
Desh Raj
15 days
While I enjoyed the Thinky post, I would also like to read blogs that are not quite so polished, by RL researchers who are not quite so well-funded. Who are your favorite PhD student bloggers out there?
0
0
2
@rdesh26
Desh Raj
17 days
"Yet, after all, why not? Why shouldn't I keep it?" - Bilbo Baggins (late Third Age, Middle-earth)
0
0
2
@rdesh26
Desh Raj
17 days
Sunday thoughts 🤔 Minimum word error rate (WER) training is RLVR with reward = negative WER, and posterior renormalization for importance sampling. Main difference: MWER uses deterministic decoding for lower variance, while RLVR uses stochastic sampling to select candidates.
0
0
6
@rdesh26
Desh Raj
18 days
For the same parameter size, network architecture, and training data, which of the following models do you think would perform best at streaming ASR?
2
0
2
@rdesh26
Desh Raj
18 days
Made some notes 📝 while reading the RLHF book by @natolambert: > https://t.co/sWfLmqgKX5 The most fun part was to derive several policy gradient methods from the basic principle of expected reward maximization.
0
0
6
@rdesh26
Desh Raj
19 days
My brilliant friend @esalesk already did this a few years ago in this paper:
@karpathy
Andrej Karpathy
22 days
@thawani_avijit Haha. I am afraid people interpreted my “delete tokenizer” as “use bytes directly without BPE”, the issue is you *still* need bytes encoding arbitrariness even for that! Pixels is the only way. Just like humans. It is written. If GPT-10 uses utf8 at the input I will eat a shoe.
0
0
11
@HirofumiInaguma
Hirofumi Inaguma
20 days
I was impacted by FAIR layoff this time. I'm looking for a new position on speech, multimodal, 3D human motion, and social behavior modeling. Happy to chat more details:)
17
44
250
@awnihannun
Awni Hannun
21 days
I always thought the decline in fundamental AI research funding would happen because AI didn’t generate enough value to be worth the cost. But it seems like it’s happening because it generated too much value. And the race to capture that value is taking priority. Just
34
37
374
@MimansaJ
Mimansa Jaiswal
21 days
I was impacted by Meta layoffs today. As a Research Scientist working on LLM posttraining (reward models, DPO/GRPO) & automated evaluation pipelines, I’ve focused on understanding why/wehere models fail & how to make them better. I’m looking for opportunities; please reach out!
@suchenzang
Susan Zhang
21 days
👀
123
235
3K
@rdesh26
Desh Raj
21 days
I often muse that if I were to do college all over again, I would study physics.
@notankitsinghal
Ankit Singhal
23 days
We’ve found a ton of value hiring folks with strong theory backgrounds with little to no production ML experience. One of our members of technical staff got his phd in pure math/the geometry of black holes and had no prior ML experience. Within days of hiring him we released our
0
0
2
@rdesh26
Desh Raj
21 days
I was really curious to try the new Atlas browser, but it felt --- meh? I asked the integrated ChatGPT to change some browser settings but it just gave me instructions for how to do it. Maybe I am missing something, but why could this not have been a simple Chrome extension?
0
0
4
@rdesh26
Desh Raj
26 days
Remember when papers were "published" and models were "released"? Why is everything just "dropped" nowadays?
1
0
6
@rdesh26
Desh Raj
1 month
New work on reasoning with SpeechLLMs, led by @yijenshih. We explore the paradigm of "thinking while listening" in the context of full-duplex speech models!
@yijenshih
Ian (Yi-Jen) Shih
1 month
Can SpeechLLMs Think while Listening? tl;dr We enhance SpeechLLM reasoning using a text-based CoT guided by an entropy metric. This approach, combined with preference tuning, enables "thinking while listening" with an improved accuracy-latency trade-off and greater control.
0
1
12
@shinjiw_at_cmu
Shinji Watanabe
7 months
📢 Introducing VERSA: our new open-source toolkit for speech & audio evaluation! - 80+ metrics in one unified interface - Flexible input support - Distributed evaluation with Slurm - ESPnet compatible Check out the details https://t.co/lKpIaJE6Er https://t.co/pSzWd5C5YM
2
32
140
@Ahmad_Al_Dahle
Ahmad Al-Dahle
1 year
📣 You can now have a conversation with Meta AI using voice. It’s super fast, connected to the web, natural and conversational and even comes with celebrity voice options from Awkwafina, Kristen Bell, John Cena, and more. What voice speaks to you? (pun intended 😆)
94
136
2K