Desh Raj @rdesh26 X Profile

Desh Raj

@rdesh26

Followers

3K

Following

7K

Media

146

Statuses

2K

Speech + LLMs @Meta MSL | Previously: @jhuclsp, @IITGuwahati

https://t.co/QSper7YoiI

New York, NY

Joined September 2009

Don't wanna be here? Send us removal request.

Desh Raj

@rdesh26

2 years

**Dissertation now available** 📜: https://t.co/wpXl7D3rF7 📽️: https://t.co/tJkC9j7YiY ⏯️: https://t.co/2oDan50NyB It's a 332-page tome, but I have summarized it in this thread 👇 1/n

arXiv Sound

@ArxivSound

2 years

``Listening to Multi-talker Conversations: Modular and End-to-end Perspectives,'' Desh Raj,

3

9

104

Andrew Gordon Wilson

@andrewgwils

2 days

Get off the hamster wheel if you can... life is too short.

8

7

167

Desh Raj

@rdesh26

5 days

My favorite talk of the day was a masterclass by @fakufakurevenge on "Generative models for universal speech enhancement". Working on ASR during my PhD and now on speech LLMs, I often tend to equate "audio" with "spoken language". So it was really nice to get out of this bubble

1

2

10

Desh Raj

@rdesh26

5 days

I am attending the SANE 2025 workshop today at Google NYC!

3

0

13

Desh Raj

@rdesh26

14 days

A few days ago, I conducted the following poll: For the same parameter size, network architecture, and training data, which of the following models do you think would perform best at streaming ASR? With 57% of the votes, RNN-T was the winner. This is not surprising, since most

Desh Raj

@rdesh26

18 days

For the same parameter size, network architecture, and training data, which of the following models do you think would perform best at streaming ASR?

1

0

13

Desh Raj

@rdesh26

15 days

While I enjoyed the Thinky post, I would also like to read blogs that are not quite so polished, by RL researchers who are not quite so well-funded. Who are your favorite PhD student bloggers out there?

0

2

Desh Raj

@rdesh26

17 days

"Yet, after all, why not? Why shouldn't I keep it?" - Bilbo Baggins (late Third Age, Middle-earth)

0

2

Desh Raj

@rdesh26

17 days

Sunday thoughts 🤔 Minimum word error rate (WER) training is RLVR with reward = negative WER, and posterior renormalization for importance sampling. Main difference: MWER uses deterministic decoding for lower variance, while RLVR uses stochastic sampling to select candidates.

0

6

Desh Raj

@rdesh26

18 days

For the same parameter size, network architecture, and training data, which of the following models do you think would perform best at streaming ASR?

2

0

2

Desh Raj

@rdesh26

18 days

Made some notes 📝 while reading the RLHF book by @natolambert: > https://t.co/sWfLmqgKX5 The most fun part was to derive several policy gradient methods from the basic principle of expected reward maximization.

0

6

Desh Raj

@rdesh26

19 days

My brilliant friend @esalesk already did this a few years ago in this paper:

Andrej Karpathy

@karpathy

22 days

@thawani_avijit Haha. I am afraid people interpreted my “delete tokenizer” as “use bytes directly without BPE”, the issue is you *still* need bytes encoding arbitrariness even for that! Pixels is the only way. Just like humans. It is written. If GPT-10 uses utf8 at the input I will eat a shoe.

0

11

Hirofumi Inaguma

@HirofumiInaguma

20 days

I was impacted by FAIR layoff this time. I'm looking for a new position on speech, multimodal, 3D human motion, and social behavior modeling. Happy to chat more details:)

17

44

250

Awni Hannun

@awnihannun

21 days

I always thought the decline in fundamental AI research funding would happen because AI didn’t generate enough value to be worth the cost. But it seems like it’s happening because it generated too much value. And the race to capture that value is taking priority. Just

34

37

374

Mimansa Jaiswal

@MimansaJ

21 days

I was impacted by Meta layoffs today. As a Research Scientist working on LLM posttraining (reward models, DPO/GRPO) & automated evaluation pipelines, I’ve focused on understanding why/wehere models fail & how to make them better. I’m looking for opportunities; please reach out!

Susan Zhang

@suchenzang

21 days

👀

123

235

3K

Desh Raj

@rdesh26

21 days

I often muse that if I were to do college all over again, I would study physics.

Ankit Singhal

@notankitsinghal

23 days

We’ve found a ton of value hiring folks with strong theory backgrounds with little to no production ML experience. One of our members of technical staff got his phd in pure math/the geometry of black holes and had no prior ML experience. Within days of hiring him we released our

0

2

Desh Raj

@rdesh26

21 days

I was really curious to try the new Atlas browser, but it felt --- meh? I asked the integrated ChatGPT to change some browser settings but it just gave me instructions for how to do it. Maybe I am missing something, but why could this not have been a simple Chrome extension?

0

4

Desh Raj

@rdesh26

26 days

Remember when papers were "published" and models were "released"? Why is everything just "dropped" nowadays?

1

0

6

Desh Raj

@rdesh26

1 month

New work on reasoning with SpeechLLMs, led by @yijenshih. We explore the paradigm of "thinking while listening" in the context of full-duplex speech models!

Ian (Yi-Jen) Shih

@yijenshih

1 month

Can SpeechLLMs Think while Listening? tl;dr We enhance SpeechLLM reasoning using a text-based CoT guided by an entropy metric. This approach, combined with preference tuning, enables "thinking while listening" with an improved accuracy-latency trade-off and greater control.

0

1

12

Shinji Watanabe

@shinjiw_at_cmu

7 months

📢 Introducing VERSA: our new open-source toolkit for speech & audio evaluation! - 80+ metrics in one unified interface - Flexible input support - Distributed evaluation with Slurm - ESPnet compatible Check out the details https://t.co/lKpIaJE6Er https://t.co/pSzWd5C5YM

2

32

140

Desh Raj

@rdesh26

1 year

https://t.co/sH0StjDzge

1

0

6

Ahmad Al-Dahle

@Ahmad_Al_Dahle

1 year

📣 You can now have a conversation with Meta AI using voice. It’s super fast, connected to the web, natural and conversational and even comes with celebrity voice options from Awkwafina, Kristen Bell, John Cena, and more. What voice speaks to you? (pun intended 😆)

94

136

2K