Anuj Diwan @anuj_diwan X Profile

Anuj Diwan

@anuj_diwan

Followers

781

Following

2K

Media

21

Statuses

242

PhD Student @UTCompSci. Prev. Student Researcher @GoogleDeepmind, FAIR (@metaai), @AdobeResearch. 2021 BTech CSE @iitbombay. Interests: NLP, ASR, ML. 🇮🇳🇺🇸

Austin + Mumbai

Joined May 2014

Don't wanna be here? Send us removal request.

Anuj Diwan

@anuj_diwan

6 months

Introducing ParaSpeechCaps, our large-scale style captions dataset that enables rich, expressive control for text-to-speech models!.Beyond basic pitch or speed controls, our models can generate speech that sounds "guttural", "scared", "whispered" and more; 59 style tags in total.

3

17

75

Anuj Diwan

@anuj_diwan

17 days

ParaSpeechCaps has been accepted to the EMNLP 2025 Main Conference!.

Anuj Diwan

@anuj_diwan

6 months

Introducing ParaSpeechCaps, our large-scale style captions dataset that enables rich, expressive control for text-to-speech models!.Beyond basic pitch or speed controls, our models can generate speech that sounds "guttural", "scared", "whispered" and more; 59 style tags in total.

0

3

40

Anuj Diwan

@anuj_diwan

3 months

RT @ZEYULIU10: LLMs trained to memorize new facts can’t use those facts well.🤔. We apply a hypernetwork to ✏️edit✏️ the gradients for fact….

0

66

0

Anuj Diwan

@anuj_diwan

3 months

RT @ForbesIndia: A pioneer in machine learning, Sunita Sarawagi has transformed how computers process unstructured data through innovations….

0

5

0

Anuj Diwan

@anuj_diwan

3 months

RT @ForbesIndia: Preethi Jyothi is advancing speech and language technologies to make AI more inclusive for low-resource Indian languages.….

0

5

0

Anuj Diwan

@anuj_diwan

4 months

RT @EliasEskin: Extremely excited to announce that I will be joining @UTAustin @UTCompSci in August 2025 as an Assistant Professor! 🎉. I’m….

0

65

0

Anuj Diwan

@anuj_diwan

5 months

RT @mjqzhang: Can your LLM ask clarifying questions for ambiguous prompts?. Come see our #ICLR25 poster this afternoon where I’ll chat abou….

0

11

0

Anuj Diwan

@anuj_diwan

5 months

RT @ManyaWadhwa1: Evaluating language model responses on open-ended tasks is hard! 🤔. We introduce EvalAgent, a framework that identifies n….

0

39

0

Anuj Diwan

@anuj_diwan

5 months

RT @ramya_namuduri: Have that eerie feeling of déjà vu when reading model-generated text 👀, but can’t pinpoint the specific words or phrase….

0

18

0

Anuj Diwan

@anuj_diwan

5 months

RT @Jess_Riedel: Scott Aaronson announces he's building an Open-Phil backed AI alignment group at UT Austin. (🔗 below.). Prospective postd….

0

44

0

Anuj Diwan

@anuj_diwan

5 months

RT @PuyuanPeng: Announcing the new SotA voice-cloning TTS model: 𝗩𝗼𝗶𝗰𝗲𝗦𝘁𝗮𝗿 ⭐️. VoiceStar is . - autoregressive, . - voice-cloning, . - robu….

0

61

0

Anuj Diwan

@anuj_diwan

5 months

RT @mina1004h: Recent AI models can suggest endless video edits, offering many alternatives to video creators. But how can we easily sift t….

0

20

0

Anuj Diwan

@anuj_diwan

6 months

If you'd like an open-source text-to-speech model that follows your style instructions, consider using our ParaSpeechCaps-based model!.Model: Paper:

arxiv.org

We introduce Paralinguistic Speech Captions (ParaSpeechCaps), a large-scale dataset that annotates speech utterances with rich style captions. While rich abstract tags (e.g. guttural, nasal,...

OpenAI Developers

@OpenAIDevs

6 months

Three new state-of-the-art audio models in the API:. 🗣️ Two speech-to-text models—outperforming Whisper.💬 A new TTS model—you can instruct it *how* to speak. 🤖 And the Agents SDK now supports audio, making it easy to build voice agents. Try TTS now at

1

5

42

Anuj Diwan

@anuj_diwan

6 months

RT @ai4bharat: 🚀 AI4Bharat: Advancing Indian Language AI - Open & Scalable! 🇮🇳✨. Over the past 4 years, we at AI4Bharat have been on a miss….

0

93

0

Anuj Diwan

@anuj_diwan

6 months

RT @berraksismann: Exciting News!😊INTERSPEECH 2028 will take place at the River Walk in San Antonio, Texas! ✨ I’m honored to serve as one o….

0

10

0

Anuj Diwan

@anuj_diwan

6 months

RT @ArxivSound: ``Scaling Rich Style-Prompted Text-to-Speech Datasets,'' Anuj Diwan, Zhisheng Zheng, David Harwath, Eunsol Choi, https://t.….

arxiv.org

We introduce Paralinguistic Speech Captions (ParaSpeechCaps), a large-scale dataset that annotates speech utterances with rich style captions. While rich abstract tags (e.g. guttural, nasal,...

0

3

0

Anuj Diwan

@anuj_diwan

6 months

Thanks to my amazing collaborators @zszheng147, @eunsolc and David Harwath!.Paper: Code: Dataset: Model: Demo: HF Space:

huggingface.co

0

7

Anuj Diwan

@anuj_diwan

6 months

We finetune Parler-TTS-Mini-v1 on ParaSpeechCaps and achieve significant improvements in both speech-style consistency and naturalness over our best performing baseline (that combines existing smaller-scale style datasets)!

1

0

4

Anuj Diwan

@anuj_diwan

6 months

ParaSpeechCaps contains 282 hrs of human-labelled data and 2427 hours of automatically-labelled data. Human evaluators rate our scaled data to be on par with human-labelled data! We carefully ablate our dataset design choices.

1

0

4

Anuj Diwan

@anuj_diwan

6 months

ParaSpeechCaps is the first large-scale dataset that supports both speaker-level intrinsic tags and utterance-level situational tags. Our key contribution is a novel pipeline for scalable, automatic style annotations over such a wide variety of rich styles for the first time.

1

0

3