anuj_diwan Profile Banner
Anuj Diwan Profile
Anuj Diwan

@anuj_diwan

Followers
761
Following
2K
Media
21
Statuses
240

PhD Student @UTCompSci. Prev. Student Researcher @GoogleDeepmind, FAIR (@metaai), @AdobeResearch. 2021 BTech CSE @iitbombay. Interests: NLP, ASR, ML. 🇮🇳🇺🇸

Austin + Mumbai
Joined May 2014
Don't wanna be here? Send us removal request.
@anuj_diwan
Anuj Diwan
4 months
Introducing ParaSpeechCaps, our large-scale style captions dataset that enables rich, expressive control for text-to-speech models!.Beyond basic pitch or speed controls, our models can generate speech that sounds "guttural", "scared", "whispered" and more; 59 style tags in total.
Tweet media one
3
17
72
@anuj_diwan
Anuj Diwan
21 days
RT @ZEYULIU10: LLMs trained to memorize new facts can’t use those facts well.🤔. We apply a hypernetwork to ✏️edit✏️ the gradients for fact….
0
61
0
@anuj_diwan
Anuj Diwan
1 month
RT @ForbesIndia: A pioneer in machine learning, Sunita Sarawagi has transformed how computers process unstructured data through innovations….
0
5
0
@anuj_diwan
Anuj Diwan
1 month
RT @ForbesIndia: Preethi Jyothi is advancing speech and language technologies to make AI more inclusive for low-resource Indian languages.….
0
5
0
@anuj_diwan
Anuj Diwan
2 months
RT @EliasEskin: Extremely excited to announce that I will be joining @UTAustin @UTCompSci in August 2025 as an Assistant Professor! 🎉. I’m….
0
66
0
@anuj_diwan
Anuj Diwan
2 months
RT @mjqzhang: Can your LLM ask clarifying questions for ambiguous prompts?. Come see our #ICLR25 poster this afternoon where I’ll chat abou….
0
11
0
@anuj_diwan
Anuj Diwan
3 months
RT @ManyaWadhwa1: Evaluating language model responses on open-ended tasks is hard! 🤔. We introduce EvalAgent, a framework that identifies n….
0
35
0
@anuj_diwan
Anuj Diwan
3 months
RT @ramya_namuduri: Have that eerie feeling of déjà vu when reading model-generated text 👀, but can’t pinpoint the specific words or phrase….
0
17
0
@anuj_diwan
Anuj Diwan
3 months
RT @Jess_Riedel: Scott Aaronson announces he's building an Open-Phil backed AI alignment group at UT Austin. (🔗 below.). Prospective postd….
0
44
0
@anuj_diwan
Anuj Diwan
3 months
RT @PuyuanPeng: Announcing the new SotA voice-cloning TTS model: 𝗩𝗼𝗶𝗰𝗲𝗦𝘁𝗮𝗿 ⭐️. VoiceStar is . - autoregressive, . - voice-cloning, . - robu….
0
61
0
@anuj_diwan
Anuj Diwan
3 months
RT @mina1004h: Recent AI models can suggest endless video edits, offering many alternatives to video creators. But how can we easily sift t….
0
20
0
@anuj_diwan
Anuj Diwan
4 months
If you'd like an open-source text-to-speech model that follows your style instructions, consider using our ParaSpeechCaps-based model!.Model: Paper:
@OpenAIDevs
OpenAI Developers
4 months
Three new state-of-the-art audio models in the API:. 🗣️ Two speech-to-text models—outperforming Whisper.💬 A new TTS model—you can instruct it *how* to speak. 🤖 And the Agents SDK now supports audio, making it easy to build voice agents. Try TTS now at
1
5
42
@anuj_diwan
Anuj Diwan
4 months
RT @ai4bharat: 🚀 AI4Bharat: Advancing Indian Language AI - Open & Scalable! 🇮🇳✨. Over the past 4 years, we at AI4Bharat have been on a miss….
0
90
0
@anuj_diwan
Anuj Diwan
4 months
RT @berraksismann: Exciting News!😊INTERSPEECH 2028 will take place at the River Walk in San Antonio, Texas! ✨ I’m honored to serve as one o….
0
10
0
@anuj_diwan
Anuj Diwan
4 months
RT @ArxivSound: ``Scaling Rich Style-Prompted Text-to-Speech Datasets,'' Anuj Diwan, Zhisheng Zheng, David Harwath, Eunsol Choi, https://t.….
0
3
0
@anuj_diwan
Anuj Diwan
4 months
Thanks to my amazing collaborators @zszheng147, @eunsolc and David Harwath!.Paper: Code: Dataset: Model: Demo: HF Space:
0
0
6
@anuj_diwan
Anuj Diwan
4 months
We finetune Parler-TTS-Mini-v1 on ParaSpeechCaps and achieve significant improvements in both speech-style consistency and naturalness over our best performing baseline (that combines existing smaller-scale style datasets)!
Tweet media one
1
0
4
@anuj_diwan
Anuj Diwan
4 months
ParaSpeechCaps contains 282 hrs of human-labelled data and 2427 hours of automatically-labelled data. Human evaluators rate our scaled data to be on par with human-labelled data! We carefully ablate our dataset design choices.
Tweet media one
1
0
4
@anuj_diwan
Anuj Diwan
4 months
ParaSpeechCaps is the first large-scale dataset that supports both speaker-level intrinsic tags and utterance-level situational tags. Our key contribution is a novel pipeline for scalable, automatic style annotations over such a wide variety of rich styles for the first time.
Tweet media one
1
0
3
@anuj_diwan
Anuj Diwan
4 months
RT @brunchavecmoi: Can we generate long text from compressed KV cache? We find existing KV cache compression methods (e.g., SnapKV) degrade….
0
27
0