PuyuanPeng Profile Banner
Puyuan Peng Profile
Puyuan Peng

@PuyuanPeng

Followers
2K
Following
819
Media
19
Statuses
307

Research Scientist @Meta Superintelligence Lab. Speech & Audio. Previously @utaustin @uchicago @bnu_1902

New York, USA
Joined December 2019
Don't wanna be here? Send us removal request.
@PuyuanPeng
Puyuan Peng
5 months
Announcing the new SotA voice-cloning TTS model: ๐—ฉ๐—ผ๐—ถ๐—ฐ๐—ฒ๐—ฆ๐˜๐—ฎ๐—ฟ โญ๏ธ VoiceStar is - autoregressive, - voice-cloning, - robust, - duration controllable, - *test-time extrapolation*, generates speech longer than training duration! Code&Model: https://t.co/7vxDpnayks
10
61
389
@PuyuanPeng
Puyuan Peng
2 months
International Mathematical Olympiad is a civil war
Tweet media one
Tweet media two
Tweet media three
1
0
5
@Tae_Hyun_Oh
Tae-Hyun Oh
2 months
A collaboration work with my student Sungbin Kim and Univ. Texas Austin team will be presented in ICCV 2025.
@PuyuanPeng
Puyuan Peng
2 months
The work is led by the amazing Sungbin Kim https://t.co/IMN9ZyYhPs, and collaborated with Jeongsoo Choi, Joon Son Chung, @Tae_Hyun_Oh, David Harwath Checkout https://t.co/LEBteX4JSz for more samples, and the forthcoming code and model!
0
1
8
@PuyuanPeng
Puyuan Peng
2 months
The work is led by the amazing Sungbin Kim https://t.co/IMN9ZyYhPs, and collaborated with Jeongsoo Choi, Joon Son Chung, @Tae_Hyun_Oh, David Harwath Checkout https://t.co/LEBteX4JSz for more samples, and the forthcoming code and model!
Tweet card summary image
sites.google.com
Kim Sung-Bin
0
1
5
@PuyuanPeng
Puyuan Peng
2 months
Announcing ๐•๐จ๐ข๐œ๐ž๐‚๐ซ๐š๐Ÿ๐ญ-๐ƒ๐ฎ๐›๐ŸŒŸ SotA Voice-Cloning Dubbing Model! Given video of any speaker, and seconds of any reference speech, VoiceCraft-Dub produces lip-synchronized expressive speech for the video, using the reference voice. ICCV 2025: https://t.co/mrWySonzfw
Tweet media one
Tweet media two
1
1
12
@PuyuanPeng
Puyuan Peng
3 months
Thanks for featuring VoiceStar, our latest, most powerful TTS (and an upgrade from VoiceCraft last year). Fully open, permissively licensed at
Tweet card summary image
github.com
VoiceStar: Robust, Duration-controllable TTS that can Extrapolate - jasonppy/VoiceStar
@github
GitHub
3 months
The AI landscape is evolving fast, and staying on top of the latest open-source projects is crucial for every developer. ๐Ÿš€ Swipe to see our list of the top new open-source AI projects on GitHub, from multi-agent systems to composable tools and cutting-edge speech synthesis.
Tweet media one
Tweet media two
Tweet media three
Tweet media four
0
0
11
@reach_vb
Vaibhav (VB) Srivastav
3 months
There will be DeepSeek R1 0528 Qwen 3 8B too matching Qwen 3 235B Thinking in performance too ๐Ÿคฏ Whale COOKED!
Tweet media one
19
66
691
@PuyuanPeng
Puyuan Peng
3 months
The paper is out! https://t.co/GikR01dy5S
Tweet media one
@PuyuanPeng
Puyuan Peng
5 months
Announcing the new SotA voice-cloning TTS model: ๐—ฉ๐—ผ๐—ถ๐—ฐ๐—ฒ๐—ฆ๐˜๐—ฎ๐—ฟ โญ๏ธ VoiceStar is - autoregressive, - voice-cloning, - robust, - duration controllable, - *test-time extrapolation*, generates speech longer than training duration! Code&Model: https://t.co/7vxDpnayks
0
11
60
@LiyanTang4
Liyan Tang
4 months
Introducing ChartMuseum๐Ÿ–ผ๏ธ, testing visual reasoning with diverse real-world charts! โœ๐ŸปEntirely human-written questions by 13 CS researchers ๐Ÿ‘€Emphasis on visual reasoning โ€“ hard to be verbalized via text CoTs ๐Ÿ“‰Humans reach 93% but 63% from Gemini-2.5-Pro & 38% from Qwen2.5-72B
Tweet media one
Tweet media two
2
30
76
@EliasEskin
Elias Stengel-Eskin
4 months
Extremely excited to announce that I will be joining @UTAustin @UTCompSci in August 2025 as an Assistant Professor! ๐ŸŽ‰ Iโ€™m looking forward to continuing to develop AI agents that interact/communicate with people, each other, and the multimodal world. Iโ€™ll be recruiting PhD
Tweet media one
92
65
453
@yuwen_lu_
yuwen lu
5 months
iโ€™m at #chi2025 and iโ€™ll be on the industry job market later this year! i work in human-ai interaction. my prev projects focused on design tools. i love design. i love user interfaces. i trained myself to become an ai engineer to push our tools further. i believe ai is on
Tweet media one
0
7
83
@ReillyOPatrick
Patrick O'Reilly
5 months
You can try yourself using this HuggingFace space, which applies the VoiceCraft codec trained by @PuyuanPeng et al. (5/8) https://t.co/jX7M9Eh0YJ
Tweet card summary image
huggingface.co
1
1
3
@harshit_sikchi
Harshit Sikchi
5 months
As I near the end of my PhD journey, I am excited to share that I will be joining the research efforts @OpenAI, working with @hadisalmanX @aleks_madry and the great team to unlock new capabilities with frontier models. Austin has been one of the best places I have lived in and I
Tweet media one
Tweet media two
28
6
364
@jasonbaldridge
Jason Baldridge
5 months
Our incredible team built many models announced here, including image, voice, music and video generation! And: I'm moving to London this summer, and I'm hiring for research scientist and engineering roles! Our focus is on speech & music in Zurich, Paris & London. DM/email me.
@googlecloud
Google Cloud
5 months
Day 1 of #GoogleCloudNext โœ… Hereโ€™s a taste of all the things that we announced today across infrastructure, research and models, Vertex AI, and agents โ†’ https://t.co/p6EHb0t7D8 Hint: Ironwood TPUs, Gemini on Google Distributed Cloud, Gemini 2.5 Flash, Lyria, and more.
Tweet media one
Tweet media two
Tweet media three
Tweet media four
5
6
111
@fredahshi
Freda Shi
5 months
I received a review like this five years ago. Itโ€™s probably the right time now to share it with everyone who wrote or got random discouraging reviews from ICML/ACL.
Tweet media one
5
38
418
@vipul_1011
Vipul Gupta
10 months
๐Ÿšจ New paper alert ๐Ÿšจ Ever struggled with quick saturation or unreliability in benchmark datasets? Introducing SMART Filtering to select high-quality, reducing dataset size by 48% on avg (up to 68% for ARC!) and improving correlation with scores from ChatBot Arena! ๐Ÿ“ˆโœจ (1/N)
Tweet media one
3
16
95
@PuyuanPeng
Puyuan Peng
6 months
This project is well on time! Check it out if you are interested in replicating OpenAIโ€™s audio agent
@anuj_diwan
Anuj Diwan
6 months
If you'd like an open-source text-to-speech model that follows your style instructions, consider using our ParaSpeechCaps-based model! Model: https://t.co/HCm71MW0aR Paper:
0
0
10
@berraksismann
Berrak Sisman
6 months
Exciting News!๐Ÿ˜ŠINTERSPEECH 2028 will take place at the River Walk in San Antonio, Texas! โœจ Iโ€™m honored to serve as one of the General Chairs alongside John Hansen and Carlos Busso @BussoCarlos - We hope youโ€™ll love this city as much as we do! https://t.co/k2dVo7nqdc
Tweet media one
0
10
40
@anuj_diwan
Anuj Diwan
6 months
Introducing ParaSpeechCaps, our large-scale style captions dataset that enables rich, expressive control for text-to-speech models! Beyond basic pitch or speed controls, our models can generate speech that sounds "guttural", "scared", "whispered" and more; 59 style tags in total.
Tweet media one
3
17
75