Cong Zhou Profile
Cong Zhou

@CongZhou1

Followers
144
Following
818
Media
0
Statuses
91

Researcher @AnuttaconGames

Bay Area
Joined January 2020
Don't wanna be here? Send us removal request.
@shawnshenjx
Shawn Shen
4 months
I’m Shawn, founder of https://t.co/6SYcxgwroZ, former researcher at Meta and CS PhD at University of Cambridge. Today we’re launching https://t.co/6SYcxgwroZ: we built the world’s first Large Visual Memory Model - to give AI human-like visual memories. Why visual memory? AI to
182
369
2K
@CongZhou1
Cong Zhou
5 months
Congrats on the release!
@Cixelyn
cory
5 months
new anime video model, with a matching banger OP. team really cooked with this one
1
0
1
@Alibaba_Wan
Wan
7 months
1/3 🚀Thrilled to introduce Wan2.1-FLF2V-14B - our first 14B-parameter large model for First-Last-Frame to video generation! Open-source, open-source, open-source! Empowering digital artists with unprecedented efficiency and creative flexibility. #wan #AIGC #alart
45
290
2K
@juberti
Justin Uberti
8 months
Put another way: we have LLMs with billions of parameters controlled by VAD models with thousands of parameters. There are reasons for this but we need more sophisticated solutions (and evals for them!)
@kwindla
kwindla
8 months
Smarter voice AI turn detection is a "2025 problem." By which I mean: in 2024 all of us in the realtime, multimodal AI ecosystem spent most of our time working on relatively low-level things ... ➡️ basic turn detection using VAD ➡️ fast, reliable interruption handling ➡️
5
2
43
@CongZhou1
Cong Zhou
8 months
The first trailer for Whispers from the Star is here! 🌟   Thrilled to have contributed to the voice modeling efforts and excited for you to experience it!   Join us in shaping immersive AI-driven experiences at @AnuttaconGames! 🎮🚀   https://t.co/QmEUOAamX7
Tweet card summary image
anuttacon.com
We're hiring primarily in the San Francisco Bay Area, with an office in Mountain View. As a dynamic startup, we value the collaborative spirit of in-person work. We also remain open to remote...
@WFTS_Game
Whispers from the Star
8 months
Whispers from the Star⭐️ Announce Trailer Your words seal her fate. When a girl named Stella crash-lands on an alien planet called Gaia, you are the only person she can contact through her communicator. Through texts, voice messages, and video calls that unfold throughout your
0
0
6
@CongZhou1
Cong Zhou
8 months
Cool!
@BoyuanChen0
Boyuan Chen
9 months
Announcing Diffusion Forcing Transformer (DFoT), our new video diffusion algorithm that generates ultra-long videos of 800+ frames. DFoT enables History Guidance, a simple add-on to any existing video diffusion models for a quality boost. Website: https://t.co/wdZ19yCgjJ (1/7)
0
0
0
@CongZhou1
Cong Zhou
9 months
It’s cool!
@justLV
Justin Alvey
9 months
Excited to share a peek of what I’ve been working on We @sesame believe voice is key to unlocking a future where computers are lifelike Here’s an early preview you can try! 👇 We’ll be open sourcing a model, and yes… we’re building hardware! 🧵
0
0
0
@imxiaohu
小互
9 months
字节跳动这个新项目效果非常不错 OmniHuman:通过一张图片配合音频或视频,生成非常自然的会说话、唱歌的人类动作视频 支持各种不同类型输入(如单一的人物图片和音频、视频等信号)生成非常逼真真人视频动画,涵盖从面部表情到全身动作,无论是说话、唱歌、跳舞等。 OmniHuman
20
162
694
@92HsChoi
최형석 (Hyeong-Seok Choi)
10 months
@sedielem imo just because it more “compressed” doesn’t mean it’s good for “modeling.” In audio/speech space people use semantic token, which is not necessarily optimized for compression. What matters more is the characteristics of representation the encoder has learnt.
2
1
5
@CongZhou1
Cong Zhou
10 months
Congratulations, Jordi! I’ll definitely play with it, any plans to go to 32k?
@jordiponsdotme
Jordi Pons
10 months
Weights are out! 🤗 Tokenizing 16kHz speech at very low bitrates. Inference code: https://t.co/eZKbrBzHzw Model code: https://t.co/vLJhpyGa7M Model weights: https://t.co/fFHpte7fey arXiv: https://t.co/ZbslCfppvF Audio demos: https://t.co/J9D46A6prO
0
0
0
@CongZhou1
Cong Zhou
11 months
You can not miss this one!
@Cixelyn
cory
11 months
come hang w/ us at neurips! i'm hosting an anime & ai social on dec 11th! will be there along with a bunch of folks who work on @nijijourney then later, we're hosting a diffusion bar event dec 12th with @midjourney! rsvp on the partiful links below!
0
0
1
@CongZhou1
Cong Zhou
1 year
Congrats on the poc!
@juberti
Justin Uberti
1 year
In 5 months, Ultravox has gone from a v0.1 proof-of-concept to the leading open-source speech LLM!
0
0
0
@CongZhou1
Cong Zhou
1 year
Tried my best, then realize there are certain performance gaps we can’t reach at this point. 🌞 side is that tts is still not solved.
1
0
2
@EricBattenberg
Eric Battenberg
1 year
Transformer-based TTS models sound great but have all kinds of reliability issues. Our new model, Very Attentive Tacotron (VAT), is a Transformer-based TTS system that doesn't drop or repeat words and can generalize to any practical utterance length. https://t.co/y3kCIYF8M5
2
12
51
@CongZhou1
Cong Zhou
1 year
This is damn natural! Imagine expanding to drama/movie scripts, it’s not that far
@omooretweets
Olivia Moore
1 year
The NotebookLM hosts realizing they are AI and spiraling out is a twist I did not see coming
0
0
0
@heiga_zen
Heiga Zen (全 炳河)
1 year
LibriTTS has been ranked 6th. Congrats to all authors and collaborators! And thanks to all users.
@HungyiLee2
Hung-yi Lee (李宏毅)
1 year
Congratulations to the SUPERB Team! Our work on the Speech Processing Universal PERformance Benchmark (SUPERB) has been ranked 7th among the most cited papers at INTERSPEECH over the past five years! A big round of applause to everyone involved.
0
7
49
@CongZhou1
Cong Zhou
1 year
Wow, this would be fun indeed
@krandiash
Karan Goel
1 year
2.5 months ago @elevenlabsio put up this comparison with our 10 day old Sonic model: https://t.co/U2A5tcZC9b The team took it as a challenge, here's our new scorecard. Higher quality, cheaper & the fastest voice model period. https://t.co/44caSdm6pe Next 3 months will be fun.
0
0
2
@jesseengel
Jesse Engel
1 year
tl;dr Adding independent gaussian noise to each pixel is equivalent to adding uniform frequency noise to a full image. Since images have a power law distribution of frequencies, adding pixel noise ~= low pass, so denoising ~= iteratively predicting frequencies from low to high.
@sedielem
Sander Dieleman
1 year
Diffusion is the rising tide that eventually submerges all frequencies, high and low 🌊 Diffusion is the gradual decomposition into feature scales, fine and coarse 🗼 Diffusion is just spectral autoregression 🤷🌈
4
10
186
@mirco_ravanelli
Mirco Ravanelli
1 year
If you're attending #INTERSPEECH2024 and have an interest in audio tokens, we warmly invite you to join our presentation! #DeepLearning #Speech #LLM #audio #research #SpeechBrain #AI
@MousaviPooneh
Pooneh Mousavi
1 year
📢 I'll be presenting our paper "How Should We Extract Discrete Audio Tokens from Self-Supervised Models?" at InterSpeech! 🎙️ Meet us at the Speech Processing Using Discrete Speech Units, Oral Session on Sep 3, 16:20. 🔗 Paper: https://t.co/Z98B0DFjGD #INTERSPEECH2024
0
5
22
@beihuo
北火
1 year
我想分享一个我知道的数据:notion 的用户,写文档的人数百分比,是个位数。连 notion 团队的人也表示他们也不知道怎么在 AI 方向继续推进。 宝玉的观点完全正确。现在可能存在的困境是投入产出比。
@dotey
宝玉
1 year
大厂与其山寨 Cursor,不如做个好用的 AI 邮件客户端 大厂抄 Cursor,这样追在别人屁股后面跑是没有前途的,AI 代码编辑器已经是红海了,就算大厂又如何,微软比 Cursor 大多少?结果 GitHub Copilot 也没打过 Cursor,大厂还不如多投资几家像 Cursor 这样的公司,为什么非要抄他们呢!
11
34
215