
Zachary Novack
@zacknovack
Followers
649
Following
3K
Media
40
Statuses
282
efficient + controllable music gen | phd @ucsd_cse | research intern @sonyai_global | prev @adoberesearch @stabilityai @mldcmu | teaching drums @pulsepercussion
Joined June 2022
RT @xunhuang1995: We should have called it "scaling up rollout", not RL. RL is a necessary evil for the discrete nature of language. My int….
0
16
0
RT @thepatch_kev: made a @huggingface space for custom sample generation using stable-audio-open-small. already had an api in my backend,….
0
3
0
We're organizing the AI for Music workshop at @NeurIPSConf in San Diego!. We'll be accepting both papers + demos w/an initial deadline of August 22, well timed for early visibility on your ICASSP/ICLR drafts 👀. Check out the website for more:.
aiformusicworkshop.github.io
NeurIPS 2025 Workshop on AI for Music
🔥Happy to announce that the AI for Music Workshop is coming to #NeurIPS2025!. We have an amazing lineup of speakers! We call for papers & demos (due on August 22)!. See you in San Diego!🏖️. @chrisdonahuey @Ilaria__Manco @zawazaw @huangcza @McAuleyLabUCSD @zacknovack @NeurIPSConf
3
12
53
Stable Audio Open Small is accepted at #WASPAA2025 @IEEE_WASPAA ! . Can't wait to share the latest in blazingly fast, on-device text-to-audio in Lake Tahoe 🏞️.
Stability AI just dropped Stable Audio Open Small on Hugging Face. Fast Text-to-Audio Generation with Adversarial Post-Training
4
10
66
RT @yupenghou97: Did you know tokenization for generative recommendation today looks a lot like LLM tokenization did *10 years* ago?. Meet….
0
29
0
RT @MardaniMorteza: 📢📢 Elucidated Rolling Diffusion Models (ERDM). How can we stably roll out diffusion models for sequence generation in d….
0
23
0
RT @hermanhwdong: Happy to share that our paper led by @havenpersona has been accepted to #ISMIR2025! 🎉. 🎬We presented the OSSL dataset of….
www.youtube.com
Demo for "Video-Guided Text-to-Music Generation Using Public Domain Movie Collections" (ISMIR 2025)Haven Kim, Zachary Novack, Weihan Xu, Julian McAuley, and ...
0
5
0
I always like those paper/author visualizations for other conferences, so I ~vibe coded~ up an interactive one for #ISMIR2025 @ISMIRConf ! Go check it out at:. Will hopefully add paper links and other metadata in the coming weeks :)
0
5
29
RT @wuyusongwys: It’s been a thrilling journey building FLAM! 🚀 Super proud of what we achieved open‑vocabulary audio event detection using….
arxiv.org
We present BraWl, a Fortran package implementing a range of conventional and enhanced sampling algorithms for exploration of the phase space of the Bragg-Williams model, facilitating study of...
0
11
0
RT @thepatch_kev: stable audio open small is great for stacking multiple generations. @zacknovack @_lyraaaa_ the ux speriments continue. c….
0
3
0
RT @thepatch_kev: live coding with stable audio open small?. let the vibes begin lol. i love having a bunch endpoints already functioning….
0
1
0
RT @ArxivSound: ``Video-Guided Text-to-Music Generation Using Public Domain Movie Collections,'' Haven Kim, Zachary Novack, Weihan Xu, Juli….
arxiv.org
Despite recent advancements in music generation systems, their application in film production remains limited, as they struggle to capture the nuances of real-world filmmaking, where filmmakers...
0
4
0
OSSL Dataset is out and accepted at #ISMIR2025 🇰🇷! High quality soundtrack+movie paired data, all public domain, perfect for your V2M tasks 📽️🎶. Led by the titan @havenpersona, check out the full thread below for more info!.
🎼 Open Screen Sound Library Version 1 Released 🎥 .Hi folks, we've just released a music-video dataset, sourced from public domain films, introduced in our paper "Video-guided text-to-music generation using public domain movie collections" accepted at #ISMIR2025.
0
1
10
Presenting RUListening! we edit Music-QA benchmarks to *actually* assess audio perception, using text-only LLMs to generate unimodally-hard distractors. Been super excited about this one (led by the beast @yongyi_zang), check out the full thread below!. And at ISMIR 2025!🇰🇷.
🚨New Audio Benchmark 🚨We find standard LLMs can solve Music-QA benchmarks by just guessing from text only, + LALMs can still answer well when given noise instead of music!. Presenting RUListening: A fully automated pipeline for making Audio-QA benchmarks *actually* assess.
0
6
26
RT @ArxivSound: ``A Review on Score-based Generative Models for Audio Applications,'' Ge Zhu, Yutong Wen, Zhiyao Duan, .
arxiv.org
Diffusion models have emerged as powerful deep generative techniques, producing high-quality and diverse samples in applications in various domains including audio. These models have many...
0
2
0
RT @dadabots: yup, just compiled it & tested. Stable Audio Open Small runs faster than realtime on a mac **CPU**. on a m1 chip you have thr….
0
8
0
RT @niloofar_mire: We (w @zacknovack @JaechulRoh et al.) are working on #memorization in #audio models & are conducting a human study on ge….
0
7
0
RT @StabilityAI: Today we’re open-sourcing Stable Audio Open Small, a 341M-parameter text-to-audio model optimized to run entirely on @Arm….
0
196
0
RT @ArxivSound: ``Fast Text-to-Audio Generation with Adversarial Post-Training,'' Zachary Novack, Zach Evans, Zack Zukowski, Josiah Taylor,….
arxiv.org
Text-to-audio systems, while increasingly performant, are slow at inference time, thus making their latency unpractical for many creative applications. We present Adversarial...
0
4
0