
Ivan Lee
@ivn1e
Followers
12
Following
11
Media
6
Statuses
14
Joined January 2018
We're organizing the AI for Music workshop at @NeurIPSConf in San Diego! We'll be accepting both papers + demos w/an initial deadline of August 22, well timed for early visibility on your ICASSP/ICLR drafts ๐ Check out the website for more:
aiformusicworkshop.github.io
NeurIPS 2025 Workshop on AI for Music
๐ฅHappy to announce that the AI for Music Workshop is coming to #NeurIPS2025! We have an amazing lineup of speakers! We call for papers & demos (due on August 22)! See you in San Diego!๐๏ธ @chrisdonahuey @Ilaria__Manco @zawazaw @huangcza @McAuleyLabUCSD @zacknovack @NeurIPSConf
3
12
53
๐จNew Audio Benchmark ๐จWe find standard LLMs can solve Music-QA benchmarks by just guessing from text only, + LALMs can still answer well when given noise instead of music! Presenting RUListening: A fully automated pipeline for making Audio-QA benchmarks *actually* assess
1
9
29
Ultra-fast text-to-music generation w/o degrading quality? Introducing Presto! Distilling Steps and Layers for Accelerating Music Generation ๐น: https://t.co/kTTAYKKtTU ๐: https://t.co/Newhxe6lI6 w/@__gzhu__ @CasebeerJonah @BergKirkpatrick @McAuleyLabUCSD @NicholasJBryan ๐งต
1
23
90
@nanjiangwill @BergKirkpatrick Since most of our chosen architectures are attention-free, what mechanism, analogous to induction heads, facilitates a similar role in ICL? We hope to explore such questions in future work. Thanks for reading! Code: https://t.co/8L8GSZpn5L Paper: https://t.co/WE3gnWSjM1 10/10
0
0
0
@nanjiangwill @BergKirkpatrick Finally, we evaluate architectures on language modeling. Mamba was the only one to reach parity with transformers, followed by Hyena and RWKV. Most exhibit an abrupt improvement in ICL score, a behavior associated with the formation of induction heads (Olsson 2022). 9/10
1
0
3
@nanjiangwill @BergKirkpatrick We also study a setting where models are given the option to either memorize or perform ICL. We find that transformers with rotary embeddings and Hyena strongly prefer ICL over memorization. Surprisingly, RetNet almost always chooses to memorize. 8/10
1
1
3
@nanjiangwill @BergKirkpatrick While all architectures are capable of multiclass classification, all except for RNNs and CNNs perform better than logistic regression (black) in the most difficult setting. However, performance degrades quickly as we extrapolate beyond training lengths. 7/10
1
0
1
@nanjiangwill @BergKirkpatrick Again, all are capable of linear regression. Specifically, Mamba, RetNet, and transformers achieved performance comparable to that of ridge regression (black) for context lengths seen during training. While no architecture extrapolated well, RetNet proved the most stable. 6/10
1
0
3
@nanjiangwill @BergKirkpatrick All are capable of associative recall, with transformers, RetNet, Hyena, Mamba, and RWKV performing best as the difficulty increases. The latter three, in particular, excel when extrapolating beyond the number of examples seen during training (right of vertical line). 5/10
1
0
1
@nanjiangwill @BergKirkpatrick In short, we find that all architectures are capable of ICL, even RNNs and CNNs. Not surprisingly, transformers are strong in-context learners. However, a number of alternatives such as RWKV, RetNet, Mamba, and Hyena prove to be equally, and sometimes more, capable. 4/10
1
1
5
@nanjiangwill @BergKirkpatrick To address this, we study ICL in controlled, synthetic environments that eliminate the possibility of memorization: we train models from scratch to take a labeled dataset as input and predict the result of learning from this data in the forward-pass. 3/10
1
0
2
@nanjiangwill @BergKirkpatrick Studying ICL in LLMs is challenging. Are these models truly learning new predictors during the forward-pass (ICL), or do in-context examples simply focus the model on aspects of knowledge already acquired during gradient-based pretraining (memorization)? 2/10
1
0
2
Is attention required for ICL? We explore this question in our #ICLR2024 paper Exploring the Relationship Between Model Architecture and In-Context Learning Ability. Code: https://t.co/8L8GSZpn5L Paper: https://t.co/WE3gnWSjM1 with @nanjiangwill and @BergKirkpatrick 1/10
1
3
27