Ivan Lee @ivn1e X Profile

Ivan Lee

@ivn1e

Followers

36

Following

15

Media

6

Statuses

15

Joined January 2018

Don't wanna be here? Send us removal request.

Zachary Novack

@zacknovack

4 months

We're organizing the AI for Music workshop at @NeurIPSConf in San Diego! We'll be accepting both papers + demos w/an initial deadline of August 22, well timed for early visibility on your ICASSP/ICLR drafts 👀 Check out the website for more:

aiformusicworkshop.github.io

NeurIPS 2025 Workshop on AI for Music

Hao-Wen (Herman) Dong 董皓文

@hermanhwdong

4 months

🔥Happy to announce that the AI for Music Workshop is coming to #NeurIPS2025! We have an amazing lineup of speakers! We call for papers & demos (due on August 22)! See you in San Diego!🏖️ @chrisdonahuey @Ilaria__Manco @zawazaw @huangcza @McAuleyLabUCSD @zacknovack @NeurIPSConf

3

12

53

Yongyi Zang

@yongyi_zang

5 months

🚨New Audio Benchmark 🚨We find standard LLMs can solve Music-QA benchmarks by just guessing from text only, + LALMs can still answer well when given noise instead of music! Presenting RUListening: A fully automated pipeline for making Audio-QA benchmarks *actually* assess

github.com

Official Repository for ISMIR 2025 paper "Are you really listening? Boosting Perceptual Awareness in Music-QA Benchmarks" - yongyizang/AreYouReallyListening

1

9

29

Danlu Chen

@danlu_ai

1 year

Can ancient (logograhpic) languages from 5,000 years ago be processed like modern ones using NLP? We found visual representation-based system for NLP on ancient logographic languages outperforms conventional Latin transliteration! Join us at Poster s3 - Mon 4pm #ACL2024 #NLProc

6

37

160

Zachary Novack

@zacknovack

1 year

Ultra-fast text-to-music generation w/o degrading quality? Introducing Presto! Distilling Steps and Layers for Accelerating Music Generation 🎹: https://t.co/kTTAYKKtTU 📖: https://t.co/Newhxe6lI6 w/@__gzhu__ @CasebeerJonah @BergKirkpatrick @McAuleyLabUCSD @NicholasJBryan 🧵

1

23

91

Ivan Lee

@ivn1e

2 years

@nanjiangwill @BergKirkpatrick Since most of our chosen architectures are attention-free, what mechanism, analogous to induction heads, facilitates a similar role in ICL? We hope to explore such questions in future work. Thanks for reading! Code: https://t.co/8L8GSZpn5L Paper: https://t.co/WE3gnWSjM1 10/10

0

Ivan Lee

@ivn1e

2 years

@nanjiangwill @BergKirkpatrick Finally, we evaluate architectures on language modeling. Mamba was the only one to reach parity with transformers, followed by Hyena and RWKV. Most exhibit an abrupt improvement in ICL score, a behavior associated with the formation of induction heads (Olsson 2022). 9/10

1

0

3

Ivan Lee

@ivn1e

2 years

@nanjiangwill @BergKirkpatrick We also study a setting where models are given the option to either memorize or perform ICL. We find that transformers with rotary embeddings and Hyena strongly prefer ICL over memorization. Surprisingly, RetNet almost always chooses to memorize. 8/10

1

3

Ivan Lee

@ivn1e

2 years

@nanjiangwill @BergKirkpatrick While all architectures are capable of multiclass classification, all except for RNNs and CNNs perform better than logistic regression (black) in the most difficult setting. However, performance degrades quickly as we extrapolate beyond training lengths. 7/10

1

0

1

Ivan Lee

@ivn1e

2 years

@nanjiangwill @BergKirkpatrick Again, all are capable of linear regression. Specifically, Mamba, RetNet, and transformers achieved performance comparable to that of ridge regression (black) for context lengths seen during training. While no architecture extrapolated well, RetNet proved the most stable. 6/10

1

0

3

Ivan Lee

@ivn1e

2 years

@nanjiangwill @BergKirkpatrick All are capable of associative recall, with transformers, RetNet, Hyena, Mamba, and RWKV performing best as the difficulty increases. The latter three, in particular, excel when extrapolating beyond the number of examples seen during training (right of vertical line). 5/10

1

0

1

Ivan Lee

@ivn1e

2 years

@nanjiangwill @BergKirkpatrick In short, we find that all architectures are capable of ICL, even RNNs and CNNs. Not surprisingly, transformers are strong in-context learners. However, a number of alternatives such as RWKV, RetNet, Mamba, and Hyena prove to be equally, and sometimes more, capable. 4/10

1

5

Ivan Lee

@ivn1e

2 years

@nanjiangwill @BergKirkpatrick To address this, we study ICL in controlled, synthetic environments that eliminate the possibility of memorization: we train models from scratch to take a labeled dataset as input and predict the result of learning from this data in the forward-pass. 3/10

1

0

2

Ivan Lee

@ivn1e

2 years

@nanjiangwill @BergKirkpatrick Studying ICL in LLMs is challenging. Are these models truly learning new predictors during the forward-pass (ICL), or do in-context examples simply focus the model on aspects of knowledge already acquired during gradient-based pretraining (memorization)? 2/10

1

0

2

Ivan Lee

@ivn1e

2 years

Is attention required for ICL? We explore this question in our #ICLR2024 paper Exploring the Relationship Between Model Architecture and In-Context Learning Ability. Code: https://t.co/8L8GSZpn5L Paper: https://t.co/WE3gnWSjM1 with @nanjiangwill and @BergKirkpatrick 1/10

1

3

27