Julian Asilis
@julian_asilis
Followers
68
Following
74
Media
2
Statuses
18
Computer Science Ph.D. student @USC. NSF Graduate Research Fellow @NSFGRFP. Symbol pusher, counterexample searcher.
Los Angeles, CA
Joined October 2016
Textual steering vectors can improve visual understanding in multimodal LLMs! You can extract steering vectors via any interpretability toolkit you like -- SAEs, MeanShift, Probes -- and apply them to image or text tokens (or both) of Multimodal LLMs. And They Steer!
1
14
47
Thanks for sharing our work!
Tina: Tiny Reasoning Models via LoRA "the best Tina model achieves a >20% reasoning performance increase and 43.33% Pass@1 accuracy on AIME24, at only $9 USD post-training and evaluation cost (i.e., an estimated 260x cost reduction). Our work reveals the surprising effectiveness
0
0
5
😋 Want strong LLM reasoning without breaking the bank? We explored just how cost-effectively RL can enhance reasoning using LoRA! [1/9] Introducing Tina: A family of tiny reasoning models with strong performance at low cost, providing an accessible testbed for RL reasoning. 🧵
2
67
371
I'll be presenting our work on transductive learning at 4:30pm today in West Ballroom A-D, poster #5708! Swing by to hear me and @sid_devic talk about a wide-ranging compactness property in supervised learning :)
0
0
12
✨NEW PREPRINT on understanding Transformers mechanisms in performing in-context learning (ICL). In this paper, we demonstrate that Transformers learn to implement higher-order optimization methods to perform ICL, instead of Gradient Descent.
3
27
131
Some great related work: the seminal https://t.co/Asn6OlCtGA, and more recently https://t.co/YnehTBNL6t and https://t.co/rJXvqQlTFK.
0
0
3
Joint work with the wonderful @sid_devic, Shaddin Dughmi, Vatsal Sharan, and Shang-Hua Teng! Providing algorithmic characterizations of optimal learners has been fascinating and challenging, and I am happy to discuss anything related!
1
0
3
Our proofs utilize properties of the beautiful one-inclusion graph (OIG) algorithm for learning, which turns learning into a purely combinatorial problem. Along the way, we are also able to extend the OIG algorithm to the agnostic setting, which may be of broader interest.
1
0
3
We also provide a randomized variant of this scheme based on the maximum entropy principle, which has connections to Bayesian inference. This randomized learner is also optimal in the agnostic setting, with slight modifications.
1
0
3
Interestingly, this regularizer will have different values based on the test point that the learner is being evaluated on! This is necessary due to prior work [ https://t.co/YnehTBNL6t].
1
0
3
In new work with @sid_devic, Shaddin Dughmi, Vatsal Sharan, and Shang-Hua Teng, we show that there exists an optimal learner that (1) first constructs a regularizer through unsupervised pre-training, and (2) performs structural risk minimization with this regularization term.
1
0
3
New paper on optimal learning for multiclass classification: https://t.co/XUO3uk4Sg3. ERM is known to fail for even simple multiclass problems. So which algorithms should one use? We show that a generalization of structural risk minimization (SRM) characterizes optimal learning.
2
6
20