Julian Asilis Profile
Julian Asilis

@julian_asilis

Followers
68
Following
74
Media
2
Statuses
18

Computer Science Ph.D. student @USC. NSF Graduate Research Fellow @NSFGRFP. Symbol pusher, counterexample searcher.

Los Angeles, CA
Joined October 2016
Don't wanna be here? Send us removal request.
@UpupWang
Shangshang Wang
5 months
Sparse autoencoders (SAEs) can be used to elicit strong reasoning abilities with remarkable efficiency. Using only 1 hour of training at $2 cost without any reasoning traces, we find a way to train 1.5B models via SAEs to score 43.33% Pass@1 on AIME24 and 90% Pass@1 on AMC23.
10
58
504
@DeqingFu
Deqing Fu
6 months
Textual steering vectors can improve visual understanding in multimodal LLMs! You can extract steering vectors via any interpretability toolkit you like -- SAEs, MeanShift, Probes -- and apply them to image or text tokens (or both) of Multimodal LLMs. And They Steer!
1
14
47
@julian_asilis
Julian Asilis
7 months
Thanks for sharing our work!
@iScienceLuvr
Tanishq Mathew Abraham, Ph.D.
7 months
Tina: Tiny Reasoning Models via LoRA "the best Tina model achieves a >20% reasoning performance increase and 43.33% Pass@1 accuracy on AIME24, at only $9 USD post-training and evaluation cost (i.e., an estimated 260x cost reduction). Our work reveals the surprising effectiveness
0
0
5
@UpupWang
Shangshang Wang
7 months
😋 Want strong LLM reasoning without breaking the bank? We explored just how cost-effectively RL can enhance reasoning using LoRA! [1/9] Introducing Tina: A family of tiny reasoning models with strong performance at low cost, providing an accessible testbed for RL reasoning. 🧵
2
67
371
@julian_asilis
Julian Asilis
1 year
I'll be presenting our work on transductive learning at 4:30pm today in West Ballroom A-D, poster #5708! Swing by to hear me and @sid_devic talk about a wide-ranging compactness property in supervised learning :)
0
0
12
@DeqingFu
Deqing Fu
2 years
✨NEW PREPRINT on understanding Transformers mechanisms in performing in-context learning (ICL). In this paper, we demonstrate that Transformers learn to implement higher-order optimization methods to perform ICL, instead of Gradient Descent.
3
27
131
@julian_asilis
Julian Asilis
2 years
Some great related work: the seminal https://t.co/Asn6OlCtGA, and more recently https://t.co/YnehTBNL6t and https://t.co/rJXvqQlTFK.
0
0
3
@julian_asilis
Julian Asilis
2 years
Joint work with the wonderful @sid_devic, Shaddin Dughmi, Vatsal Sharan, and Shang-Hua Teng! Providing algorithmic characterizations of optimal learners has been fascinating and challenging, and I am happy to discuss anything related!
1
0
3
@julian_asilis
Julian Asilis
2 years
Our proofs utilize properties of the beautiful one-inclusion graph (OIG) algorithm for learning, which turns learning into a purely combinatorial problem. Along the way, we are also able to extend the OIG algorithm to the agnostic setting, which may be of broader interest.
1
0
3
@julian_asilis
Julian Asilis
2 years
We also provide a randomized variant of this scheme based on the maximum entropy principle, which has connections to Bayesian inference. This randomized learner is also optimal in the agnostic setting, with slight modifications.
1
0
3
@julian_asilis
Julian Asilis
2 years
Interestingly, this regularizer will have different values based on the test point that the learner is being evaluated on! This is necessary due to prior work [ https://t.co/YnehTBNL6t].
1
0
3
@julian_asilis
Julian Asilis
2 years
In new work with @sid_devic, Shaddin Dughmi, Vatsal Sharan, and Shang-Hua Teng, we show that there exists an optimal learner that (1) first constructs a regularizer through unsupervised pre-training, and (2) performs structural risk minimization with this regularization term.
1
0
3
@julian_asilis
Julian Asilis
2 years
New paper on optimal learning for multiclass classification: https://t.co/XUO3uk4Sg3. ERM is known to fail for even simple multiclass problems. So which algorithms should one use? We show that a generalization of structural risk minimization (SRM) characterizes optimal learning.
2
6
20