Julian Asilis @julian_asilis X Profile

Julian Asilis

@julian_asilis

Followers

68

Following

74

Media

2

Statuses

18

Computer Science Ph.D. student @USC. NSF Graduate Research Fellow @NSFGRFP. Symbol pusher, counterexample searcher.

Los Angeles, CA

Joined October 2016

Don't wanna be here? Send us removal request.

Shangshang Wang

@UpupWang

5 months

Sparse autoencoders (SAEs) can be used to elicit strong reasoning abilities with remarkable efficiency. Using only 1 hour of training at $2 cost without any reasoning traces, we find a way to train 1.5B models via SAEs to score 43.33% Pass@1 on AIME24 and 90% Pass@1 on AMC23.

10

58

504

Deqing Fu

@DeqingFu

6 months

Textual steering vectors can improve visual understanding in multimodal LLMs! You can extract steering vectors via any interpretability toolkit you like -- SAEs, MeanShift, Probes -- and apply them to image or text tokens (or both) of Multimodal LLMs. And They Steer!

1

14

47

Julian Asilis

@julian_asilis

7 months

Thanks for sharing our work!

Tanishq Mathew Abraham, Ph.D.

@iScienceLuvr

7 months

Tina: Tiny Reasoning Models via LoRA "the best Tina model achieves a >20% reasoning performance increase and 43.33% Pass@1 accuracy on AIME24, at only $9 USD post-training and evaluation cost (i.e., an estimated 260x cost reduction). Our work reveals the surprising effectiveness

0

5

Shangshang Wang

@UpupWang

7 months

😋 Want strong LLM reasoning without breaking the bank? We explored just how cost-effectively RL can enhance reasoning using LoRA! [1/9] Introducing Tina: A family of tiny reasoning models with strong performance at low cost, providing an accessible testbed for RL reasoning. 🧵

2

67

371

Julian Asilis

@julian_asilis

1 year

I'll be presenting our work on transductive learning at 4:30pm today in West Ballroom A-D, poster #5708! Swing by to hear me and @sid_devic talk about a wide-ranging compactness property in supervised learning :)

0

12

Deqing Fu

@DeqingFu

2 years

✨NEW PREPRINT on understanding Transformers mechanisms in performing in-context learning (ICL). In this paper, we demonstrate that Transformers learn to implement higher-order optimization methods to perform ICL, instead of Gradient Descent.

3

27

131

Julian Asilis

@julian_asilis

2 years

Some great related work: the seminal https://t.co/Asn6OlCtGA, and more recently https://t.co/YnehTBNL6t and https://t.co/rJXvqQlTFK.

0

3

Julian Asilis

@julian_asilis

2 years

Joint work with the wonderful @sid_devic, Shaddin Dughmi, Vatsal Sharan, and Shang-Hua Teng! Providing algorithmic characterizations of optimal learners has been fascinating and challenging, and I am happy to discuss anything related!

1

0

3

Julian Asilis

@julian_asilis

2 years

Our proofs utilize properties of the beautiful one-inclusion graph (OIG) algorithm for learning, which turns learning into a purely combinatorial problem. Along the way, we are also able to extend the OIG algorithm to the agnostic setting, which may be of broader interest.

1

0

3

Julian Asilis

@julian_asilis

2 years

We also provide a randomized variant of this scheme based on the maximum entropy principle, which has connections to Bayesian inference. This randomized learner is also optimal in the agnostic setting, with slight modifications.

1

0

3

Julian Asilis

@julian_asilis

2 years

Interestingly, this regularizer will have different values based on the test point that the learner is being evaluated on! This is necessary due to prior work [ https://t.co/YnehTBNL6t].

1

0

3

Julian Asilis

@julian_asilis

2 years

In new work with @sid_devic, Shaddin Dughmi, Vatsal Sharan, and Shang-Hua Teng, we show that there exists an optimal learner that (1) first constructs a regularizer through unsupervised pre-training, and (2) performs structural risk minimization with this regularization term.

1

0

3

Julian Asilis

@julian_asilis

2 years

New paper on optimal learning for multiclass classification: https://t.co/XUO3uk4Sg3. ERM is known to fail for even simple multiclass problems. So which algorithms should one use? We show that a generalization of structural risk minimization (SRM) characterizes optimal learning.

2

6

20