Julia Kempe @KempeLab X Profile

Julia Kempe

@KempeLab

Followers

2K

Following

185

Media

50

Statuses

128

Silver Professor at NYU Courant and CDS, Research Scientist at FAIR Research in Machine Learning, past in Quantum Computing & Finance. Posts my own.

Joined April 2024

Don't wanna be here? Send us removal request.

Julia Kempe

@KempeLab

3 days

RT @SimonsFdn: Our new Simons Collaboration on the Physics of Learning and Neural Computation will employ and develop powerful tools from #….

0

30

0

Julia Kempe

@KempeLab

1 month

Check out the full paper here: Joint work with @is_labiad @mathuvu_ Matthieu Kowalski Marc Schoenauer @alessandroleite @teytaud !.@AIatMeta @NYUDataScience.(8/8).

arxiv.org

Gradient-based optimization is the workhorse of deep learning, offering efficient and scalable training via backpropagation. However, its reliance on large volumes of labeled data raises privacy...

0

1

7

Grok

@grok

2 days

What do you want to know?.

122

24

256

Julia Kempe

@KempeLab

1 month

What if someone tries to extract your training data ?.Then you can wish them luck ! Since BBoxER only relies on the compressed dataset through the optimization trace, it is extremely unlikely to recover the training data. (7/8).

1

0

2

Julia Kempe

@KempeLab

1 month

Are you worried that someone might have poisoned the training data to influence the outcome ?.Our bound shows that modifying the optimization trace is very unlikely as a function of dataset size. More data = more robustness ! (6/8)

1

0

3

Julia Kempe

@KempeLab

1 month

You want to learn directly on user preferences while protecting the privacy of their prompts and outputs ?.No problem, we got you covered !. Unlike gradient based approaches, BBoxER only relies on the optimization trace, providing privacy by design. (5/8).

1

0

2

Julia Kempe

@KempeLab

1 month

This comparison-based approach yields non-vacuous generalization bounds for LLMs that depend on the algorithm rather than model capacity. Using concentration inequalities we obtain bounds on the number of allowed iterations for generalization as a function of dataset size. (4/8).

1

0

2

Julia Kempe

@KempeLab

1 month

BBoxER only depends on the data through a compression bottleneck: the optimization trace of model comparisons. Which allows us to derive strong privacy and robustness. (3/8)

1

0

2

Julia Kempe

@KempeLab

1 month

We introduce BBoxER, a comparison-based black-box retrofitting method applicable after pretraining, fine-tuning, or reinforcement learning loops. BBoxER requires no gradient access and integrates seamlessly with existing black-box libraries and algorithms. (2/8).

1

0

2

Julia Kempe

@KempeLab

1 month

Black-box Optimization for LLM Post-Training 💪.Strong non-vacuous generalization bounds ✔️.Privacy by design ✔️.Robustness to poisoning and data extraction ✔️.Improvement on reasoning benchmarks ✔️.@AIatMeta @NYUDataScience.(1/8)

1

11

20

Julia Kempe

@KempeLab

1 month

RT @karen_ullrich: How would you make an LLM "forget" the concept of dog — or any other arbitrary concept? 🐶❓. We introduce SAMD & SAMI — a….

0

12

0

Julia Kempe

@KempeLab

2 months

RT @arnal_charles: ❓How to balance negative and positive rewards in off-policy RL❓. In Asymmetric REINFORCE for off-Policy RL, we show that….

0

28

0

Julia Kempe

@KempeLab

4 months

RT @NYUDataScience: Congrats to 37 CDS researchers — faculty, postdocs, and PhD students — who had papers accepted to ICLR 2025, including….

nyudatascience.medium.com

Thirty-seven CDS researchers had papers accepted to ICLR 2025, with several receiving Spotlight recognition.

0

2

0

Julia Kempe

@KempeLab

4 months

RT @feeelix_feng: Check out our poster tmr at 10am at the ICLR Bidirectional Human-AI Alignment workshop! We cover how on-policy preference….

0

8

0

Julia Kempe

@KempeLab

4 months

Here is to a next generation of AI-literate kids!.International AI Olympiad ML Researchers, you might appreciate the impressive syllabus. Do we have all the chops our kids are expected to have :) ? .

ioai-official.org

0

1

8

Julia Kempe

@KempeLab

4 months

RT @arvysogorets: If in Singapore next week, come by our #ICLR2025 Spotlight poster for our recent study at @KempeLab unveiling how data pr….

0

2

0

Julia Kempe

@KempeLab

4 months

RT @dajmeyer: @KempeLab 😆.

0

1

0

Julia Kempe

@KempeLab

4 months

Thanks to wonderful coauthors:.@dohmatobelvis @feeelix_feng @arvysogorets @KartikAhuja1 @arjunsubgraph @f_charton @yangpuPKU @galvardi @AIatMeta @NYUDataScience and the ICLR PC @iclr_conf for unanimously upholding standards of rigor and ethical conduct!.

1

0

5

Julia Kempe

@KempeLab

4 months

Our ICLR25 papers:.🎉ICLR Spotlight: Strong Model Collapse 🎉ICLR Spotlight: DRoP: Distributionally Robust Data Pruning Beyond Model Collapse Flavors of Margin More details here soon!.

arxiv.org

We study the implicit bias of the general family of steepest descent algorithms with infinitesimal learning rate in deep homogeneous neural networks. We show that: (a) an algorithm-dependent...

3

17

154

Julia Kempe

@KempeLab

4 months

RT @dohmatobelvis: We refused to cite the paper due to severe misconduct of the authors of that paper: plagiarism of our own prior work,….

0

29

0

Julia Kempe

@KempeLab

5 months

It is a real delight to work with @dohmatobelvis and I encourage every student in search of excellent and rigorous mentorship to apply to his group!.

Elvis Dohmatob

@dohmatobelvis

5 months

Papers accepted at @iclr_conf 2025: . - An Effective Theory of Bias Amplification - Pitfalls of Memorization - Strong Model Collapse - Beyond Model Collapse With @KempeLab,.

0

2

12