Dimitris Bertsimas @dbertsim X Profile

Dimitris Bertsimas

@dbertsim

Followers

3K

Following

18

Media

1

Statuses

53

MIT professor, analytics, optimizer, Machine Learner, entrepreneur, philatelist

https://t.co/piTDzLLRsv

Belmont, MA

Joined August 2017

Don't wanna be here? Send us removal request.

arXiv math.OC Optimization and Control

@mathOCb

2 years

Dimitris Bertsimas, Georgios Margaritis: Global Optimization: A Machine Learning Approach https://t.co/rxYzJBkhaQ

arxiv.org

Many approaches for addressing Global Optimization problems typically rely on relaxations of nonlinear constraints over specific mathematical primitives. This is restricting in applications with...

0

15

60

JAMA Surgery

@JAMASurgery

2 years

Interpretable machine learning methodologies are powerful tools to diagnose and remedy system-related bias in care, such as disparities in access to postinjury rehabilitation care. https://t.co/X7U8mJQtzc @hayfarani @dbertsim @AnthonyGebran @LMaurerMD

0

8

18

Wes Gurnee

@wesg52

3 years

This paper would not have been possible without my coauthors @NeelNanda5, Matthew Pauly, Katherine Harvey, @mitroitskii, and @dbertsim or all the foundational and inspirational work from @ch402, @boknilev, and many others! Read the full paper:

arxiv.org

Despite rapid adoption and deployment of large language models (LLMs), the internal computations of these models remain opaque and poorly understood. In this work, we seek to understand how...

2

3

44

Wes Gurnee

@wesg52

3 years

Precision and recall can also be helpful guides, and remind us that it should not be assumed a model will learn to represent features in an ontology convenient or familiar to humans.

2

1

23

Wes Gurnee

@wesg52

3 years

While we found tons of interesting neurons with sparse probing, it requires careful follow up analysis to draw more rigorous conclusions. E.g., athlete neurons turn out to be more general sport neurons when analyzing max average activating tokens.

1

2

22

Wes Gurnee

@wesg52

3 years

What happens with scale? We find representational sparsity increases on average, but different features obey different scaling dynamics. In particular, quantization and neuron splitting: features both emerge and split into finer grained features.

1

3

19

Wes Gurnee

@wesg52

3 years

Results in toy models from @AnthropicAI and @ch402 suggest a potential mechanistic fingerprint of superposition: large MLP weight norms and negative biases. We find a striking drop in early layers in the Pythia models from @AiEleuther and @BlancheMinerva.

1

3

30

Wes Gurnee

@wesg52

3 years

Early layers seem to use sparse combinations of neurons to represent many features in superposition. That is, using the activations of multiple polysemantic neurons to boost the signal of the true feature over all interfering features (here “social security” vs. adjacent bigrams)

1

3

37

Wes Gurnee

@wesg52

3 years

But what if there are more features than there are neurons? This results in polysemantic neurons which fire for a large set of unrelated features. Here we show a single early layer neuron which activates for a large collection of unrelated n-grams.

1

3

38

Wes Gurnee

@wesg52

3 years

Neural nets are often thought of as feature extractors. But what features are neurons in LLMs actually extracting? In our new paper, we leverage sparse probing to find out https://t.co/hZkFK6aI38. A 🧵:

10

126

696

Wes Gurnee

@wesg52

3 years

One large family of neurons we find are “context” neurons, which activate only for tokens in a particular context (French, Python code, US patent documents, etc). When deleting these neurons the loss increases in the relevant context while leaving other contexts unaffected!

3

12

112

Dimitris Bertsimas

@dbertsim

3 years

Reducing overall deaths and increase access for patients waiting for lung transplants. https://t.co/IoiB2ooJQq

prnewswire.com

/PRNewswire/ -- United Network for Organ Sharing (UNOS) has rolled out a new organ allocation system designed to be more equitable and effective for patients...

0

14

Dimitris Bertsimas

@dbertsim

3 years

As part of HIAS and together with Professor Georgios Stamou from NTUA, Greece we are offering a course on Universal AI (in English, free of charge) https://t.co/oeGZ6b1zck on July 3-5, 2023 in Athens, Greece. Prospective participants can declare their interest in the website.

0

3

19

Ryan Cory-Wright

@RyanCoryWright

3 years

Delighted to share that our paper "A new perspective on low-rank optimization" has just been accepted for publication by Math Programming! Valid & often strong lower bounds on low-rank problems via a generalization of the perspective reformulation from mixed-integer optimization

Ryan Cory-Wright

@RyanCoryWright

5 years

Excited to share a new paper with @dbertsim and Jean Pauphilet on a matrix perspective reformulation technique for strong relaxations of low-rank problems. Applications in reduced-rank regression and D-optimal experimental design:

1

16

Ryan Cory-Wright

@RyanCoryWright

3 years

📢New preprint alert! https://t.co/1o48udkqwk We use sampling schemes and clustering to improve the scalability of deterministic Bender's decomposition on data-driven network design problems, while maintaining optimality. w/ @dbertsim, Jean Pauphilet, and Periklis Petridis

arxiv.org

Network design problems involve constructing edges in a transportation or supply chain network to minimize construction and daily operational costs. We study a stochastic version where operational...

1

2

19

Dimitris Bertsimas

@dbertsim

3 years

The paper presents a novel holistic deep learning framework that improves accuracy, robustness, sparsity, and stability over standard deep learning models, as demonstrated by extensive experiments on both tabular and image data sets.

0

1

35

Dimitris Bertsimas

@dbertsim

3 years

My book with David Gamarnik “Queueing Theory: Classical and Modern Methods” was published. It was a long journey that lasted two decades but both of us are delighted with the journeys completion. For more details see https://t.co/xfpqQ3itJU

5

22

182

npj Digital Medicine

@npjDigitalMed

3 years

The Holistic AI in Medicine (HAIM) framework from @dbertsim et al. in @AIHealthMIT is a pipeline to receive multimodal patient data + use generalizable pre-processing + #machinelearning modelling stages adaptable to multiple health related tasks. https://t.co/uyeq6yV3rp

0

4

9

Stefanos Kechagias

@StefanosKe

3 years

If you are into #MachineLearning and #Statistics check this out. I would also highly recommend the Machine Learning under a modern optimization lens book by @dbertsim and Dunn. Here are two two teaser must watch imo youtube videos https://t.co/uZkrt1XxLo

Christoph Molnar 🦋 christophmolnar.bsky.social

@ChristophMolnar

3 years

One of the best arguments for supervised learning was made by a statistician. Statistical Modeling: The Two Cultures Every modeler should read it. The paper is written by Leo Breiman, the inventor of Random Forests. https://t.co/x0jOAKxpDF

0

2

MIT Jameel Clinic for AI & Health

@AIHealthMIT

3 years

pipelines that can consistently be applied to train multimodal AI/ML systems & outperform their single-modality counterparts has remained challenging. #JameelClinic faculty lead @dbertsim, executive director @ifuentes3, postdoc @lrsoenksen, Yu Ma, @CynthiaZeng1,... (2/4)

1

3