Dimitris Bertsimas Profile
Dimitris Bertsimas

@dbertsim

Followers
3K
Following
18
Media
1
Statuses
53

MIT professor, analytics, optimizer, Machine Learner, entrepreneur, philatelist

Belmont, MA
Joined August 2017
Don't wanna be here? Send us removal request.
@JAMASurgery
JAMA Surgery
2 years
Interpretable machine learning methodologies are powerful tools to diagnose and remedy system-related bias in care, such as disparities in access to postinjury rehabilitation care. https://t.co/X7U8mJQtzc @hayfarani @dbertsim @AnthonyGebran @LMaurerMD
0
8
18
@wesg52
Wes Gurnee
3 years
This paper would not have been possible without my coauthors @NeelNanda5, Matthew Pauly, Katherine Harvey, @mitroitskii, and @dbertsim or all the foundational and inspirational work from @ch402, @boknilev, and many others! Read the full paper:
Tweet card summary image
arxiv.org
Despite rapid adoption and deployment of large language models (LLMs), the internal computations of these models remain opaque and poorly understood. In this work, we seek to understand how...
2
3
44
@wesg52
Wes Gurnee
3 years
Precision and recall can also be helpful guides, and remind us that it should not be assumed a model will learn to represent features in an ontology convenient or familiar to humans.
2
1
23
@wesg52
Wes Gurnee
3 years
While we found tons of interesting neurons with sparse probing, it requires careful follow up analysis to draw more rigorous conclusions. E.g., athlete neurons turn out to be more general sport neurons when analyzing max average activating tokens.
1
2
22
@wesg52
Wes Gurnee
3 years
What happens with scale? We find representational sparsity increases on average, but different features obey different scaling dynamics. In particular, quantization and neuron splitting: features both emerge and split into finer grained features.
1
3
19
@wesg52
Wes Gurnee
3 years
Results in toy models from @AnthropicAI and @ch402 suggest a potential mechanistic fingerprint of superposition: large MLP weight norms and negative biases. We find a striking drop in early layers in the Pythia models from @AiEleuther and @BlancheMinerva.
1
3
30
@wesg52
Wes Gurnee
3 years
Early layers seem to use sparse combinations of neurons to represent many features in superposition. That is, using the activations of multiple polysemantic neurons to boost the signal of the true feature over all interfering features (here “social security” vs. adjacent bigrams)
1
3
37
@wesg52
Wes Gurnee
3 years
But what if there are more features than there are neurons? This results in polysemantic neurons which fire for a large set of unrelated features. Here we show a single early layer neuron which activates for a large collection of unrelated n-grams.
1
3
38
@wesg52
Wes Gurnee
3 years
Neural nets are often thought of as feature extractors. But what features are neurons in LLMs actually extracting? In our new paper, we leverage sparse probing to find out https://t.co/hZkFK6aI38. A 🧵:
10
126
696
@wesg52
Wes Gurnee
3 years
One large family of neurons we find are “context” neurons, which activate only for tokens in a particular context (French, Python code, US patent documents, etc). When deleting these neurons the loss increases in the relevant context while leaving other contexts unaffected!
3
12
112
@dbertsim
Dimitris Bertsimas
3 years
As part of HIAS and together with Professor Georgios Stamou from NTUA, Greece we are offering a course on Universal AI (in English, free of charge) https://t.co/oeGZ6b1zck on July 3-5, 2023 in Athens, Greece. Prospective participants can declare their interest in the website.
0
3
19
@RyanCoryWright
Ryan Cory-Wright
3 years
Delighted to share that our paper "A new perspective on low-rank optimization" has just been accepted for publication by Math Programming! Valid & often strong lower bounds on low-rank problems via a generalization of the perspective reformulation from mixed-integer optimization
@RyanCoryWright
Ryan Cory-Wright
5 years
Excited to share a new paper with @dbertsim and Jean Pauphilet on a matrix perspective reformulation technique for strong relaxations of low-rank problems. Applications in reduced-rank regression and D-optimal experimental design:
1
1
16
@RyanCoryWright
Ryan Cory-Wright
3 years
📢New preprint alert! https://t.co/1o48udkqwk We use sampling schemes and clustering to improve the scalability of deterministic Bender's decomposition on data-driven network design problems, while maintaining optimality. w/ @dbertsim, Jean Pauphilet, and Periklis Petridis
Tweet card summary image
arxiv.org
Network design problems involve constructing edges in a transportation or supply chain network to minimize construction and daily operational costs. We study a stochastic version where operational...
1
2
19
@dbertsim
Dimitris Bertsimas
3 years
The paper presents a novel holistic deep learning framework that improves accuracy, robustness, sparsity, and stability over standard deep learning models, as demonstrated by extensive experiments on both tabular and image data sets.
0
1
35
@dbertsim
Dimitris Bertsimas
3 years
My book with David Gamarnik “Queueing Theory: Classical and Modern Methods” was published. It was a long journey that lasted two decades but both of us are delighted with the journeys completion. For more details see https://t.co/xfpqQ3itJU
5
22
182
@npjDigitalMed
npj Digital Medicine
3 years
The Holistic AI in Medicine (HAIM) framework from @dbertsim et al. in @AIHealthMIT is a pipeline to receive multimodal patient data + use generalizable pre-processing + #machinelearning modelling stages adaptable to multiple health related tasks. https://t.co/uyeq6yV3rp
0
4
9
@StefanosKe
Stefanos Kechagias
3 years
If you are into #MachineLearning and #Statistics check this out. I would also highly recommend the Machine Learning under a modern optimization lens book by @dbertsim and Dunn. Here are two two teaser must watch imo youtube videos https://t.co/uZkrt1XxLo
@ChristophMolnar
Christoph Molnar 🦋 christophmolnar.bsky.social
3 years
One of the best arguments for supervised learning was made by a statistician. Statistical Modeling: The Two Cultures Every modeler should read it. The paper is written by Leo Breiman, the inventor of Random Forests. https://t.co/x0jOAKxpDF
0
2
2
@AIHealthMIT
MIT Jameel Clinic for AI & Health
3 years
pipelines that can consistently be applied to train multimodal AI/ML systems & outperform their single-modality counterparts has remained challenging. #JameelClinic faculty lead @dbertsim, executive director @ifuentes3, postdoc @lrsoenksen, Yu Ma, @CynthiaZeng1,... (2/4)
1
1
3