Dimitris Bertsimas
@dbertsim
Followers
3K
Following
18
Media
1
Statuses
53
MIT professor, analytics, optimizer, Machine Learner, entrepreneur, philatelist
Belmont, MA
Joined August 2017
Dimitris Bertsimas, Georgios Margaritis: Global Optimization: A Machine Learning Approach https://t.co/rxYzJBkhaQ
arxiv.org
Many approaches for addressing Global Optimization problems typically rely on relaxations of nonlinear constraints over specific mathematical primitives. This is restricting in applications with...
0
15
60
Interpretable machine learning methodologies are powerful tools to diagnose and remedy system-related bias in care, such as disparities in access to postinjury rehabilitation care. https://t.co/X7U8mJQtzc
@hayfarani @dbertsim @AnthonyGebran @LMaurerMD
0
8
18
This paper would not have been possible without my coauthors @NeelNanda5, Matthew Pauly, Katherine Harvey, @mitroitskii, and @dbertsim or all the foundational and inspirational work from @ch402, @boknilev, and many others! Read the full paper:
arxiv.org
Despite rapid adoption and deployment of large language models (LLMs), the internal computations of these models remain opaque and poorly understood. In this work, we seek to understand how...
2
3
44
Precision and recall can also be helpful guides, and remind us that it should not be assumed a model will learn to represent features in an ontology convenient or familiar to humans.
2
1
23
While we found tons of interesting neurons with sparse probing, it requires careful follow up analysis to draw more rigorous conclusions. E.g., athlete neurons turn out to be more general sport neurons when analyzing max average activating tokens.
1
2
22
What happens with scale? We find representational sparsity increases on average, but different features obey different scaling dynamics. In particular, quantization and neuron splitting: features both emerge and split into finer grained features.
1
3
19
Results in toy models from @AnthropicAI and @ch402 suggest a potential mechanistic fingerprint of superposition: large MLP weight norms and negative biases. We find a striking drop in early layers in the Pythia models from @AiEleuther and @BlancheMinerva.
1
3
30
Early layers seem to use sparse combinations of neurons to represent many features in superposition. That is, using the activations of multiple polysemantic neurons to boost the signal of the true feature over all interfering features (here “social security” vs. adjacent bigrams)
1
3
37
But what if there are more features than there are neurons? This results in polysemantic neurons which fire for a large set of unrelated features. Here we show a single early layer neuron which activates for a large collection of unrelated n-grams.
1
3
38
Neural nets are often thought of as feature extractors. But what features are neurons in LLMs actually extracting? In our new paper, we leverage sparse probing to find out https://t.co/hZkFK6aI38. A 🧵:
10
126
696
One large family of neurons we find are “context” neurons, which activate only for tokens in a particular context (French, Python code, US patent documents, etc). When deleting these neurons the loss increases in the relevant context while leaving other contexts unaffected!
3
12
112
Reducing overall deaths and increase access for patients waiting for lung transplants. https://t.co/IoiB2ooJQq
prnewswire.com
/PRNewswire/ -- United Network for Organ Sharing (UNOS) has rolled out a new organ allocation system designed to be more equitable and effective for patients...
0
0
14
As part of HIAS and together with Professor Georgios Stamou from NTUA, Greece we are offering a course on Universal AI (in English, free of charge) https://t.co/oeGZ6b1zck on July 3-5, 2023 in Athens, Greece. Prospective participants can declare their interest in the website.
0
3
19
Delighted to share that our paper "A new perspective on low-rank optimization" has just been accepted for publication by Math Programming! Valid & often strong lower bounds on low-rank problems via a generalization of the perspective reformulation from mixed-integer optimization
Excited to share a new paper with @dbertsim and Jean Pauphilet on a matrix perspective reformulation technique for strong relaxations of low-rank problems. Applications in reduced-rank regression and D-optimal experimental design:
1
1
16
📢New preprint alert! https://t.co/1o48udkqwk We use sampling schemes and clustering to improve the scalability of deterministic Bender's decomposition on data-driven network design problems, while maintaining optimality. w/ @dbertsim, Jean Pauphilet, and Periklis Petridis
arxiv.org
Network design problems involve constructing edges in a transportation or supply chain network to minimize construction and daily operational costs. We study a stochastic version where operational...
1
2
19
The paper presents a novel holistic deep learning framework that improves accuracy, robustness, sparsity, and stability over standard deep learning models, as demonstrated by extensive experiments on both tabular and image data sets.
0
1
35
My book with David Gamarnik “Queueing Theory: Classical and Modern Methods” was published. It was a long journey that lasted two decades but both of us are delighted with the journeys completion. For more details see https://t.co/xfpqQ3itJU
5
22
182
The Holistic AI in Medicine (HAIM) framework from @dbertsim et al. in @AIHealthMIT is a pipeline to receive multimodal patient data + use generalizable pre-processing + #machinelearning modelling stages adaptable to multiple health related tasks. https://t.co/uyeq6yV3rp
0
4
9
If you are into #MachineLearning and #Statistics check this out. I would also highly recommend the Machine Learning under a modern optimization lens book by @dbertsim and Dunn. Here are two two teaser must watch imo youtube videos https://t.co/uZkrt1XxLo
One of the best arguments for supervised learning was made by a statistician. Statistical Modeling: The Two Cultures Every modeler should read it. The paper is written by Leo Breiman, the inventor of Random Forests. https://t.co/x0jOAKxpDF
0
2
2
pipelines that can consistently be applied to train multimodal AI/ML systems & outperform their single-modality counterparts has remained challenging. #JameelClinic faculty lead @dbertsim, executive director @ifuentes3, postdoc @lrsoenksen, Yu Ma, @CynthiaZeng1,... (2/4)
1
1
3