mrdrozdov Profile Banner
Andrew Drozdov Profile
Andrew Drozdov

@mrdrozdov

Followers
3K
Following
13K
Media
629
Statuses
13K

Senior Research Scientist @ Databricks

NYC
Joined August 2010
Don't wanna be here? Send us removal request.
@audreyccheng
Audrey Cheng
1 day
What is AI not good at? Database transactions! In our latest blog post, we dive into how AI could not find better scheduling algorithm than our VLDB '24 paper (in collaboration with @pbailis, @siobhcroo, @istoica05, and many others).
@ai4research_ucb
AI-Driven Research Systems
1 day
🎯 AI discovers an algorithm that makes database transaction schedules 34% faster [ADRS Blog #4] We revisit a classic database problem: dealing with transactional contention. Starting with a state-of-the-art algorithm from our VLDB '24 paper, we use OpenEvolve to automatically
2
4
19
@srush_nlp
Sasha Rush
4 days
Pretty cool event next Wed (11/19) at South Park Commons NYC: @jefrankle and @soumithchintala . https://t.co/jtouZRpQMj
4
6
95
@matei_zaharia
Matei Zaharia
4 days
Great new capability in Databricks powered by our AI research team! We trained a document parsing system that delivers leading quality at 3-5x lower cost and outperforms leading VLMs like GPT-5 and Claude. This is critical to connect AI to so many kinds of data.
@databricks
Databricks
4 days
80% of enterprise data is unstructured, locked in PDFs, reports, and diagrams that traditional tools can’t parse or govern. Introducing ai_parse_document, state-of-the-art document intelligence on Databricks. With a single SQL command, teams can now turn any document into
11
29
210
@mrdrozdov
Andrew Drozdov
4 days
Nandan is incredible at everything he does — he brings fresh energy into every new project and is an inspirational collaborator. If you haven't checked out FreshStack yet, you're missing out!
@beirmug
Nandan Thakur
4 days
Had fun designing the FreshStack #NeurIPS2025 D&B poster! ❤️ FreshStack will be presented in San Diego by @DbrxMosaicAI! ☀️🇺🇸 Thanks to all my co-authors: @lateinteraction @mrdrozdov @sam_havens @mcarbin @lintool!
1
0
14
@beirmug
Nandan Thakur
4 days
Had fun designing the FreshStack #NeurIPS2025 D&B poster! ❤️ FreshStack will be presented in San Diego by @DbrxMosaicAI! ☀️🇺🇸 Thanks to all my co-authors: @lateinteraction @mrdrozdov @sam_havens @mcarbin @lintool!
@beirmug
Nandan Thakur
7 months
Existing IR/RAG benchmarks are unrealistic: they’re often derived from easily retrievable topics, rather than grounded in solving real user problems. 🧵Introducing 𝐅𝐫𝐞𝐬𝐡𝐒𝐭𝐚𝐜𝐤, a challenging RAG benchmark on niche, recent topics. Work done during intern @databricks 🧱
3
5
18
@srush_nlp
Sasha Rush
5 days
Talk at Ray Summit on "Building Cursor Composer." Overview of the work from our research team. https://t.co/9a5yeC3IT8
8
43
336
@emnlpmeeting
EMNLP 2025
7 days
🎉 Congratulations to all #EMNLP2025 award winners 🎉 Starting with the ✨Best Paper award ✨: "Infini-gram mini: Exact n-gram Search at the Internet Scale with FM-Index" by Hao Xu, Jiacheng Liu, Yejin Choi, Noah A. Smith, and Hannaneh Hajishirzi https://t.co/DKYhylaopF 1/n
2
32
219
@andrewgwils
Andrew Gordon Wilson
7 days
In many industry frontier labs, there’s a perceived tension between breadth and depth. It’s often missed that breadth *enables* meaningful depth. It may seem like you are advancing a frontier, but you are in fact in a myopic echo chamber. ML theory suffers badly from this effect.
8
8
157
@beirmug
Nandan Thakur
7 days
Evaluated ModernBERT variants on the FreshStack leaderboard! (i) GTE (ModernBERT) (ii) IBM Granite (and small) english R2 Outperforms Embedding Gemma 300M despite being 149M params. Poster and other updates coming soon!
0
4
7
@DBahdanau
🇺🇦 Dzmitry Bahdanau
8 days
i've been waiting for this moment since our initial PipelineRL blog post in May :) 🕺🕺🕺
@hamishivi
Hamish Ivison
9 days
to continue the PipelineRL glazing, @finbarrtimbers implemented PipelineRL for open-instruct a little bit ago and it ended up being probably the single biggest speedup to our overall pipeline. We went from 2-week long RL runs to 5-day runs, without sacrificing performance
2
7
100
@khoomeik
Rohan Pandey
6 days
people often take deep learning as synonymous with backprop, but deep networks were originally trained with probabilistic energy-based methods! found this great talk by hinton from 2012 about EBMs, boltzmann machines, and deep belief nets at the start of the deep learning era
11
21
201
@mrdrozdov
Andrew Drozdov
6 days
Free name for anyone looking to start a new vector DB company: Chunk E. Cheese
0
0
6
@mrdrozdov
Andrew Drozdov
6 days
Vector search is comically good.
1
0
4
@gneubig
Graham Neubig
8 days
It's rare nowadays to find something that is intuitively important and not yet done well by any major language models. But *precisely aggregating lots of information over long contexts* is one of those things. Our new benchmark Oolong tests this ability, see the 🧵 for more!
@abertsch72
Amanda Bertsch
8 days
Can LLMs accurately aggregate information over long, information-dense texts? Not yet… We introduce Oolong, a dataset of simple-to-verify information aggregation questions over long inputs. No model achieves >50% accuracy at 128K on Oolong!
3
17
175
@mrdrozdov
Andrew Drozdov
9 days
The cool kids have been saying…
@cursor_ai
Cursor
10 days
Semantic search improves our agent's accuracy across all frontier models, especially in large codebases where grep alone falls short. Learn more about our results and how we trained an embedding model for retrieving code.
0
0
5
@cursor_ai
Cursor
10 days
Semantic search improves our agent's accuracy across all frontier models, especially in large codebases where grep alone falls short. Learn more about our results and how we trained an embedding model for retrieving code.
70
109
1K
@jxmnop
dr. jack morris
11 days
defending today 🥲
250
112
3K
@mrdrozdov
Andrew Drozdov
11 days
Model gets lethargic when it’s running a fever. Try again after providing Tylenol, and make sure model is getting enough sleep and drinking liquids.
@deliprao
Delip Rao e/σ
11 days
Interview question: I notice the latency of my LLM call increases, on average, at higher temperatures. Why?
0
0
1
@milo_cress
Milo Cress
19 days
@cremieuxrecueil
Crémieux
19 days
We shouldn't let this sort of thing happen. Paper straws were lower quality, worse for the environment, and worse in terms of chemical exposures. Rnadom statistics shouldn't become cudgels for activists to enact harmful policy.
6
4
90
@mrdrozdov
Andrew Drozdov
13 days
system distillation should be a thing, although i am thinking more system -> system rather than system -> model
0
0
0