Andrew Drozdov @mrdrozdov X Profile

Andrew Drozdov

@mrdrozdov

Followers

3K

Following

13K

Media

629

Statuses

13K

Senior Research Scientist @ Databricks

https://t.co/pNKB8XyuN7

NYC

Joined August 2010

Don't wanna be here? Send us removal request.

Audrey Cheng

@audreyccheng

1 day

What is AI not good at? Database transactions! In our latest blog post, we dive into how AI could not find better scheduling algorithm than our VLDB '24 paper (in collaboration with @pbailis, @siobhcroo, @istoica05, and many others).

AI-Driven Research Systems

@ai4research_ucb

1 day

🎯 AI discovers an algorithm that makes database transaction schedules 34% faster [ADRS Blog #4] We revisit a classic database problem: dealing with transactional contention. Starting with a state-of-the-art algorithm from our VLDB '24 paper, we use OpenEvolve to automatically

2

4

19

Sasha Rush

@srush_nlp

4 days

Pretty cool event next Wed (11/19) at South Park Commons NYC: @jefrankle and @soumithchintala . https://t.co/jtouZRpQMj

4

6

95

Matei Zaharia

@matei_zaharia

4 days

Great new capability in Databricks powered by our AI research team! We trained a document parsing system that delivers leading quality at 3-5x lower cost and outperforms leading VLMs like GPT-5 and Claude. This is critical to connect AI to so many kinds of data.

Databricks

@databricks

4 days

80% of enterprise data is unstructured, locked in PDFs, reports, and diagrams that traditional tools can’t parse or govern. Introducing ai_parse_document, state-of-the-art document intelligence on Databricks. With a single SQL command, teams can now turn any document into

11

29

210

Andrew Drozdov

@mrdrozdov

4 days

Nandan is incredible at everything he does — he brings fresh energy into every new project and is an inspirational collaborator. If you haven't checked out FreshStack yet, you're missing out!

Nandan Thakur

@beirmug

4 days

Had fun designing the FreshStack #NeurIPS2025 D&B poster! ❤️ FreshStack will be presented in San Diego by @DbrxMosaicAI! ☀️🇺🇸 Thanks to all my co-authors: @lateinteraction @mrdrozdov @sam_havens @mcarbin @lintool!

1

0

14

Nandan Thakur

@beirmug

4 days

Had fun designing the FreshStack #NeurIPS2025 D&B poster! ❤️ FreshStack will be presented in San Diego by @DbrxMosaicAI! ☀️🇺🇸 Thanks to all my co-authors: @lateinteraction @mrdrozdov @sam_havens @mcarbin @lintool!

Nandan Thakur

@beirmug

7 months

Existing IR/RAG benchmarks are unrealistic: they’re often derived from easily retrievable topics, rather than grounded in solving real user problems. 🧵Introducing 𝐅𝐫𝐞𝐬𝐡𝐒𝐭𝐚𝐜𝐤, a challenging RAG benchmark on niche, recent topics. Work done during intern @databricks 🧱

3

5

18

Sasha Rush

@srush_nlp

5 days

Talk at Ray Summit on "Building Cursor Composer." Overview of the work from our research team. https://t.co/9a5yeC3IT8

8

43

336

EMNLP 2025

@emnlpmeeting

7 days

🎉 Congratulations to all #EMNLP2025 award winners 🎉 Starting with the ✨Best Paper award ✨: "Infini-gram mini: Exact n-gram Search at the Internet Scale with FM-Index" by Hao Xu, Jiacheng Liu, Yejin Choi, Noah A. Smith, and Hannaneh Hajishirzi https://t.co/DKYhylaopF 1/n

2

32

219

Andrew Gordon Wilson

@andrewgwils

7 days

In many industry frontier labs, there’s a perceived tension between breadth and depth. It’s often missed that breadth *enables* meaningful depth. It may seem like you are advancing a frontier, but you are in fact in a myopic echo chamber. ML theory suffers badly from this effect.

8

157

Nandan Thakur

@beirmug

7 days

Evaluated ModernBERT variants on the FreshStack leaderboard! (i) GTE (ModernBERT) (ii) IBM Granite (and small) english R2 Outperforms Embedding Gemma 300M despite being 149M params. Poster and other updates coming soon!

0

4

7

🇺🇦 Dzmitry Bahdanau

@DBahdanau

8 days

i've been waiting for this moment since our initial PipelineRL blog post in May :) 🕺🕺🕺

Hamish Ivison

@hamishivi

9 days

to continue the PipelineRL glazing, @finbarrtimbers implemented PipelineRL for open-instruct a little bit ago and it ended up being probably the single biggest speedup to our overall pipeline. We went from 2-week long RL runs to 5-day runs, without sacrificing performance

2

7

100

Rohan Pandey

@khoomeik

6 days

people often take deep learning as synonymous with backprop, but deep networks were originally trained with probabilistic energy-based methods! found this great talk by hinton from 2012 about EBMs, boltzmann machines, and deep belief nets at the start of the deep learning era

11

21

201

Andrew Drozdov

@mrdrozdov

6 days

Free name for anyone looking to start a new vector DB company: Chunk E. Cheese

0

6

Andrew Drozdov

@mrdrozdov

6 days

Vector search is comically good.

1

0

4

Graham Neubig

@gneubig

8 days

It's rare nowadays to find something that is intuitively important and not yet done well by any major language models. But *precisely aggregating lots of information over long contexts* is one of those things. Our new benchmark Oolong tests this ability, see the 🧵 for more!

Amanda Bertsch

@abertsch72

8 days

Can LLMs accurately aggregate information over long, information-dense texts? Not yet… We introduce Oolong, a dataset of simple-to-verify information aggregation questions over long inputs. No model achieves >50% accuracy at 128K on Oolong!

3

17

175

Andrew Drozdov

@mrdrozdov

9 days

The cool kids have been saying…

Cursor

@cursor_ai

10 days

Semantic search improves our agent's accuracy across all frontier models, especially in large codebases where grep alone falls short. Learn more about our results and how we trained an embedding model for retrieving code.

0

5

Cursor

@cursor_ai

10 days

Semantic search improves our agent's accuracy across all frontier models, especially in large codebases where grep alone falls short. Learn more about our results and how we trained an embedding model for retrieving code.

70

109

1K

dr. jack morris

@jxmnop

11 days

defending today 🥲

250

112

3K

Andrew Drozdov

@mrdrozdov

11 days

Model gets lethargic when it’s running a fever. Try again after providing Tylenol, and make sure model is getting enough sleep and drinking liquids.

Delip Rao e/σ

@deliprao

11 days

Interview question: I notice the latency of my LLM call increases, on average, at higher temperatures. Why?

0

1

Milo Cress

@milo_cress

19 days

https://t.co/kaAt50OoEw

Crémieux

@cremieuxrecueil

19 days

We shouldn't let this sort of thing happen. Paper straws were lower quality, worse for the environment, and worse in terms of chemical exposures. Rnadom statistics shouldn't become cudgels for activists to enact harmful policy.

6

4

90

Andrew Drozdov

@mrdrozdov

13 days

system distillation should be a thing, although i am thinking more system -> system rather than system -> model

0