Aditya Parameswaran @adityagp X Profile

Aditya Parameswaran

@adityagp

Followers

5K

Following

3K

Media

107

Statuses

2K

associate prof @ucberkeley, co-director @ucbepic, cofounder @ponderdata (acq. @snowflakeDB) | on a mission to make data science effortless at scale | he/him

https://t.co/pdpFKQtZLg

Berkeley, CA

Joined April 2008

Don't wanna be here? Send us removal request.

Jelani Nelson

@minilek

2 days

One of our @Berkeley_EECS grad students doing some public service!

Shreya Shankar

@sh_reya

3 days

No one's built an interactive way to dig through the ~3k newly released Epstein-related emails—so we did! Here's a free, searchable DocETL-powered interface that lets journalists, researchers, and anyone else explore the material without wading through raw data dumps 🔎

0

5

87

Diogo Neves 📹 / ☕️

@DiogoSnows

3 days

@jbendery check this out too

Shreya Shankar

@sh_reya

3 days

No one's built an interactive way to dig through the ~3k newly released Epstein-related emails—so we did! Here's a free, searchable DocETL-powered interface that lets journalists, researchers, and anyone else explore the material without wading through raw data dumps 🔎

1

17

47

Shreya Shankar

@sh_reya

7 days

DocETL is a system we’ve been building at Berkeley for the past two years to make large-scale unstructured data analysis reliable and efficient. It powers our broader stack—used by journalists, public defenders, and researchers—to extract, transform, and reason over messy

4

29

189

Shreya Shankar

@sh_reya

3 days

No one's built an interactive way to dig through the ~3k newly released Epstein-related emails—so we did! Here's a free, searchable DocETL-powered interface that lets journalists, researchers, and anyone else explore the material without wading through raw data dumps 🔎

22

74

470

UC Berkeley EECS

@Berkeley_EECS

1 month

UC Berkeley EECS is hiring! We're seeking exceptional faculty candidates at all ranks for our "Engineering + AI" search and up to 7 tenure-track Asst. Professors in EECS. EECS Focused Searches Include: Quantum Computing ⚛️ AI, Inequality, & Society ⚖️ https://t.co/S8KFqbx1yF

eecs.berkeley.edu

Explore open faculty positions at UC Berkeley's Electrical Engineering & Computer Sciences (EECS) Department. Join a world-class research and teaching community. Apply today.

1

37

164

Ian Arawjo

@IanArawjo

2 months

“[S]ystem building efforts were a lot more exciting and had greater impact—in no small part because we thought long and hard about how the systems ended up getting adopted and used!” 👏

Aditya Parameswaran

@adityagp

2 months

Blog Post: Looking back on the first decade as faculty (2014-2024). I list my favorite papers from the decade, why I enjoyed working on them, and provide backstory and reflection. https://t.co/TdzQenNt8V

0

2

9

Dylan Wootton

@WoottonDylan

2 months

Such a cool system!

Bhavya Chopra

@BhavyaChopra1

2 months

Ever struggled to find the right dataset? Excited to present our work “Rethinking Dataset Discovery with DataScout” at #UIST2025! DataScout combines expressive search with contextual assistance to make dataset discovery tractable 🧵👇🏽

0

3

4

Aditya Parameswaran

@adityagp

2 months

Lots of people who led the projects mentioned, including @dorisjlee @me_dorx @mangeshbendre @subZero_saj @DixinTang @DataCereal @tariquesdd @twattanawaroon @Silu__Huang @stephen_macke

0

2

Aditya Parameswaran

@adityagp

2 months

Blog Post: Looking back on the first decade as faculty (2014-2024). I list my favorite papers from the decade, why I enjoyed working on them, and provide backstory and reflection. https://t.co/TdzQenNt8V

data-people-group.github.io

Aditya reflects on his greatest hits from the first decade of facultyhood.

1

12

46

Aditya Parameswaran

@adityagp

2 months

DataScout leverages LLM assistance during data discovery to proactively narrow the semantic gap between user needs and the datasets at hand. Go see @BhavyaChopra1's talk at #UIST2025 !

Bhavya Chopra

@BhavyaChopra1

2 months

Ever struggled to find the right dataset? Excited to present our work “Rethinking Dataset Discovery with DataScout” at #UIST2025! DataScout combines expressive search with contextual assistance to make dataset discovery tractable 🧵👇🏽

1

2

11

Aditya Parameswaran

@adityagp

2 months

Excited about DocWrangler receiving a #uist2025 Best Paper Honorable mention! DocWrangler brings principles from structured data wrangling interfaces to bear on document wrangling - and introduces new features to target three new "gulfs" we identify - between users, their

Shreya Shankar

@sh_reya

2 months

Our paper "Steering Semantic Data Processing with DocWrangler" will be receiving a Best Paper Honorable Mention in a couple of weeks at UIST 2025! DocWrangler is a mixed-initiative IDE we built at Berkeley for semantic data processing, where users work with AI to analyze

1

3

25

Melissa Pan

@melissapan

2 months

Excited to share: MAST has been accepted as 🌟 NeurIPS D&B Spotlight🌟 Updates for the community: - NEW: We open-source 1,000+ multi-agent traces (link in 🧵). - lots of exciting use cases are emerging, we’ll be releasing blogs & tutorials to help you get started - And … more

10

36

159

Murat Demirbas (Distributolog)

@muratdemirbas

2 months

[new blog post] Supporting our AI overlords: Redesigning data systems to be Agent-first https://t.co/DcRENoirz8

0

5

18

Aditya Parameswaran

@adityagp

2 months

A nice summary of our agent-first data systems agenda, with some commentary and praise: "... this new Agent-First Data Systems paper is more technical, grounded, and focused." I'll take it!

1

2

6

Shreya Shankar

@sh_reya

2 months

Our work on cheap & accurate LLM-powered data processing will be SIGMOD next year! Results are really awesome: BARGAIN reduces use of expensive models by more than 86% compared to prior approaches, while guaranteeing 90% accuracy with respect to the expensive LLMs

Sepanta Zeighami

@SepantaZeighami

2 months

Do you have a text dataset you need to process with an LLM but want to minimize cost? 🤷 Try BARGAIN (SIGMOD'26): 💰 It reduces costs by using cheaper LLMs (e.g,gpt5mini) on data records they are accurate 🎯 *statistically guarantees* output matches expensive LLM's (e.g,gpt5)

2

7

62

Aditya Parameswaran

@adityagp

2 months

Sep's work on reducing cost with large-scale LLM processing is very cool and very useful! Strong theoretical guarantees on accuracy plus huge cost savings relative to state of the art. Take a look!

Sepanta Zeighami

@SepantaZeighami

2 months

Do you have a text dataset you need to process with an LLM but want to minimize cost? 🤷 Try BARGAIN (SIGMOD'26): 💰 It reduces costs by using cheaper LLMs (e.g,gpt5mini) on data records they are accurate 🎯 *statistically guarantees* output matches expensive LLM's (e.g,gpt5)

0

1

5

Aditya Parameswaran

@adityagp

2 months

Do you think text2sql benchmarks don't capture the real world? Help us (@UCBEPIC w/ @PromptQL) build a comprehensive enterprise data agent benchmark! We want to understand typical failure modes for data agents in production, incl. but not limited to those below. Submit an

Shreya Shankar

@sh_reya

2 months

At @UCBEPIC, we’re working with @PromptQL to build a benchmark that documents how and why AI systems fail on enterprise data queries. Enterprise data agents are not *that* good: e.g., they generate SQL in the wrong dialect, forget that the join key needs to be cleaned before

0

12

25

Aditya Parameswaran

@adityagp

3 months

New research agenda we're kickstarting at Berkeley: redesigning data systems to serve the dominant workload of the future: agents! Agentic speculation is massive, heterogeneous, steerable, and redundant: properties data systems can better support and take advantage of. Take a

6

49

265

Aditya Parameswaran

@adityagp

3 months

Coming out of twitter hibernation to say that DocETL is exciting work from my group and you all should read it. Crucially, it's the only semantic data processing system that can automatically (🌟agentically🌟) rewrite user pipelines into more accurate ones. Not that Github

Shreya Shankar

@sh_reya

3 months

Our first DocETL paper has been accepted to VLDB 2025! DocETL is a system we’ve been building at Berkeley for reliable LLM-powered data pipelines, where the optimizer logically rewrites pipelines because even experts cannot author one that is accurate enough to begin with. I'll

1

6

41