Aditya Parameswaran
@adityagp
Followers
5K
Following
3K
Media
107
Statuses
2K
associate prof @ucberkeley, co-director @ucbepic, cofounder @ponderdata (acq. @snowflakeDB) | on a mission to make data science effortless at scale | he/him
Berkeley, CA
Joined April 2008
One of our @Berkeley_EECS grad students doing some public service!
No one's built an interactive way to dig through the ~3k newly released Epstein-related emails—so we did! Here's a free, searchable DocETL-powered interface that lets journalists, researchers, and anyone else explore the material without wading through raw data dumps 🔎
0
5
87
@jbendery check this out too
No one's built an interactive way to dig through the ~3k newly released Epstein-related emails—so we did! Here's a free, searchable DocETL-powered interface that lets journalists, researchers, and anyone else explore the material without wading through raw data dumps 🔎
1
17
47
DocETL is a system we’ve been building at Berkeley for the past two years to make large-scale unstructured data analysis reliable and efficient. It powers our broader stack—used by journalists, public defenders, and researchers—to extract, transform, and reason over messy
4
29
189
No one's built an interactive way to dig through the ~3k newly released Epstein-related emails—so we did! Here's a free, searchable DocETL-powered interface that lets journalists, researchers, and anyone else explore the material without wading through raw data dumps 🔎
22
74
470
UC Berkeley EECS is hiring! We're seeking exceptional faculty candidates at all ranks for our "Engineering + AI" search and up to 7 tenure-track Asst. Professors in EECS. EECS Focused Searches Include: Quantum Computing ⚛️ AI, Inequality, & Society ⚖️ https://t.co/S8KFqbx1yF
eecs.berkeley.edu
Explore open faculty positions at UC Berkeley's Electrical Engineering & Computer Sciences (EECS) Department. Join a world-class research and teaching community. Apply today.
1
37
164
“[S]ystem building efforts were a lot more exciting and had greater impact—in no small part because we thought long and hard about how the systems ended up getting adopted and used!” 👏
Blog Post: Looking back on the first decade as faculty (2014-2024). I list my favorite papers from the decade, why I enjoyed working on them, and provide backstory and reflection. https://t.co/TdzQenNt8V
0
2
9
Such a cool system!
Ever struggled to find the right dataset? Excited to present our work “Rethinking Dataset Discovery with DataScout” at #UIST2025! DataScout combines expressive search with contextual assistance to make dataset discovery tractable 🧵👇🏽
0
3
4
Lots of people who led the projects mentioned, including @dorisjlee @me_dorx @mangeshbendre @subZero_saj @DixinTang @DataCereal @tariquesdd @twattanawaroon @Silu__Huang @stephen_macke
0
0
2
Blog Post: Looking back on the first decade as faculty (2014-2024). I list my favorite papers from the decade, why I enjoyed working on them, and provide backstory and reflection. https://t.co/TdzQenNt8V
data-people-group.github.io
Aditya reflects on his greatest hits from the first decade of facultyhood.
1
12
46
DataScout leverages LLM assistance during data discovery to proactively narrow the semantic gap between user needs and the datasets at hand. Go see @BhavyaChopra1's talk at #UIST2025 !
Ever struggled to find the right dataset? Excited to present our work “Rethinking Dataset Discovery with DataScout” at #UIST2025! DataScout combines expressive search with contextual assistance to make dataset discovery tractable 🧵👇🏽
1
2
11
Excited about DocWrangler receiving a #uist2025 Best Paper Honorable mention! DocWrangler brings principles from structured data wrangling interfaces to bear on document wrangling - and introduces new features to target three new "gulfs" we identify - between users, their
Our paper "Steering Semantic Data Processing with DocWrangler" will be receiving a Best Paper Honorable Mention in a couple of weeks at UIST 2025! DocWrangler is a mixed-initiative IDE we built at Berkeley for semantic data processing, where users work with AI to analyze
1
3
25
Excited to share: MAST has been accepted as 🌟 NeurIPS D&B Spotlight🌟 Updates for the community: - NEW: We open-source 1,000+ multi-agent traces (link in 🧵). - lots of exciting use cases are emerging, we’ll be releasing blogs & tutorials to help you get started - And … more
10
36
159
[new blog post] Supporting our AI overlords: Redesigning data systems to be Agent-first https://t.co/DcRENoirz8
0
5
18
A nice summary of our agent-first data systems agenda, with some commentary and praise: "... this new Agent-First Data Systems paper is more technical, grounded, and focused." I'll take it!
1
2
6
Our work on cheap & accurate LLM-powered data processing will be SIGMOD next year! Results are really awesome: BARGAIN reduces use of expensive models by more than 86% compared to prior approaches, while guaranteeing 90% accuracy with respect to the expensive LLMs
Do you have a text dataset you need to process with an LLM but want to minimize cost? 🤷 Try BARGAIN (SIGMOD'26): 💰 It reduces costs by using cheaper LLMs (e.g,gpt5mini) on data records they are accurate 🎯 *statistically guarantees* output matches expensive LLM's (e.g,gpt5)
2
7
62
Sep's work on reducing cost with large-scale LLM processing is very cool and very useful! Strong theoretical guarantees on accuracy plus huge cost savings relative to state of the art. Take a look!
Do you have a text dataset you need to process with an LLM but want to minimize cost? 🤷 Try BARGAIN (SIGMOD'26): 💰 It reduces costs by using cheaper LLMs (e.g,gpt5mini) on data records they are accurate 🎯 *statistically guarantees* output matches expensive LLM's (e.g,gpt5)
0
1
5
New research agenda we're kickstarting at Berkeley: redesigning data systems to serve the dominant workload of the future: agents! Agentic speculation is massive, heterogeneous, steerable, and redundant: properties data systems can better support and take advantage of. Take a
6
49
265
Coming out of twitter hibernation to say that DocETL is exciting work from my group and you all should read it. Crucially, it's the only semantic data processing system that can automatically (🌟agentically🌟) rewrite user pipelines into more accurate ones. Not that Github
Our first DocETL paper has been accepted to VLDB 2025! DocETL is a system we’ve been building at Berkeley for reliable LLM-powered data pipelines, where the optimizer logically rewrites pipelines because even experts cannot author one that is accurate enough to begin with. I'll
1
6
41