Aditya Parameswaran Profile
Aditya Parameswaran

@adityagp

Followers
5K
Following
3K
Media
107
Statuses
2K

associate prof @ucberkeley, co-director @ucbepic, cofounder @ponderdata (acq. @snowflakeDB) | on a mission to make data science effortless at scale | he/him

Berkeley, CA
Joined April 2008
Don't wanna be here? Send us removal request.
@minilek
Jelani Nelson
2 days
One of our @Berkeley_EECS grad students doing some public service!
@sh_reya
Shreya Shankar
3 days
No one's built an interactive way to dig through the ~3k newly released Epstein-related emails—so we did! Here's a free, searchable DocETL-powered interface that lets journalists, researchers, and anyone else explore the material without wading through raw data dumps 🔎
0
5
87
@DiogoSnows
Diogo Neves 📹 / ☕️
3 days
@jbendery check this out too
@sh_reya
Shreya Shankar
3 days
No one's built an interactive way to dig through the ~3k newly released Epstein-related emails—so we did! Here's a free, searchable DocETL-powered interface that lets journalists, researchers, and anyone else explore the material without wading through raw data dumps 🔎
1
17
47
@sh_reya
Shreya Shankar
7 days
DocETL is a system we’ve been building at Berkeley for the past two years to make large-scale unstructured data analysis reliable and efficient. It powers our broader stack—used by journalists, public defenders, and researchers—to extract, transform, and reason over messy
4
29
189
@sh_reya
Shreya Shankar
3 days
No one's built an interactive way to dig through the ~3k newly released Epstein-related emails—so we did! Here's a free, searchable DocETL-powered interface that lets journalists, researchers, and anyone else explore the material without wading through raw data dumps 🔎
22
74
470
@Berkeley_EECS
UC Berkeley EECS
1 month
UC Berkeley EECS is hiring! We're seeking exceptional faculty candidates at all ranks for our "Engineering + AI" search and up to 7 tenure-track Asst. Professors in EECS. EECS Focused Searches Include: Quantum Computing ⚛️ AI, Inequality, & Society ⚖️ https://t.co/S8KFqbx1yF
eecs.berkeley.edu
Explore open faculty positions at UC Berkeley's Electrical Engineering & Computer Sciences (EECS) Department. Join a world-class research and teaching community. Apply today.
1
37
164
@IanArawjo
Ian Arawjo
2 months
“[S]ystem building efforts were a lot more exciting and had greater impact—in no small part because we thought long and hard about how the systems ended up getting adopted and used!” 👏
@adityagp
Aditya Parameswaran
2 months
Blog Post: Looking back on the first decade as faculty (2014-2024). I list my favorite papers from the decade, why I enjoyed working on them, and provide backstory and reflection. https://t.co/TdzQenNt8V
0
2
9
@WoottonDylan
Dylan Wootton
2 months
Such a cool system!
@BhavyaChopra1
Bhavya Chopra
2 months
Ever struggled to find the right dataset? Excited to present our work “Rethinking Dataset Discovery with DataScout” at #UIST2025! DataScout combines expressive search with contextual assistance to make dataset discovery tractable 🧵👇🏽
0
3
4
@adityagp
Aditya Parameswaran
2 months
0
0
2
@adityagp
Aditya Parameswaran
2 months
Blog Post: Looking back on the first decade as faculty (2014-2024). I list my favorite papers from the decade, why I enjoyed working on them, and provide backstory and reflection. https://t.co/TdzQenNt8V
data-people-group.github.io
Aditya reflects on his greatest hits from the first decade of facultyhood.
1
12
46
@adityagp
Aditya Parameswaran
2 months
DataScout leverages LLM assistance during data discovery to proactively narrow the semantic gap between user needs and the datasets at hand. Go see @BhavyaChopra1's talk at #UIST2025 !
@BhavyaChopra1
Bhavya Chopra
2 months
Ever struggled to find the right dataset? Excited to present our work “Rethinking Dataset Discovery with DataScout” at #UIST2025! DataScout combines expressive search with contextual assistance to make dataset discovery tractable 🧵👇🏽
1
2
11
@adityagp
Aditya Parameswaran
2 months
Excited about DocWrangler receiving a #uist2025 Best Paper Honorable mention! DocWrangler brings principles from structured data wrangling interfaces to bear on document wrangling - and introduces new features to target three new "gulfs" we identify - between users, their
@sh_reya
Shreya Shankar
2 months
Our paper "Steering Semantic Data Processing with DocWrangler" will be receiving a Best Paper Honorable Mention in a couple of weeks at UIST 2025! DocWrangler is a mixed-initiative IDE we built at Berkeley for semantic data processing, where users work with AI to analyze
1
3
25
@melissapan
Melissa Pan
2 months
Excited to share: MAST has been accepted as 🌟 NeurIPS D&B Spotlight🌟 Updates for the community: - NEW: We open-source 1,000+ multi-agent traces (link in 🧵). - lots of exciting use cases are emerging, we’ll be releasing blogs & tutorials to help you get started - And … more
10
36
159
@muratdemirbas
Murat Demirbas (Distributolog)
2 months
[new blog post] Supporting our AI overlords: Redesigning data systems to be Agent-first https://t.co/DcRENoirz8
0
5
18
@adityagp
Aditya Parameswaran
2 months
A nice summary of our agent-first data systems agenda, with some commentary and praise: "... this new Agent-First Data Systems paper is more technical, grounded, and focused." I'll take it!
1
2
6
@sh_reya
Shreya Shankar
2 months
Our work on cheap & accurate LLM-powered data processing will be SIGMOD next year! Results are really awesome: BARGAIN reduces use of expensive models by more than 86% compared to prior approaches, while guaranteeing 90% accuracy with respect to the expensive LLMs
@SepantaZeighami
Sepanta Zeighami
2 months
Do you have a text dataset you need to process with an LLM but want to minimize cost? 🤷 Try BARGAIN (SIGMOD'26): 💰 It reduces costs by using cheaper LLMs (e.g,gpt5mini) on data records they are accurate 🎯 *statistically guarantees* output matches expensive LLM's (e.g,gpt5)
2
7
62
@adityagp
Aditya Parameswaran
2 months
Sep's work on reducing cost with large-scale LLM processing is very cool and very useful! Strong theoretical guarantees on accuracy plus huge cost savings relative to state of the art. Take a look!
@SepantaZeighami
Sepanta Zeighami
2 months
Do you have a text dataset you need to process with an LLM but want to minimize cost? 🤷 Try BARGAIN (SIGMOD'26): 💰 It reduces costs by using cheaper LLMs (e.g,gpt5mini) on data records they are accurate 🎯 *statistically guarantees* output matches expensive LLM's (e.g,gpt5)
0
1
5
@adityagp
Aditya Parameswaran
2 months
Do you think text2sql benchmarks don't capture the real world? Help us (@UCBEPIC w/ @PromptQL) build a comprehensive enterprise data agent benchmark! We want to understand typical failure modes for data agents in production, incl. but not limited to those below. Submit an
@sh_reya
Shreya Shankar
2 months
At @UCBEPIC, we’re working with @PromptQL to build a benchmark that documents how and why AI systems fail on enterprise data queries. Enterprise data agents are not *that* good: e.g., they generate SQL in the wrong dialect, forget that the join key needs to be cleaned before
0
12
25
@adityagp
Aditya Parameswaran
3 months
New research agenda we're kickstarting at Berkeley: redesigning data systems to serve the dominant workload of the future: agents! Agentic speculation is massive, heterogeneous, steerable, and redundant: properties data systems can better support and take advantage of. Take a
6
49
265
@adityagp
Aditya Parameswaran
3 months
Coming out of twitter hibernation to say that DocETL is exciting work from my group and you all should read it. Crucially, it's the only semantic data processing system that can automatically (🌟agentically🌟) rewrite user pipelines into more accurate ones. Not that Github
@sh_reya
Shreya Shankar
3 months
Our first DocETL paper has been accepted to VLDB 2025! DocETL is a system we’ve been building at Berkeley for reliable LLM-powered data pipelines, where the optimizer logically rewrites pipelines because even experts cannot author one that is accurate enough to begin with. I'll
1
6
41