
UC Berkeley EPIC Lab
@UCBEPIC
Followers
467
Following
9
Media
70
Statuses
88
Effective Programming, Interaction, and Computation with Data Lab @UCBerkeley
Berkeley CA
Joined November 2021
This was a great retreat put together by the @UCBEPIC team. As expected, applications of LLMs were central to many talks and discussions. Identifying the core problems invariant to the particularities of the next big LLM is key for research ROI in these applications.
0
2
5
Can you explore the space using LLMs - but do it in a way that is efficient? How do we find the high error regions?
0
1
2
It’s hard to robustly test edge cases in a model and make user defined concepts explicit
1
0
1
Fereshte Khani from Microsoft describes how to collaboratively develop NLP models, ensuring alignment and safety
1
2
3
A new system they are working on is Humboldt for data discovery. You shouldn’t have to ask experts about what data you should explore!
0
1
3
Alex Bauerle from Sigma Computing tells us about what’s hard when building a spreadsheet for cloud data warehouses
2
3
8
Can you fuse structural understanding of API programs with LLM techniques? Naman provides a way! Parametric templates for the win!
0
0
2
LLMs by themselves are insufficient for this task - brittle and hard to control
1
0
2
Naman Jain explores how to summarize data transformation scripts using a template-based approach, informed by LLMs
1
1
2
Flor allows users to travel back in time to help debug ML training. You can also inspect and “jump into” another user’s training history. Time travel and shapeshifting!
0
0
3
Rolando Garcia @rogarcia_sanz describes the next generation of Flor, a tool for rapid iteration during ML training via a live notebook demo!
1
1
3
Haotian leverages large language models to identify visualization intent (variants of BERT) and prior work on automatically translating visualization intent into actual visualizations (eg Lux).
0
0
1
Haotian Li describes how to support conversation with data via visualization - why write code when you can just talk to your data!
1
1
6
Can we check extensional equality (ie two programs have similar outputs) for constrained domains like biology? So that we can automatically rewrite and make code more performant — component by component?
0
0
1
There is a trade off between easy to understand code (eg one that loops through arrays) and those that are performant (eg one that manipulates arrays in NumPy)
1
0
1
Biologists, like many other non computer scientists, struggle to write performant code, especially on large datasets, such as genome sequences
1
0
1
Yet more challenges in Machine Learning - operationalizing, explaining and trusting it.
0
0
1
More open challenges in helping novice users through the data science workflow - so that one can go from “zero to hero”
1
0
1
Open challenges in data prep - even with sophisticated GUI tools, users often want to inspect and tweak underlying scripts - in tandem. Current tools don’t support seamless transitions and sensemaking
1
0
1