paddy_mullen Profile
paddy_mullen

@paddy_mullen

Followers
306
Following
735
Media
134
Statuses
2K

Boston/Newport. Python/PyData/Jupyter dev. Interested in biking, urbanism, and zoning.

Newport RI
Joined November 2008
Don't wanna be here? Send us removal request.
@paddy_mullen
paddy_mullen
2 years
Buckaroo continues adding features including:.Histograms.Polars Support .Colab/VSCode support .Pluggable Analysis Framework (customize summary stats) .Auto-cleaning .Many other refinements. pip install buckaroo.
Tweet media one
0
0
0
@paddy_mullen
paddy_mullen
2 months
All of this will be part of my article "so you want to serialize a dataframe to JS". Which is a subsection (probably the largest) of "so you want to write a table viewer".
0
0
0
@paddy_mullen
paddy_mullen
2 months
This should be much more modular.
1
0
0
@paddy_mullen
paddy_mullen
2 months
The final cool idea though is to leverage multi indexes and polars structs. It would work like this. Say you want to color a column based on the diff with an original column. Put them into a struct, then render, coloring based on the struct column you don't display.
1
0
0
@paddy_mullen
paddy_mullen
2 months
I am currently kicking around two ideas. no, maybe 3, writing helps. First would be a conversion step in python, plumbing this in is a mess. Second is different renderers, that do the same thing in the frontend. This is actually a bit cleaner.
1
0
0
@paddy_mullen
paddy_mullen
2 months
But how to deal with custom coded apps that build color_maps via column comparison. By the time this gets to the frontend, those column names won't exist in the data.
1
0
0
@paddy_mullen
paddy_mullen
2 months
The other key change is converting every column name from the original to a letter based encoding sequential encoding (a-z, aa, ab,). and then having each config for columns explicitly state header_name and field.
1
0
0
@paddy_mullen
paddy_mullen
2 months
Rather than specialcasing pandas parquet. I am doing the following. Explicit `first_col_config` for table configuration. no more including "index" in summary stats (I don't currently have a way to display it either.
1
0
0
@paddy_mullen
paddy_mullen
2 months
This works until you want a column named "index", or you serialize a dataframe to parquet with multi-index columns. Parquet changes "index" to the string "('index', '')".
1
0
0
@paddy_mullen
paddy_mullen
2 months
for JSON, buckaroo serializes dataframes as a list of dicts, `to_json(orient='records')` or `polars.with_row_index()`. This always adds a column of "index" to the records. Parquet behaves similarly, but column oriented.
1
0
0
@paddy_mullen
paddy_mullen
2 months
Hit a snag with the multi-index refactoring. The short of it is that dataframe serialization is really tricky and relying on assumption and default behavors will bite you, always.
1
0
0
@paddy_mullen
paddy_mullen
2 months
Got to 600 today. I hadn't realized it took that long to get to 100. Still a vanity metric.
@paddy_mullen
paddy_mullen
1 year
Buckaroo got it's 100th star today. I know it's a vanity metric, but it still feels good to get some recognition for something I have been building for a year.
Tweet media one
0
0
0
@paddy_mullen
paddy_mullen
2 months
I also built an integration between #pandera and Buckaroo so that you can see validation errors from a Pandera schema in notebooks. @union_ai @cosmicbboy
Tweet media one
1
0
0
@paddy_mullen
paddy_mullen
2 months
All of this reminds me to finish and publish my article "So you want to write a dataframe table". It details most of the little pitfalls that I remember in building Buckaroo. it's 2k words right now.
0
0
0
@paddy_mullen
paddy_mullen
2 months
Finally I remembered while looking this up that pandas can have multi-indexes for rows too. I can do that too, but down the line. The next major bit of assumption jank to pull out of Buckaroo after multi-indexes is pandas indexes. Lot's of implicit stuff around it.
1
0
0
@paddy_mullen
paddy_mullen
2 months
Also, this should work really well with polars tuple types.
0
0
0
@paddy_mullen
paddy_mullen
2 months
I needed to use multi indexes to deal with extracting data from panderas validations last week. The UX of passing a dataframe to buckaroo and seeing only a stacktrace was really bad. A little reading of AG-Grid docs and here I am.
2
0
0
@paddy_mullen
paddy_mullen
2 months
I'm kneedeep in adding proper MultiIndex column support to Buckaroo. MIs are kind of a forgotten corner case of pandas, super powerful. Many tables don't display them properly because they are tricky. They don't serialize to JSON natively. #python #pandas #datascience #pydata
Tweet media one
1
0
0
@paddy_mullen
paddy_mullen
2 months
I take it back, this might be more of an alumni donation pitch. That would make more sense, because the development guy doesn't know tech/VC well.
0
0
0
@paddy_mullen
paddy_mullen
2 months
I'm in a nieghborhood hotel getting coffee. Listening to an inhouse college VC pitch a position to an experienced VC who's an alumni. I hear business meetings at this hotel regularly. The notable ones go poorly, you can tell when the sales guy talks too much.
1
0
0