Thomas J. Fan
@thomasjpfan
Followers
713
Following
70
Media
45
Statuses
255
Working on machine learning and open source, scikit-learn maintainer @[email protected]
New York
Joined April 2009
With @huggingface's smolagent v1.22.0 release, you can now use @modal Sandboxes for secure code execution. Just set `executor_type="modal"`! ☺️
1
1
8
Quick comparison between PyTorch's TorchScript, FX Graph tracing, and torch.compile for handling data dependent control flow: https://t.co/fOEcyEqpmP
thomasjpfan.com
Over the past few years, PyTorch went through a few iterations for turning Python code into a graph to improve performance: TorchScript can trace or parse …
1
1
3
I developed rustimport_jupyter to compile Rust code in Jupyter and have the compiled code available in Python! In this post, I showcase a simple function, @numpy_team function, and @DataPolars expression plugin:
thomasjpfan.com
The Rust programming language has gotten more prominent for writing compiled Python extensions. Currently, there is a bunch of boilerplate for wrapping writing up a Rust …
2
10
41
I wrote a quick blog post about generating NumPy UFuncs with Cython 3.0. The feature is quite nice 😊!
thomasjpfan.com
Cython 3.0 was finally released on July 17, 2023 and it comes with many features. There are many exciting features such as a Pure Python …
1
1
8
I wrote a quick post about accessing data from #Python's DataFrame Interchange Protocol:
thomasjpfan.com
Python's DataFrame interchange protocol specifies a zero-copy data interchange between Python DataFrame libraries, such as Pandas, Vaex, and Polars. This blog post explores how to read …
1
0
5
Pandas DataFrame output is now available for all sklearn transformers (in dev)! https://t.co/o4qdahQpZC This will make running pipelines on dataframes soo much easier, and provides better ways to track feature names! thanks to @thomasjpfan @glemaitre58 and Christian Lorentzen!
scikit-learn.org
This example will demonstrate the set_output API to configure transformers to output pandas DataFrames. set_output can be configured per estimator by calling the set_output method or globally by se...
13
143
616
git bisect can also help find the commit that fixes a bug. (Then you can back-port the commit to a release branch.)
0
0
1
📣 #PyDataNYC CFP EXTENDED We’ve had an amazing level of interest and submissions, and want to make sure everyone has a chance to submit. Submit by EoD Aug 28 https://t.co/9tx24OE7XG
Two can't miss @PyData events coming up, and the CFPs are NOW OPEN! #PyDataNYC 2022 (Nov 9-11) returns in-person after two-year hiatus 🎉 CFP closes Aug 24 https://t.co/tWO6jcP9nZ Virtual-first #PyDataGlobal 2022 (Dec 1-3) is BACK 🎉 CFP closes Sept 12 https://t.co/rvoyAP1PtT
1
8
6
What started out as a simple refactor lead to a 15% runtime performance improvement for trees 😅
github.com
This PR replaces the use of sort in the tree splitter with simultaneous_sort. Running the following benchmark script: Benchmark import argparse from time import perf_counter import json from statis...
2
4
14
Now that PEP 646 has been accepted: https://t.co/TBvo60l0LN We will get to attach semantic meaning to an array's axes in the type signature! For example:
11
79
539
Just put together a small webpage to display the number of open/closed issues at @scikit_learn over the last year. The site updates every evening with stats for the current month: https://t.co/cZ5yx4B1qH
4
15
78
Doing some research on mgcv's API. The way R's formula can express constraints is quite nice. (drive is a categorical feature, sp is a smoothing parameter, k is the number of basis functions)
0
0
2
My most used terminal commands: history | awk 'BEGIN {FS="[ \t]+|\\|"} {print $3}' | sort | uniq -c | sort -nr | head -n 10
0
1
10
0
5
14
@thomasjpfan updating us on @scikit_learn ! - great work on community sprints from @reshamas @DataUmbrella et al. !!! 3/n
1
2
3
0
0
0