Jason Kim @jason_z_kim X Profile

Jason Kim

@jason_z_kim

Followers

2K

Following

245

Media

68

Statuses

300

Postdoctoral researcher at Cornell interested in representation and computation in latent spaces of biological and artificial neural networks.

Joined July 2010

Don't wanna be here? Send us removal request.

Jason Kim

@jason_z_kim

1 year

Ever wanted a low-dimensional model of your data that you could be confident would explain data structure and accurately re-embed out-of distribution data, all with minimal distortion of the geometry? Now you can with Γ-VAE! demonstrated on gene data.

1

18

66

Jason Kim

@jason_z_kim

4 months

RT @LindenParkes: Excited to share the first major piece of work and preprint from my lab! Led by @jason_z_kim! 🥳🎉🤘. .

0

6

0

Grok

@grok

5 days

The most fun image & video creation tool in the world is here. Try it for free in the Grok App.

0

39

367

Jason Kim

@jason_z_kim

10 months

Working to understand the physics of how the brain works? Interested in understanding how brain function and collective dynamics emerge from neural interactions? Submit an abstract to our focus session "Statistical and Dynamical Physics of the Brain", APS2025 w/@ChrisWLynn!

0

6

30

Jason Kim

@jason_z_kim

1 year

RT @NatureProtocols: #FeaturedProtocol this week is for a Python-based #software package to apply network control theory to the #humanconne….

0

16

0

Jason Kim

@jason_z_kim

1 year

RT @LindenParkes: Our protocol paper for NCT is now online at @NatureProtocols!! Check it out here: @jason_z_kim @….

0

25

0

Jason Kim

@jason_z_kim

1 year

In summary, by accurately preserving the manifold tangent spaces in low-dimensional embeddings, we better preserve the geometry of the data, make our embeddings more interpretable, make trustable models with great out-of distribution generalization, and uncover biology.

1

0

2

Jason Kim

@jason_z_kim

1 year

On this same dataset, Γ-VAE can also accurately re-embed cell gene expression on days 4 and 6 while only being trained on cells from day 2.

1

0

1

Jason Kim

@jason_z_kim

1 year

Γ-VAE works on single-cell RNAseq too. We look at a lineage tracing experiment in hematopoietic stem cells. Γ-VAE can separate undifferentiated cells along their eventual fates at 60% accuracy using 3 dimensions: the same as the original authors using hundreds of dimensions.

1

0

1

Jason Kim

@jason_z_kim

1 year

If we do this re-embedding across our 33 cancer tissues, we get incredible re-embedding consistency, meaning that the model you build is a model you can trust.

1

0

1

Jason Kim

@jason_z_kim

1 year

Zooming in, the re-embedding preserves crucial cancer phenotypes, including the sepration of normal breast cancer tissues from triple-negative breast cancers, which are highly resistant to hormone therapy.

1

0

1

Jason Kim

@jason_z_kim

1 year

But the real test of a model is whether it can make predictions. And here we put Γ-VAE to the test. First we train a Γ-VAE on all of our data. Then, we completely remove all breast cancer samples, train a second Γ-VAE, and see where the points re-embed. It'st he same picture!

1

0

1

Jason Kim

@jason_z_kim

1 year

We also capture meso-scale structure in carcinomas, namely the separation of squamous-cell and adenocarcinomas, and uncover a common axis from 9 healthy tissues from GTEx, and their corresponding adenocarcinomas from TCGA.

1

0

1

Jason Kim

@jason_z_kim

1 year

We uncover lots of beautiful biology, including the blood-brain barrier for the adaptive immune response, the p53 pathway that is often called the "guardian of the genome," and the epithelial to mesenchymal transition that is hijacked by cancer to metastasize.

1

0

1

Jason Kim

@jason_z_kim

1 year

Using this method, we construct a gently curved, 3-dimensional model of human gene expression for healthy tissues from the Genotype Tissue Expression (GTEx, , and The Cancer Genome Atlas (TCGA, with nonlinear, slowly-varying axes.

1

0

1

Jason Kim

@jason_z_kim

1 year

We regularize this curvature to generate Γ-VAE, which gives us a control knob on what are called the parameter-effects curvature, and extrinsic curvature, which gives us nice, smooth manifolds with long correlation lengths in the tangent spaces.

1

0

1

Jason Kim

@jason_z_kim

1 year

Variational autoencoders (VAEs) excel at constructing statistical latent-variable models as generative manifolds through data. But when your data are in clusters, these manifolds are highly curved between clusters, so you don't know where you're going after one data cluster.

1

0

1

Jason Kim

@jason_z_kim

1 year

This is the case in human gene expression, where each healthy or cancer tissue has a distinct genetic signature, but they also have global trends that span across many clusters. UMAP excels at clustering the data, but fails to capture this meso-scale organization.

1

0

2

Jason Kim

@jason_z_kim

1 year

There are 101 ways to embed data like PCA, UMAP and VAEs, and they are all excellent at different things. Something that's hard for every method is multiscale data. What happens when you have highly clustered data, but you want to understand the organization across clusters?.

1

0

Jason Kim

@jason_z_kim

2 years

RT @apd_flynn: 🚀 Exciting news alert! 🚀 . Christoph Räth (@DLR_de) and I are thrilled to announce that our minisymposium proposal, 'Dynamic….

0

3

0

Jason Kim

@jason_z_kim

2 years

So if you're interested in more interpretable, translatable, and computationally more effective RNN models of how brain uses dynamics to compute and run algorithms, give it a read!.

0

9