New open source release from my team at Google: Dinosaur, a differentiable dynamical core for global atmospheric modeling, written in JAX:
Dinosaur is a core component of NeuralGCM and we hope it is useful for the weather/climate research community.
Are you a PhD student interested in machine learning and numerical modeling for weather & climate?
My team at Google Research is looking to hire a student researcher for this summer (and likely beyond) in either Mountain View, CA or Cambridge, MA.
I'm really excited to share this project that my team at
@GoogleAI
has been working on for the past year.
We show that ML + TPUs can accelerate fluid simulations by up to two orders of magnitude without compromising accuracy or generalization.
1/2
Excited to share "Machine learning accelerated computational fluid dynamics"
We use ML inside a CFD simulator to advance the accuracy/speed Pareto frontier
with/
Jamie A. Smith
Ayya Alieva
Qing Wang
Michael P. Brenner
@shoyer
Academics often ask me if we have "academic freedom" in industrial research.
Of course we don't, but generally that isn't the real concern -- they want to know if I have control over my own destiny.
I tell them this is my favorite part about industry. 🧵
I love when ML researchers get excited about science, but seriously the reviewing process for scientific applications at ML conferences (e.g., ICLR) is entirely broken.
Papers with glaring errors are sailing through, without a single review from somebody with domain expertise.
Gradient checkpointing (aka rematerialization) is an easy trick that can save massive amounts of memory for calculating gradients.
If you differentiate through computation involving long iterative processes (like ODE solving), learn it and make it part of your toolkit!
👇🧵
We're developing a new protocol (__array_function__) that allows for alternate implementations of NumPy functions (e.g., for GPU, autodiff or units support):
It's now implemented on the master branch of NumPy -- please give it a try and report back!
Can machine learning improve physics constrained optimization tasks, like those at the heart of numerical weather forecasting?
Happy to share some our work on "Variational Data Assimilation with a Learned Inverse Observation Operator", tomorrow/today at ICML.
🧵
JAX now supports Google Cloud TPUs!
I contributed this example, solving a 2D wave equation with a spatially partitioned grid. The code is remarkably simple and all in pure Python!
Does anyone know a good intuitive explanation for the central limit theorem? I realized the other day that even though I use it all the time I can't really justify *why* it's true.
My team at Google is looking to hire a PhD student intern for research on AI-based coupled Earth system modeling. This would be a full-time ~3 month position in summer or fall 2024 working in-person in Cambridge, MA with
@dkochkov1
and
@janniyuval
.
Something that I think is under-appreciated in the current AI mania is that more compute does not always result in better models. Sometimes, even with perfect knowledge, you can hit a wall.
A good example of this is weather prediction.
One lesson for writing NumPy/JAX code that took me a surprisingly long time to learn is to preserve array dimensionality whenever possible (i.e., always use keepdims=True).
The world's best weather forecast model is switching its numerics from double to single precision:
They reinvested the 40% runtime savings in an increase in vertical resolution from 91 -> 137 levels 😍
Honest question: why do ML researchers publish papers filled with giant tables?
Yes, it's nice to have as a reference for comparing to prior results, but couldn't we put nice plots in the paper and the uninterpretable raw numbers in an appendix?
I'm happy to share a new paper, with
@jaschasd
and
@samgreydanus
: "Neural reparameterization improves structural optimization"
We use neural nets to parameterize inputs of a finite elements method, and differentiable through the whole thing:
We are hiring PhD interns for my team at Google Research:
We use computational methods (especially ML) to advance research in a variety of scientific fields. For full consideration, please apply by January 15:
This paper "Optimal control of PDEs using physics informed neural networks" looks really nice:
Finally, an assessment of PINNs that includes a runtime comparison to classical adjoint methods!
I'm happy to share a new project on using machine learning for computational fluid dynamics, led by
@gideoknite
:
"Learning to correct spectral methods for simulating turbulent flows"
In this project led by
@leeley18
, we show that end-to-end training with differentiable physics results in extremely effective hybrid physics/ML models for density functional theory.
We discover that the prior knowledge embedded in the physics computation itself acts as an implicit regularization that greatly improves generalization of machine learning models for physics.
Please check out our recent paper:
I'm pleased to announce xarray v0.11:
This release includes:
- file-storage refactor for performance with
@dask_dev
- better support for calendars used in climate science
- lots of other miscellaneous API clean-ups, bug-fixes and enhancements
This year, Google's Research Scholar program for early-career professors is specifically solicitating proposals on large-language and multi-modal machine learning models for science:
Applications will open next week and are due by the end of November.
It is hard for me to imagine a better feeling than passing off a project to a total stranger from half way around the world, who builds on it and takes it to new heights you never imagined 🥰
This is the true magic of open source software.
I hope I never read another paper claiming faster simulation with ML that only compares to sims used for training data.
Congrats, your method is faster and less accurate. So what? That costly reference simulation could almost certainly be made faster and less accurate, too.
We are in a midst of perhaps the greatest computing power buildup ever. Sure, the vast majority of it is going towards the AI products, but at some point, between two LLM trainings, someone will decide to use at least a fraction of it for other purposes. Hey, let’s simulate
In industry, you do not have complete freedom within a job, but you have freedom to pick an organization to work for that aligns with your values.
Within a (good) job, nobody wants to tell you what to do. You succeed by showing that can deliver on the organization's mission.
Today's PSA: don't use continuous colormaps if quantitative comparisons are important. Our eyes aren't good at comparing colors!
Example: left & right plots have same data & color scale, but colors on the right are binned. Despite strictly less info, the shape is easier to see.
If only it were this easy! Everytime I've thought this was true, domain scientists have proved me wrong. Machine learning is no shortcut for domain knowledge.
I'm seeing a lot of skepticism from Physicists around my "Learn Physics in 2 Months" curriculum. Machine Learning enables people to make scientific discoveries without needing as much domain knowledge. See my lecture at CERN last year for an example
My first PR to CPython got merged. Yes, it's only a three line doc fix -- but it's exciting to have finally worked my way down to the bottom of the stack!
The secret is a differentiable CFD simulator written in JAX (soon to be open sourced!), which lets us do end-to-end optimization with hard physics priors.
And if you do that, you will soon find yourself accumulating more freedom in your job than you know what to do with.
It's not the same as the freedom of academia. It doesn't come all at once with a grant or tenure. But you earn it all the same, and it can be even more powerful.
xarray v0.10 has been released!
Highlights include:
- indexing with broadcasting over dimensions
- easier wrapping of functions written for NumPy (+ auto-parallelization with dask)
The arguments about symbolic vs deep learning for AI reminds me a lot of arguments about numerical methods vs ML for solving physics problems. Seems like hybrid methods are the way to go in both cases.
Deep Learning Is Hitting a Wall. What would it take for artificial intelligence to make real progress?
#longread
in
@NautilusMag
on one of the key technical questions in AI.
The weather forecast is improving… literally! Introducing WeatherBench 2, a benchmark for the next generation of data-driven, global weather forecast models, providing data, tools, & an evaluation platform. Learn how to use it and check out the website →
The hardest part of research is distinguishing proofs of concept that will actually scale vs perpetual toy examples.
Sadly every attempt I've seen at "intelligence" with neural nets seems to fall in the later category.
Finally, we're open sourced our 2D spectral Navier-Stokes solver with high-order time-stepping in JAX-CFD:
It's differentiable and fast on GPU/TPU, so we hope it may be a useful reference for future ML for PDE efforts, e.g., like
@shoyer
1. A set of standard benchmarks. Compare against high-accuracy baselines (DNS/LES, flow fields should be shared publicly). The spectral code used to compute the flow field should be shared, as well as the CPU/GPU time needed to create the baselines. We DO need the equivalent 2/n
New
#xarray
release v0.10.1 is out, with new IO (Iris and Zarr support) and plotting options, among many other option. Thanks to everyone who contributed!
We've just release v2022.06.0!
This release brings a major internal refactor of the indexing functionality (thanks
@benbovy
,
@cziscience
), the use of flox in groupby operations, and experimental support for the new Python Array API standard.
xarray 0.12.2 was released over the weekend:
My favorite new features are the new N-D combine functions by
@TomTomnicholas1
and the ability to append to existing
#Zarr
stores by a team of four (!) contributors, including
@davidbrochart
&
@rabernat
.
I can't quite believe what showed up in my inbox today...
I'm not sure whether it's more exploitive to ask students to do unpaid internships (instead, they're charged £500), or open source projects to mentor them without compensation.
This is a nice example from EarthMover of how to tune Zarr data pipelines for loading ML training data into
@xarray_dev
.
Processing Zarr data on the fly is such a better paradigm than trying to anticipate access patterns with a preshuffled dataset on disk.
Slides for my talk on xarray at the ECMWF Python workshop
#Py4ESS
:
It was excited to hear from so many in the weather data community (
@ECMWF
@bopensolutions
@PyTrollOrg
) about how they're using Xarray & Dask.
@docmilanfar
I don't think you need any of these to be a researcher! Being a researcher means you work on problems that you don't know are solvable by anyone.
Learning the laws of physics: 🥱😴
Learning how to solve the laws of physics: 🤔
Learning how to solve the laws of physics more efficiently than SOTA methods from scientific computing: 😍
For the record: Gemini's insistence on producing diverse images of people is a slightly glitchy feature, not a bug.
If ahistorical images bother you much more than the myriad other generative AI issues, maybe this would be a good opportunity for self-reflection...
To elaborate: I spent most of my 20s in grad school and feeling really dumb, surrounded by people who were better at math & computers than I ever could be.
Maybe this is the natural experience of doing a PhD but it sucked.
Industry is not a perfect system -- I've only experienced a few corporate cultures, and have undoubtedly benefited from a tremendous amount of privilege. And yes, there absolutely pathological cases where it fails entirely.
Slightly surreal experience with this article. I'm quoted liberally, but I never spoke with this journalist! AFAICT all the quotes are cribbed from a recorded talk that was posted on YouTube.
I'm regularly floored by all the amazing things people are doing with JAX, and I'm so glad we can do our little bit to help researchers push the boundaries of humanity's knowledge and capabilities. Nice work
@shoyer
et al.!
@Thomas_ensc
@OpenAI
XLA (which JAX uses under the hood) is great, but it's solving a much higher level optimization problem to generate CUDA kernels from NumPy like code. Triton is lower level and thus offers more control to the programmer, for better or worse.
This is nice work, but improved RMSE on 10+ day deterministic forecasts doesn't mean your model is "substantially better." It means your model is blurrier.
The right baseline is ECMWF's probabilistic ensemble.
Introducing ClimaX, the first foundation model for weather and climate. A fast and accurate one-stop AI solution for a range of atmospheric science tasks.
Paper:
Blog:
Thread🧵
#ML
#Climate
#Weather
#FoundationModel
Revisiting ResNets: Improved Training and Scaling Strategies
- The original ResNets w/ better training + scaling strategies + minor arch change achieve SotA perf w/ faster speed.
- Training and scaling strategies matters more than architectural changes.
Here's an interactive visualization of 728 GB of
@ECMWF
ERA5 temperature data curated by
@pangeo_data
stored in
@zarr_dev
using Neuroglancer.
Be patient: each chunk is 64MB, so it's a little slow to load!
@stardazed0
@zarr_dev
We've been using it to read dataset created with
@xarray_dev
, and it works well! Neuroglancer even knows Xarray's convention for saving dimension names.
Would love to get a
@pangeo_data
demo going on a big climate dataset
I'm excited to announce the first beta of a new Python package that I've been working on:
It's (yet another) Gaussian process library in Python, this time built on JAX. It's meant to be both performant & pedagogical with ~4x as many lines of docs as code.
@xarray_dev
@ProjectJupyter
@dask_dev
Also: a special shout-out to
@benbovy
and
@JSignell
for xarray's new HTML repr for notebooks!
It makes it really easy to explore complex datasets (you can click all over to expand/collapse sections). It even wraps
@dask_dev
's HTML repr for showing off nested dask arrays!
The best part is that you maybe be only one short function decorator away from solving your memory problems for good!
If you use Python, take a look at jax.checkpoint, torch.utils.checkpoint or tf.recompute_grad.
I'm sure it's easy in Julia, too -- please reply if you know how!
@jeremyphoward
NumPy uses sorting for set operations, because it doesn't have any hashtable data structures. So np.isin(a, b) is OK for a large a and small b, but not the other way around.
I agree that it would be good to have some performance warnings in the docs.
New research shows how
#machinelearning
can improve high-performance computing for solving partial differential equations, with potential applications that range from modeling
#climatechange
to simulating fusion reactions. Learn all about it here ↓
I will be at AGU next week. If you want to talk about ML-based weather/climate modeling (NeuralGCM, WeatherBench2, etc) or
@xarray_dev
please reach out!
I'll be presenting this work with
@samgreydanus
and
@jaschasd
tomorrow morning (Friday Dec 13) at 11:30am at the NeurIPS Deep Inverse Problems workshop ()
I'm happy to share a new paper, with
@jaschasd
and
@samgreydanus
: "Neural reparameterization improves structural optimization"
We use neural nets to parameterize inputs of a finite elements method, and differentiable through the whole thing:
This program is for PhD students planning to graduate in 2024 or later:
Please apply on the linked page *and* let me know by email at shoyer
@google
.com
@chrmanning
There is undoubtedly loads of compute wasted on supercomputers, but simple AI shortcuts (like in this paper) are far from ready to replace physics based simulations. They don’t generalize in any meaningful way.
Interesting note: by far the most expensive part of training these sorts of models is *validation*. We have to run reference solvers on 32x higher resolution grids in space + time in order to rigorously measure the accuracy of our ML solver.
@raymondh
I think this is ill-advised in most cases - it's better to pick a stable serialization format for persistent data, and there's no shame in explicitly writing to disk. That said, joblib does a reasonable implementation of this: .
But as I learned from Tim Palmer's delightful book "The primacy of doubt" the situation for weather forecasting is actually far more dire: as Lorenz showed in 1969, each doubling of resolution only gives half previous gains in accuracy.
@jaschasd
@samgreydanus
I'm happy to share:
(1) this paper was accepted for an oral presentation at the NeurIPS 2019 Deep Inverse workshop!
(2) we're released the source code, so you can now run it yourself!
@bilalmahmood
Bilal, I supported your campaign, but this is not a good look!
You should never had led a campaign ad by calling yourself a "neuroscientist" based on a year or two of part time lab experience as part of your undergrad degree. It diminishes the credentials of real scientists.
New paper: when to use gradients
DL researchers often compute derivatives though just about everything (physics simulators, optimization procedures, renderers). Sometimes these gradients are useful, other times they are not.
We explore why.
1/7
Hitting refresh in TensorBoard is the ML equivalent of pulling the lever on a slot machine. It's every bit as addictive, but possibly even more expensive!
interviewer: can you explain this gap in your CV
me: yeah I was trying to make a complicated figure in TikZ and lost track of time and—
interviewer: say no more
One of my regrets with
@xarray_dev
is that I copied end-inclusive slicing rules from pandas. At this point there's basically no way to fix it without silently breaking lots of code.
I should say that Pandas is a great resource for the Python community and enables my research. I just wish it didn't have such a strong "personality." Its own slicing rules (end of range is included), its own membership function (isin), etc.
If I was rewriting NumPy's API from scratch today, I would be tempted to make both array indexing and reductions over specific axes preserve rank, i.e.,
1. `x[i]` would be equivalent to `x[i:i+1]`
2. `x.sum(axis=0)` would be equivalent to `x.sum(axis=0, keepdims=True)`
Maybe I'm the problem: I want my work to influence domain scientists (and get sensible reviews), so I rarely submit to ML conferences (and thus don't get asked to review). 🤷
Fun bonus fact: data assimilation was one of the first use-cases motivating the development of auto-diff software.
E.g., see this 1993 paper
So using modern libraries like JAX for data assimilation is really going back to the roots of the field :)