Tung Nguyen Profile
Tung Nguyen

@tungnd_13

Followers
1,097
Following
849
Media
44
Statuses
249

Ph.D. Student in CS at UCLA. Working on Sequence Modeling and Decision making.

Los Angeles, CA
Joined April 2020
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
Pinned Tweet
@tungnd_13
Tung Nguyen
1 year
Introducing ClimaX, the first foundation model for weather and climate. A fast and accurate one-stop AI solution for a range of atmospheric science tasks. Paper: Blog: Thread🧵 #ML #Climate #Weather #FoundationModel
Tweet media one
35
179
849
@tungnd_13
Tung Nguyen
7 months
Introducing Stormer, a scalable transformer model for skillful and reliable medium-range weather forecasting. Stormer achieves competitive accuracy for short-range, 1–7 day forecasts, while outperforming Pangu-Weather and Graphcast by a large margin for longer lead times. Paper:
Tweet media one
10
84
472
@tungnd_13
Tung Nguyen
2 years
Transformers show excellent capabilities in few-shot/meta learning, but have been mostly evaluated on accuracy-based metrics. How can we represent uncertainty in meta learning with transformers? We address this question in our new work at #ICML2022 !
3
24
162
@tungnd_13
Tung Nguyen
1 year
ClimateLearn, our PyTorch-based ML library for accessing climate datasets, state-of-the-art models, diverse evaluation metrics, and high-quality visualizations, just reached version 1.0.0! arXiv: Quickstart:
1
32
136
@tungnd_13
Tung Nguyen
7 months
I'm attending #NeurIPS2023 in New Orleans Dec 11-16 ✈️✈️✈️. I'll present two papers: 1. ExPT: Synthetic Pretraining for Few-Shot Experimental Design () 2. ClimateLearn: Benchmarking Machine Learning for Weather and Climate Modeling
Tweet media one
Tweet media two
2
19
103
@tungnd_13
Tung Nguyen
8 months
Introducing ExPT, a general-purpose model for few-shot experimental design (ED) that combines unsupervised pretraining and in-context learning. ExPT solves challenging ED problems with only a handful of samples. To appear at #NeurIPS2023 ! Paper:
Tweet media one
3
19
66
@tungnd_13
Tung Nguyen
1 year
We just released the first version of ClimaX, which supports finetuning for the global forecasting task: Pretraining and other tasks are coming soon!
@tungnd_13
Tung Nguyen
1 year
Introducing ClimaX, the first foundation model for weather and climate. A fast and accurate one-stop AI solution for a range of atmospheric science tasks. Paper: Blog: Thread🧵 #ML #Climate #Weather #FoundationModel
Tweet media one
35
179
849
1
12
57
@tungnd_13
Tung Nguyen
2 years
Very honored to be selected as one of the Amazon fellows at UCLA! And also thanks to my PhD advisor @adityagrover_ for his constant support. #AmazonScience #MINT
@AmazonScience
Amazon Science
2 years
Congrats to the second cohort of Amazon-UCLA fellows, who are pursuing PhDs at @UCLAengineering . The fellowship assists the students in their pursuit of independent research projects in a variety of topics. Meet the newest group. 👋 #ArtificialIntelligence
0
9
28
6
2
38
@tungnd_13
Tung Nguyen
7 months
Super proud to be advised by one of the #ForbesUnder30 !
@adityagrover_
Aditya Grover
7 months
Incredibly honored to be in the #ForbesUnder30 list for 2024! Thanks to my incredible support team over the years - students, mentors, colleagues, friends, and family.
Tweet media one
51
14
418
1
0
36
@tungnd_13
Tung Nguyen
2 years
The first in-person conference ever! It was amazing to learn from and discuss with the best minds in ML research, and morre importantly, to see my friends after such a long time. Really looking forward to the next one! #ICML2022 #Baltimore
Tweet media one
Tweet media two
Tweet media three
Tweet media four
0
0
35
@tungnd_13
Tung Nguyen
11 months
I’m at ICML and will present ClimaX at Poster #613 at 2pm. Excited to discuss about AI research for weather and climate!
@tungnd_13
Tung Nguyen
1 year
Introducing ClimaX, the first foundation model for weather and climate. A fast and accurate one-stop AI solution for a range of atmospheric science tasks. Paper: Blog: Thread🧵 #ML #Climate #Weather #FoundationModel
Tweet media one
35
179
849
0
4
32
@tungnd_13
Tung Nguyen
2 years
@tsiprasd @shivamg_13 @percyliang Congrats! Just wanted to point out our work at ICML that studied the same problem: We considered more complicated functions, including functions sampled from a Gaussian Process, images, etc. We also apply the model to downstream decision-making tasks.
Tweet media one
2
4
27
@tungnd_13
Tung Nguyen
2 years
New paper out! We propose ConserWeightive BC, a simple but effective method for improving the performance and reliability of behavioral cloning methods such as DT and RvS in offline RL. @qqyuzu @adityagrover_
5
8
23
@tungnd_13
Tung Nguyen
1 year
This paper was the result of a fruitful year-long collaboration between UCLA & Microsoft. Very excited to see both the ML & climate community build on these results for next-generation climate science! w/ @jo_brandstetter , @akapoor_av8r , @rejuvyesh and @adityagrover_
4
1
21
@tungnd_13
Tung Nguyen
1 year
We just released the first version of ClimaX, which supports finetuning for the global forecasting task: Pretraining and other tasks are coming soon!
2
2
17
@tungnd_13
Tung Nguyen
7 months
At the core of Stormer is a novel randomized forecasting objective, which trains the model to forecast the weather dynamics over varying time intervals of 6, 12, and 24 hours. The 6 and 12-hour values help to encourage the model to learn and resolve the diurnal cycle, while the
Tweet media one
1
1
16
@tungnd_13
Tung Nguyen
1 year
Architecture: ClimaX extends ViT with novel tokenization and aggregation modules that allow learning from heterogeneous data sources while remaining computationally efficient.
Tweet media one
1
1
16
@tungnd_13
Tung Nguyen
1 year
Proud to be part of this great project. Looking forward to adding more features!
@adityagrover_
Aditya Grover
1 year
📢Introducing ClimateLearn, a new PyTorch library for accessing climate datasets, state-of-the-art ML models, and high quality training and visualization pipelines. Blog: Docs: Quickstart Colab: 🧵 (1/n)
17
275
1K
0
0
16
@tungnd_13
Tung Nguyen
7 months
Existing deep learning models for weather forecasting, while achieving impressive accuracy, often employ complex, customized architectures. In this work, we show that a simple transformer network can achieve state-of-the-art performance with a carefully designed training recipe.
1
0
16
@tungnd_13
Tung Nguyen
1 year
Forecasting: ClimaX can produce global and regional forecasts all the way from a few hours to days and weeks into the future Better/competitive with IFS as lead time grows!
Tweet media one
2
0
15
@tungnd_13
Tung Nguyen
4 months
A promising alternative architecture to transformer for modeling multi-dimensional data!
@li78658171
Shufan (Jack) Li
4 months
1/7 Introducing Mamba-ND: the latest advancement in the Mamba. Mamba-ND extends  Mamba to multi-dimensional data such as images and videos, outperforming transformer benchmarks with far fewer parameters and achieving linear complexity.
Tweet media one
2
35
166
0
0
15
@tungnd_13
Tung Nguyen
1 year
Climate Projections: ClimaX can also be used to project future climates under different greenhouse forcings. State-of-the-art performance on ClimateBench!
Tweet media one
1
0
15
@tungnd_13
Tung Nguyen
2 years
This is the first paper in my PhD with @adityagrover_ ! The paper is on arxiv and the source code is publicly available. Feel free to check them out! Paper: Code:
2
1
14
@tungnd_13
Tung Nguyen
1 year
Downscaling: ClimaX can downscale low-resolution outputs of climate models and fix biases. Outperforms all other CNN-style architectures.
Tweet media one
2
0
13
@tungnd_13
Tung Nguyen
1 year
Big update: the pretrained checkpoints are now publicly available!
0
2
12
@tungnd_13
Tung Nguyen
1 year
The current approach to numerical weather and climate modeling is to simulate a system of differential equations relating the flow of energy and matter in different Earth systems. The science is good, but often computationally expensive and imperfect at longer time scales.
2
1
11
@tungnd_13
Tung Nguyen
7 months
We evaluate the performance of Stormer on WeatherBench 2 (WB2) in forecasting 9 key climate variables at lead times from 1 to 14 days. Stormer is on par with the baselines for short-range, 1-7 day forecasts. At longer lead times, Stormer consistently outperforms the baselines by
Tweet media one
1
0
10
@tungnd_13
Tung Nguyen
5 months
Awesome work as always!
@khainb_ml
Khai Nguyen @ CVPR
5 months
I'm happy to share that our paper 'Quasi-Monte Carlo for 3D Sliced Wasserstein' (), where we empirically discuss the usage of spherical low-discrepancy point sets to approximate Sliced Wasserstein, has been accepted as a spotlight at #ICLR2024 (1/11).
Tweet media one
1
7
47
1
0
10
@tungnd_13
Tung Nguyen
1 year
Fortunately, weather and climate is also a data-rich field courtesy satellites, radar, & other sensors. While numerical models do not scale with data, ML models benefit from both data & compute. What if we could distill knowledge of the Earth’s atmosphere into a large neural net?
1
1
10
@tungnd_13
Tung Nguyen
1 year
Unlike recent ML attempts aimed at specific tasks like weather forecasting, CliMax is a foundation model that allows quick and easy adaptation to any spatiotemporal predictive task in the atmospheric sciences.
1
1
10
@tungnd_13
Tung Nguyen
1 year
Update: the code now supports regional forecasting
1
1
9
@tungnd_13
Tung Nguyen
7 months
We observe similar results in terms of the Anomaly Correlation Coefficient (ACC).
Tweet media one
1
0
9
@tungnd_13
Tung Nguyen
7 months
The Stormer architecture consists of a weather-specific embedding and a transformer backbone. The weather-specific embedding module embeds the input to a sequence of tokens, while modeling the non-linear interactions between climate variables in the input. The transformer
Tweet media one
1
0
8
@tungnd_13
Tung Nguyen
1 year
To enable this adaptation, we follow a pretrain-finetune regime. We propose to use climate simulation datasets (CMIP6) for pretraining. Enables finetuning on reanalysis datasets!
2
0
8
@tungnd_13
Tung Nguyen
7 months
Stormer achieves this competitive performance with much less compute and lower-resolution data. We train Stormer on 1.40625 degree data, while other models train on 0.25 degree data. Stormer requires less than 1 day of training on 128 A100s, while Pangu-Weather and Graphcast take
1
0
8
@tungnd_13
Tung Nguyen
2 years
I will present this work in the Transfer/Multitask/Meta Learning session tomorrow, and will also have a poster from 6.30pm to 8.30pm. Happy to discuss with everyone more about the paper! #ICML2022
@tungnd_13
Tung Nguyen
2 years
Transformers show excellent capabilities in few-shot/meta learning, but have been mostly evaluated on accuracy-based metrics. How can we represent uncertainty in meta learning with transformers? We address this question in our new work at #ICML2022 !
3
24
162
1
0
8
@tungnd_13
Tung Nguyen
1 year
@bhskrdtt @jo_brandstetter @akapoor_av8r @rejuvyesh @adityagrover_ The source code and checkpoints are coming out soon. Stay tuned!
0
1
7
@tungnd_13
Tung Nguyen
7 months
The randomized forecasting objective enables a single model, once trained, to generate various forecasts for a specified lead time T by considering different combinations of the intervals seen during training. As target lead times extend beyond 5–7 days and individual forecasts
1
0
7
@tungnd_13
Tung Nguyen
7 months
Finally, we show the favorable scaling properties of Stormer with respect to model size and number of training tokens.
Tweet media one
1
0
7
@tungnd_13
Tung Nguyen
7 months
We conducted extensive ablation studies to verify the importance of each component in Stormer. These span architecture design choices – weather-specific embedding and adaptive layer normalization, and training the objective – pressure-weighted loss and dynamics forecasting.
Tweet media one
Tweet media two
1
0
6
@tungnd_13
Tung Nguyen
2 years
Amazing work as always!
@khainb_ml
Khai Nguyen @ CVPR
2 years
In our new #NeurIPS2022 paper, we show that using multiple convolution layers with random kernels to map a probability measure over images to one dimension is better for the sliced Wasserstein than doing vectorization and then taking inner-product with a random direction. (1/7)
2
16
96
1
0
6
@tungnd_13
Tung Nguyen
1 year
@catshouldnt It’s in Vietnamm 🇻🇳
0
0
4
@tungnd_13
Tung Nguyen
8 months
ExPT is accepted to #NeurIPS2023 . The paper is on arxiv and the source code is publicly available. Feel free to check them out! @adityagrover_ @TheSudhanshuAgr Paper: Code: Blog:
0
1
5
@tungnd_13
Tung Nguyen
2 years
Great work from our labmates!
@hbXNov
Hritik Bansal@CVPR
2 years
New paper📢 w/ @_shashankgoel_ @sbhatia_ R.Rossi, V. Vinay & @adityagrover_ ! We revisit the contrastive loss optimized by CLIP & identify a key shortcoming: image and text embeddings can lead to different predictions for downstream classification, which is fixed in CyCLIP. 🧵
1
9
35
0
0
4
@tungnd_13
Tung Nguyen
7 months
Works with my amazing collaborators @TheSudhanshuAgr @jasonjewik @hbXNov @ Prakhar Sharma @adityagrover_
0
0
4
@tungnd_13
Tung Nguyen
7 months
@nalkalc @hbXNov @TroyArcomano @Sand3e3p @ianfoster @adityagrover_ Thanks for reading our work! And our apologies for the oversight. We’ll make sure to cite MetNet in the next revision.
0
0
4
@tungnd_13
Tung Nguyen
1 year
Pre-processed data as used in our paper can also be loaded from HuggingFace: .
1
0
4
@tungnd_13
Tung Nguyen
2 years
While most neural processes parameterize this objective by a latent variable model, we propose a novel autoregressive factorization:
Tweet media one
1
0
4
@tungnd_13
Tung Nguyen
1 year
@shoyer You are right, deterministic forecasting at 10+ days might not be the best evaluation to do, but we did this to suggest the generality of ClimaX in working with multiple time scales. Using ClimaX to perform probabilistic forecasting is a potential future direction we’re exploring
1
0
4
@tungnd_13
Tung Nguyen
6 months
@FerranAlet @GoogleDeepMind Thank you for the great opportunity. I just applied and sent you an email!
0
0
2
@tungnd_13
Tung Nguyen
1 year
ClimateLearn reimagines the entire ML stack for data-driven climate science. Here are snapshots of some of the cool features in ClimateLearn.
Tweet media one
1
0
4
@tungnd_13
Tung Nguyen
1 year
@mattlungrenMD @MSFTResearch @pranavrajpurkar Thank you. Indeed we hope ClimaX architecture can be used in or extended to other domains
0
0
3
@tungnd_13
Tung Nguyen
8 months
or conserving energy!
1
0
3
@tungnd_13
Tung Nguyen
1 year
Models: Load SOTA deep learning models for forecasting or downscaling, and customize them as you want.
Tweet media one
1
0
3
@tungnd_13
Tung Nguyen
2 years
Interestingly, TNPs are comparable to the baselines regarding parameter counts, training, and prediction time. This indicates that scaling is not always necessary, and the transformer architecture itself is important to good performance.
Tweet media one
1
0
3
@tungnd_13
Tung Nguyen
2 years
TNPs outperform the current state-of-the-art NP-based methods on a wide range of tasks, especially on Bayesian optimization and contextual bandits, two decision-making problems that critically require uncertainty quantification:
Tweet media one
Tweet media two
1
0
3
@tungnd_13
Tung Nguyen
1 year
Data: Load your dataset in just a few lines of code, and slice it as you want.
Tweet media one
Tweet media two
Tweet media three
1
0
3
@tungnd_13
Tung Nguyen
1 year
In recent years, we have seen many promising works in data-driven climate and weather modeling. But, problems abound: difficult access to datasets, outdated model architectures, non-standardized evaluation protocols, and a general lack of reproducibility.
1
0
3
@tungnd_13
Tung Nguyen
1 year
Update: the code now supports regional forecasting
0
0
3
@tungnd_13
Tung Nguyen
8 months
Sample efficiency in ED is crucial due to high time, money, and safety costs. However, existing ML approaches require active data collection or access to large labeled datasets. This is impractical in many real-world scenarios such as material or protein design.
1
0
2
@tungnd_13
Tung Nguyen
8 months
ExPT follows a pretraining-adaptation approach for hyper-efficient ED. We first pretrain the model on unlabeled data, i.e., input designs x’s. During adaptation, the model adapts to the downstream task using a few labeled examples (x,y) from past experiments.
Tweet media one
1
0
2
@tungnd_13
Tung Nguyen
5 months
0
0
2
@tungnd_13
Tung Nguyen
2 years
The first attempt is to apply a vanilla transformer architecture. As this model treats x and y separately, it needs positional embedding to associate them as a pair. This breaks both conditions, as the model depends on the positions of input tokens.
Tweet media one
1
0
2
@tungnd_13
Tung Nguyen
8 months
To facilitate efficient adaptation, we use the unlabeled inputs to generate pretraining data from other synthetic functions. By few-shot learning from a diverse set of functions, the model is able to generalize quickly to any target objective during the adaptation phase.
1
0
2
@tungnd_13
Tung Nguyen
3 years
@VinAI_Research is one of the top research companies in ICML 2021 @icmlconf , despite being very young (2 years old)! And I'm happy to contribute one paper to this amazing result 😄
@SergeyI49013776
Sergey Ivanov
3 years
Top companies: Google, Microsoft, DeepMind (UK), FB, Amazon, IBM, Huawei (China), Tencent (China), Apple, Alibaba (China). Huawei (7 in 2020 -> 14 in 2021) Tencent (3 -> 10)
Tweet media one
2
2
12
0
0
2
@tungnd_13
Tung Nguyen
8 months
Data generation: We generate synthetic data from Gaussian Processes with an RBF kernel. They are a natural choice as they represent distributions over functions, are easy and cheap to sample from, and are universal approximators of any function.
Tweet media one
1
0
2
@tungnd_13
Tung Nguyen
2 years
In principle, any sequence model (e.g., gpt) can optimize this objective; however an architecture designed for meta learning has to satisfy two desiderata: 1) permutation-invariant to the context points, and 2) permutation-equivariant to the target points.
1
1
2
@tungnd_13
Tung Nguyen
1 year
@miniapeur Yes. It definitely saved me lots of time
0
0
2
@tungnd_13
Tung Nguyen
8 months
@kklmmr @MarcCoru @rolf_comma_e @devistuia Great work. Congrats! We've been wondering what the best positional embedding for geospatial data should be, and this seems like a nice answer
1
0
2
@tungnd_13
Tung Nguyen
2 years
For each task in meta learning, we provide a small set of labeled (context) points, and the model makes predictions for a set of unlabelled (target) points. As the supervision is limited, the model outputs a joint distribution of the target points to account for uncertainty.
1
0
2
@tungnd_13
Tung Nguyen
8 months
@MarcCoru @kklmmr @rolf_comma_e @devistuia Yes, the data is for the entire globe. Thanks for the insights!
0
0
0
@tungnd_13
Tung Nguyen
1 year
Tasks: Downscaling, Climate projections, Weather Forecasting – all in one.
Tweet media one
1
0
2
@tungnd_13
Tung Nguyen
7 months
@miniapeur so the less it is the better?
0
0
2
@tungnd_13
Tung Nguyen
1 year
We welcome community contributions: new datasets, model baselines, bug reports, and broader feedback of any kind! Our code is open-sourced at:
1
0
2
@tungnd_13
Tung Nguyen
8 months
Model architecture: ExPT employs an encoder-decoder architecture. The Transformer encoder encodes the information of the context points and the target value y, and the VAE decoder generates the input design x conditioning on the encoder’s output.
Tweet media one
1
0
2
@tungnd_13
Tung Nguyen
8 months
jumping
1
0
2
@tungnd_13
Tung Nguyen
8 months
We evaluate ExPT on four challenging tasks, constructed using only 1% of the data available in Design-Bench. We consider two scenarios where we sample the 1% data randomly (random) or use the worst 1% data (poor).
1
0
2
@tungnd_13
Tung Nguyen
7 months
@raspstephan @hbXNov @TroyArcomano @Sand3e3p @ianfoster @adityagrover_ We gave some thoughts about it. I think it'll definitely help for long lead times, but may hurt the performance at shorter horizons, as what I've seen with ClimaX.
0
0
2
@tungnd_13
Tung Nguyen
2 years
@rahiment haven’t got a single reply despite sending daily reminders…
0
0
2
@tungnd_13
Tung Nguyen
2 years
Neural Processes formalize this problem as learning a conditional distribution of the target labels y_m+1:N, given the context points (x_1:m, y_1:m) and the target inputs x_m+1:N.
Tweet media one
1
0
2
@tungnd_13
Tung Nguyen
1 year
0
0
2
@tungnd_13
Tung Nguyen
8 months
In both scenarios, ExPT significantly outperforms all baselines. Especially, in the more challenging poor setting, ExPT performs better than the second-best method by 70% on average across four tasks in terms of the mean performance.
Tweet media one
Tweet media two
1
0
2
@tungnd_13
Tung Nguyen
2 years
TNP-A is expressive but can be slow during inference due to sequential prediction. We introduce two simplified variants, TNP-D and TNP-ND to trade off expressivity and simplicity. TNP-D predicts the target points independently, while TNP-ND predicts them jointly.
1
0
2
@tungnd_13
Tung Nguyen
1 year
Easily produce visualizations for common metrics to gain insights into model performance.
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
0
2
@tungnd_13
Tung Nguyen
4 months
@suvarna_ashima @hbXNov They must hate Harry Potter :)
0
0
2
@tungnd_13
Tung Nguyen
8 months
For each synthetic function, we sample a set of points using the unlabeled data as inputs. We divide these points into a context set and a target set, and train the model to perform in-context generation of the target input x given the context points and the target y.
Tweet media one
1
0
2
@tungnd_13
Tung Nguyen
8 months
Moreover, unsupervised pretraining allows ExPT to adapt to any objective in the same domain purely via in-context learning. In other words, a single pretrained ExPT can generate robot morphologies that are optimal for running
1
0
2
@tungnd_13
Tung Nguyen
2 years
For further discussion and results, please checkout out our paper: . We open-source our implementation at .
0
0
2
@tungnd_13
Tung Nguyen
7 months
@adityagrover_ Huge congrats!
1
0
1
@tungnd_13
Tung Nguyen
2 years
We propose TNPs, a novel transformer-based model that satisfies these conditions. We concatenate x and y to form an input token, which removes the need for positional encoding. TNPs employ a custom masking mechanism to preserve autoregressive ordering.
Tweet media one
1
0
2
@tungnd_13
Tung Nguyen
1 year
@RyanHuynh1108 Congrats anh and the team!
0
0
2
@tungnd_13
Tung Nguyen
8 months
The analysis shows that the model’s performance improves consistently throughout training, demonstrating the effectiveness of synthetic pretraining.
Tweet media one
1
0
2
@tungnd_13
Tung Nguyen
5 months
@DuongLe54055716 Have been following your work for a while. Congrats Duong!
0
0
1
@tungnd_13
Tung Nguyen
2 years
@hbXNov Congrats Hritik!!!
0
0
1
@tungnd_13
Tung Nguyen
1 year
@vitusbenson @jo_brandstetter @akapoor_av8r @rejuvyesh @adityagrover_ And we pretrained on 5 datasets so it’s 128x5 in total
0
0
1
@tungnd_13
Tung Nguyen
1 year
@pentagoniac @jo_brandstetter @akapoor_av8r @rejuvyesh @adityagrover_ ClimaX is applicable to any gridded prediction tasks, whether the input variables are seen or unseen during training
0
0
1
@tungnd_13
Tung Nguyen
2 years
@huynm99 Congrats! Very well-deserved!
0
0
1