tomyoung903 Profile Banner
Tom Young Profile
Tom Young

@tomyoung903

Followers
57
Following
722
Media
2
Statuses
124

Research fellow working on large language models @NUS w/ folks such as @XueFz and @YangYou1991

Singapore
Joined April 2017
Don't wanna be here? Send us removal request.
@tomyoung903
Tom Young
2 years
Do Masked Language Models (e.g., PaLM2) inherently learn self-consistent token distributions? 🤔🤔 Our experiments show: No.❗And we can ensemble multiple inconsistent distributions to improve the model’s accuracy. 📈 Paper: https://t.co/pgbP3Nf05w Code: https://t.co/kCiDOEeGU0
0
1
9
@AndrewYNg
Andrew Ng
28 days
AI agents are getting better at looking at different types of data in businesses to spot patterns and create value. This is making data silos increasingly painful. This is why I increasingly try to select software that lets me control my own data, so I can make it available to my
Tweet card summary image
deeplearning.ai
The Batch AI News and Insights: AI agents are getting better at looking at different types of data in businesses to spot patterns and create value...
129
355
2K
@Yang_zy223
Zonglin Yang
9 months
We are at the dawn of a "Cambrian Explosion" in scientific discovery. A super AI science copilot (or even pilot) is on the horizon. As AI frees humanity from routine labor and enhances education, we can fully harness our intelligence—investing our efforts (with AI’s aid) into
0
1
1
@NiJinjie
Jinjie Ni
10 months
Thrilled to share that MixEval-X has been selected for a spotlight at #ICLR2025! Our real-world-aligned, efficient evaluation method pushes the frontier of multi-modal model development. It would be useful to you whether you’re working on MMLM, AIGC, or any2any models. 🪐Brace
@NiJinjie
Jinjie Ni
1 year
🏇 Frontier players are racing to solve modality puzzles in the quest for AGI. But to get there, we need consistent, high-standard evaluations across all modalities! 🚀 Introducing MixEval-X, the first real-world, any-to-any benchmark. Inheriting the philosophy from MixEval,
9
8
20
@Yang_zy223
Zonglin Yang
11 months
Excited to share that our paper on automated scientific discovery has been accepted to #ICLR2025! In brief, 1. It shows that LLMs can rediscover the main innovations of many research hypotheses published in Nature or Science. 2. It provides a mathematically proven theoretical
@Yang_zy223
Zonglin Yang
1 year
Given only a chemistry research question, can an LLM system output novel and valid chemistry research hypotheses? Our answer is YES!!!🚀Even can rediscover those hypotheses published on Nature, Science, or a similar level. Preprint: https://t.co/xJQr0JrIzO Code:
3
11
30
@NiJinjie
Jinjie Ni
1 year
That’s why we built MixEval-X. Test your multi-modal models in real-world tasks here: https://t.co/Jz2THe24I3 It’s easy to run!
@GoogleDeepMind
Google DeepMind
1 year
Gemini 2.0 Flash Experimental has the ability to produce native audio in a variety of styles and languages - all from scratch. 🗣️ Here’s how this is different to traditional text-to-speech systems ↓ https://t.co/FRWb3q3KHe
0
4
11
@NiJinjie
Jinjie Ni
1 year
🔥Congrats GDM on successfully having Fuzhao! Needless to say the extreme challenge to get recruited by GDM Gemini team, especially for an SG based PhD student, what I want to mention here are three most important merits that I observe from Fuzhao during the four years’
@XueFz
Fuzhao Xue (Frio)
1 year
Life Update: I’m joining Google DeepMind as a Senior Research Scientist after three incredible years of PhD (aka "Pretrain myself harD" 😄). I’ll be contributing to Gemini pretraining and multi-modality research. I feel incredibly fortunate to report to @m__dehghani again, and
1
2
16
@NiJinjie
Jinjie Ni
1 year
🏇 Frontier players are racing to solve modality puzzles in the quest for AGI. But to get there, we need consistent, high-standard evaluations across all modalities! 🚀 Introducing MixEval-X, the first real-world, any-to-any benchmark. Inheriting the philosophy from MixEval,
2
21
68
@NiJinjie
Jinjie Ni
2 years
How to get ⚔️Chatbot Arena⚔️ model rankings with 2000× less time (5 minutes) and 5000× less cost ($0.6)? Maybe simply mix the classic benchmarks. 🚀 Introducing MixEval, a new 🥇gold-standard🥇 LLM evaluation paradigm standing on the shoulder of giants (classic benchmarks).
31
66
245
@YangYou1991
Yang You
2 years
Say hello to Grok-1's new PyTorch+HuggingFace edition! 🚀 314 billion parameters, 3.8x faster inference. Easy to use, open-source, and optimized by Colossal-AI. 🤖 Dive in: #Grok1 #ColossalAI🌟 https://t.co/D1P1XM1rEs Download Now: https://t.co/Fg3BmiNRzT
Tweet card summary image
huggingface.co
33
108
679
@NiJinjie
Jinjie Ni
2 years
Was impressed by the idea the first time I got to know it. DMC demonstrates the possibility of compressing KV cache in a smarter way, and KV cache compression is just sooooo important nowadays! Glad to see the huge leap in LLM's efficiency! Kudos to the team! 🔥🔥🔥
@p_nawrot
Piotr Nawrot
2 years
The memory in Transformers grows linearly with the sequence length at inference time. In SSMs it is constant, but often at the expense of performance. We introduce Dynamic Memory Compression (DMC) where we retrofit LLMs to compress their KV cache while preserving performance
0
3
4
@XueFz
Fuzhao Xue (Frio)
2 years
Big congrats to @DrJimFan and @yukez !!! I’m super lucky to be part of the GEAR team! I do believe that Generalist Embodied Agent is the next step towards AGI. Join GEAR and let’s make some exciting things together!
@DrJimFan
Jim Fan
2 years
Career update: I am co-founding a new research group called "GEAR" at NVIDIA, with my long-time friend and collaborator Prof. @yukez. GEAR stands for Generalist Embodied Agent Research. We believe in a future where every machine that moves will be autonomous, and robots and
0
1
32
@tomyoung903
Tom Young
2 years
You don't see a paper like this everyday!
@_akhaliq
AK
2 years
Neural Network Diffusion Diffusion models have achieved remarkable success in image and video generation. In this work, we demonstrate that diffusion models can also generate high-performing neural network parameters. Our approach is simple, utilizing an autoencoder and a
0
0
3
@XueFz
Fuzhao Xue (Frio)
2 years
(1/5)🚀 Our OpenMoE Paper is out! 📄 Including: 🔍ALL Checkpoints 📊 In-depth MoE routing analysis 🤯Learning from mistakes & solutions Three important findings: (1) Context-Independent Specialization; (2) Early Routing Learning; (3) Drop-towards-the-End. Paper Link:
5
101
512
@tomyoung903
Tom Young
2 years
Fuzhao shared his developing details with OpenMoE every time I asked. A lot of details go into training LLMs.
@DrJimFan
Jim Fan
2 years
Mistral magnet link is awesome, but let’s get the timelines straight. My student and NVIDIA intern Fuzhao open-sourced a decoder-only MoE 4 months ago. Google Switch Transformer, a T5-based MoE, was open > a year ago. MoE isn’t new. It just didn’t get as much attention routed
0
0
4
@tomyoung903
Tom Young
2 years
As a postdoc working on AI, I still spend a lot of time trying to improve my coding skills😂. Would love to see more tweets on those.
@gdb
Greg Brockman
2 years
People often ask if ML or software skills are more the bottleneck to AI progress. It’s the wrong question—both are invaluable, and people with both sets of skills can have outsized impact. We find it easier, however, to teach people ML skills as needed than software engineering.
0
0
2
@tomyoung903
Tom Young
2 years
This pipeline suits many companies I imagine🤔🤔
@DrJimFan
Jim Fan
2 years
NVIDIA basically compressed 30 years of its corporate memory into 13B parameters. Our greatest creations add up to 24B tokens, including chip designs, internal codebases, and engineering logs like bug reports. Let that sink in. The model "ChipNeMo" is deployed internally, like a
1
0
4
@tomyoung903
Tom Young
2 years
🥳🥳🥳 I feel fortunate to chat with and learn from him from time to time 😄😄
@XueFz
Fuzhao Xue (Frio)
2 years
Super thrilled to announce that I've been awarded the 2023 Google PhD Fellowship! Enormous gratitude to my wonderful mentors/advisors who championed my application: @m__dehghani, @YangYou1991, @AixinSG, and to all my incredible collaborators. A heartfelt thanks to @GoogleAI and
1
0
5
@BrianRoemmele
Brian Roemmele
2 years
“Levels of AGI: Operationalizing Progress on the Path to AGI” Google defines AGI levels in this new paper. It is interesting. Paper: https://t.co/DtjTK9sG7w
5
12
53
@arankomatsuzaki
Aran Komatsuzaki
2 years
What's In My Big Data? Proposes a platform and a set of sixteen analyses that allow us to reveal and compare the contents of large text corpora https://t.co/rYdZErJFld
1
35
203