Tom Young @tomyoung903 X Profile

Tom Young

@tomyoung903

Followers

57

Following

722

Media

2

Statuses

124

Research fellow working on large language models @NUS w/ folks such as @XueFz and @YangYou1991

https://t.co/sbY8fa1lz3

Singapore

Joined April 2017

Don't wanna be here? Send us removal request.

Tom Young

@tomyoung903

2 years

Do Masked Language Models (e.g., PaLM2) inherently learn self-consistent token distributions? 🤔🤔 Our experiments show: No.❗And we can ensemble multiple inconsistent distributions to improve the model’s accuracy. 📈 Paper: https://t.co/pgbP3Nf05w Code: https://t.co/kCiDOEeGU0

0

1

9

Andrew Ng

@AndrewYNg

28 days

AI agents are getting better at looking at different types of data in businesses to spot patterns and create value. This is making data silos increasingly painful. This is why I increasingly try to select software that lets me control my own data, so I can make it available to my

deeplearning.ai

The Batch AI News and Insights: AI agents are getting better at looking at different types of data in businesses to spot patterns and create value...

129

355

2K

Zonglin Yang

@Yang_zy223

9 months

We are at the dawn of a "Cambrian Explosion" in scientific discovery. A super AI science copilot (or even pilot) is on the horizon. As AI frees humanity from routine labor and enhances education, we can fully harness our intelligence—investing our efforts (with AI’s aid) into

0

1

Jinjie Ni

@NiJinjie

10 months

Thrilled to share that MixEval-X has been selected for a spotlight at #ICLR2025! Our real-world-aligned, efficient evaluation method pushes the frontier of multi-modal model development. It would be useful to you whether you’re working on MMLM, AIGC, or any2any models. 🪐Brace

Jinjie Ni

@NiJinjie

1 year

🏇 Frontier players are racing to solve modality puzzles in the quest for AGI. But to get there, we need consistent, high-standard evaluations across all modalities! 🚀 Introducing MixEval-X, the first real-world, any-to-any benchmark. Inheriting the philosophy from MixEval,

9

8

20

Zonglin Yang

@Yang_zy223

11 months

Excited to share that our paper on automated scientific discovery has been accepted to #ICLR2025! In brief, 1. It shows that LLMs can rediscover the main innovations of many research hypotheses published in Nature or Science. 2. It provides a mathematically proven theoretical

Zonglin Yang

@Yang_zy223

1 year

Given only a chemistry research question, can an LLM system output novel and valid chemistry research hypotheses? Our answer is YES!!!🚀Even can rediscover those hypotheses published on Nature, Science, or a similar level. Preprint: https://t.co/xJQr0JrIzO Code:

3

11

30

Jinjie Ni

@NiJinjie

1 year

That’s why we built MixEval-X. Test your multi-modal models in real-world tasks here: https://t.co/Jz2THe24I3 It’s easy to run!

Google DeepMind

@GoogleDeepMind

1 year

Gemini 2.0 Flash Experimental has the ability to produce native audio in a variety of styles and languages - all from scratch. 🗣️ Here’s how this is different to traditional text-to-speech systems ↓ https://t.co/FRWb3q3KHe

0

4

11

Jinjie Ni

@NiJinjie

1 year

🔥Congrats GDM on successfully having Fuzhao! Needless to say the extreme challenge to get recruited by GDM Gemini team, especially for an SG based PhD student, what I want to mention here are three most important merits that I observe from Fuzhao during the four years’

Fuzhao Xue (Frio)

@XueFz

1 year

Life Update: I’m joining Google DeepMind as a Senior Research Scientist after three incredible years of PhD (aka "Pretrain myself harD" 😄). I’ll be contributing to Gemini pretraining and multi-modality research. I feel incredibly fortunate to report to @m__dehghani again, and

1

2

16

Jinjie Ni

@NiJinjie

1 year

🏇 Frontier players are racing to solve modality puzzles in the quest for AGI. But to get there, we need consistent, high-standard evaluations across all modalities! 🚀 Introducing MixEval-X, the first real-world, any-to-any benchmark. Inheriting the philosophy from MixEval,

2

21

68

Jinjie Ni

@NiJinjie

2 years

How to get ⚔️Chatbot Arena⚔️ model rankings with 2000× less time (5 minutes) and 5000× less cost ($0.6)? Maybe simply mix the classic benchmarks. 🚀 Introducing MixEval, a new 🥇gold-standard🥇 LLM evaluation paradigm standing on the shoulder of giants (classic benchmarks).

31

66

245

Yang You

@YangYou1991

2 years

Say hello to Grok-1's new PyTorch+HuggingFace edition! 🚀 314 billion parameters, 3.8x faster inference. Easy to use, open-source, and optimized by Colossal-AI. 🤖 Dive in: #Grok1 #ColossalAI🌟 https://t.co/D1P1XM1rEs Download Now: https://t.co/Fg3BmiNRzT

huggingface.co

33

108

679

Jinjie Ni

@NiJinjie

2 years

Was impressed by the idea the first time I got to know it. DMC demonstrates the possibility of compressing KV cache in a smarter way, and KV cache compression is just sooooo important nowadays! Glad to see the huge leap in LLM's efficiency! Kudos to the team! 🔥🔥🔥

Piotr Nawrot

@p_nawrot

2 years

The memory in Transformers grows linearly with the sequence length at inference time. In SSMs it is constant, but often at the expense of performance. We introduce Dynamic Memory Compression (DMC) where we retrofit LLMs to compress their KV cache while preserving performance

0

3

4

Fuzhao Xue (Frio)

@XueFz

2 years

Big congrats to @DrJimFan and @yukez !!! I’m super lucky to be part of the GEAR team! I do believe that Generalist Embodied Agent is the next step towards AGI. Join GEAR and let’s make some exciting things together!

Jim Fan

@DrJimFan

2 years

Career update: I am co-founding a new research group called "GEAR" at NVIDIA, with my long-time friend and collaborator Prof. @yukez. GEAR stands for Generalist Embodied Agent Research. We believe in a future where every machine that moves will be autonomous, and robots and

0

1

32

Tom Young

@tomyoung903

2 years

You don't see a paper like this everyday!

AK

@_akhaliq

2 years

Neural Network Diffusion Diffusion models have achieved remarkable success in image and video generation. In this work, we demonstrate that diffusion models can also generate high-performing neural network parameters. Our approach is simple, utilizing an autoencoder and a

0

3

Fuzhao Xue (Frio)

@XueFz

2 years

(1/5)🚀 Our OpenMoE Paper is out! 📄 Including: 🔍ALL Checkpoints 📊 In-depth MoE routing analysis 🤯Learning from mistakes & solutions Three important findings: (1) Context-Independent Specialization; (2) Early Routing Learning; (3) Drop-towards-the-End. Paper Link:

5

101

512

Tom Young

@tomyoung903

2 years

Fuzhao shared his developing details with OpenMoE every time I asked. A lot of details go into training LLMs.

Jim Fan

@DrJimFan

2 years

Mistral magnet link is awesome, but let’s get the timelines straight. My student and NVIDIA intern Fuzhao open-sourced a decoder-only MoE 4 months ago. Google Switch Transformer, a T5-based MoE, was open > a year ago. MoE isn’t new. It just didn’t get as much attention routed

0

4

Tom Young

@tomyoung903

2 years

As a postdoc working on AI, I still spend a lot of time trying to improve my coding skills😂. Would love to see more tweets on those.

Greg Brockman

@gdb

2 years

People often ask if ML or software skills are more the bottleneck to AI progress. It’s the wrong question—both are invaluable, and people with both sets of skills can have outsized impact. We find it easier, however, to teach people ML skills as needed than software engineering.

0

2

Tom Young

@tomyoung903

2 years

This pipeline suits many companies I imagine🤔🤔

Jim Fan

@DrJimFan

2 years

NVIDIA basically compressed 30 years of its corporate memory into 13B parameters. Our greatest creations add up to 24B tokens, including chip designs, internal codebases, and engineering logs like bug reports. Let that sink in. The model "ChipNeMo" is deployed internally, like a

1

0

4

Tom Young

@tomyoung903

2 years

🥳🥳🥳 I feel fortunate to chat with and learn from him from time to time 😄😄

Fuzhao Xue (Frio)

@XueFz

2 years

Super thrilled to announce that I've been awarded the 2023 Google PhD Fellowship! Enormous gratitude to my wonderful mentors/advisors who championed my application: @m__dehghani, @YangYou1991, @AixinSG, and to all my incredible collaborators. A heartfelt thanks to @GoogleAI and

1

0

5

Brian Roemmele

@BrianRoemmele

2 years

“Levels of AGI: Operationalizing Progress on the Path to AGI” Google defines AGI levels in this new paper. It is interesting. Paper: https://t.co/DtjTK9sG7w

5

12

53

Aran Komatsuzaki

@arankomatsuzaki

2 years

What's In My Big Data? Proposes a platform and a set of sixteen analyses that allow us to reveal and compare the contents of large text corpora https://t.co/rYdZErJFld

1

35

203