sukjun_hwang Profile Banner
Sukjun (June) Hwang Profile
Sukjun (June) Hwang

@sukjun_hwang

Followers
3K
Following
572
Media
14
Statuses
88

ML PhD student @mldcmu advised by @_albertgu

Pittsburgh, PA
Joined April 2023
Don't wanna be here? Send us removal request.
@sukjun_hwang
Sukjun (June) Hwang
4 months
Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data
98
750
5K
@seohong_park
Seohong Park
2 days
We scaled up an "alternative" paradigm in RL: *divide and conquer*. Compared to Q-learning (TD learning), divide and conquer can naturally scale to much longer horizons. Blog post: https://t.co/xtXBzya0bI Paper: https://t.co/nqYkLucsWu
11
72
423
@krandiash
Karan Goel
3 days
We've raised $100M from Kleiner Perkins, Index Ventures, Lightspeed, and NVIDIA. Today we're introducing Sonic-3 - the state-of-the-art model for realtime conversation. What makes Sonic-3 great: - Breakthrough naturalness - laughter and full emotional range - Lightning fast -
1K
1K
8K
@risteski_a
Andrej Risteski
17 days
I have been thinking a lot recently about framing a variety of inference-time tasks as doing algorithm design with access to strong oracles (e.g. generators, different types of verifiers, convolved scores, ...) --- as an alternative to "end-to-end" analyses.
@canondetortugas
Dylan Foster 🐢
20 days
New paper we're excited to get online! Taming Imperfect Process Verifiers: A Sampling Perspective on Backtracking. A totally new framework based on ~backtracking~ for using process verifiers to guide inference, w/ connections to approximate counting/sampling in theoretical CS.
3
11
55
@sainingxie
Saining Xie
17 days
three years ago, DiT replaced the legacy unet with a transformer-based denoising backbone. we knew the bulky VAEs would be the next to go -- we just waited until we could do it right. today, we introduce Representation Autoencoders (RAE). >> Retire VAEs. Use RAEs. 👇(1/n)
56
331
2K
@seohong_park
Seohong Park
22 days
Introducing *dual representations*! tl;dr: We represent a state by the "set of similarities" to all other states. This dual perspective has lots of nice properties and practical benefits in RL. Blog post: https://t.co/lw1PortD9E Paper: https://t.co/zYKFjyOy7C
14
96
787
@yewonbyun_
Emily Byun
22 days
💡Can we trust synthetic data for statistical inference? We show that synthetic data (e.g. LLM simulations) can significantly improve the performance of inference tasks. The key intuition lies in the interactions between the moments of synthetic data and those of real data
2
36
135
@AdtRaghunathan
Aditi Raghunathan
1 month
There’s been a lot of work on unlearning in LLMs, trying to erase memorization without hurting capabilities — but we haven’t seen much success. ❓What if unlearning is actually doomed from the start? 👇This thread explains why and how *memorization sinks* offer a new way forward.
6
39
174
@yus167
Yuda Song
2 months
LLMs lose diversity after RL post-training, and this hurts test-time scaling & creativity. Why does this collapse happen, and how can we fix it? Our new work introduces: 🔍 RL as Sampling (analysis) 🗺️ Outcome-based Exploration (intervention) [1/n]
9
87
468
@sukjun_hwang
Sukjun (June) Hwang
2 months
Coming from a computer vision background and now in sequence modeling, I’m often struck by how disconnected LLMs and vision feel. Our work, AUSM, treats video as language -- and it reveals a few blind spots we’ve overlooked.
@miran_heo
Miran Heo
2 months
We connect the autoregressive pipeline of LLMs with streaming video perception. Introducing AUSM: Autoregressive Universal Video Segmentation Model. A step toward unified, scalable video perception — inspired by how LLMs unified NLP. 📝
4
8
134
@main_horse
main
2 months
μtransfer for Mamba2 & Muon
4
24
198
@pratyushmaini
Pratyush Maini
2 months
1/Pretraining is hitting a data wall; scaling raw web data alone leads to diminishing returns. Today @datologyai shares BeyondWeb, our synthetic data approach & all the learnings from scaling it to trillions of tokens🧑🏼‍🍳 - 3B LLMs beat 8B models🚀 - Pareto frontier for performance
23
124
713
@lchen915
Lili
3 months
Self-Questioning Language Models: LLMs that learn to generate their own questions and answers via asymmetric self-play RL. There is no external training data – the only input is a single prompt specifying the topic.
27
183
1K
@mihirp98
Mihir Prabhudesai
3 months
🚨 The era of infinite internet data is ending, So we ask: 👉 What’s the right generative modelling objective when data—not compute—is the bottleneck? TL;DR: ▶️Compute-constrained? Train Autoregressive models ▶️Data-constrained? Train Diffusion models Get ready for 🤿 1/n
127
196
1K
@_albertgu
Albert Gu
3 months
I'll be giving the first H-Net talk this afternoon at 4:30-5 PT at the ES-FoMo workshop! come support the fight against Big Token 🙏
@ESFoMo
ES-FoMo@ICML2025
3 months
Looking forward to seeing everyone for ES-FoMo part three tomorrow! We'll be in East Exhibition Hall A (the big one), and we've got an exciting schedule of invited talks, orals, and posters planned for you tomorrow. Let's meet some of our great speakers! 1/
4
11
139
@gaurav_ghosal
Gaurav Ghosal
4 months
1/So much of privacy research is designing post-hoc methods to make models mem. free. It’s time we turn that around with architectural changes. Excited to add Memorization Sinks to the transformer architecture this #ICML2025 to isolate memorization during LLM training🧵
1
24
61
@sukjun_hwang
Sukjun (June) Hwang
4 months
Just realized we forgot to link the code, check it out! Model checkpoints are included so you can play with it yourself and see what boundaries it's learning Code: https://t.co/BtQaU383xJ Paper: https://t.co/AVW1Rtzpqw 12/10
Tweet card summary image
arxiv.org
Major progress on language models (LMs) in recent years has largely resulted from moving away from specialized models designed for specific tasks, to general models based on powerful architectures...
2
16
99
@sukjun_hwang
Sukjun (June) Hwang
4 months
Albert has written amazing blog posts full of behind-the-scenes stories and wonderful insights about H-Net. You should check them out! https://t.co/NL9Eus1YBa
@_albertgu
Albert Gu
4 months
This was an incredibly important project to me - I’ve wanted to solve it for years, but had no idea how. This was all @sukjun_hwang and @fluorane's amazing work! I wrote about the story of its development, and what might be coming next. The H-Net:
5
5
106
@sukjun_hwang
Sukjun (June) Hwang
4 months
We’re incredibly excited to see how H-Nets will allow models to learn more efficiently, with less priors and pre-processing, across all sorts of modalities! This work was a collaboration with @cartesia_ai 10/10
7
4
153
@sukjun_hwang
Sukjun (June) Hwang
4 months
Finally, a key ingredient of H-Net is using state space models (SSMs) such as Mamba layers in the outer stages. SSMs naturally compress data into their recurrent states, which is not only more efficient, but turns out to be crucial toward building higher-level abstractions. 9/
1
7
117
@sukjun_hwang
Sukjun (June) Hwang
4 months
DNA is an unusual “language”, and previous architectures showed different modeling power on DNA sequences (e.g., Mamba > Transformer). But any of them can be wrapped inside an H-Net for much stronger scaling, learning nearly 4 times as efficiently with data! 8/
2
11
149