Steven Kolawole @_stevenkolawole X Profile

Steven Kolawole

@_stevenkolawole

Followers

2K

Following

7K

Media

64

Statuses

2K

Ẹ̀yin èèyàn mi! ❤️ Low-budget philosopher. ML Efficiency. PhD-ing @SCSatCMU. @ml_collective's poster child. Big brother to 3 amazing sisters.

https://t.co/bmuMdsZCoh

Pittsburgh, PA

Joined December 2018

Don't wanna be here? Send us removal request.

Naomi Saphra

@nsaphra

2 months

I’m recruiting PhD students for 2026! If you are interested in robustness, training dynamics, interpretability for scientific understanding, or the science of LLM analysis you should apply. BU is building a huge LLM analysis/interp group and you’ll be joining at the ground floor.

Naomi Saphra

@nsaphra

9 months

Life update: I'm starting as faculty at Boston University in 2026! BU has SCHEMES for LM interpretability & analysis, so I couldn't be more pumped to join a burgeoning supergroup w/ @najoungkim @amuuueller. Looking for my first students, so apply and reach out!

17

126

668

Chhavi Yadav

@chhaviyadav_

2 months

🚀 Federated Learning (FL) promises collaboration without data sharing. While Cross-Device FL is a success and deployed widely in industry, we don’t see Cross-Silo FL (collaboration between organizations) taking off despite huge demand and interest. Why could this be the case? 🤔

1

12

23

Prince Mireku

@freakish_prince

2 months

It's a beautiful thing looking back to when I joined MLC last year. All I had was a scrambled mind and an interest in research. One step at a time; not an expert yet, but now I have direction and can confidently design and execute research projects thanks to @ml_collective.

Steven Kolawole

@_stevenkolawole

2 months

On today’s MLC-Ng Sunday Specials call, an old-timer returned after a research role at Max Planck l (undergrad at UNILAG); another casually mentioned his new role at Google. Joining these calls weekly is humbling-seeing so many talented folks doing big things from small places.

0

2

4

Josh Salako

@josh_salako

2 months

Thanks to this wonderful community, I got selected for the Google DeepMind scholarship for a Master's in Mathematical Science (AI for Science) at the African Institute for Mathematical Sciences, South Africa. Cheers to the next chapter🎉

Steven Kolawole

@_stevenkolawole

2 months

On today’s MLC-Ng Sunday Specials call, an old-timer returned after a research role at Max Planck l (undergrad at UNILAG); another casually mentioned his new role at Google. Joining these calls weekly is humbling-seeing so many talented folks doing big things from small places.

11

20

142

Steven Kolawole

@_stevenkolawole

2 months

On today’s MLC-Ng Sunday Specials call, an old-timer returned after a research role at Max Planck l (undergrad at UNILAG); another casually mentioned his new role at Google. Joining these calls weekly is humbling-seeing so many talented folks doing big things from small places.

8

17

109

Justin Skycak

@justinskycak

2 months

A general phenomenon that can sneak up on you when you’re at, say, the 99th percentile of a skill: At first, you’re exceptional enough that you receive praise from virtually everyone, and you may never go head-to-head with someone who can beat you. That is, until you join some

Justin Skycak

@justinskycak

2 months

Last year I had a conversation with someone who majored in physics at UChicago. He initially started in math & thought he was prepared having taken AP Calculus BC, but he got smacked in the face by the level of abstraction and proof-writing ability that was assumed. He couldn't

3

13

242

Jay Yang

@Jayyanginspires

2 months

Sleep hits harder when you’re exhausted. Food tastes better when you’ve gone hungry. Water tastes sweeter when you’ve been grinding. Music hits deeper when you’ve been in silence. Deprivation sharpens pleasure. Don’t run from suffering. That’s what gives life its color.

117

2K

13K

Justin Skycak

@justinskycak

2 months

Initially you think you're training hard & smart enough to max out your potential. Then you get to high enough level to realize that, while a few people there do have more inborn talent than you, many of them have no inborn edge over you -- but they're way ahead because they

Justin Skycak

@justinskycak

2 months

Last year I had a conversation with someone who majored in physics at UChicago. He initially started in math & thought he was prepared having taken AP Calculus BC, but he got smacked in the face by the level of abstraction and proof-writing ability that was assumed. He couldn't

9

23

433

Jane Manchun Wong

@wongmjane

3 months

Crazy how the MIT License started as a open source project and popped off so hard they built a university about it 🤯

66

184

8K

Steven Kolawole

@_stevenkolawole

3 months

Arxiv:

arxiv.org

Cascade systems route computational requests to smaller models when possible and defer to larger models only when necessary, offering a promising approach to balance cost and quality in LLM...

Steven Kolawole

@_stevenkolawole

3 months

Excited to announce that "Semantic Agreement Enables Efficient Open-Ended LLM Cascades" got accepted to EMNLP 2025 Industry Track! Thread 🧵

0

5

Steven Kolawole

@_stevenkolawole

3 months

@gingsmith #EMNLP2025

0

3

Chimaobi Okite

@ChimaobiOkite

3 months

Our paper (my first 😌) “Benchmarking and Improving LLM Robustness for Personalized Generation” was accepted to #EMNLP2025 findings🎉 🧑‍💻Data and Code: https://t.co/ULz29xqxz0 Paper: https://t.co/q22FPPpOaY 🧵

arxiv.org

Recent years have witnessed a growing interest in personalizing the responses of large language models (LLMs). While existing evaluations primarily focus on whether a response aligns with a user's...

1

6

45

Steven Kolawole

@_stevenkolawole

3 months

Grateful for this collaboration with @duncansoiffer and @gingsmith--extending model routing/cascading to open-ended generation, using meaning-level consensus to further bridge the theory-practice gap! 📄 Paper: TBA (arxiv is still processing)

1

0

2

Steven Kolawole

@_stevenkolawole

3 months

This work started from a simple observation from my previous work on ABC ( https://t.co/r0NE5INY0Q): deployment reality != academic assumptions. Most cascade research assumes white-box access & homogeneous models, but that's not how LLMs are actually deployed in practice.

arxiv.org

Adaptive inference schemes reduce the cost of machine learning inference by assigning smaller models to easier examples, attempting to avoid invocation of larger models when possible. In this work...

1

0

1

Steven Kolawole

@_stevenkolawole

3 months

Why this matters for production: Black-box APIs, constant model updates, heterogeneous serving, & cost pressure are realities that existing cascade methods struggle in forms to handle Semantic cascades solve all these constraints simultaneously-no retraining when models update.

1

0

2

Steven Kolawole

@_stevenkolawole

3 months

This breaks conventional ensemble wisdom. Results across translation, summarization, and QA (500M-70B parameters): - Match 70B quality at 40% computational cost - Up to 60% latency reduction - Works across any model family - Deployable today

1

0

3

Steven Kolawole

@_stevenkolawole

3 months

But here's the counterintuitive twist: *weaker models actually improve ensemble decisions* through disagreement signals A 500M model disagreeing with 8B models provides valuable uncertainty info. Even when the weak model is wrong, it helps identify when to defer to larger models

1

0

2

Steven Kolawole

@_stevenkolawole

3 months

Our solution: When models say "It's sunny today" vs "The weather is bright," they agree semantically despite different words. That meaning-level consensus predicts reliability better than individual confidence scores, with no training required.

1

0

1

Steven Kolawole

@_stevenkolawole

3 months

Quick primer: LLM cascading means routing queries to small, cheap models when possible, and only using large, expensive models when necessary. Think "escalation system" - start small, escalate when needed. Goal: maintain quality while cutting costs.

1

0

1

Steven Kolawole

@_stevenkolawole

3 months

Open-ended generation breaks traditional cascading Multiple valid answers exist, so "correct vs incorrect" deferral rules don't work Token-level confidence optimizes for next-token pred, not semantic quality. GPT/Claude APIs hide those scores anyway We needed sth different.

1

0

1