Baris Kasikci
@bariskasikci
Followers
5K
Following
11K
Media
152
Statuses
4K
Professor @uwcse; previously Morris Wellman Professor of EECS @UMichCSE, @Google, @MSFTResearch
Seattle, WA
Joined March 2010
Overload control is usually built around a bad assumption. Most systems watch global signals like queue length or tail latency and react at the front door by throttling new arrivals or dropping random requests. This works when CPU or network is the bottleneck. It fails when the
2
5
35
🚀 Join us at the Paul G. Allen School of Computer Science & Engineering at The University of Washington! We’re hiring tenure-track faculty at all ranks across computer science and computer engineering. 📢 You’ll engage with outstanding, motivated students and colleagues in one
1
27
180
I will be attending #EMNLP2025 this week to present LiteASR, a compression method for speech encoders (a collaborative work with @kotoba_tech). Catch our poster at the first poster session on Wednesday morning. Happy to chat about efficiency, speech, or both!
🚀 Presenting LiteASR: a method that halves the compute cost of speech encoders by 2x, leveraging low-rank approximation of activations. LiteASR is accepted to #EMNLP2025 (main) @emnlpmeeting
1
3
10
(please reshare) I'm recruiting multiple PhD students and Postdocs @uwcse @uwnlp ( https://t.co/I5wQsFnCLL). Focus areas incl. psychosocial AI simulation and safety, Human-AI collaboration. PhD: https://t.co/ku40wCrpYh Postdocs: https://t.co/K9HUIPJ5h6
7
111
402
UTCS is hiring in all areas, including PL! Please DM me if you are on the job market this year and interested in joining our wonderful department :)
3
27
81
Can we please build a printer that works reliably and doesn’t jam before we get to Skynet?
A stunningly broad coalition has come out against Skynet: AI researchers, faith leaders, business pioneers, policymakers, NatSec folks and actors stand together, from Bannon & Beck to Hinton, Wozniak & Prince Harry. We stand together because we want a human future.
1
0
8
🤔 Can AI optimize the systems it runs on? 🚀 Introducing FlashInfer-Bench, a workflow that makes AI systems self-improving with agents: - Standardized signature for LLM serving kernels - Implement kernels with your preferred language - Benchmark them against real-world serving
3
44
145
My team is hiring AI research interns for summer 2026 at Databricks! Join us to learn about AI use cases at thousands of companies, and contribute to making it easier for anyone to build specialized AI agents and models for difficult tasks.
20
55
615
LLMc is open-source ( https://t.co/OSqM6Q2mMX)! We’re excited to see the community build on it. Try it out and let us know what you think! (4/4) P.S. props to @cHHillee and other great folks at @thinkymachines for working on deterministic LLM inference, which made LLMc possible
github.com
A language-model–powered compressor for natural language text - uw-syfi/LLMc
1
0
8
Benchmarks show LLMc achieves state-of-the-art compression ratios, outperforming Gzip and LZMA on natural language text. To manage the quadratic complexity of LLM inference, it processes text in chunks, improving performance and GPU utilization. (3/4)
1
0
1
The connection between LLMs and compression is strong: a model that accurately predicts the next token is an optimal compressor. LLMc uses this principle with rank-based encoding, storing a token’s rank in the LLM’s output distribution instead of the token itself for a compact
1
0
1
How to beat all compression using LLMs? ⚙️ Introducing LLMc — a lossless compressor built with LLMs. LLMc leverages the predictive power of LLMs to beat traditional compressors like Gzip and LZMA on natural language text. (1/4) 🔗 Blog Post: https://t.co/5ppAqBSTTh 💻 Code:
2
5
22
VoxServe is open source and already supports models like CSM, Orpheus, Zonos, GLM-Voice, Step-Audio-2, and more are coming. Try it via `pip install vox-serve`, and we’d love to hear your feedback! (4/4) https://t.co/3uBRHE9K64
github.com
Serving System for SpeechLMs. Contribute to vox-serve/vox-serve development by creating an account on GitHub.
0
0
1
VoxServe also introduces a scheduling algorithm for various scenarios, optimizing for performance metrics that really matter: For online settings, it minimizes Time-To-First-Audio latency while satisfying streaming needs. For offline settings, it optimizes end-to-end throughput.
1
0
1
SpeechLMs pose unique deployment challenges: you need to run a language model + audio detokenizer in concert with careful scheduling, stream audio in real time, and support very different model architectures. VoxServe unifies all these under a consistent abstraction while
1
0
0
🎙️ Introducing VoxServe — a high-throughput, low-latency serving system built for Speech Language Models (TTS, STS, etc.), natively handling audio detokenization + streaming with performance as the core goal. (1/4) 🔗 blog post: https://t.co/vPEwN8Q5XQ 💻 code:
1
4
11
We are opening a new blog series at @ACMSIGOPS Blog to discuss Systems Research in the era of disruptive AI. If you'd like to share thoughts, viewpoints, and stories, please consider contributing an article! My hope is that, through the exposure and discussion, we can help
Read the full post and join the conversation! 👉 https://t.co/ckLum4Pr8O Together with Mike Liang, @FrancisYan_, @tianyin_xu, Lidong Zhou
5
16
70
Hayroll wraps C2Rust so that C macros are translated into Rust `macro_rules` or functions. See the image for an example. Hayroll is designed to wrap any C-to-Rust translation tool, but we have not yet tested that capability. You can find Hayroll at https://t.co/pYH6QKiWgC. Please
0
0
4
C to Rust translators like C2Rust do not handle C macros. @hrhrpeng built a tool called Hayroll that tackles this problem.
1
1
9
Excited to share the CFP for the inaugural MLSys industry track. Timeline and format are the same, but industry track papers focus on the design and/or evaluation of real world systems. Novelty is not a requirement. The deadline is October 30, 2025:
1
2
8