GarchFather Profile Banner
Kirill Solodskikh Profile
Kirill Solodskikh

@GarchFather

Followers
370
Following
342
Media
34
Statuses
178

Almost Phd, Almost Founder, Almost Team Lead, Almost Successful, married. @TheStageAI Co-founder, CEO, ex Huawei P50 AI cameras

Joined October 2022
Don't wanna be here? Send us removal request.
@GarchFather
Kirill Solodskikh
1 day
Can LLMs recognize ASCII art?. Our tests show accelerated Elastic Models analyze line-by-line features and combine them using statistical patterns. Try it yourself with DeepSeek-Qwen-14B – 120 tok/s on H100, 40 tok/s on L40s, up to 3× faster. Free API token!.
2
6
72
@GarchFather
Kirill Solodskikh
4 days
Replace static scale estimation with dynamic activation quantization. Accuracy jumps to 66.5% – almost original performance. This progress is only possible thanks to open source. Special thanks to SmoothQuant authors (@Guangxuan_Xiao, @jilin_14, @songhan_mit), their method is
Tweet media one
1
0
9
@GarchFather
Kirill Solodskikh
4 days
After static quantization, MMLU accuracy dropped from 67% to 25%. Use SmoothQuant to shift error into weights – quality recovers to 60%.
Tweet media one
1
0
6
@GarchFather
Kirill Solodskikh
4 days
Calibrate on your dataset. Run MMLU eval to record baseline accuracy.
Tweet media one
1
0
5
@GarchFather
Kirill Solodskikh
4 days
Import QLIP + pre-tuned NVIDIA config. Set quantization for all linear layers.
Tweet media one
1
0
5
@GarchFather
Kirill Solodskikh
4 days
We made this to demonstrate quantization in our QLIP framework. Let’s start – load LLaMA-8B model ready for quantization.
Tweet media one
1
0
6
@GarchFather
Kirill Solodskikh
4 days
Our research team took @AIatMeta LLaMA-8B, quantized it with QLIP using post-training int8, applied SmoothQuant, and used pre-defined compiler-compatible NVIDIA configs. Why do this? Up to 2× fewer weights and 3.6× faster on one GPU. Try it with our simple Jupyter Notebook.
6
14
202
@GarchFather
Kirill Solodskikh
5 days
Our @TheStageAI team was happy to gain early access to the @nvidia B200 from @nebiusai and establish benchmarking for our optimized diffusion models. We now fully support inference of optimized models on B200 across various AI applications - LLMs, VLMs, Text-to-Image,.
@nebiusai
Nebius
5 days
NVIDIA HGX B200 instances are now available as self-service AI clusters on Nebius AI Cloud: 🔥. This means anyone can access @NVIDIA Blackwell — the latest generation of NVIDIA’s accelerated computing platform — with just a few clicks and a credit card.
Tweet media one
0
1
8
@GarchFather
Kirill Solodskikh
7 days
RT @TheStageAI: AI engineers and researchers can now use our Quantization API to run accelerated LLMs, VLMs, and diffusion on NVIDIA and ed….
0
7
0
@GarchFather
Kirill Solodskikh
21 days
Been cooking up some audio tools. Made a quick playground on Hugging Face Spaces for easy testing. It’s Elastic MusicGen, our fork of Meta’s MusicGen Large by @TheStageAI. Drop prompts, get tracks —.in seconds, right in your browser. 🚀 11× faster than.
Tweet card summary image
huggingface.co
12
12
141
@GarchFather
Kirill Solodskikh
2 months
Meet Elastic MusicGen Large — our optimized fork of @metaai's MusicGen, powered by ANNA (@TheStageAI’s Automated Neural Network Accelerator):. Ye @kanyewest used AI for vocals on "Bully," calling it the "next Auto-Tune." He switched up later, but tracks.
Tweet card summary image
huggingface.co
6
17
186
@GarchFather
Kirill Solodskikh
2 months
⌁ EUROPE SIGNAL: ACTIVE ⌁. ↳ Want to accelerate your model’s inference?.↳ These guys sure do. ✦ Berlin: mapped next steps with our investors Christophe Maire and Lukas Erbguth of Atlantic Labs. ✦ Paris: @NVIDIAGTC showed us what’s possible. ✦ Germany: more investor talks
Tweet media one
0
2
7
@GarchFather
Kirill Solodskikh
2 months
▚▞▚▞ DATA LOG: AI EUROPE ▚▞▚▞. For years, AI talk was all Silicon Valley. After @NVIDIA #GTCParis, one thing became clear: Europe’s AI ecosystem has already kicked into high gear. 🇫🇷 @MistralAI’s dropping open weights that actually run. 🇩🇪 @Aleph__Alpha building native
0
4
8
@GarchFather
Kirill Solodskikh
2 months
RT @TheStageAI: Bonjour, Paris 🇫🇷. Just wrapped 2 amazing days at @NVIDIA #GTCParis at @VivaTech — AI infra, agentic systems, and robots wa….
0
3
0
@GarchFather
Kirill Solodskikh
2 months
RT @TheStageAI: 🥐 Bon appétit, developers. New @MistralAI models for self-hosting accelerated by TheStage AI:. - New LLM: Mistral Small 24B….
0
2
0
@GarchFather
Kirill Solodskikh
3 months
RT @TheStageAI: Wrong model = slow app. We help you pick the right one for your GPU. You can now explore a new Models section on our platf….
0
4
0
@GarchFather
Kirill Solodskikh
3 months
RT @TheStageAI: 💻Hey devs - real-time speech transcription inference, zero cost to run! Starting from MacOS 14, M1-M4. Check the crash test….
0
13
0
@GarchFather
Kirill Solodskikh
3 months
Hello! We are releasing the fastest LLMs every week! Its has simple HuggingFace libraries interface! Please check!. - All models are coming in 4 tiers: S, M, L, XL. - For instance Llama 8B runs ~200 tok/s. - Models coming with quality, speed benchmarks.
Tweet media one
0
0
3
@GarchFather
Kirill Solodskikh
3 months
Yes! We already supporting @NVIDIAAI B200! And it's giving us the fastest performance in the world!.
@TheStageAI
TheStage AI
3 months
🚀 TheStage AI x Nebius = fastest diffusion model inference on NVIDIA Blackwell - and it’s already live. Huge thanks to @TFNBreakingNews for the sharp story. We don’t chase trends. We set benchmarks. Big thanks to @Nebius team!. 🔗 #AI #NVIDIA
0
2
2