Kirill Solodskikh @GarchFather X Profile

Kirill Solodskikh

@GarchFather

Followers

370

Following

342

Media

34

Statuses

178

Almost Phd, Almost Founder, Almost Team Lead, Almost Successful, married. @TheStageAI Co-founder, CEO, ex Huawei P50 AI cameras

Joined October 2022

Don't wanna be here? Send us removal request.

Kirill Solodskikh

@GarchFather

1 day

Can LLMs recognize ASCII art?. Our tests show accelerated Elastic Models analyze line-by-line features and combine them using statistical patterns. Try it yourself with DeepSeek-Qwen-14B – 120 tok/s on H100, 40 tok/s on L40s, up to 3× faster. Free API token!.

2

6

72

Kirill Solodskikh

@GarchFather

4 days

Replace static scale estimation with dynamic activation quantization. Accuracy jumps to 66.5% – almost original performance. This progress is only possible thanks to open source. Special thanks to SmoothQuant authors (@Guangxuan_Xiao, @jilin_14, @songhan_mit), their method is

1

0

9

Kirill Solodskikh

@GarchFather

4 days

After static quantization, MMLU accuracy dropped from 67% to 25%. Use SmoothQuant to shift error into weights – quality recovers to 60%.

1

0

6

Kirill Solodskikh

@GarchFather

4 days

Calibrate on your dataset. Run MMLU eval to record baseline accuracy.

1

0

5

Kirill Solodskikh

@GarchFather

4 days

Import QLIP + pre-tuned NVIDIA config. Set quantization for all linear layers.

1

0

5

Kirill Solodskikh

@GarchFather

4 days

We made this to demonstrate quantization in our QLIP framework. Let’s start – load LLaMA-8B model ready for quantization.

1

0

6

Kirill Solodskikh

@GarchFather

4 days

Our research team took @AIatMeta LLaMA-8B, quantized it with QLIP using post-training int8, applied SmoothQuant, and used pre-defined compiler-compatible NVIDIA configs. Why do this? Up to 2× fewer weights and 3.6× faster on one GPU. Try it with our simple Jupyter Notebook.

6

14

202

Kirill Solodskikh

@GarchFather

5 days

Our @TheStageAI team was happy to gain early access to the @nvidia B200 from @nebiusai and establish benchmarking for our optimized diffusion models. We now fully support inference of optimized models on B200 across various AI applications - LLMs, VLMs, Text-to-Image,.

Nebius

@nebiusai

5 days

NVIDIA HGX B200 instances are now available as self-service AI clusters on Nebius AI Cloud: 🔥. This means anyone can access @NVIDIA Blackwell — the latest generation of NVIDIA’s accelerated computing platform — with just a few clicks and a credit card.

0

1

8

Kirill Solodskikh

@GarchFather

7 days

RT @TheStageAI: AI engineers and researchers can now use our Quantization API to run accelerated LLMs, VLMs, and diffusion on NVIDIA and ed….

0

7

0

Kirill Solodskikh

@GarchFather

21 days

Been cooking up some audio tools. Made a quick playground on Hugging Face Spaces for easy testing. It’s Elastic MusicGen, our fork of Meta’s MusicGen Large by @TheStageAI. Drop prompts, get tracks —.in seconds, right in your browser. 🚀 11× faster than.

huggingface.co

12

141

Kirill Solodskikh

@GarchFather

2 months

Meet Elastic MusicGen Large — our optimized fork of @metaai's MusicGen, powered by ANNA (@TheStageAI’s Automated Neural Network Accelerator):. Ye @kanyewest used AI for vocals on "Bully," calling it the "next Auto-Tune." He switched up later, but tracks.

huggingface.co

6

17

186

Kirill Solodskikh

@GarchFather

2 months

⌁ EUROPE SIGNAL: ACTIVE ⌁. ↳ Want to accelerate your model’s inference?.↳ These guys sure do. ✦ Berlin: mapped next steps with our investors Christophe Maire and Lukas Erbguth of Atlantic Labs. ✦ Paris: @NVIDIAGTC showed us what’s possible. ✦ Germany: more investor talks

0

2

7

Kirill Solodskikh

@GarchFather

2 months

▚▞▚▞ DATA LOG: AI EUROPE ▚▞▚▞. For years, AI talk was all Silicon Valley. After @NVIDIA #GTCParis, one thing became clear: Europe’s AI ecosystem has already kicked into high gear. 🇫🇷 @MistralAI’s dropping open weights that actually run. 🇩🇪 @Aleph__Alpha building native

0

4

8

Kirill Solodskikh

@GarchFather

2 months

RT @TheStageAI: Bonjour, Paris 🇫🇷. Just wrapped 2 amazing days at @NVIDIA #GTCParis at @VivaTech — AI infra, agentic systems, and robots wa….

0

3

0

Kirill Solodskikh

@GarchFather

2 months

RT @TheStageAI: 🥐 Bon appétit, developers. New @MistralAI models for self-hosting accelerated by TheStage AI:. - New LLM: Mistral Small 24B….

0

2

0

Kirill Solodskikh

@GarchFather

3 months

RT @TheStageAI: Wrong model = slow app. We help you pick the right one for your GPU. You can now explore a new Models section on our platf….

0

4

0

Kirill Solodskikh

@GarchFather

3 months

RT @TheStageAI: 💻Hey devs - real-time speech transcription inference, zero cost to run! Starting from MacOS 14, M1-M4. Check the crash test….

0

13

0

Kirill Solodskikh

@GarchFather

3 months

Hello! We are releasing the fastest LLMs every week! Its has simple HuggingFace libraries interface! Please check!. - All models are coming in 4 tiers: S, M, L, XL. - For instance Llama 8B runs ~200 tok/s. - Models coming with quality, speed benchmarks.

0

3

Kirill Solodskikh

@GarchFather

3 months

Yes! We already supporting @NVIDIAAI B200! And it's giving us the fastest performance in the world!.

TheStage AI

@TheStageAI

3 months

🚀 TheStage AI x Nebius = fastest diffusion model inference on NVIDIA Blackwell - and it’s already live. Huge thanks to @TFNBreakingNews for the sharp story. We don’t chase trends. We set benchmarks. Big thanks to @Nebius team!. 🔗 #AI #NVIDIA

0

2