Stew Tong @stewtong X Profile

Stew Tong

@stewtong

Followers

309

Following

2K

Media

73

Statuses

943

Big Papa. Music and muscle ups. Tech sales and Tequila. AI @ AWS. Building self-hosted LLMs and apps.

Los Angeles, CA

Joined April 2009

Don't wanna be here? Send us removal request.

Stew Tong

@stewtong

18 days

First time I've pushed an LLM deploy this far with a full benchmark and writeup. Sharing a gotcha so nobody else burns a Saturday: SM120 (RTX PRO 6000 Blackwell Server Edition) trips SGLang's default FP8 GEMM backends (DeepGemm + CUTLASS) on MiniMax-M2.5 (gated to SM100/SM90).

4

5

49

Stew Tong

@stewtong

6 days

Head-to-head vs M2.5 on the same hardware: Qwen3.5-122B: 1,985 burst tok/s, 310 online @ 4 rps, 6.99/10 Arena-Hard M2.5: 1,818 burst tok/s, 404 online @ 4 rps, 4.94/10 Arena-Hard Qwen3.5-122B wins on burst throughput and quality. M2.5 still wins on sustained serving, largely

github.com

Summary Benchmarks of MiniMax-M2.5 (228B MoE) running FP8 inference on SM120 (RTX PRO 6000 Blackwell) via SGLang. I haven't seen prior published SM120 numbers for this model. Includes: SM120 ba...

0

1

Stew Tong

@stewtong

6 days

FP8 KV cache on Qwen3.5-122B doesn’t crash but it silently produces corrupt output. No error, no warning. Just exclamation marks and repetition instead of answers. bf16 KV fixes it. I benchmarked this on 8x RTX PRO 6000 Blackwell Server Edition (SM120, AWS g7e.48xlarge) with

1

0

4

Stew Tong

@stewtong

17 days

Sw stock beat down🤬 Agents exploding in the hands of non-tech humans means more open backdoors and more front-door attacks. Today it’s OpenClaw and we’re just getting started. This pattern exponentially reinforces the need for cyber solutions like $CRWD $PANW $ZS $NET $S to

0

Stew Tong

@stewtong

18 days

M2.5 had a lot of us in the lab this week! Dope work as always on H200 + vLLM. Just posted a complementary datapoint from the SM120 Blackwell Server Edition lane: in SGLang, default FP8 GEMM backends (DeepGemm/CUTLASS) can assert/crash due to SM90/SM100 gating. Switching FP8 GEMM

github.com

Summary Benchmarks of MiniMax-M2.5 (228B MoE) running FP8 inference on SM120 (RTX PRO 6000 Blackwell) via SGLang. I haven't seen prior published SM120 numbers for this model. Includes: SM120 ba...

SemiAnalysis

@SemiAnalysis_

19 days

How efficient is MiniMax M2.5? We benchmarked on 8xH200 TEP8 with @vllm_project . At a reasonable 10-25s TTFT, M2.5 is able to sustain ~2500 tok/s/GPU throughput. For decode, it's still possible to reach ~20 tok/s/GPU throughput at a strict 20 tok/s/user interactivity with 10K+

0

1

Stew Tong

@stewtong

20 days

@AnthropicAI @OpenAI @cursor_ai Someone told me @bcherny might find this interesting re skills

0

Stew Tong

@stewtong

21 days

@AnthropicAI @OpenAI @cursor_ai Full implementation guide here:

Stew Tong

@stewtong

21 days

https://t.co/upB9E31iAO

1

0

Stew Tong

@stewtong

21 days

https://t.co/upB9E31iAO

0

1

Stew Tong

@stewtong

21 days

@AnthropicAI @OpenAI @cursor_ai This builds on my AI Native Knowledge Work article https://t.co/fRFmbjnNb3

Stew Tong

@stewtong

1 month

https://t.co/SI3cFqo8Zi

0

Stew Tong

@stewtong

21 days

Honest question, how long until you make this whole thread irrelevant lol @AnthropicAI @OpenAI @cursor_ai

2

0

Stew Tong

@stewtong

21 days

I run enterprise AI GTM. I coded this entire system in a terminal. A year ago those two sentences didn't belong in the same paragraph. This pattern works with any AI CLI tool like Codex, OpenCode, Cursor with local file access. Full implementation guide with pseudo-code, file

2

0

Stew Tong

@stewtong

21 days

What doesn't work: The system can load wrong context at times. You'll need to tune your digital twin as your role changes. And critical thinking? Still on you. The AI is infrastructure, not judgment. But for knowledge work that requires juggling multiple contexts across many

1

0

Stew Tong

@stewtong

21 days

Real impact after 8 months: • Customer prep time: 1hr → 5min • Context switching time: 20min → 30sec The ROI isn't in the tool. It's in the compounding context.

1

0

Stew Tong

@stewtong

21 days

When I start the next session: /load-work The system reads my identity, recent memory, and current priorities. It knows what I was working on Friday, what's due this week, and which customers need attention. 30 seconds. No Slack archaeology. No "where was I?"

1

0

Stew Tong

@stewtong

21 days

Once your digital twin exists, the memory system kicks in. After every work session: /save-work "session summary" Behind the scenes: 1. Append to project/memory.md with timestamp 2. If https://t.co/1tIXW1RWBJ > 50KB, rotate to archive 3. Extract key insights to weekly file 4.

1

0

Stew Tong

@stewtong

21 days

Here's what a digital twin looks like for a GTM specialist: Role & Products - Enterprise Sales, AI Platform - Products: Name, key differentiators, roadmap - Target: Fortune 500, Digital Natives Current Focus - Q1 pipeline: Enterprise deals in FinServ - Product launch: Extended

1

0

Stew Tong

@stewtong

21 days

The digital twin is a structured markdown file. It answers: - Who are you? - What do you do? - What are you working on? - How should the AI help you? In Claude Code, https://t.co/W91OEe43O3 is your lightweight system config (auto-loaded, <1000 tokens). Your digital twin lives

code.claude.com

Claude Code is an agentic coding tool that reads your codebase, edits files, runs commands, and integrates with your development tools. Available in your terminal, IDE, desktop app, and browser.

1

0

Stew Tong

@stewtong

21 days

The system has two layers: Layer 1: Define your digital twin once (1-2 hours of focused work) Layer 2: Create a memory system to let context compound session by session Most people skip Layer 1 and wonder why their AI assistant feels generic. The AI doesn't know what you do,

1

0

1

Stew Tong

@stewtong

21 days

After 8 months building my GTM operating system on Claude Code the same question kept coming back: "How does someone start without 8 months of accumulated context?" Here's the truth: managing multiple enterprise customers, product lines, and strategic initiatives used to mean

1

0

Stew Tong

@stewtong

27 days

J Cole one-shotted The Fall-Off using Openclaw. Unreal

0

1