𝚐𝔪𝟾𝚡𝚡𝟾 @gm8xx8 X Profile

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

Followers

8K

Following

74K

Media

2K

Statuses

17K

𝐇𝐈𝐆𝐇-𝐒𝐈𝐆𝐍𝐀𝐋, 𝐇𝐈𝐆𝐇-𝐓𝐀𝐒𝐓𝐄 𝐀𝐈 | 𝐎𝐏𝐄𝐍 𝐌𝐎𝐃𝐄𝐋𝐒

Joined March 2010

Don't wanna be here? Send us removal request.

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

8 months

THIS WEEK SHOULD BE EXCITING me: every week, forever.

1

2

59

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

3 days

GREAT RELEASE FROM DEEPSEEK, will share my thoughts on this later

DeepSeek

@deepseek_ai

3 days

🚀 Launching DeepSeek-V3.2 & DeepSeek-V3.2-Speciale — Reasoning-first models built for agents! 🔹 DeepSeek-V3.2: Official successor to V3.2-Exp. Now live on App, Web & API. 🔹 DeepSeek-V3.2-Speciale: Pushing the boundaries of reasoning capabilities. API-only for now. 📄 Tech

1

120

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

7 days

Were you paying attention anon… The system was already mapped. DeepSeekMath-V2 closes the loop. Not just an LLM judge, but a meta-judge that audits the judge itself. Verification now enforces behavior, not just assessment.

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

4 months

Turns out I was pretty spot on.

0

4

38

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

7 days

Were you paying attention anon… The system was already mapped. DeepSeekMath-V2 closes the loop. Not just an LLM judge, but a meta-judge that audits the judge itself. Verification now enforces behavior, not just assessment.

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

4 months

Turns out I was pretty spot on.

0

4

38

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

7 days

YOU HAVE NO IDEA HOW HAPPY THIS MAKES ME

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

7 days

DeepSeek-Math-V2 MODEL: https://t.co/D2uU8yeP3I PAPER: https://t.co/PPKTX6mUHw

11

106

3K

Comp AI

@compai

4 months

AI that handles compliance for you. 25+ leading compliance frameworks. All-In-One Governance Hub. 1:1 Slack support.

0

1

2

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

7 days

DeepSeek-Math-V2 MODEL: https://t.co/D2uU8yeP3I PAPER: https://t.co/PPKTX6mUHw

18

194

2K

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

9 days

ZYPHRA ZAYA1 COLLECTION: https://t.co/a7uvVa1aV2 AMD FULL-STACK PRETRAINING CASE STUDY (ZAYA1 SYSTEM PAPER): https://t.co/VZaysdlsdO CCA/CCGQA ATTENTION PAPER:

huggingface.co

0

5

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

10 days

AMD MI300X is no longer just capable. It is actively being optimized for by serious model architectures.

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

29 days

Granite-4.0 hybrids are likely NUMA-aware. IBM tuned them for AMD’s MI300X, which exposes eight XCDs with private L2 caches and uneven memory latency. The 70%+ memory savings and stable scaling suggest NUMA-aware scheduling and placement. Mamba’s sequential state reuse keeps

1

0

3

Unplugged

@weare_unplugged

15 days

During COVID, free speech was at risk. They banned us from their apps. They deplatformed alternate views and silenced any form of opposition. That's why we built Unplugged. So when the chips are down, we know we have an independent platform that will have our back.

0

234

896

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

10 days

GPUs with Pollara high-speed interconnect, but actually measured: this paper is a full case study of frontier-scale pretraining on an all-AMD stack (MI300X with Pollara), introducing Zyphra’s ZAYA1-base and ZAYA1-reasoning-base, MoE models with CCA attention and a redesigned

Zyphra

@ZyphraAI

10 days

In collaboration with @AMD and @IBM, we @ZyphraAI are sharing ZAYA1-base! The first large-scale model on an integrated AMD hardware, software, and networking stack. ZAYA1 uses Zyphra’s novel MoE architecture with 760M active and 8.3B total params. Tech paper and more below👇

2

0

19

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

10 days

DAMO’s embodied stack now spans perception (RynnEC), action (RynnVLA-001), and unified world modeling (RynnVLA-002)

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

3 months

RynnEC: Bringing MLLMs into the Embodied World (Alibaba DAMO Academy) - Introduces a region encoder and mask decoder for precise region-level interaction in video-based reasoning - Trained on 20,832 egocentric videos from over 200 houses, producing 1.14M instance masks through

0

3

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

10 days

MODEL: https://t.co/3CC6YUKzvQ PAPER:

huggingface.co

1

0

4

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

10 days

𝚁𝚢𝚗𝚗𝚅𝙻𝙰-𝟎𝟎𝟐 RynnVLA-002: a unified action world model that evolves RynnVLA-001 from “VLA and generative priors” into a fully joint Chameleon-based action–world framework, merging VLA policy and world model in one autoregressive transformer with shared token space for

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

4 months

RynnVLA-001 is a 7B open VLA model from Alibaba DAMO, built on large-scale ego-centric video generative pretraining - Generative pretrain (Stage 1): Autoregressive I2V Transformer on ~12M ego-centric human + 244K robot manipulation videos; no action labels - Continuous actions

1

5

42

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

23 days

The phase-transition curve from ‘Data Mixing’ has been there all along. You’re welcome.

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

2 months

Meta just ran one of the largest synthetic-data studies (over 1000 LLMs, more than 100k GPU hours). Result: mixing synthetic and natural data only helps once you cross the right scale and ratio (~30%). Small models learn nothing; larger ones suddenly gain a sharp threshold

0

4

33

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

14 days

🐋 LPLB https://t.co/vqwMXoiRJa

github.com

An early research stage expert-parallel load balancer for MoE models based on linear programming. - deepseek-ai/LPLB

0

9

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

15 days

DeepSeek is clearly tightening the whole sparse/MoE stack The indexer RoPE fix cleaned up long-context retrieval geometry. Now LPLB shows up with LP-based load balancing, validated across Cube, Hypercube, Ring, and Torus topologies, showing the runtime is being hardened for real

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

17 days

Current V3.2-Exp numbers are a lower bound. The cap just got lifted.

1

7

116

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

16 days

UPDATE

DeepSeek

@deepseek_ai

16 days

⚠️ Heads-up to anyone using the DeepSeek-V3.2-Exp inference demo: earlier versions had a RoPE implementation mismatch in the indexer module that could degrade performance. Indexer RoPE expects non-interleaved input, MLA RoPE expects interleaved. Fixed in https://t.co/2BDzSyt1cW.

0

5

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

16 days

UPDATE

DeepSeek

@deepseek_ai

16 days

⚠️ Heads-up to anyone using the DeepSeek-V3.2-Exp inference demo: earlier versions had a RoPE implementation mismatch in the indexer module that could degrade performance. Indexer RoPE expects non-interleaved input, MLA RoPE expects interleaved. Fixed in https://t.co/2BDzSyt1cW.

0

3

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

17 days

Current V3.2-Exp numbers are a lower bound. The cap just got lifted.

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

17 days

(Indexer RoPE fix) It corrects the indexer’s RoPE layout by switching it to the proper non-interleaved path, fixing the phase mismatch that hurt long-context top-k retrieval. The commit also stabilizes the fp8 KV simulation and moves weights_proj to fp32. The indexer now scores

3

2

126

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

17 days

(Indexer RoPE fix) It corrects the indexer’s RoPE layout by switching it to the proper non-interleaved path, fixing the phase mismatch that hurt long-context top-k retrieval. The commit also stabilizes the fp8 KV simulation and moves weights_proj to fp32. The indexer now scores

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

2 months

DeepSeek-V3.2-Exp This is an experimental drop built on top of V3.1-Terminus (128K) that introduces DeepSeek Sparse Attention (DSA) to cut long-context cost without hurting scores. The weights are on HF, the API is live with prices cut by more than 50%, and V3.1 remains online

4

9

156

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

21 days

hmmm

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

29 days

Optimizing Attention on GPUs by Exploiting GPU Architectural NUMA Effects Swizzled Head-first Mapping cuts attention latency on chiplet GPUs by making scheduling NUMA-aware. It maps all row-blocks of a head (or KV-group in GQA) to the same XCD, so K/V first-touch stays hot in

0

2

12

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

23 days

The phase-transition curve from ‘Data Mixing’ has been there all along. You’re welcome.

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

2 months

Meta just ran one of the largest synthetic-data studies (over 1000 LLMs, more than 100k GPU hours). Result: mixing synthetic and natural data only helps once you cross the right scale and ratio (~30%). Small models learn nothing; larger ones suddenly gain a sharp threshold

0

4

33