gm8xx8 Profile Banner
𝚐π”ͺ𝟾𝚑𝚑𝟾 Profile
𝚐π”ͺ𝟾𝚑𝚑𝟾

@gm8xx8

Followers
8K
Following
74K
Media
2K
Statuses
17K

π‡πˆπ†π‡-π’πˆπ†ππ€π‹, π‡πˆπ†π‡-𝐓𝐀𝐒𝐓𝐄 π€πˆ | πŽππ„π πŒπŽπƒπ„π‹π’

Joined March 2010
Don't wanna be here? Send us removal request.
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
8 months
THIS WEEK SHOULD BE EXCITING me: every week, forever.
1
2
59
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
3 days
GREAT RELEASE FROM DEEPSEEK, will share my thoughts on this later
@deepseek_ai
DeepSeek
3 days
πŸš€ Launching DeepSeek-V3.2 & DeepSeek-V3.2-Speciale β€” Reasoning-first models built for agents! πŸ”Ή DeepSeek-V3.2: Official successor to V3.2-Exp. Now live on App, Web & API. πŸ”Ή DeepSeek-V3.2-Speciale: Pushing the boundaries of reasoning capabilities. API-only for now. πŸ“„ Tech
1
1
120
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
7 days
Were you paying attention anon… The system was already mapped. DeepSeekMath-V2 closes the loop. Not just an LLM judge, but a meta-judge that audits the judge itself. Verification now enforces behavior, not just assessment.
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
4 months
Turns out I was pretty spot on.
0
4
38
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
7 days
Were you paying attention anon… The system was already mapped. DeepSeekMath-V2 closes the loop. Not just an LLM judge, but a meta-judge that audits the judge itself. Verification now enforces behavior, not just assessment.
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
4 months
Turns out I was pretty spot on.
0
4
38
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
7 days
YOU HAVE NO IDEA HOW HAPPY THIS MAKES ME
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
7 days
11
106
3K
@compai
Comp AI
4 months
AI that handles compliance for you. 25+ leading compliance frameworks. All-In-One Governance Hub. 1:1 Slack support.
0
1
2
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
7 days
18
194
2K
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
9 days
ZYPHRA ZAYA1 COLLECTION: https://t.co/a7uvVa1aV2 AMD FULL-STACK PRETRAINING CASE STUDY (ZAYA1 SYSTEM PAPER): https://t.co/VZaysdlsdO CCA/CCGQA ATTENTION PAPER:
Tweet card summary image
huggingface.co
0
0
5
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
10 days
AMD MI300X is no longer just capable. It is actively being optimized for by serious model architectures.
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
29 days
Granite-4.0 hybrids are likely NUMA-aware. IBM tuned them for AMD’s MI300X, which exposes eight XCDs with private L2 caches and uneven memory latency. The 70%+ memory savings and stable scaling suggest NUMA-aware scheduling and placement. Mamba’s sequential state reuse keeps
1
0
3
@weare_unplugged
Unplugged
15 days
During COVID, free speech was at risk. They banned us from their apps. They deplatformed alternate views and silenced any form of opposition. That's why we built Unplugged. So when the chips are down, we know we have an independent platform that will have our back.
0
234
896
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
10 days
GPUs with Pollara high-speed interconnect, but actually measured: this paper is a full case study of frontier-scale pretraining on an all-AMD stack (MI300X with Pollara), introducing Zyphra’s ZAYA1-base and ZAYA1-reasoning-base, MoE models with CCA attention and a redesigned
@ZyphraAI
Zyphra
10 days
In collaboration with @AMD and @IBM, we @ZyphraAI are sharing ZAYA1-base! The first large-scale model on an integrated AMD hardware, software, and networking stack. ZAYA1 uses Zyphra’s novel MoE architecture with 760M active and 8.3B total params. Tech paper and more belowπŸ‘‡
2
0
19
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
10 days
DAMO’s embodied stack now spans perception (RynnEC), action (RynnVLA-001), and unified world modeling (RynnVLA-002)
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
3 months
RynnEC: Bringing MLLMs into the Embodied World (Alibaba DAMO Academy) - Introduces a region encoder and mask decoder for precise region-level interaction in video-based reasoning - Trained on 20,832 egocentric videos from over 200 houses, producing 1.14M instance masks through
0
0
3
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
10 days
Tweet card summary image
huggingface.co
1
0
4
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
10 days
πšπš’πš—πš—πš…π™»π™°-𝟎𝟎𝟐 RynnVLA-002: a unified action world model that evolves RynnVLA-001 from β€œVLA and generative priors” into a fully joint Chameleon-based action–world framework, merging VLA policy and world model in one autoregressive transformer with shared token space for
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
4 months
RynnVLA-001 is a 7B open VLA model from Alibaba DAMO, built on large-scale ego-centric video generative pretraining - Generative pretrain (Stage 1): Autoregressive I2V Transformer on ~12M ego-centric human + 244K robot manipulation videos; no action labels - Continuous actions
1
5
42
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
23 days
The phase-transition curve from β€˜Data Mixing’ has been there all along. You’re welcome.
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
2 months
Meta just ran one of the largest synthetic-data studies (over 1000 LLMs, more than 100k GPU hours). Result: mixing synthetic and natural data only helps once you cross the right scale and ratio (~30%). Small models learn nothing; larger ones suddenly gain a sharp threshold
0
4
33
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
15 days
DeepSeek is clearly tightening the whole sparse/MoE stack The indexer RoPE fix cleaned up long-context retrieval geometry. Now LPLB shows up with LP-based load balancing, validated across Cube, Hypercube, Ring, and Torus topologies, showing the runtime is being hardened for real
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
17 days
Current V3.2-Exp numbers are a lower bound. The cap just got lifted.
1
7
116
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
16 days
UPDATE
@deepseek_ai
DeepSeek
16 days
⚠️ Heads-up to anyone using the DeepSeek-V3.2-Exp inference demo: earlier versions had a RoPE implementation mismatch in the indexer module that could degrade performance. Indexer RoPE expects non-interleaved input, MLA RoPE expects interleaved. Fixed in https://t.co/2BDzSyt1cW.
0
0
5
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
16 days
UPDATE
@deepseek_ai
DeepSeek
16 days
⚠️ Heads-up to anyone using the DeepSeek-V3.2-Exp inference demo: earlier versions had a RoPE implementation mismatch in the indexer module that could degrade performance. Indexer RoPE expects non-interleaved input, MLA RoPE expects interleaved. Fixed in https://t.co/2BDzSyt1cW.
0
0
3
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
17 days
Current V3.2-Exp numbers are a lower bound. The cap just got lifted.
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
17 days
(Indexer RoPE fix) It corrects the indexer’s RoPE layout by switching it to the proper non-interleaved path, fixing the phase mismatch that hurt long-context top-k retrieval. The commit also stabilizes the fp8 KV simulation and moves weights_proj to fp32. The indexer now scores
3
2
126
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
17 days
(Indexer RoPE fix) It corrects the indexer’s RoPE layout by switching it to the proper non-interleaved path, fixing the phase mismatch that hurt long-context top-k retrieval. The commit also stabilizes the fp8 KV simulation and moves weights_proj to fp32. The indexer now scores
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
2 months
DeepSeek-V3.2-Exp This is an experimental drop built on top of V3.1-Terminus (128K) that introduces DeepSeek Sparse Attention (DSA) to cut long-context cost without hurting scores. The weights are on HF, the API is live with prices cut by more than 50%, and V3.1 remains online
4
9
156
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
21 days
hmmm
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
29 days
Optimizing Attention on GPUs by Exploiting GPU Architectural NUMA Effects Swizzled Head-first Mapping cuts attention latency on chiplet GPUs by making scheduling NUMA-aware. It maps all row-blocks of a head (or KV-group in GQA) to the same XCD, so K/V first-touch stays hot in
0
2
12
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
23 days
The phase-transition curve from β€˜Data Mixing’ has been there all along. You’re welcome.
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
2 months
Meta just ran one of the largest synthetic-data studies (over 1000 LLMs, more than 100k GPU hours). Result: mixing synthetic and natural data only helps once you cross the right scale and ratio (~30%). Small models learn nothing; larger ones suddenly gain a sharp threshold
0
4
33