ππͺπΎπ‘π‘πΎ
@gm8xx8
Followers
8K
Following
74K
Media
2K
Statuses
17K
ππππ-ππππππ, ππππ-πππππ ππ | ππππ ππππππ
Joined March 2010
THIS WEEK SHOULD BE EXCITING me: every week, forever.
1
2
59
GREAT RELEASE FROM DEEPSEEK, will share my thoughts on this later
π Launching DeepSeek-V3.2 & DeepSeek-V3.2-Speciale β Reasoning-first models built for agents! πΉ DeepSeek-V3.2: Official successor to V3.2-Exp. Now live on App, Web & API. πΉ DeepSeek-V3.2-Speciale: Pushing the boundaries of reasoning capabilities. API-only for now. π Tech
1
1
120
Were you paying attention anon⦠The system was already mapped. DeepSeekMath-V2 closes the loop. Not just an LLM judge, but a meta-judge that audits the judge itself. Verification now enforces behavior, not just assessment.
0
4
38
Were you paying attention anon⦠The system was already mapped. DeepSeekMath-V2 closes the loop. Not just an LLM judge, but a meta-judge that audits the judge itself. Verification now enforces behavior, not just assessment.
0
4
38
YOU HAVE NO IDEA HOW HAPPY THIS MAKES ME
11
106
3K
AI that handles compliance for you. 25+ leading compliance frameworks. All-In-One Governance Hub. 1:1 Slack support.
0
1
2
ZYPHRA ZAYA1 COLLECTION: https://t.co/a7uvVa1aV2 AMD FULL-STACK PRETRAINING CASE STUDY (ZAYA1 SYSTEM PAPER): https://t.co/VZaysdlsdO CCA/CCGQA ATTENTION PAPER:
huggingface.co
0
0
5
AMD MI300X is no longer just capable. It is actively being optimized for by serious model architectures.
Granite-4.0 hybrids are likely NUMA-aware. IBM tuned them for AMDβs MI300X, which exposes eight XCDs with private L2 caches and uneven memory latency. The 70%+ memory savings and stable scaling suggest NUMA-aware scheduling and placement. Mambaβs sequential state reuse keeps
1
0
3
During COVID, free speech was at risk. They banned us from their apps. They deplatformed alternate views and silenced any form of opposition. That's why we built Unplugged. So when the chips are down, we know we have an independent platform that will have our back.
0
234
896
GPUs with Pollara high-speed interconnect, but actually measured: this paper is a full case study of frontier-scale pretraining on an all-AMD stack (MI300X with Pollara), introducing Zyphraβs ZAYA1-base and ZAYA1-reasoning-base, MoE models with CCA attention and a redesigned
In collaboration with @AMD and @IBM, we @ZyphraAI are sharing ZAYA1-base! The first large-scale model on an integrated AMD hardware, software, and networking stack. ZAYA1 uses Zyphraβs novel MoE architecture with 760M active and 8.3B total params. Tech paper and more belowπ
2
0
19
DAMOβs embodied stack now spans perception (RynnEC), action (RynnVLA-001), and unified world modeling (RynnVLA-002)
RynnEC: Bringing MLLMs into the Embodied World (Alibaba DAMO Academy) - Introduces a region encoder and mask decoder for precise region-level interaction in video-based reasoning - Trained on 20,832 egocentric videos from over 200 houses, producing 1.14M instance masks through
0
0
3
ππ’πππ
π»π°-πππ RynnVLA-002: a unified action world model that evolves RynnVLA-001 from βVLA and generative priorsβ into a fully joint Chameleon-based actionβworld framework, merging VLA policy and world model in one autoregressive transformer with shared token space for
RynnVLA-001 is a 7B open VLA model from Alibaba DAMO, built on large-scale ego-centric video generative pretraining - Generative pretrain (Stage 1): Autoregressive I2V Transformer on ~12M ego-centric human + 244K robot manipulation videos; no action labels - Continuous actions
1
5
42
The phase-transition curve from βData Mixingβ has been there all along. Youβre welcome.
Meta just ran one of the largest synthetic-data studies (over 1000 LLMs, more than 100k GPU hours). Result: mixing synthetic and natural data only helps once you cross the right scale and ratio (~30%). Small models learn nothing; larger ones suddenly gain a sharp threshold
0
4
33
DeepSeek is clearly tightening the whole sparse/MoE stack The indexer RoPE fix cleaned up long-context retrieval geometry. Now LPLB shows up with LP-based load balancing, validated across Cube, Hypercube, Ring, and Torus topologies, showing the runtime is being hardened for real
1
7
116
UPDATE
β οΈ Heads-up to anyone using the DeepSeek-V3.2-Exp inference demo: earlier versions had a RoPE implementation mismatch in the indexer module that could degrade performance. Indexer RoPE expects non-interleaved input, MLA RoPE expects interleaved. Fixed in https://t.co/2BDzSyt1cW.
0
0
5
UPDATE
β οΈ Heads-up to anyone using the DeepSeek-V3.2-Exp inference demo: earlier versions had a RoPE implementation mismatch in the indexer module that could degrade performance. Indexer RoPE expects non-interleaved input, MLA RoPE expects interleaved. Fixed in https://t.co/2BDzSyt1cW.
0
0
3
Current V3.2-Exp numbers are a lower bound. The cap just got lifted.
(Indexer RoPE fix) It corrects the indexerβs RoPE layout by switching it to the proper non-interleaved path, fixing the phase mismatch that hurt long-context top-k retrieval. The commit also stabilizes the fp8 KV simulation and moves weights_proj to fp32. The indexer now scores
3
2
126
(Indexer RoPE fix) It corrects the indexerβs RoPE layout by switching it to the proper non-interleaved path, fixing the phase mismatch that hurt long-context top-k retrieval. The commit also stabilizes the fp8 KV simulation and moves weights_proj to fp32. The indexer now scores
DeepSeek-V3.2-Exp This is an experimental drop built on top of V3.1-Terminus (128K) that introduces DeepSeek Sparse Attention (DSA) to cut long-context cost without hurting scores. The weights are on HF, the API is live with prices cut by more than 50%, and V3.1 remains online
4
9
156
The phase-transition curve from βData Mixingβ has been there all along. Youβre welcome.
Meta just ran one of the largest synthetic-data studies (over 1000 LLMs, more than 100k GPU hours). Result: mixing synthetic and natural data only helps once you cross the right scale and ratio (~30%). Small models learn nothing; larger ones suddenly gain a sharp threshold
0
4
33