DataVoid @DataPlusEngine X Profile

DataVoid

@DataPlusEngine

Followers

2K

Following

8K

Media

3K

Statuses

7K

Independent ML researcher. The First step in knowing is admitting you don't

https://discord.gg/KkKSVqU4Gs

Joined June 2023

Don't wanna be here? Send us removal request.

DataVoid

@DataPlusEngine

1 year

AI visionaries tend to be. A dreamer who can not dream. They are utterly engulfed within their own doctrine that their daring stabs at the truth amount to moving numbers on a plot.

2

8

Skyler Miao

@SkylerMiao7

2 days

M2.7 open weights coming in ~2 weeks. still actively iterating just updated a new version on yesterday — noticeably better on OpenClaw.

155

132

2K

シャポコ🌵

@shapoco

4 days

グーローシェーディングとデプスバッファを実装した #shapolab

69

875

7K

am.will

@LLMJunky

4 days

Someone posted this and I'm all the way ☠️☠️

Fynn

@fynnso

5 days

was messing with the OpenAI base URL in Cursor and caught this accounts/anysphere/models/kimi-k2p5-rl-0317-s515-fast so composer 2 is just Kimi K2.5 with RL at least rename the model ID

33

88

2K

NIK

@ns123abc

4 days

🚨NEWS: Cursor’s $50B “in-house model” is literally Kimi K2.5 with RL on top. Got caught in 24 hours >be Moonshot AI >spend hundreds of millions training Kimi K2.5 >1 trillion parameters, 15 trillion tokens, agent swarm architecture >beat GPT-5.2 and Opus 4.5 on real benchmarks

Elon Musk

@elonmusk

4 days

@fynnso Yeah, it’s Kimi 2.5

280

573

8K

Lex

@xw33bttv

4 days

Cursor AI may be in material breach of contract with their new Composer model, which is generating buzz online for reportedly reaching Opus-level performance. It’s alleged that the new model is a fine-tuned checkpoint of Moonshot’s Kimi K2.5. If true, the original model is

42

23

465

catid

@MrCatid

5 days

I was the first person to order the GB300 computer just found out

6

1

44

MiniMax (official)

@MiniMax_AI

6 days

During the iteration process, we also realized that the model's ability to recursively evolve its harness is equally critical. Our internal harness autonomously collects feedback, builds evaluation sets for internal tasks, and based on this continuously iterates on its own

14

59

713

DataVoid

@DataPlusEngine

6 days

I reapplied thermal paste 3 different times. Isolated the cause as not being from the cooling block. Reseated the cpu 4 times. The only possible explanation is the cpu is toast. It did smell slightly burnt suddenly under low load during a test.

1

0

3

DataVoid

@DataPlusEngine

6 days

I pushed my tests to hard for Hermes-Agent and trashed my CPU somehow. Literally burnt it. It's now 62ºc under no load and than overheats. Oops lol I got wayyyy to into Hermes dev. Props to @Teknium @NousResearch for making a project that has gotten me so enveloped.

3

0

16

Haocheng Xi

@HaochengXiUCB

8 days

𝗞-𝗺𝗲𝗮𝗻𝘀 𝗶𝘀 𝘀𝗶𝗺𝗽𝗹𝗲. 𝗠𝗮𝗸𝗶𝗻𝗴 𝗶𝘁 𝗳𝗮𝘀𝘁 𝗼𝗻 𝗚𝗣𝗨𝘀 𝗶𝘀𝗻’𝘁. That’s why we built Flash-KMeans — an IO-aware implementation of exact k-means that rethinks the algorithm around modern GPU bottlenecks. By attacking the memory bottlenecks directly,

36

197

2K

Rosinality

@rosinality

7 days

ByteDance also implemented attention over depth. They literally combined it with sequence attention.

9

127

885

Lisan al Gaib

@scaling01

8 days

Leanstral is part of the Mistral Small 4 family

Lisan al Gaib

@scaling01

8 days

Some math prover model by Mistral? link is dead again, just got the notif

5

86

jtydhr88

@jtydhr88

9 days

Tried to recreate PS’s image rotation feature inside ComfyUI - 2

9

19

144

Andrew Carr 🤸

@andrew_n_carr

8 days

If you can internalize what this plot means, how to make it, and why it's important you can get a job at any top lab

Kimi.ai

@Kimi_Moonshot

8 days

Scaling law experiments reveal a consistent 1.25× compute advantage across varying model sizes.

19

9

550

Chubby♨️

@kimmonismus

8 days

Holy: Kimi did an amazing work because it changes one of the most basic parts of how deep AI models pass information from layer to layer. Instead of blindly mixing in everything from earlier layers equally, the model can now choose which past information is actually useful for

Kimi.ai

@Kimi_Moonshot

8 days

Introducing 𝑨𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔: Rethinking depth-wise aggregation. Residual connections have long relied on fixed, uniform accumulation. Inspired by the duality of time and depth, we introduce Attention Residuals, replacing standard depth-wise recurrence with

28

34

446

Yu Zhang 🐙🌘

@yzhang_cs

8 days

The idea of rotating attention by 90° is sooooooo cool (credits to @Jianlin_S 's insights), and it surprisingly works. We (w/ the amazing @nathan) are so excited about this— been working on the paper for months and couldn't stop. Go give it a try. It's a drop-in replacement for

Kimi.ai

@Kimi_Moonshot

8 days

Introducing 𝑨𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔: Rethinking depth-wise aggregation. Residual connections have long relied on fixed, uniform accumulation. Inspired by the duality of time and depth, we introduce Attention Residuals, replacing standard depth-wise recurrence with

19

70

961

Kimi.ai

@Kimi_Moonshot

8 days

Introducing 𝑨𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔: Rethinking depth-wise aggregation. Residual connections have long relied on fixed, uniform accumulation. Inspired by the duality of time and depth, we introduce Attention Residuals, replacing standard depth-wise recurrence with

330

2K

14K

DataVoid

@DataPlusEngine

10 days

This @Teknium @NousResearch Hackathon for Hermes-Agent seems like fun! Gonna put my Local hardware behind it and see if i can win! Hermes-Agent LLM Lora? 🤔