
Marco Mascorro
@Mascobot
Followers
14K
Following
12K
Media
731
Statuses
6K
Partner @a16z (investor in @cursor_ai, @bfl_ml, @WaveFormsAI & more) | Roboticist | Cofounder @Fellow_AI | prev @BMW | @MIT 35 under 35 | Opinions my own.
San Francisco, CA
Joined October 2009
Super thrilled to back @miramurati and the amazing team @thinkymachines - a GOAT team that has made major contributions to RL, pre-training/post-training, reasoning, multimodal, and of course ChatGPT!. No one is better positioned to advance the frontier. @martin_casado @pmarca.
Thinking Machines Lab exists to empower humanity through advancing collaborative general intelligence. We're building multimodal AI that works with how you naturally interact with the world - through conversation, through sight, through the messy way we collaborate. We're.
9
2
175
Can’t wait for Kimi K2 reasoning - K2 (base) seems pretty good even on creative writing. Hopefully we can get better training/sampling efficiencies with RL and can't wait to see how these models perform when RL compute is the vast majority of training (and hopefully when it's.
Remember: K2 is *not* a reasoning model. And very few active tokens in the MoE. So it's using less tokens, *and* each token is cheaper and faster.
1
0
2
Neat! @danielhanchen you guys work fast!. Dynamic 1.8-bit (245GB from 1.1TB) for Kimi K2:.
You can now run Kimi K2 locally with our Dynamic 1.8-bit GGUFs!. We shrank the full 1.1TB model to just 245GB (-80% size reduction). The 2-bit XL GGUF performs exceptionally well on coding & passes all our code tests. Guide: GGUFs:
1
2
14
RT @TrungTPhan: Lee Kuan Yew: . “Air conditioning was a most important invention for us, perhaps one of the signal inventions of history. I….
0
338
0
This is quite interesting. The tokens per parameter in LLaMA 4 seem off (to what LLMs are trained on today):. We are well over the Chinchilla optimal (20 tokens per param), but Llama 4 Behemoth had (only) 104 Tokens Per Param (TPP), similar to Mistral7B (dense), while Llama 4.
This explains why LLaMA 4 failed. The tokens per parameter (TPP) is way off. You can’t defy scaling laws and expect miracles. ===.Llama 4 Maverick was 400B(17B active) and >30T tokens, TPP = 1764. Llama 4 Behemoth was 2T(288B active) and > 30T tokens, TPP = 104. DeepSeek v3 is.
0
1
2
The dream for a long time of not using tokenizers at all might be here:.
Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data
3
0
8
1T MoE Param model, open source, 32b active:.
🚀 Hello, Kimi K2! Open-Source Agentic Model!.🔹 1T total / 32B active MoE model.🔹 SOTA on SWE Bench Verified, Tau2 & AceBench among open models.🔹Strong in coding and agentic tasks.🐤 Multimodal & thought-mode not supported for now. With Kimi K2, advanced agentic intelligence
0
0
2
Seeing these results on ARC-AGI from Grok 4, I am so tempted to spin up different RL environments in the form of similar adjacent games and see how far the (new) small <7B models can get with ARC-AGI. Last November got to the top 1% of the ARC AGI submissions with no RL and very
On ARC-AGI-1, Grok 4 (Thinking) achieves 66.7% inline with the Pareto frontier for AI reasoning systems we reported last month
0
0
2
Super cool! A hackable little robot from @huggingface. Congrats @Thom_Wolf, @ClementDelangue and team!.
Thrilled to finally share what we've been working on for months at @huggingface 🤝@pollenrobotics. Our first robot: Reachy Mini. A dream come true: cute and low priced, hackable yet easy to use, powered by open-source and the infinite community. Tiny price, small size, huge
1
6
31