Mascobot Profile Banner
Marco Mascorro Profile
Marco Mascorro

@Mascobot

Followers
14K
Following
12K
Media
731
Statuses
6K

Partner @a16z (investor in @cursor_ai, @bfl_ml, @WaveFormsAI & more) | Roboticist | Cofounder @Fellow_AI | prev @BMW | @MIT 35 under 35 | Opinions my own.

San Francisco, CA
Joined October 2009
Don't wanna be here? Send us removal request.
@Mascobot
Marco Mascorro
9 hours
Super thrilled to back @miramurati and the amazing team @thinkymachines - a GOAT team that has made major contributions to RL, pre-training/post-training, reasoning, multimodal, and of course ChatGPT!. No one is better positioned to advance the frontier. @martin_casado @pmarca.
@miramurati
Mira Murati
9 hours
Thinking Machines Lab exists to empower humanity through advancing collaborative general intelligence. We're building multimodal AI that works with how you naturally interact with the world - through conversation, through sight, through the messy way we collaborate. We're.
9
2
175
@Mascobot
Marco Mascorro
1 day
Can’t wait for Kimi K2 reasoning - K2 (base) seems pretty good even on creative writing. Hopefully we can get better training/sampling efficiencies with RL and can't wait to see how these models perform when RL compute is the vast majority of training (and hopefully when it's.
@jeremyphoward
Jeremy Howard
1 day
Remember: K2 is *not* a reasoning model. And very few active tokens in the MoE. So it's using less tokens, *and* each token is cheaper and faster.
1
0
2
@Mascobot
Marco Mascorro
1 day
Neat! @danielhanchen you guys work fast!. Dynamic 1.8-bit (245GB from 1.1TB) for Kimi K2:.
@UnslothAI
Unsloth AI
1 day
You can now run Kimi K2 locally with our Dynamic 1.8-bit GGUFs!. We shrank the full 1.1TB model to just 245GB (-80% size reduction). The 2-bit XL GGUF performs exceptionally well on coding & passes all our code tests. Guide: GGUFs:
Tweet media one
1
2
14
@Mascobot
Marco Mascorro
1 day
If you be super neat if the labs producing the top OS LLMs like Kimi K2, etc. could release the smaller distilled versions of them (like DeepSeek R1 with the smaller distills they released), so that we all can run the most optimized, on pair speculative decoding.
0
0
2
@Mascobot
Marco Mascorro
2 days
The improvement rate of Cursor is amazing, it keeps getting even better and better.
@wey_gu
Wey Gu 古思为
3 days
Cursor is getting even better, and better.
0
0
4
@Mascobot
Marco Mascorro
3 days
RT @TrungTPhan: Lee Kuan Yew: . “Air conditioning was a most important invention for us, perhaps one of the signal inventions of history. I….
0
338
0
@Mascobot
Marco Mascorro
3 days
This is from the DeepSeekLLM paper in jan 2024 fwiw.
0
0
3
@Mascobot
Marco Mascorro
3 days
Here with a nice plot, showing multi-step vs cosine learning rate decay:
Tweet media one
@zxytim
Xinyu Zhou
3 days
For the record, we use WSD learning schedule. The sudden drop in loss at around 11T tokens is just learning rate starting to decay.
1
3
27
@Mascobot
Marco Mascorro
4 days
This is quite interesting. The tokens per parameter in LLaMA 4 seem off (to what LLMs are trained on today):. We are well over the Chinchilla optimal (20 tokens per param), but Llama 4 Behemoth had (only) 104 Tokens Per Param (TPP), similar to Mistral7B (dense), while Llama 4.
@QuanquanGu
Quanquan Gu
4 days
This explains why LLaMA 4 failed. The tokens per parameter (TPP) is way off. You can’t defy scaling laws and expect miracles. ===.Llama 4 Maverick was 400B(17B active) and >30T tokens, TPP = 1764. Llama 4 Behemoth was 2T(288B active) and > 30T tokens, TPP = 104. DeepSeek v3 is.
0
1
2
@Mascobot
Marco Mascorro
4 days
The dream for a long time of not using tokenizers at all might be here:.
@sukjun_hwang
Sukjun (June) Hwang
4 days
Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data
Tweet media one
Tweet media two
3
0
8
@Mascobot
Marco Mascorro
4 days
1T MoE Param model, open source, 32b active:.
@Kimi_Moonshot
Kimi.ai
4 days
🚀 Hello, Kimi K2! Open-Source Agentic Model!.🔹 1T total / 32B active MoE model.🔹 SOTA on SWE Bench Verified, Tau2 & AceBench among open models.🔹Strong in coding and agentic tasks.🐤 Multimodal & thought-mode not supported for now. With Kimi K2, advanced agentic intelligence
Tweet media one
0
0
2
@Mascobot
Marco Mascorro
5 days
RT @elonmusk: @ns123abc A Cursor engineering team is at xAI today, so integration issues are being solved in real-time.
0
199
0
@Mascobot
Marco Mascorro
5 days
AI now writes 50% of the code at google:.
@ai_for_success
AshutoshShrivastava
5 days
AI now writes 50% of the code at Google.
Tweet media one
1
0
5
@Mascobot
Marco Mascorro
5 days
This is a good benchmark, TritonBench. Most LLMs aren’t great at writing Triton code for GPU kernels (there isn’t much Triton code publicly available yet, but it’s definitely increasing):
Tweet media one
Tweet media two
1
0
3
@Mascobot
Marco Mascorro
5 days
RT @cursor_ai: Grok 4 is available in Cursor! We're curious to hear what you think.
0
606
0
@Mascobot
Marco Mascorro
6 days
Seeing these results on ARC-AGI from Grok 4, I am so tempted to spin up different RL environments in the form of similar adjacent games and see how far the (new) small <7B models can get with ARC-AGI. Last November got to the top 1% of the ARC AGI submissions with no RL and very
Tweet media one
@arcprize
ARC Prize
6 days
On ARC-AGI-1, Grok 4 (Thinking) achieves 66.7% inline with the Pareto frontier for AI reasoning systems we reported last month
Tweet media one
0
0
2
@Mascobot
Marco Mascorro
6 days
Grok 4 achieves 66.6% on ARC AGI 1. So many folks ignored this benchmark for a long time and in the early days. Congrats to the @xai team.
@arcprize
ARC Prize
6 days
Thank you to the @xai team for working with us to validate Grok 4's score and inviting us to the watch the live stream
Tweet media one
0
1
13
@Mascobot
Marco Mascorro
7 days
Super cool! A hackable little robot from @huggingface. Congrats @Thom_Wolf, @ClementDelangue and team!.
@Thom_Wolf
Thomas Wolf
7 days
Thrilled to finally share what we've been working on for months at @huggingface 🤝@pollenrobotics. Our first robot: Reachy Mini. A dream come true: cute and low priced, hackable yet easy to use, powered by open-source and the infinite community. Tiny price, small size, huge
1
6
31
@Mascobot
Marco Mascorro
8 days
Open AI Gym, a single-agent open source RL environment released by Open AI in 2016 (now called Gymnasium) was way ahead of its time:
Tweet media one
Tweet media two
0
0
4