Drishan Arora Profile
Drishan Arora

@drishanarora

Followers
4K
Following
32
Media
2
Statuses
19

AI Researcher @DeepCogito

San Francisco, CA
Joined February 2013
Don't wanna be here? Send us removal request.
@drishanarora
Drishan Arora
17 days
More recently, our recent preview models were also added to @OpenRouterAI: https://t.co/XgyfvMTRLW As well as @huggingface Endpoints: https://t.co/1cv0zMD3gW (All of them powered by the brilliant team and infrastructure of @togethercompute.) Try them out!
Tweet card summary image
huggingface.co
0
2
8
@drishanarora
Drishan Arora
17 days
It is great to see more of these ideas coming to the forefront. This is a nascent field and we have made quite a bit of headway here. I am personally very optimistic about this field progressing further. Our approach to self play for LLMs is called Iterated Distillation and
Tweet card summary image
deepcogito.com
Building general superintelligence
1
0
4
@drishanarora
Drishan Arora
17 days
It is intuitively easy to understand why self play *can* work for LLMs, if we are able to provide a value function at intermediate steps (although not as clearly guaranteed as in two-player zero-sum games). In chess / go / poker, we have a reward associated with every next
@polynoamial
Noam Brown
17 days
Self play works so well in chess, go, and poker because those games are two-player zero-sum. That simplifies a lot of problems. The real world is messier, which is why we haven’t seen many successes from self play in LLMs yet. Btw @karpathy did great and I mostly agree with him!
12
10
37
@drishanarora
Drishan Arora
3 months
A small update - we had more traffic than anticipated. However, the endpoints are now scalable on Together AI for all models, including the 671B MoE. Test out the model here: https://t.co/Od1NXYVBxU (A huge thanks to the folks at @togethercompute for making this happen so
Tweet card summary image
together.ai
671B mixture-of-experts model matching Deepseek R1 performance, 60% shorter reasoning chains, approaching o3 and Claude 4 capabilities
@drishanarora
Drishan Arora
3 months
Today, we are releasing 4 hybrid reasoning models of sizes 70B, 109B MoE, 405B, 671B MoE under open license. These are some of the strongest LLMs in the world, and serve as a proof of concept for a novel AI paradigm - iterative self-improvement (AI systems improving themselves).
4
14
83
@UnslothAI
Unsloth AI
3 months
You can now run the world’s most powerful Western open models locally! The hybrid reasoning 671B model matches o3 & Claude-4-Opus in performance. Trained on Llama 4 & DeepSeek-R1, Cogito-v2 has 4 variants—each setting new benchmarks. Guide + GGUFs: https://t.co/rHBD0mZGZH
@drishanarora
Drishan Arora
3 months
Today, we are releasing 4 hybrid reasoning models of sizes 70B, 109B MoE, 405B, 671B MoE under open license. These are some of the strongest LLMs in the world, and serve as a proof of concept for a novel AI paradigm - iterative self-improvement (AI systems improving themselves).
15
56
378
@drishanarora
Drishan Arora
3 months
More details in the blog post:
Tweet card summary image
deepcogito.com
Building general superintelligence
5
8
108
@drishanarora
Drishan Arora
3 months
The Cogito v2 models can be downloaded on @Huggingface, and can be accessed via API through @TogetherAI or @runpod_io or @basetenco. You can also run them locally using @UnslothAI. Each model can answer directly (standard LLM), or self-reflect before answering (like reasoning
3
3
106
@drishanarora
Drishan Arora
3 months
We plan to scale up and expect significant improvements by self improvement over the next few months using this approach. From experiments so far, this technique is far more efficient than simply “searching more” by longer reasoning chains. All models we create will be open
1
1
93
@drishanarora
Drishan Arora
3 months
This seems to be a novel scaling paradigm where the models develop more “intuition”, and serves as a strong proof of concept for self-improvement. Since the Cogito models develop a better intuition of the trajectory to take while searching at inference time, they have 60% shorter
1
0
101
@drishanarora
Drishan Arora
3 months
The models are built on our work on building superintelligence using Iterated Distillation and Amplification (IDA). In particular, we scale the model’s intelligence prior by the model internalizing the reasoning process using iterative policy improvement, rather than simply
1
2
128
@drishanarora
Drishan Arora
3 months
Today, we are releasing 4 hybrid reasoning models of sizes 70B, 109B MoE, 405B, 671B MoE under open license. These are some of the strongest LLMs in the world, and serve as a proof of concept for a novel AI paradigm - iterative self-improvement (AI systems improving themselves).
45
262
2K
@drishanarora
Drishan Arora
5 months
Data quality has the biggest impact on LLM performance - far more than most algorithmic improvements. As we build models that move towards superintelligence, the paradigm for LLM evaluation would need to evolve to increase the strength of the overseer. Very excited to see this -
@mannatsan
Mannat Sandhu
5 months
Today, we are launching Anthromind, where we are building scalable oversight for AI systems. As LLMs and AI systems grow more intelligent, the data needed to evaluate, supervise and align these models requires higher intelligence, often surpassing human expertise. Traditional
0
2
8
@drishanarora
Drishan Arora
7 months
All models we create will be open sourced. More details in the blog post:
deepcogito.com
Building general superintelligence
3
24
274
@drishanarora
Drishan Arora
7 months
The Cogito v1 models can be downloaded on @huggingface or @ollama, and can be accessed via API through @FireworksAI_HQ or @togethercompute. Each model can answer directly (standard LLM), or self-reflect before answering (like reasoning models).
4
4
150
@drishanarora
Drishan Arora
7 months
From what we can tell, we're still in the early stages of this scaling curve - IDA is incredibly powerful and generalizes across domains. Most notably, our 70B model also outperforms Llama 4 Scout (109B MoE) distilled from a 2T model. As we improve and iterate on our
1
2
140
@drishanarora
Drishan Arora
7 months
We use IDA to remove the intelligence ceiling. Simply put, use more computation to let the model arrive at a better solution, and then distill the expensive thinking process to the model's own parameters. As the LLM improves in intelligence, the thinking process itself becomes
5
15
218
@drishanarora
Drishan Arora
7 months
Traditional LLMs are upper bounded in intelligence by their overseers (larger teacher models or human capabilities). Building superintelligence requires not only matching human-level abilities but also to uncover entirely new capabilities we have yet to imagine.
2
5
149
@drishanarora
Drishan Arora
7 months
Today, we are launching @DeepCogito, where we are building general superintelligence. We are also releasing open models of 3B, 8B, 14B, 32B, and 70B sizes trained using our research on iterated distillation and amplification (IDA). From evals so far, each model outperforms the
86
321
3K
@southpkcommons
South Park Commons
1 year
4/ Deep Cogito You can build a more powerful language model with more data and more compute. @drishanarora showed how he & @drvdhruv are pioneering a 3rd way and building frontier intelligence LLMs. (off-the-record-ish, ping @drishanarora & @drvdhruv)
2
2
19