
Tom Hosking
@tomhosking
Followers
926
Following
4K
Media
171
Statuses
1K
Model merging lead for Command A @cohere. Prev: PhD student in NLP @EdinburghNLP @Edin_CDT_NLP, @BloomsburyAI @UCL @DRWTrading
Edinburgh, Scotland
Joined April 2009
RT @Cohere_Labs: How does sparse attention reshape LLM scaling? 🔍. We’re excited to share this work by former @Cohere intern @p_nawrot, “Th….
0
8
0
RT @maximevoisin_ai: TIL @cohere's best LLM (Command A) is higher than Anthropic's best LLM on the Arena.
0
3
0
RT @arduinfindeis: How exactly was the initial Chatbot Arena version of Llama 4 Maverick different from the public HuggingFace version?🕵️….
0
6
0
RT @mgalle: You like Code, you like LLMs, you are looking for a leadership position?. We are searching for somebody who can support our ama….
0
11
0
RT @douwekiela: When we came up with RAG five years ago, we weren't creating a workaround for small context windows—we were designing a pri….
0
4
0
Now feels like a good time to plug @cohere Command A:.- model evaled on @lmarena_ai is same as hosted on @huggingface .- claimed performance is reproducible.- not trained on the test set.- uses the @cohere hybrid attention architecture for long context.- fits on 2xH100 not 8x.
We’re excited to introduce our newest state-of-the-art model: Command A!. Command A provides enterprises maximum performance across agentic tasks with minimal compute requirements.
1
7
66
RT @nrehiew_: The next section on Merging is the most interesting imo. As a summary of what we discussed earlier, they used expert mergin….
0
4
0
RT @nrehiew_: Some thoughts: .First, the paper is pretty well-written and easy to follow. They have so many benchmarks and results. I think….
0
9
0
RT @nrehiew_: They find that linear merging is pretty interpretable ie upweight an expert leads to better performance in that domain. Howev….
0
4
0
RT @nrehiew_: The most interesting part of their post training is just how much they use model merging both in SFT and RL. Their process is….
0
15
0
RT @____aakanksha: the complete cooking guide with all the ingredients, seasonings and garnishes for this soup of a model is here! 🍲🧂🌶️🔥. c….
0
3
0
RT @cohere: We’re redefining what’s possible with AI. With the release of our latest model, Command A, optimized for real-world agentic a….
0
26
0
RT @CohereForAI: Following the open-weight release of Command A and Command R7B models, we're excited to have collaborated with @Cohere col….
0
16
0
I'm really proud to have led the model merging work that went into @cohere Command A and R7B, all made possible by an amazing group of collaborators. Check out the report for loads of details on how we trained a GPT-4o level model that fits on 2xH100!.
I'm excited to the tech report for our @Cohere @CohereForAI Command A and Command R7B models. We highlight our novel approach to model training including the use of self-refinement algorithms and model merging techniques at scale. Command A is an efficient, agent-optimised
0
3
57
RT @max_nlp: I'm excited to the tech report for our @Cohere @CohereForAI Command A and Command R7B models. We highlight our novel approach….
0
75
0
RT @max_nlp: I really enjoyed my @MLStreetTalk chat with Tim at #NeurIPS2024 about some of the research we've been doing on reasoning, robu….
0
18
0
RT @nickfrosst: UPDATE: my numbers were off, external benchmarking actually shows we are faster and better. GPQA-diamond: 53% .milisecond….
0
12
0
RT @nickfrosst: I added @cohere command A to this chart, I had to extend the axis a bit though….
0
47
0