tomhosking Profile Banner
Tom Hosking Profile
Tom Hosking

@tomhosking

Followers
926
Following
4K
Media
171
Statuses
1K

Model merging lead for Command A @cohere. Prev: PhD student in NLP @EdinburghNLP @Edin_CDT_NLP, @BloomsburyAI @UCL @DRWTrading

Edinburgh, Scotland
Joined April 2009
Don't wanna be here? Send us removal request.
@tomhosking
Tom Hosking
2 months
RT @Cohere_Labs: How does sparse attention reshape LLM scaling? 🔍. We’re excited to share this work by former @Cohere intern @p_nawrot, “Th….
0
8
0
@tomhosking
Tom Hosking
2 months
RT @maximevoisin_ai: TIL @cohere's best LLM (Command A) is higher than Anthropic's best LLM on the Arena.
0
3
0
@tomhosking
Tom Hosking
3 months
RT @arduinfindeis: How exactly was the initial Chatbot Arena version of Llama 4 Maverick different from the public HuggingFace version?🕵️….
0
6
0
@tomhosking
Tom Hosking
3 months
RT @mgalle: You like Code, you like LLMs, you are looking for a leadership position?. We are searching for somebody who can support our ama….
0
11
0
@tomhosking
Tom Hosking
3 months
RT @douwekiela: When we came up with RAG five years ago, we weren't creating a workaround for small context windows—we were designing a pri….
0
4
0
@tomhosking
Tom Hosking
3 months
Now feels like a good time to plug @cohere Command A:.- model evaled on @lmarena_ai is same as hosted on @huggingface .- claimed performance is reproducible.- not trained on the test set.- uses the @cohere hybrid attention architecture for long context.- fits on 2xH100 not 8x.
@cohere
cohere
4 months
We’re excited to introduce our newest state-of-the-art model: Command A!. Command A provides enterprises maximum performance across agentic tasks with minimal compute requirements.
1
7
66
@tomhosking
Tom Hosking
3 months
RT @nrehiew_: The next section on Merging is the most interesting imo. As a summary of what we discussed earlier, they used expert mergin….
0
4
0
@tomhosking
Tom Hosking
3 months
RT @nrehiew_: Some thoughts: .First, the paper is pretty well-written and easy to follow. They have so many benchmarks and results. I think….
0
9
0
@tomhosking
Tom Hosking
3 months
RT @nrehiew_: They find that linear merging is pretty interpretable ie upweight an expert leads to better performance in that domain. Howev….
0
4
0
@tomhosking
Tom Hosking
3 months
RT @nrehiew_: The most interesting part of their post training is just how much they use model merging both in SFT and RL. Their process is….
0
15
0
@tomhosking
Tom Hosking
3 months
RT @____aakanksha: the complete cooking guide with all the ingredients, seasonings and garnishes for this soup of a model is here! 🍲🧂🌶️🔥. c….
0
3
0
@tomhosking
Tom Hosking
3 months
RT @cohere: We’re redefining what’s possible with AI. With the release of our latest model, Command A, optimized for real-world agentic a….
0
26
0
@tomhosking
Tom Hosking
3 months
RT @CohereForAI: Following the open-weight release of Command A and Command R7B models, we're excited to have collaborated with @Cohere col….
0
16
0
@tomhosking
Tom Hosking
3 months
I'm really proud to have led the model merging work that went into @cohere Command A and R7B, all made possible by an amazing group of collaborators. Check out the report for loads of details on how we trained a GPT-4o level model that fits on 2xH100!.
@max_nlp
Max Bartolo
3 months
I'm excited to the tech report for our @Cohere @CohereForAI Command A and Command R7B models. We highlight our novel approach to model training including the use of self-refinement algorithms and model merging techniques at scale. Command A is an efficient, agent-optimised
Tweet media one
0
3
57
@tomhosking
Tom Hosking
3 months
RT @viraataryabumi: Merging 🍇 + polishing 🧽 = ⌘🧑🏼‍🍳.
0
4
0
@tomhosking
Tom Hosking
3 months
RT @max_nlp: I'm excited to the tech report for our @Cohere @CohereForAI Command A and Command R7B models. We highlight our novel approach….
0
75
0
@tomhosking
Tom Hosking
4 months
RT @max_nlp: I really enjoyed my @MLStreetTalk chat with Tim at #NeurIPS2024 about some of the research we've been doing on reasoning, robu….
0
18
0
@tomhosking
Tom Hosking
4 months
RT @nickfrosst: UPDATE: my numbers were off, external benchmarking actually shows we are faster and better. GPQA-diamond: 53% .milisecond….
0
12
0
@tomhosking
Tom Hosking
4 months
RT @nickfrosst: I added @cohere command A to this chart, I had to extend the axis a bit though….
Tweet media one
0
47
0
@tomhosking
Tom Hosking
4 months
RT @mgalle: outperforming deepseek in 6/10 categories. while being x6 smaller.
0
7
0