Collinear AI
@CollinearAI
Followers
280
Following
39
Media
13
Statuses
75
High signal data for Frontier AI
Joined October 2023
Thrilled to introduce spider š·ļø, a system for crafting data recipes that you can directly use @thinkymachines's tinker for model training. Supports both off-policy and on-policy distillation (tokenizer agnostic). You can also run various data filters and verifiers for curation.
Tinker for training exists, but Tinker for data doesn't. Yet, researchers spend most of the time on data preprocessing / generation and training integration. This Halloween, we introduce spider, ie. Tinker for data. It spins up a client for users to define a production-grade
3
10
106
Weāve partnered with @togethercompute š Collinear Simulations are now live inside Together Evals, bringing real-world, multi-turn testing to model evaluation. Simulate messy user behavior with our TraitMix engine and see how your models perform under real conditions!
Together AI š¤@CollinearAI Introducing TraitMix, Collinearās simulation product empowering teams to generate persona-driven AI agent interactions. šPlug these interactions into your workflows and evaluate their effectiveness with Together Evals. Details:
1
1
2
would love to see more effort being put in building systems that enable people to create end artifacts instead of just open-sourcing the end artifacts datasets and models are great but imagine if many people worked on putting systems out that anyone could use to craft training
1
2
14
I interviewed 103 candidates for the MTS role in the last 6 months, we ended up only making single digit offers. This is what I am looking for: - how do they approach a problem. do they jump to the solution or first think about *how to measure* and how to set up evals - how
4
14
230
Weāre live on @awsmarketplace ! @CollinearAI's simulation and post-training data platform is now available directly through AWS. Enterprises can recreate real world user journeys, stress-test their AI models, and generate high signal datasets for fine-tuning. #AWSMarketplace
0
4
8
Excited to attend #ICCV2025 this week! - I will be presenting our work āIntroStyle: Training-free Introspective Style Attribution using Diffusion Featuresā: https://t.co/snz3kTN4O4 Hit me up if you want to talk about VLMs and get some ICCV goodies from @CollinearAI :)
0
3
7
Results: strong gains on WikiArt and DomainNet, outperforming existing methods and staying robust to evolving artistic styles. If youāre at ICCV this week, stop by his poster, and grab one of our custom cat stickers while youāre at it š±
0
0
0
IntroStyle tackles one of the toughest challenges in generative vision ā attributing artistic style in text-to-image models without training new networks or collecting massive custom datasets. The framework is training-free and leverages diffusion model features directly.
0
0
0
š Exciting news from #ICCV 2025! Our researcher Anand Kumar will be presenting IntroStyle: Introspective Style Attribution for Diffusion Models š #ICCV2025 #DiffusionModels #ComputerVision #AIResearch
2
0
0
Our research was featured in the 2025 State of AI Report by Air Street Capital, alongside @OpenAI, @GoogleDeepMind, @Apple, and @Meta. The report spotlights Collinearās work on adversarial testing and reasoning brittleness, advancing how we evaluate and improve reasoning
lnkd.in
This link will take you to a page thatās not on LinkedIn
0
3
7
Frontier research + top notch office vibes. Come join us. https://t.co/lbbCosi9As
collinear.ai
View open roles and join the team building expert Judges to ensure the utmost safety, reliability, and precision
Weāre growing and hiring! Iām looking for Research Scientists and Research Engineers passionate about pushing the boundaries of post-training AI technologies. We've shipped 100B+ tokens of high-quality data in a very short time and enabled enterprises to save serious $$
0
0
0
Some very fun moments I have while grinding at @CollinearAI below. If these sound interesting to you, come and join us: - Using mech interp tools to drastically alter the personas of LLMs for agentic evals / RL - Generating 25B+ post training tokens in 5 days on limited
Weāre growing and hiring! Iām looking for Research Scientists and Research Engineers passionate about pushing the boundaries of post-training AI technologies. We've shipped 100B+ tokens of high-quality data in a very short time and enabled enterprises to save serious $$
3
3
22
Deloitte refunded part of a $290K contract after their review of Australiaās welfare system included AI generated hallucinations. Even trusted workflows need guardrails, post-training and red teaming checks, so that models donāt start writing their own facts.
0
0
4
@Collinear, weāre studying these post-training dynamics to build smarter improvement loops, where every dataset, reward, and evaluation helps models climb out of their own valley. Our paper has been accepted for @NeurIPSConf ā25 (DL4C) ā
0
0
2
As we scaled reasoning data from 1K ā 10K ā 30K examples, model performance on competitive coding first dropped by half before climbing back to surpass baseline by over 100%! Small models need to unlearn surface-level patterning before internalizing structured reasoning.
0
0
1
What if learning to reason requires unlearning first? We often imagine fine-tuning as a straight climb: add more data, get better results. But when we studied how small models learn to reason through code distillation⦠We found what we call a valley of reasoning. #AIResearch
2
0
3
Checkout our work on simulations āØ
A very fun API endpoint we have been working on along with the paper: using activation steering techniques to mix a gazillion of personality traits into a single LLM to simulate real people. How to use it: once you define a custom mix of traits, demographics, and any intent you
0
0
2
We will be presenting the viral CatAttack paper tomorrow at 11 am. If you are @COLM_conf, make sure to stop by and learn why this work got featured in Science
1
2
6
@CollinearAI , weāre building the frontier data that makes those better mistakes possible. High signal, evaluative, and built for models that keep learning!
0
0
0
We spoke about how reward models are evolving into much more than static judges, and how mid-training is starting to gain prominence. The conversation pointed to a shift: the next generation of AI wonāt just avoid mistakes, it will use them. Thatās where real learning begins.
0
0
0