Sathwik Tejaswi
@SathwikTejaswi
Followers
66
Following
34
Media
0
Statuses
17
Technical Co-Lead of Apriel Mid Training and Post training https://t.co/7CEEfb5Fib
SF Bay Area
Joined April 2016
Thank you @mervenoyann for the shout-out!
the new Apriel-1.5 reasoning vision language model by @ServiceNowRSRCH is so good! ๐ฅ๐ฎ here's a small vibe test across languagesโคต๏ธ > ask to identify drug interactions in French label in English > it compares minerals > finally comes up with a look-up table with correct list!
0
0
2
ServiceNow-AI/Apriel-1.5-15b-Thinker running on a single GPU using `transformers serve` ๐ฅ great to have some very nice reasoning models that can run locally! next step, trying it on mps ๐
0
9
55
๐ Congratulations to @ServiceNowRSRCH on introducing Apriel-1.5-15B-Thinker โ a powerful new AI model that delivers frontier-level reasoning with a fraction of the compute. Weโre proud that our Nemotron collection helped power its training .
๐ Congratulations to @ServiceNowRSRCH on introducing Apriel-1.5-15B-Thinker โ their 15B-parameter model that matches DeepSeek-R1-0528, Mistral-medium-1.2 and Gemini Flash 2.5 on the Artificial Analysis Index (AAI 52) โ delivering comparable results at fraction of the size (at
5
9
78
๐๐๐๐๐๐๐๐: @ServiceNow released a 15B parameter AI model today. The model is the product of a partnership with Turing, which provided the training data. Breakdown below.
5
20
142
ServiceNow has released Apriel-v1.5-15B-Thinker, a 15B open weights reasoning model that leads our Small Models category (<40B parameters) ๐ผย Overview: Apriel-v1.5-15B-Thinker is a dense, 15B parameter open weights reasoning model. This is not the first model ServiceNow has
19
61
502
๐ Congratulations to @ServiceNowRSRCH on introducing Apriel-1.5-15B-Thinker โ their 15B-parameter model that matches DeepSeek-R1-0528, Mistral-medium-1.2 and Gemini Flash 2.5 on the Artificial Analysis Index (AAI 52) โ delivering comparable results at fraction of the size (at
12
23
180
SLAM Labs presents Apriel-1.5-15B-Thinker ๐ An open-weights multimodal reasoning model that hits frontier-level performance with just a fraction of the compute.
15
77
337
This is an interesting technical LLM report. This 15B model beats QwQ32B while using quite fewer tokens. Most interestingly, the authors heavily use model merging to combine the strengths of different checkpoints. ๐ https://t.co/thoIqNEeBd
5
48
346
๐ Our work โVariable Layerwise Quantization: A Simple and Effective Approach to Quantize LLMsโ is accepted at #ACLFindings2025 ๐ https://t.co/7fKAnZQIBr โ Keep key layers high-precision, push others lower โ compact LLMs w/ ~no accuracy loss โ Simple LIM & ZD scores rank layers
arxiv.org
We present a simple meta quantization approach that quantizes different layers of a large language model (LLM) at different bit levels, and is independent of the underlying quantization technique....
1
3
6
๐จ๐คฏ Today Jensen Huang announced SLAM Lab's newest model on the @HelloKnowledge stage: AprielโNemotronโ15BโThinker ๐จ A lean, mean reasoning machine punching way above its weight class ๐ Built by SLAM ร NVIDIA. Smaller models, bigger impact. ๐งต๐
2
22
47
๐จ SLAM Labs presents Apriel-5B! And it lands right in the green zone ๐จ Speed โก + Accuracy ๐ + Efficiency ๐ธ This model punches above its weight, beating bigger LLMs while training on a fraction of the compute. Built with Fast-LLM, our in-house training stack. ๐งต๐
5
49
134
BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks abs: https://t.co/l6wHdrGAt5 project page: https://t.co/55UGlS3FLQ BigDocs-7.5M is a high-quality, open-access dataset comprising 7.5 million multimodal documents across
2
28
142
๐๐๐ We just released BigDocs: An Open Multimodal Dataset โ our latest work on scaling document understanding across diverse data types! ๐ ๐ Dive into the details: https://t.co/KfOKZKARDS ๐ง or come see us at the #NeurIPS2024 RBFM workshop! #AI @ServiceNowRSRCH #bigdocs
0
15
17
We introduce LLM2Vec, a simple approach to transform any decoder-only LLM into a text encoder. We achieve SOTA performance on MTEB in the unsupervised and supervised category (among the models trained only on publicly available data). ๐งต1/N Paper: https://t.co/1ARXK1SWwR
13
165
874
๐ข๐ขExcited to share our new work ๐CurryDPO 1/2 ๐ดSystematically curates multiple preference pairs and trains upon them in a curriculum learning setup with DPO framework ๐ดAchieves notable performance gains over vanilla DPO method on MTbench, Vicuna, WizardLM, and UltraFeedback
1
12
19