Aman Arora
@amaarora
Followers
6K
Following
8K
Media
292
Statuses
4K
Lead AI Engineer | prev: W&B
Sydney, New South Wales
Joined June 2014
🧵 Most modern LLMs like Qwen, DeepSeek & gpt-oss use YaRN to extend context from 4K→128K tokens. But what led to YaRN? Today I'm proud and excited to share a comprehensive resource into the evolution of positional embeddings such as APE, RoPE, YaRN & variants👇 1/n
2
4
19
Indulge yourself with class, style, and sophistication. Dress to impress this holiday season with Gentlemen's Guru. Shop the latest styles in men's formal wear and accessories for the modern gentleman. Get assistance from our Experts Award Winning Brand
0
8
59
Gemini 3 Flash is insane at OCR. It parses this extremely hard to read handwritten letter by Richard Feynman perfectly. It can do ~300 of these for $1. What's crazy is Feynman addresses General Donald J. Kutyna as "Katyna" which Gemini gets. There is no "Meeting Katyna", the
56
128
1K
Mt. Sinai wants to raise prices and drive health care costs up by more than $1B. That means higher premiums and medical bills for New Yorkers. Putting profits over patients.
0
0
1
Memory ≠ likability! Just because an LLM can remember facts about a user, does not necessarily make it likeable. A new benchmark from Amazon measures LLM likeability across 7 different dimensions: emotional adaptation, formality matching, knowledge adaptation, reference
1
0
0
Interesting finding from our PostTrainBench: Sonnet 4.5 released ~3 months ago can barely improve the performance of base LLMs. But there's been _a lot_ of progress since then: - Opus 4.5 does perform much better - GPT-5.1 Codex Max outperforms the rest by a wide margin!
6
4
84
Alrighty. The Toad is out of the bag. 👜🐸 Install toad to work with a variety of #AI coding agents with one beautiful terminal interface. Check out the blog post for more information... https://t.co/hMcnfyuMa9 I've been told I'm very authentic on camera. You just can't fake
46
68
517
skills, commands, subagents are HIGH LEVERAGE which means you should probably WRITE THEM BY HAND at least for a while. If you let claude slop-out your instructions into agents/claude.md/skills etc, and you don't read them its going to vomit information from the training set,
26
18
237
So post-training went from domain-specific finetuning (few years ago) to: General instruction SFT (chat, multi-turn, tool use, code, summarization) → Reasoning-focused SFT/RL (with <think> tokens for CoT) → RLVR (verifiable rewards for math/code/reasoning boosts) → Preference
0
0
1
Using the extension, Claude Code can test code directly in the browser to validate its work. Claude can also see client-side errors via console logs. Try it out by running /chrome in the latest version of Claude Code.
51
145
1K
In latest released benchmarking by Nemotron3, it's interesting that AIME-25 gets close to 100% score with tool call whereas for GPQA, not much difference (only about 2-3%).
0
0
0
Nanbeige4-3b: A family of small but high-performing language models. All models are open weights & released on Huggingface. Pretrained on 23T tokens, and fine-tuned on 30M instructions followed by knowledge distillation using proposed Dual Preference Distillation (DPD) method &
1
0
1
Link rot is not a theoretical problem. It affects research, journalism, public records, and software. ARIO exists to keep data online and accessible even when platforms fail.
0
13
32
Introducing FunctionGemma 🤏270m model for function calling 📱can run in your phone, browser or other devices 🤖designed to be specialized for your own tasks https://t.co/vU0YAeWWmH
46
155
1K
Gemini 3 flash is as good at reading handwriting as the average human (pro is expert human level). It is much better than both GPT-5.2 and Opus 4.5 with character level error rates of 1.43% and word level error rates of 2.74%. This is a 47-63% improvement over 2.5 Flash, the
26
102
957
You always think you're safe until your job becomes a benchmark.
We release PostTrainBench: a benchmark measuring how well AI agents like Claude Code can post-train base LLMs. We expect this to be an important indicator for AI R&D automation as it unfolds over the next few years. 🔗 https://t.co/dVSSHkpAE1 📂 https://t.co/vqZNrQw66z 1/n
14
34
764
No signs of an end to rapid gains in AI ability at ever-decreasing costs (which is a log scale) yet. I have to update this monthly or more frequently at this point. All AI benchmarks are flawed, but GPQA Diamond has been a pretty good one, though likely close to being maxed out.
25
89
719
Base assets are now usable in everyday life. Top up your Tria card, tap to pay globally, and keep full custody. Use creator coins anywhere Visa or Mastercard are accepted. Onchain meet real world.
436
245
774
Slide Decks are officially our second most popular studio output! To celebrate, here are a few of our favorite ways to make the most of this feature: 1. Refine your existing slides— Upload any presentation to @NotebookLM along with your logo, brand guidelines, etc. Then, prompt
63
248
2K
Gemini 3 Flash across different test-time compute levels (green line below) represents a new score/cost Pareto frontier on ARC-AGI-2. Congrats to @demishassabis and @sundarpichai on the launch!
30
83
1K
Announcing the Beta release of ty: an extremely fast type checker and language server for Python, written in Rust. We now use ty exclusively in our own projects and are ready to recommend it to motivated users. 10x, 50x, even 100x faster than existing type checkers and LSPs.
93
282
3K