Amir Haghighat @amiruci X Profile

Amir Haghighat

@amiruci

Followers

2K

Following

2K

Media

80

Statuses

660

Co-founder @basetenco

https://t.co/og19VXSIM9

San Francisco, CA

Joined May 2009

Don't wanna be here? Send us removal request.

Amir Haghighat

@amiruci

2 months

We closed our series D at $2.1b. It happened 8 months after our series C, which seems too fast until you consider the facts: 2-years worth of growth in 8 months, virtually 0 customer churn, healthy margins, and QoQ NDR numbers that are considered top-tier YoY. The market demand

31

20

319

Amir Haghighat

@amiruci

5 days

Amazing work by @drfeifei and the @worldlabs team to push the boundaries of multimodal AI.

World Labs

@theworldlabs

5 days

Introducing Marble by World Labs: a foundation for a spatially intelligent future. Create your world at https://t.co/V267VJu1H9

3

0

18

Amir Haghighat

@amiruci

6 days

with the lowest time-to-first-token:

0

15

Amir Haghighat

@amiruci

6 days

A few days ago Kimi K2 Thinking significantly narrowed the capability gap between open and closed LLMs. Today Baseten is the only provider to deliver over 100 tok/sec on this massive 1T-parameter model.

13

46

636

Cline

@cline

7 days

The fastest provider for kimi-k2-thinking <now in Cline>

Baseten

@basetenco

7 days

It’s Monday, and we could all use a little help thinking. Thankfully we have the new Kimi K2 Thinking to do it for us. Kimi K2 Thinking is now live in our Model APIs with the most performant TTFT (0.3 sec) and TPS (140) on @OpenRouterAI & @ArtificialAnlys . If you’re looking

22

31

438

Tuhin Srivastava

@tuhinone

18 days

Cursor 2.0 feels game changing - fast agentic workflows unlock new levels of creativity and productivity. Congrats to the team!

Cursor

@cursor_ai

19 days

Introducing Cursor 2.0. Our first coding model and the best way to code with agents.

7

4

117

Amir Haghighat

@amiruci

24 days

And we have the highest tok/sec using nvidia GPUs:

0

5

Amir Haghighat

@amiruci

24 days

There's an obsession with tok/sec as *the* metric in LLM inference. But in latency-sensitive use cases the metic that matters more is time-to-first-token: - Code edit use cases have short outputs and overall latency is heavily determined by ttft - Voice AI use cases care about

5

35

Artificial Analysis

@ArtificialAnlys

27 days

GLM-4.6 providers overview: we are benchmarking API endpoints offered by Baseten, GMI, Parasail, Novita, Deepinfra GLM-4.6 (Reasoning) from @Zai_org is one of the most intelligent open weights models, with intelligence close to GPT-OSS-120b (high), DeepSeek V3.2 Exp (Reasoning)

16

23

272

Baseten

@basetenco

28 days

We see the massive AWS outage. Baseten web app is down but inference, new deploys, training jobs, and the model management APIs are unaffected.

3

8

31

Cline

@cline

1 month

GLM 4.6 fans! @basetenco just soared to the top as the fastest provider in Artificial Analysis for the model. > a 114 TPS and <0.18s TTFT. > That's 2x faster than the next best option on both metrics. Available now in Cline.

15

36

439

Amir Haghighat

@amiruci

1 month

Go team @FactoryAI!

Factory

@FactoryAI

1 month

Deploy and serve custom models with enterprise-grade infrastructure on @basetenco. Special promo for Factory users: receive $500 Model API credits when you fill out this form.

0

33

Baseten

@basetenco

2 months

What do Superhuman, Baseten, and Ricky Bobby all have in common? An obsession with speed. If you’re a Superhuman user, you know their email app lives and dies by how fast their users can get through all things email.

1

13

Baseten

@basetenco

2 months

If you see a doctor today, chances are they're using OpenEvidence for trustworthy, up-to-date medical information at their fingertips. We're thrilled to support OpenEvidence's mission with the speed (<160 ms latency) and reliability physicians require at the point of care.

3

9

29

Madison Kanna

@Madisonkanna

2 months

@rtfeldman teaching me zed live now!! https://t.co/cRPiwkD0sT

3

7

57

NVIDIA AI

@NVIDIAAI

2 months

📈 @basetenco users are scaling smarter with us: ✅ 5× throughput on high-traffic endpoints ✅ 50% lower cost per token ✅ Up to 38% lower latency on the largest LLMs Built on NVIDIA Blackwell + TensorRT-LLM + Dynamo on @googlecloud—driving efficiency, speed & adoption at scale.

8

21

109

Madison Kanna

@Madisonkanna

2 months

Next live stream is tomorrow at 10:30AM pst. Sharing some fun announcements, guests joining, and we're giving away a bunch of our shirts!

20

3

100

lily clifford

@lilyjclifford

3 months

🚀 Arcana v2 is here. Rime’s next-gen TTS makes voice AI sound truly human. More languages. More realism. More deployment options. 🧵👇

7

10

51

Amir Haghighat

@amiruci

3 months

It's important to support newly released open-weight models on day 1. But it's not noteworthy. What's noteworthy is to have the inference optimization muscle to immediately blow the competition out of water on latency and throughput. As measured by OpenRouter:

12

14

87

Tuhin Srivastava

@tuhinone

3 months

We're very excited to be an @OpenAI launch partner for GPT OSS. Today's a big day for open models, and we have day 0 support for GPT OSS 120b via our Model APIs: https://t.co/hMLzTy2dek We'll be rolling out more performance optimizations and benchmarks over the coming hours and

baseten.co

120B MoE open model by OpenAI

12

19

91