Letta @Letta_AI X Profile

Letta

@Letta_AI

Followers

4K

Following

575

Media

157

Statuses

452

Stateful agents that remember and learn https://t.co/4upATozfXj

https://t.co/cOJmZNeUsc

San Francisco, CA

Joined August 2024

Don't wanna be here? Send us removal request.

Letta

@Letta_AI

23 days

What if we evaluated agents less like isolated code snippets, and more like humans - where behavior depends on the environment and lived experiences? 🧪 Introducing 𝗟𝗲𝘁𝘁𝗮 𝗘𝘃𝗮𝗹𝘀: a fully open source evaluation framework for stateful agents

4

55

Letta

@Letta_AI

3 days

The next Stateful Agents Meetup is about self-improving coding agents! Come learn about Letta Code and how you can use it to build infinitely long-lived agents that learn your codebase as they improve it. November 20th, 2025 in San Francisco! Register: https://t.co/Y2mG6Lclb2

0

1

meng shao

@shao__meng

7 days

每个 AI 模型都能学习使用 Skills 吗？ @Letta_AI 发布 Context-Bench Skills 评测基准，来测试 AI 模型能否像人类一样"按需学习技能"。核心问题 AI

Letta

@Letta_AI

8 days

Last week we launched Context-Bench, a new leaderboard that measures how good AI models are at Agentic Context Engineering. This week, we're expanding Context-Bench with a new addition: Context-Bench Skills.

0

9

29

Sarah Wooders

@sarahwooders

8 days

Claude Skills might be the new MCP - but does it work outside of @AnthropicAI? Find out with the "Skills Suite" in Context-Bench, our benchmark for Agentic Context Engineering GPT-5 and GLM 4.6 excel at skill-use, but smaller models (e.g. GPT-5-mini) struggle

2

5

14

Letta

@Letta_AI

8 days

As part of our evaluation, we built skills into Letta Code, a model-agnostic harness that enables any LLM to leverage skills, so you can experiment with context mounting with any model. For more read our full write-up on evaluating skills:

letta.com

Today we're releasing Skill Use, a new evaluation suite inside of Context-Bench that measures how well models discover and load relevant skills from a library to complete tasks.

0

2

Letta

@Letta_AI

8 days

Anthropic recently released a set of open source skills to teach Claude to do various tasks, but they never released a quantitative evaluation of skill use. It turns out: many frontier models (not just Claude) are capable of effective skill acquisition.

1

0

Letta

@Letta_AI

8 days

Context-Bench Skills measures whether or not an agent is capable of acquiring and utilizing specialized knowledge. We call this concept "context mounting" - similar to how you mount a storage volume or USB drive to a computer.

1

0

Letta

@Letta_AI

8 days

Last week we launched Context-Bench, a new leaderboard that measures how good AI models are at Agentic Context Engineering. This week, we're expanding Context-Bench with a new addition: Context-Bench Skills.

1

2

13

Letta

@Letta_AI

8 days

The Letta Office Hours recording is now available. It covers: - V1 SDK breaking changes (snake case, pagination, shared archives, project scoping) - The AI Memory SDK v0.2 - Our agent Ezra - The Letta Code link/unlink feature - Agent scheduling https://t.co/3oabYZgoDp

0

1

3

Lisbon AI

@lisbonai_

11 days

“You obviously cannot learn if you have no memory.” @sarahwooders from Letta cuts to the core of why current LLM agents struggle to evolve beyond workflows. It's a fundamental limitation many builders are grappling with.

0

4

16

Letta

@Letta_AI

15 days

Last week we announced Letta Evals. Here's a video on how to use it. You'll learn simple Q&A testing, rubric-based grading, and multi-turn memory verification. https://t.co/eQgoeGVEhE

0

6

Letta

@Letta_AI

16 days

Context-Bench proves promising for the open source community: the gap between frontier open weights models and closed weights models appears to be closing. Read our breakdown of the benchmark at https://t.co/n7FaK4TLhh See the live leaderboard at https://t.co/GC8jS41nCf

0

4

Letta

@Letta_AI

16 days

Context-Bench also measures total cost to complete the benchmark. Surprisingly, raw token costs ($/million tokens) do not map directly to total cost. GPT-5 has lower per-token cost than Sonnet 4.5, but costs more in the benchmark because GPT-5 agents are more "token hungry".

1

0

3

Letta

@Letta_AI

16 days

Our goal in creating Context-Bench is to construct a benchmark that is (1) contamination proof, (2) measures "deep" multi-turn tool calling, (3) has controllable difficulty. In its present state, the benchmark is far from saturated - the top model (Sonnet 4.5) takes 74%.

1

0

3

Letta

@Letta_AI

16 days

Agentic context engineering is the new frontier in AI agent capabilities. Models that are post-trained specifically for context engineering excel at long-horizon tasks where the task length far exceeds the native context window of the LLMs themselves. So which models do it best?

1

0

3

Letta

@Letta_AI

16 days

Today we're releasing Context-Bench, an open benchmark for agentic context engineering. Context-Bench evaluates how well language models can chain file operations, trace entity relationships, and manage long-horizon multi-step tool calling.

3

19

Letta

@Letta_AI

16 days

Join the stream:

0

Letta

@Letta_AI

16 days

We are changing the format of our weekly office hours. Now you'll be able to join us on YouTube. 11:30am PST on Thursdays, every week. Link below 👇️

1

0

3

Letta

@Letta_AI

17 days

Here's our team checking out the new office. We're looking forward to hosting you all for our meetups -- we finally have room!

1

19

Charles Packer

@charlespacker

22 days

We're hiring researchers & engineers at @Letta_AI to work on AI's hardest problem: memory. Join us to work on finding the right memory representations & learning methods (both in-context and in-weights) required to create self-improving AI systems with LLMs. We're an open AI

jobs.ashbyhq.com

Research Engineer / Research Scientist at Letta

1

4

22

Charles Packer

@charlespacker

23 days

Super excited about this release: Letta Evals is the first evals platform *purpose-built* for stateful agents. What does that actually mean? When you eval agents w/ Letta Evals, you can literally pull an agent out of production (by cloning a replica of its active state),

Letta

@Letta_AI

23 days

What if we evaluated agents less like isolated code snippets, and more like humans - where behavior depends on the environment and lived experiences? 🧪 Introducing 𝗟𝗲𝘁𝘁𝗮 𝗘𝘃𝗮𝗹𝘀: a fully open source evaluation framework for stateful agents

2

4

27