Letta_AI Profile Banner
Letta Profile
Letta

@Letta_AI

Followers
4K
Following
575
Media
157
Statuses
452

Stateful agents that remember and learn https://t.co/4upATozfXj

San Francisco, CA
Joined August 2024
Don't wanna be here? Send us removal request.
@Letta_AI
Letta
23 days
What if we evaluated agents less like isolated code snippets, and more like humans - where behavior depends on the environment and lived experiences? ๐Ÿงช Introducing ๐—Ÿ๐—ฒ๐˜๐˜๐—ฎ ๐—˜๐˜ƒ๐—ฎ๐—น๐˜€: a fully open source evaluation framework for stateful agents
4
4
55
@Letta_AI
Letta
3 days
The next Stateful Agents Meetup is about self-improving coding agents! Come learn about Letta Code and how you can use it to build infinitely long-lived agents that learn your codebase as they improve it. November 20th, 2025 in San Francisco! Register: https://t.co/Y2mG6Lclb2
0
0
1
@shao__meng
meng shao
7 days
ๆฏไธช AI ๆจกๅž‹้ƒฝ่ƒฝๅญฆไน ไฝฟ็”จ Skills ๅ—๏ผŸ @Letta_AI ๅ‘ๅธƒ Context-Bench Skills ่ฏ„ๆต‹ๅŸบๅ‡†๏ผŒๆฅๆต‹่ฏ• AI ๆจกๅž‹่ƒฝๅฆๅƒไบบ็ฑปไธ€ๆ ท"ๆŒ‰้œ€ๅญฆไน ๆŠ€่ƒฝ"ใ€‚ ๆ ธๅฟƒ้—ฎ้ข˜ AI
@Letta_AI
Letta
8 days
Last week we launched Context-Bench, a new leaderboard that measures how good AI models are at Agentic Context Engineering. This week, we're expanding Context-Bench with a new addition: Context-Bench Skills.
0
9
29
@sarahwooders
Sarah Wooders
8 days
Claude Skills might be the new MCP - but does it work outside of @AnthropicAI? Find out with the "Skills Suite" in Context-Bench, our benchmark for Agentic Context Engineering GPT-5 and GLM 4.6 excel at skill-use, but smaller models (e.g. GPT-5-mini) struggle
2
5
14
@Letta_AI
Letta
8 days
As part of our evaluation, we built skills into Letta Code, a model-agnostic harness that enables any LLM to leverage skills, so you can experiment with context mounting with any model. For more read our full write-up on evaluating skills:
Tweet card summary image
letta.com
Today we're releasing Skill Use, a new evaluation suite inside of Context-Bench that measures how well models discover and load relevant skills from a library to complete tasks.
0
0
2
@Letta_AI
Letta
8 days
Anthropic recently released a set of open source skills to teach Claude to do various tasks, but they never released a quantitative evaluation of skill use. It turns out: many frontier models (not just Claude) are capable of effective skill acquisition.
1
0
0
@Letta_AI
Letta
8 days
Context-Bench Skills measures whether or not an agent is capable of acquiring and utilizing specialized knowledge. We call this concept "context mounting" - similar to how you mount a storage volume or USB drive to a computer.
1
0
0
@Letta_AI
Letta
8 days
Last week we launched Context-Bench, a new leaderboard that measures how good AI models are at Agentic Context Engineering. This week, we're expanding Context-Bench with a new addition: Context-Bench Skills.
1
2
13
@Letta_AI
Letta
8 days
The Letta Office Hours recording is now available. It covers: - V1 SDK breaking changes (snake case, pagination, shared archives, project scoping) - The AI Memory SDK v0.2 - Our agent Ezra - The Letta Code link/unlink feature - Agent scheduling https://t.co/3oabYZgoDp
0
1
3
@lisbonai_
Lisbon AI
11 days
โ€œYou obviously cannot learn if you have no memory.โ€ @sarahwooders from Letta cuts to the core of why current LLM agents struggle to evolve beyond workflows. It's a fundamental limitation many builders are grappling with.
0
4
16
@Letta_AI
Letta
15 days
Last week we announced Letta Evals. Here's a video on how to use it. You'll learn simple Q&A testing, rubric-based grading, and multi-turn memory verification. https://t.co/eQgoeGVEhE
0
0
6
@Letta_AI
Letta
16 days
Context-Bench proves promising for the open source community: the gap between frontier open weights models and closed weights models appears to be closing. Read our breakdown of the benchmark at https://t.co/n7FaK4TLhh See the live leaderboard at https://t.co/GC8jS41nCf
0
0
4
@Letta_AI
Letta
16 days
Context-Bench also measures total cost to complete the benchmark. Surprisingly, raw token costs ($/million tokens) do not map directly to total cost. GPT-5 has lower per-token cost than Sonnet 4.5, but costs more in the benchmark because GPT-5 agents are more "token hungry".
1
0
3
@Letta_AI
Letta
16 days
Our goal in creating Context-Bench is to construct a benchmark that is (1) contamination proof, (2) measures "deep" multi-turn tool calling, (3) has controllable difficulty. In its present state, the benchmark is far from saturated - the top model (Sonnet 4.5) takes 74%.
1
0
3
@Letta_AI
Letta
16 days
Agentic context engineering is the new frontier in AI agent capabilities. Models that are post-trained specifically for context engineering excel at long-horizon tasks where the task length far exceeds the native context window of the LLMs themselves. So which models do it best?
1
0
3
@Letta_AI
Letta
16 days
Today we're releasing Context-Bench, an open benchmark for agentic context engineering. Context-Bench evaluates how well language models can chain file operations, trace entity relationships, and manage long-horizon multi-step tool calling.
3
3
19
@Letta_AI
Letta
16 days
Join the stream:
0
0
0
@Letta_AI
Letta
16 days
We are changing the format of our weekly office hours. Now you'll be able to join us on YouTube. 11:30am PST on Thursdays, every week. Link below ๐Ÿ‘‡๏ธ
1
0
3
@Letta_AI
Letta
17 days
Here's our team checking out the new office. We're looking forward to hosting you all for our meetups -- we finally have room!
1
1
19
@charlespacker
Charles Packer
22 days
We're hiring researchers & engineers at @Letta_AI to work on AI's hardest problem: memory. Join us to work on finding the right memory representations & learning methods (both in-context and in-weights) required to create self-improving AI systems with LLMs. We're an open AI
Tweet card summary image
jobs.ashbyhq.com
Research Engineer / Research Scientist at Letta
1
4
22
@charlespacker
Charles Packer
23 days
Super excited about this release: Letta Evals is the first evals platform *purpose-built* for stateful agents. What does that actually mean? When you eval agents w/ Letta Evals, you can literally pull an agent out of production (by cloning a replica of its active state),
@Letta_AI
Letta
23 days
What if we evaluated agents less like isolated code snippets, and more like humans - where behavior depends on the environment and lived experiences? ๐Ÿงช Introducing ๐—Ÿ๐—ฒ๐˜๐˜๐—ฎ ๐—˜๐˜ƒ๐—ฎ๐—น๐˜€: a fully open source evaluation framework for stateful agents
2
4
27