Peter Albert
@peter_albert_
Followers
139
Following
152
Media
10
Statuses
43
Enabling LLMs to take action in the digital world @ZetaLabsAI. Previously worked on Llama models @MetaAI
San Francisco
Joined February 2014
Didn’t expect to say this so soon, but o4-mini just dramatically surpassed Claude Sonnet in real-world agentic performance. Our toughest email-agent benchmark tasks (high information load, large number of constraints, complex situations) are finally solved. Quite insane.
3
2
21
Just tested GPT-4.1 on our internal email-agent benchmark: slight improvement over GPT-4o, but only after prompt adaptations. Claude Sonnet 3.7 is still far ahead.
1
1
7
This is the prompt if you want to reproduce this for your account:
0
0
0
The summaries seem like they were created offline by smaller models and appear to span across chats. This could mean they first group chats by time or topic and then summarize across their boundaries.
1
0
0
Moonshine (ChatGPTs new memory feature) is quite interesting. Under the hood it provides the models both with an assortment of random interaction data, as well as short summaries (generated offline) of relevant previous chat conversations.
1
1
4
Here is the list of tools and their definitions of OpenAI Deep Research: - search (search query to bing) - open (opens up link ids returned from the search and line ranges) - find (regex over a page) - python (similiar to a jupyter notebook)
1
0
4
Had a look today again at claude code, to check out its agent design, tools and system prompts: when asked for its list of tools it provides: 1. dispatch_agent - Launches an agent with search tools - prompt (required): Task description 2. Bash - Executes bash commands
0
1
4
Regular markdown rules don't align well with text produced by LLMs, causing significant formatting loss—particularly whitespace and newlines—when viewed in ChatGPT. This becomes even more of an issue with workflows like Canvas. We probably need an "LLM-flavored markdown" spec
0
1
2
Good news for @AnthropicAI devs: We shipped a more token-efficient tool use implementation for 3.7 Sonnet that uses on average 14% less tokens under-the-hood and shows marked improvement in tool use performance. Use this beta header: "token-efficient-tools-2025-02-19"
96
74
2K
The one thing holding back MCP from being used by every agent startup is proper OAuth support
0
0
3
Just found a way to use full o3 (not mini) for coding: If you submit a deep research task, it will use the large o3 under the hood. So just paste in your files and provide a detailed prompt, then tell the "manager" (4o) to start a deep research task and pass on as much context
1
2
10
Introducing Jace. Your AI Email Agent. Who we are? Engineers from Meta AI, Google, Amazon, and Tesla. Serial founders. Backed by Tier 1 investors including Nat Friedman and Daniel Gross. What Jace does? Uses tools and past emails to draft responses and schedule calendar
6
9
27
Thrilled to finally launch Jace! Our autonomous email agent uses LLMs + tools (email search, calendar, web, editing) to handle your inbox, craft replies and schedule meetings. We discovered something interesting: users of our earlier web agent liked its email features most. So
Today, we’re introducing Jace, your AI Email Agent. Jace uses your past responses, checks your calendar, and pulls in context from attachments or the web to draft replies in your voice and schedule your meetings. Imagine an executive assistant that knows exactly what to reply,
0
0
5
The table in this paper from Nvidia references scores from the Vision version of Llama 3 (405B). Seems like it will be released soon!
Introducing NVLM 1.0, a family of frontier-class multimodal LLMs that achieve state-of-the-art results on vision-language tasks, rivaling the leading proprietary models (e.g., GPT-4o) and open-access models (e.g., InternVL 2). Remarkably, NVLM 1.0 shows improved text-only
0
2
5
Significant breakthrough in AI web autonomy: Our AWA 1.5 system has achieved a score of 57.14% on the WebArena benchmark, substantially surpassing the previous state-of-the-art of 35.8%. This marks a notable step towards human-level performance (78%). Details below 🧵
4
10
50
We're hiring senior frontend engineers and product designers. Details below. At @ZetaLabsAI we are pioneering the autonomous web agent space with our flagship agent, Jace. Founded by ex-engineers from Google, Meta, Amazon, and Tesla, we have built a state-of-the-art action
Today we're thrilled to introduce Jace, your AI employee. Jace goes beyond AI chatbots by being able to handle longer-running tasks and taking actions in the digital world. By using our new AWA-1 (Autonomous Web Agent) model, Jace can use a browser to interact with websites
1
3
7
I'm really excited to share what we worked on in the last few months. We built AWA-1, a web agent model that is able to use a browser similar to a human, and that is able to act over long horizons of actions (100s).
Today we're thrilled to introduce Jace, your AI employee. Jace goes beyond AI chatbots by being able to handle longer-running tasks and taking actions in the digital world. By using our new AWA-1 (Autonomous Web Agent) model, Jace can use a browser to interact with websites
2
2
8
Today we're thrilled to introduce Jace, your AI employee. Jace goes beyond AI chatbots by being able to handle longer-running tasks and taking actions in the digital world. By using our new AWA-1 (Autonomous Web Agent) model, Jace can use a browser to interact with websites
41
112
526