Peter Albert @peter_albert_ X Profile

Peter Albert

@peter_albert_

Followers

139

Following

152

Media

10

Statuses

43

Enabling LLMs to take action in the digital world @ZetaLabsAI. Previously worked on Llama models @MetaAI

https://t.co/ALW2fAVivX

San Francisco

Joined February 2014

Don't wanna be here? Send us removal request.

Peter Albert

@peter_albert_

8 months

Didn’t expect to say this so soon, but o4-mini just dramatically surpassed Claude Sonnet in real-world agentic performance. Our toughest email-agent benchmark tasks (high information load, large number of constraints, complex situations) are finally solved. Quite insane.

3

2

21

Peter Albert

@peter_albert_

8 months

Just tested GPT-4.1 on our internal email-agent benchmark: slight improvement over GPT-4o, but only after prompt adaptations. Claude Sonnet 3.7 is still far ahead.

1

7

Peter Albert

@peter_albert_

9 months

This is the prompt if you want to reproduce this for your account:

0

Peter Albert

@peter_albert_

9 months

The summaries seem like they were created offline by smaller models and appear to span across chats. This could mean they first group chats by time or topic and then summarize across their boundaries.

1

0

Peter Albert

@peter_albert_

9 months

Moonshine (ChatGPTs new memory feature) is quite interesting. Under the hood it provides the models both with an assortment of random interaction data, as well as short summaries (generated offline) of relevant previous chat conversations.

1

4

Peter Albert

@peter_albert_

9 months

Full text: https://t.co/Dp27oO4mXj Source:

pastebin.com

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

0

Peter Albert

@peter_albert_

9 months

Here is the list of tools and their definitions of OpenAI Deep Research: - search (search query to bing) - open (opens up link ids returned from the search and line ranges) - find (regex over a page) - python (similiar to a jupyter notebook)

1

0

4

Peter Albert

@peter_albert_

9 months

Had a look today again at claude code, to check out its agent design, tools and system prompts: when asked for its list of tools it provides: 1. dispatch_agent - Launches an agent with search tools - prompt (required): Task description 2. Bash - Executes bash commands

0

1

4

Peter Albert

@peter_albert_

10 months

Regular markdown rules don't align well with text produced by LLMs, causing significant formatting loss—particularly whitespace and newlines—when viewed in ChatGPT. This becomes even more of an issue with workflows like Canvas. We probably need an "LLM-flavored markdown" spec

0

1

2

Alex Albert

@alexalbert__

10 months

Good news for @AnthropicAI devs: We shipped a more token-efficient tool use implementation for 3.7 Sonnet that uses on average 14% less tokens under-the-hood and shows marked improvement in tool use performance. Use this beta header: "token-efficient-tools-2025-02-19"

96

74

2K

Peter Albert

@peter_albert_

10 months

The one thing holding back MCP from being used by every agent startup is proper OAuth support

0

3

Peter Albert

@peter_albert_

11 months

Just found a way to use full o3 (not mini) for coding: If you submit a deep research task, it will use the large o3 under the hood. So just paste in your files and provide a detailed prompt, then tell the "manager" (4o) to start a deep research task and pass on as much context

1

2

10

FW

@fawiatrowski

11 months

Introducing Jace. Your AI Email Agent. Who we are? Engineers from Meta AI, Google, Amazon, and Tesla. Serial founders. Backed by Tier 1 investors including Nat Friedman and Daniel Gross. What Jace does? Uses tools and past emails to draft responses and schedule calendar

6

9

27

Peter Albert

@peter_albert_

11 months

Thrilled to finally launch Jace! Our autonomous email agent uses LLMs + tools (email search, calendar, web, editing) to handle your inbox, craft replies and schedule meetings. We discovered something interesting: users of our earlier web agent liked its email features most. So

Jace AI

@jace_ai

11 months

Today, we’re introducing Jace, your AI Email Agent. Jace uses your past responses, checks your calendar, and pulls in context from attachments or the web to draft replies in your voice and schedule your meetings. Imagine an executive assistant that knows exactly what to reply,

0

5

Peter Albert

@peter_albert_

1 year

The table in this paper from Nvidia references scores from the Vision version of Llama 3 (405B). Seems like it will be released soon!

Wei Ping

@_weiping

1 year

Introducing NVLM 1.0, a family of frontier-class multimodal LLMs that achieve state-of-the-art results on vision-language tasks, rivaling the leading proprietary models (e.g., GPT-4o) and open-access models (e.g., InternVL 2). Remarkably, NVLM 1.0 shows improved text-only

0

2

5

Jace AI

@jace_ai

1 year

Significant breakthrough in AI web autonomy: Our AWA 1.5 system has achieved a score of 57.14% on the WebArena benchmark, substantially surpassing the previous state-of-the-art of 35.8%. This marks a notable step towards human-level performance (78%). Details below 🧵

4

10

50

FW

@fawiatrowski

1 year

We're hiring senior frontend engineers and product designers. Details below. At @ZetaLabsAI we are pioneering the autonomous web agent space with our flagship agent, Jace. Founded by ex-engineers from Google, Meta, Amazon, and Tesla, we have built a state-of-the-art action

Jace AI

@jace_ai

2 years

Today we're thrilled to introduce Jace, your AI employee. Jace goes beyond AI chatbots by being able to handle longer-running tasks and taking actions in the digital world. By using our new AWA-1 (Autonomous Web Agent) model, Jace can use a browser to interact with websites

1

3

7

Peter Albert

@peter_albert_

2 years

I'm really excited to share what we worked on in the last few months. We built AWA-1, a web agent model that is able to use a browser similar to a human, and that is able to act over long horizons of actions (100s).

Jace AI

@jace_ai

2 years

Today we're thrilled to introduce Jace, your AI employee. Jace goes beyond AI chatbots by being able to handle longer-running tasks and taking actions in the digital world. By using our new AWA-1 (Autonomous Web Agent) model, Jace can use a browser to interact with websites

2

8

Jace AI

@jace_ai

2 years

Today we're thrilled to introduce Jace, your AI employee. Jace goes beyond AI chatbots by being able to handle longer-running tasks and taking actions in the digital world. By using our new AWA-1 (Autonomous Web Agent) model, Jace can use a browser to interact with websites

41

112

526