Peter Bakkum
@pbbakkum
Followers
4K
Following
5K
Media
108
Statuses
1K
API Capabilities @openai, for fun: https://t.co/FabgtjdaT1 — Previously internal ledger @stripe, platform @quizlet
San Francisco
Joined November 2011
A small PSA if you use the Codex CLI through Homebrew -- it changed from a formula to a cask so you have to uninstall and reinstall to get updates
2
1
12
A small audio model launch -- gpt-4o-transcribe-diarize This is a diarization-focused ASR model, it's big and slow so we recommend running it offline, but it excels at differentiating speakers, and you can provide voice samples for known speakers up front.
39
99
1K
Very good drone flyover of Stargate, gives a sense of the scale https://t.co/vg27LSWiMU
4
0
12
Applied ai problems like this are prob in a regime where the most useful data is examples of the model attempting it rather than a huge data corpus, the bottleneck is trying (even if it’s bad) and RLing. The footprint then of what the model can do will grow steadily
Not convinced LLMs will be good at systems architecture for quite some time. There are no good training sets, requires non-local reasoning (unexpected relationships between systems), and good design doesn't show up for several years of changes. Use a platform with good
0
0
12
Don't sleep on gpt-realtime-mini – a 70% cheaper version of the full-size model. This should make many voice workflows cheap enough to deploy widely. But the real surprise: our internal qualitative testing has it scoring higher than even gpt-realtime (!) for voice quality.
32
53
1K
Excited to announce gpt-realtime-mini, a small but capable speech2speech model that is ~70% cheaper than the big model. We think this will unlock new voice applications where the economics didn’t previously make sense.
15
7
114
I’ll be at OpenAI DevDay tomorrow, come find me at the multimodal booth if you want to talk audio models and Realtime API
27
12
357
After growing up in the 90s/00s I was well into my 30s before I clocked that they’re called Fuji Apples and not Fugee Apples
1
0
5
I made a voice agent that I can call. It has access to the GitHub MCP server. I can now create issues and ASSIGN THEM TO CLAUDE from a phone call
6
4
45
There are dozens of us. I find Dvorak significantly more comfortable, hard to say if I type faster. When I switched I unexpectedly destroyed my ability to type qwerty. It’s hard to recommend but if you’re interested in the exciting hobby of vim configuration it might be for you.
just found out the QWERTY layout was designed to slow us down to prevent mechanical jams in typewriters. > Dvorak was then created to be more 100x efficient but nobody uses it lol.
0
0
4
With apologies to other parents and children, my 2 yo daughter is the cutest to ever do it. The COAT.
0
0
4
Updated the Realtime API docs with several things that were under-specified in the recent API update. - simpler "unified" WebRTC API (with samples): https://t.co/tAWyk8VYvh - enhanced SIP docs: https://t.co/aLf6UTAqWm - deltas between GA and beta:
platform.openai.com
Explore developer resources, tutorials, API docs, and dynamic examples to get the most out of OpenAI's platform.
5
6
39
A very useful new feature in the Realtime API -- idle timeouts. If we don't hear from the user for a while the model will ask something like "are you still there?". Useful for phone call applications.
2
0
3
Full post here --
developers.openai.com
Details worth noticing in recent realtime speech-to-speech updates
0
0
3