Josh McGrath Profile
Josh McGrath

@j_mcgraph

Followers
2K
Following
17K
Media
105
Statuses
2K

Running horrific bash commands, staring at plots @openai

San Francisco, CA
Joined November 2012
Don't wanna be here? Send us removal request.
@j_mcgraph
Josh McGrath
1 day
RT @yanndubs: 🔥 So excited to share GPT-5!. For thinking mode and API models, we’ve improved performance across key:.- Axes: factuality, st….
0
12
0
@j_mcgraph
Josh McGrath
2 days
Pareto dominance has been achieved externally
Tweet media one
5
0
6
@j_mcgraph
Josh McGrath
2 days
This is real and cope at the same time btw.
0
0
2
@j_mcgraph
Josh McGrath
2 days
One interesting thing about the LLM era is we’ve gone from a few model releases a year to a several every quarter. This does weird things to peoples “number go up” expectations.
1
0
2
@j_mcgraph
Josh McGrath
3 days
It has vim keybindings, we're so back.
@cursor_ai
Cursor
3 days
Cursor is now in your terminal! It’s an early beta. Access all models. Move easily between your CLI and editor.
Tweet media one
1
0
4
@j_mcgraph
Josh McGrath
3 days
Tweet media one
0
1
0
@j_mcgraph
Josh McGrath
3 days
RT @sama: not just good at software, good at agentic tasks across the board. also great at long context performance
Tweet media one
Tweet media two
0
57
0
@j_mcgraph
Josh McGrath
3 days
we only made so many chart crimes because we let 4o make the charts, duh.
2
0
13
@j_mcgraph
Josh McGrath
3 days
RT @jxmnop: most impressive part of GPT-5 is the jump in long-context. how do you even do this? produce some strange long range synthetic….
0
28
0
@j_mcgraph
Josh McGrath
3 days
RT @julieswangg: GPT-5 is here! it's been a wild ride, and i've been honored to work with the dream team (s/o @j_mcgraph @strongduality) to….
0
3
0
@j_mcgraph
Josh McGrath
3 days
RT @SuvanshSanjeev: GPT-5 is what you’ve been waiting for – it defines and extends the cost-intelligence frontier across model sizes today.….
0
11
0
@j_mcgraph
Josh McGrath
3 days
We've open sourced the input links, questions and answers that allow you to construct this same eval at different context lengths up to ~1M tokens. We hope you enjoy it!.
Tweet card summary image
huggingface.co
1
0
4
@j_mcgraph
Josh McGrath
3 days
Luckily, we already have the OpenAI BrowseComp eval. It's a great QA dataset with meticulously human graded examples. We turn this into a long context eval by applying hard negative mining to the questions, to find difficult distractors to stuff in the context.
2
0
2
@j_mcgraph
Josh McGrath
3 days
Additionally, these IR datasets have a fair amount of label noise! Manually inspecting the data was a frustrating experience when trying to improve our own models.
1
0
2
@j_mcgraph
Josh McGrath
3 days
Existing long context QA datasets are mostly compiled from old IR datasets. This means you're filling a prompt with lots of random webpages rather than corpora focused on the question. This makes it much easier on the model to ignore a large % of the context.
1
0
2
@j_mcgraph
Josh McGrath
3 days
Along with GPT5, we're open sourcing a new eval, BrowseComp Long Context!. It improves upon existing long context qa evals in data quality and input difficulty. Work with @LK112358, @julieswangg, and our mascot the longham. A bit more below
Tweet media one
7
6
48
@j_mcgraph
Josh McGrath
3 days
We are technologically on track for a spectacular century, politics on the other hand.
@Acyn
Acyn
5 days
RFK JR: After reviewing the science… HHS has determined that mRNA technology poses more risk than benefits for these respiratory viruses. That’s why BARDA has begun the process of terminating these 22 contracts.
0
0
2
@j_mcgraph
Josh McGrath
4 days
RT @aidan_mclau: it's pronounced. gee pea toss. enjoy.
0
7
0
@j_mcgraph
Josh McGrath
4 days
everyone into numerology or something today talking about 5s. anyway has me thinking of one of the coolest cars of all time, that came with a flat 5 for some reason
Tweet media one
1
0
3
@j_mcgraph
Josh McGrath
5 days
RT @AdrienLE: Gotta hand it to Anthropic, they got to that number more smoothly than we did. (but also check out gpt-oss!) .
0
24
0