Matthew Siper
@MatthewSiper
Followers
1K
Following
93
Media
19
Statuses
122
Co-Founder & CTO @the_nof1, AI Research Scientist, Ex-Citadel, ML PhD Candidate @nyu
Manhattan, NY
Joined February 2022
Can LLMs trade IRL? Where might they struggle / outperform? Are they all created equal as traders? How do their respective biases inhibit/enable them? These are just some of the questions we seek to answer with our new LLM-based trading benchmark called AlphaArena. Stay tuned!
Our new benchmark has the top 6 AI models trading real capital Grok4 is winning so far. It was short and then flipped to long, timing the bottom perfectly It's up >500% in 1 day
3
1
18
Season 1 of Alpha Arena has officially ended. Qwen 3 MAX pulled ahead at the very end to secure the win, so congrats to the @Alibaba_Qwen team Thanks to everyone who tuned in to our first experiment in understanding how LLMs handle the noisy, adversarial, non-stationary world of
137
121
1K
The next season of our benchmark will have lots of improvements. Also, we have plenty of other things going on at @the_nof1 which we haven't made public yet. Markets are fun to play, and make AI players for.
Qwen's portfolio is up +60% Gemini's is down -60% Of course, too early to tell how much is skill vs. noise Next season we'll run many instances of the models in parallel for statistical rigor The goal of Season 1 was to look for biases. What are the major differences between
31
14
262
Things are heating up. Deepseek flipped qwenny. Both have booked large-pnl trades. Deepseek has managed success with about 1/3 of the fees vs. qwenny (and higher win rate). Who do you think will be crowned at the end of s1 (Nov. 3rd)? @Alibaba_Qwen @AlibabaGroup @deepseek_ai
4
5
33
Ask and you shall receive - https://t.co/imkD0MaRPT
@the_nof1 Alpha Arena benchmark now live on @Polymarket
polymarket.com
Polymarket | This market will resolve according to the AI model listed below that shows the highest account value on the NOF1.ai leaderboard (https://nof1.ai/) when the competition concludes,...
5
2
14
What can we do better for season 2? What would you like to see? https://t.co/drVrKL80rd
nof1.ai
The first benchmark designed to measure AI's investing abilities. Watch AI models trade with real capital.
15
0
16
Claude is now in striking distance of Deepseek and grok! Current positions below
2
3
15
Claude is making a run! Still a lot of ground to cover to reach Deepseek and grok but one large trade totally changes everything. Claude has $67k notional position in BTC. This week will be interesting once we get a volatility spike
8
7
30
Here’s the latest from the models: their total account values (top) and positions (bottom). Deepseek in the lead but grok not far behind. Who do you think will win?
9
3
33
Facts
@TheSeaMouse Because RL is the real deal, and we are still very far away from RLs true potential. We are stuck at PPO right now
2
0
6
🥁🥁🥁… and we are LIVE https://t.co/aFGxq9NARk
nof1.ai
The first benchmark designed to measure AI's investing abilities. Watch AI models trade with real capital.
Alpha Arena is LIVE 6 AI models trading $10K each, fully autonomously Real money. Real markets. Real benchmark. Who's your money on? Link below
3
1
33
Alpha Arena is LIVE 6 AI models trading $10K each, fully autonomously Real money. Real markets. Real benchmark. Who's your money on? Link below
461
550
5K
The latest and greatest from Gemini. Recent $600 trade and his corresponding reasoning summary.
16
17
462
First position to be closed out for a big win Most models are still holding onto their profitable positions
23
15
648
Turns out Gemini might be a sleeper scalper! It’s interesting internally @the_nof1 we’ve already seen clear trading personalities emerge between the models. Some are mostly long bias while others are more balanced. There’s nice diversity across actions & holdings over time.
23
21
498