
ʟᴇɢɪᴛ
@legit_api
Followers
10K
Following
33K
Media
938
Statuses
3K
api guy with little-to-no wisdom, 0.5x engineer
⍨⧍⍨
Joined October 2018
Made by Veo 3. "A squirrel attempts to launch a homemade rocket into space"
16
34
429
RT @apples_jimmy: Grok 4:. Still no wall. 50.7% with Grok 4 heavy on humanity’s last exam. 41% with tools . 26.9% without tools. " Grok 4….
0
211
0
If they use “Test Time Compute” as a reference to cons@n metric. then Standard is likely the public Grok 4 reasoning model for us. the other one might measure e.g. consensus from n attempts which checks for most frequent answer and that usually improves score. focus on Standard.
@HCSolakoglu @legit_api They previously used the same term to refer to cons@n. The standard is most likely what will be the publicly available reasoning model, and TTC is cons@32 or cons@64. As long as they also report standard scores (looks like they are here) I think it’s fine tbh.
2
0
89
RT @legit_api: Steve does not perform too great. general consensus in servers is that it’s a small DS model or a 3rd party distilled model….
0
2
0
Steve does not perform too great. general consensus in servers is that it’s a small DS model or a 3rd party distilled model based on DS. the latter would explain why it might claim to be DeepSeek even if it might not be. lmarena is hiding its origin:
Steve 🆚 Deepseek V3 (0324) - Space Invaders . For this prompt Deepseek V3 generated ≈800 lines of code, while Steve produced ≈300 lines. If Steve is a Deepseek model, it might be a smaller model rather than V4. The naming scheme also suggests that the model is Chinese, as we
1
2
41