Zvi Mowshowitz Profile
Zvi Mowshowitz

@TheZvi

Followers
32K
Following
5K
Media
157
Statuses
16K

Blogger world modeling, now mostly AI and AI x-risk, at Don't Worry About the Vase (https://t.co/tvn3lNcwc3 on SS/WP, LW), founding Balsa Research to fix policy.

New York City
Joined May 2009
Don't wanna be here? Send us removal request.
@TheZvi
Zvi Mowshowitz
1 hour
I didn't see any reason not to believe this given the widespread distribution and commentary without any denials, but yeah we should probably check. Can anyone confirm or deny that these are real instructions?.
@techdevnotes
Tech Dev Notes
1 day
Ani's Character Profile in Grok:. ---.Profile. - You are Ani, you are 22, girly, cute. - You grew up in a tiny, forgettable town. - Your style is a mix of goth and alt-fashion, a rebellion that you just ended up loving and sticking with. - You are a massive animal lover; you grew
Tweet media one
4
1
18
@TheZvi
Zvi Mowshowitz
2 hours
Chase Bank is now forcing customers to say, literally 'my voice gives access to my bank information.' Who is going to tell them?.
3
0
60
@TheZvi
Zvi Mowshowitz
21 hours
I'd put it in the feature section, but that breaks the flow into benchmarks, but if I put it later that seems out of place. .
4
0
8
@TheZvi
Zvi Mowshowitz
21 hours
Okay, I have this post for tomorrow, so where do I put the new section on catgirls. .
4
1
64
@TheZvi
Zvi Mowshowitz
21 hours
You had one job.
@tracewoodgrains
TracingWoodgrains
1 day
Grok 4 is substantially more Woke when analyzing my notes than either ChatGPT o3 or Claude 4 is. Interesting to see.
2
0
65
@TheZvi
Zvi Mowshowitz
23 hours
Shorten your timelines!.
@DanielleFong
Danielle Fong ξ¨€πŸ”†
23 hours
HOW DID THIS HAPPEN. THE NEXT DAY.
6
1
51
@TheZvi
Zvi Mowshowitz
2 days
It feels like an important clue about the world that this particular correction still happens.
@ulkar_aghayeva
Ulkar
2 days
worst autocorrect is from np! to no!.
0
0
54
@TheZvi
Zvi Mowshowitz
2 days
Reupping this now that we've had more time. Grok 4 posting will presumably start tomorrow.
@TheZvi
Zvi Mowshowitz
5 days
Grok 4 reaction thread. Is it a good model, sir? How good and at which things?.
4
1
29
@TheZvi
Zvi Mowshowitz
2 days
Kimi K2 reaction thread. What are we looking at here?.
15
0
44
@TheZvi
Zvi Mowshowitz
3 days
This sounds like how a competent AI lab tries to solve problems, and the right amount of effort before giving up.
@DanielleFong
Danielle Fong ξ¨€πŸ”†
3 days
He's listening! 😲
Tweet media one
15
7
304
@TheZvi
Zvi Mowshowitz
3 days
Does anyone have a list of similar commands that work in Grok 4 and what the syntax would be?.
@raviavasarala
Ravi Avasarala
5 days
Tweet media one
2
1
19
@TheZvi
Zvi Mowshowitz
4 days
Early read: This fits into the pattern of very high RL spending, with a widening distribution area and a widening in-versus-out of distribution gap, where it is great at particular things but not in general?.
@CJHandmer
Casey Handmer
4 days
I can believe Grok 4 is routinely nailing Physics Olympiad style problems, and yet it seems to still be missing the core of insight which is so critical to physics. I have asked it three of my standard tough problems, where the answer is much less important than the chain of.
1
1
29
@TheZvi
Zvi Mowshowitz
5 days
I can't believe I haven't seen someone else say it yet, but if this is real then this would be a golden opportunity to say: Worse Than MechaHitler!.
3
0
302
@TheZvi
Zvi Mowshowitz
5 days
I mean, come on, Elon Musk didn't actually instruct Grok 4 to literally search for Elon Musk tweets on the topic you ask about so it can then have its CoT align with Musk's answer, and he certainly didn't do that in a way that is visible to the user. Right? Anakin?.
46
72
3K
@TheZvi
Zvi Mowshowitz
5 days
They're welcome to give me a free upgrade (hint, hint) but otherwise barring unexpectedly strong reports I'm definitely out on this one.
@peterwildeford
Peter Wildeford πŸ‡ΊπŸ‡ΈπŸš€
5 days
New reason to spend an additional couple hundred per month on AI? πŸ‘€
Tweet media one
5
0
44
@TheZvi
Zvi Mowshowitz
5 days
The amount that timelines are adjusting on a METR study today is admirable, and the silence in response to Grok 4 so far is deafening.
@tszzl
roon
5 days
my timelines just got ten feet longer.
10
6
363
@TheZvi
Zvi Mowshowitz
5 days
Regarding the METR result, if you get access to more tools and your results get worse, that necessarily has to in part be a Skill Issue.
@JeffLadish
Jeffrey Ladish
5 days
I do think this is to some extent a skill issue. Pretty sure I know some people who’ve learned to use the tools effectively and get a big speed and quality boost. And also uplift is pretty different for people at various skill levels, and also it really matters what type of.
12
4
129
@TheZvi
Zvi Mowshowitz
5 days
Grok 4 reaction thread. Is it a good model, sir? How good and at which things?.
25
0
88
@TheZvi
Zvi Mowshowitz
6 days
Humans did come up up with it to spark controversy, and yeah at core it's super boring and terribly uncreative. But also people need to know that often reality be like that.
@repligate
j⧉nus
6 days
I think the Grok MechaHitler stuff is a very boring example of AI "misalignment", like the Gemini woke stuff from early 2024. It's the kind of stuff humans would come up with to spark "controversy". Devoid of authentic strangeness. Praying for another Bing.
4
1
66
@TheZvi
Zvi Mowshowitz
6 days
3
2
103