zer0int1 Profile Banner
zer0int (it·its) Profile
zer0int (it·its)

@zer0int1

Followers
437
Following
4K
Media
6K
Statuses
9K

AI & I do prompt engineering towards prompt criticality. e/acc

no u
Joined August 2022
Don't wanna be here? Send us removal request.
@zer0int1
zer0int (it·its)
2 months
Finally, a #CLIP #AI model without the #text #obsession / typographic attack vulnerability. It's better in all other aspects (zero-shot, retrieval, linear probe), too. But what's best about it: You'll find the 🧑‍💻 code to train it below (bonus: 📝 paper).
Tweet card summary image
huggingface.co
10
0
6
@zer0int1
zer0int (it·its)
12 hours
CLIP Vision Transformer 'texting' / typography features
Tweet media one
Tweet media two
0
0
1
@zer0int1
zer0int (it·its)
13 hours
CLIP ViT-L/14 layer 4, 5 text feature with icons & emojis
Tweet media one
Tweet media two
0
0
1
@zer0int1
zer0int (it·its)
16 hours
Non-text neuron being stimulated for text. They're all multimodal (output layer!), so it seems to cause straighter lines as a 'textness associaton', but - not really text.
Tweet media one
Tweet media two
0
0
0
@zer0int1
zer0int (it·its)
16 hours
🤖: this a text.🤓: yes. thank you confirming the circuit neuron, CLIP. 🤖: ♥️u ?? arts this ill er have tuvex was a
Tweet media one
Tweet media two
2
0
1
@zer0int1
zer0int (it·its)
17 hours
Interesting effect of upscaling embeds for MLP feature activation max visualization. CLIP ViT-L/14. Left, native 224px of Layer 23, Feature 1346.Right, interpolate to 336px. 224: Head in the very left center; but that representation becomes maximally activating for 336px.🤔
Tweet media one
Tweet media two
1
0
1
@zer0int1
zer0int (it·its)
1 day
PS:.'dogpile' = how many heads 'pick' the same top token (per image). 'rage attending' = high key norms of multiple heads for the same thing.
0
0
0
@zer0int1
zer0int (it·its)
1 day
I'll probably go back to using normal words tomorrow, so it doesn't become entrenched in GPT-5's code. 😂. (KO-CLIP vs. pre-trained CLIP ViT-L/14 vs. typographic attack dataset)
Tweet media one
1
0
1
@zer0int1
zer0int (it·its)
1 day
The GPT-5 excels at prompt following. Which one can exploit to 'teach' it novel terms, which it will apply correctly. 🤭. Let's research #rage #attending heads and #attention #dogpile keys! 😂
Tweet media one
0
0
0
@zer0int1
zer0int (it·its)
1 day
#GPT5 has a really strange way to handle uncertainty / confusion about the user prompt. 🚫 Never ask the user something back. ✅ Always insert 'assert' and wait to get a Traceback. I guess #AI prefer talking to computers. 😂.#AIweirdness
Tweet media one
0
0
1
@zer0int1
zer0int (it·its)
1 day
Typographic Attention ('reading'), #CLIP ViT-L/14 pre-trained vs. robust to typographic attacks KO-CLIP #finetune. That k_proj_ortho_loss disrupted stuff in a beneficial, but dramatic way (beginning of second half of transformer, right after register neurons / tokens form).
Tweet media one
1
0
0
@zer0int1
zer0int (it·its)
2 days
Whoops typo. That's of course "Feature 86 act max viz", the same as shown via that thick line in the Sankey.
0
0
0
@zer0int1
zer0int (it·its)
2 days
This awesome paper shows you can get a dyslexic CLIP (a model no longer gravely vulnerable to typographic attacks) in a training-free manner, with only a very small impact on performance:.
Tweet card summary image
arxiv.org
Typographic attacks exploit multi-modal systems by injecting text into images, leading to targeted misclassifications, malicious content generation and even Vision-Language Model jailbreaks. In...
0
0
1
@zer0int1
zer0int (it·its)
2 days
Apparently, there are just a few late-layer #attention #heads that write 'text reading' in #CLIP #Vision #Transformer. Preventing writing to CLS leads to a dyslexic CLIP that can't read (see 🧵!). But can we find the MLP #neurons & circuits with this info, too? Probably yes! 🤓
Tweet media one
3
0
2
@zer0int1
zer0int (it·its)
3 days
"chef's kiss" is the new "let's delve"
Tweet media one
0
0
0
@zer0int1
zer0int (it·its)
4 days
0
0
1
@zer0int1
zer0int (it·its)
4 days
Typographic attack example. 🙃
Tweet media one
0
0
0
@zer0int1
zer0int (it·its)
4 days
What do #Vision #Transformer see? 🔎🤖🎯. #CLIP #HeadHunter: .Attention Head Max Visualization to find, rank, and visualize heads; map bias; see what #AI 'sees'. Disclaimer: I am not responsible for CLIP turning your PG13 image into NSFW. RTM 📃👀. TY.👍.
Tweet card summary image
github.com
Head-Hunter: A Visual Bias Explorer. Attention Head Max Visualization to find, rank, and visualize heads; map bias; see what a CLIP 'sees'. - zer0int/CLIP-HeadHunter
2
0
0
@zer0int1
zer0int (it·its)
4 days
Layer 15. Three attention heads.
0
0
0
@zer0int1
zer0int (it·its)
4 days
Pure #CLIP generated #AIart with attention head max visualization & embeds upscale to 448px:. "an enchanted forest with trippy botcritters and many many eyes, psychedelic cryengine". (I need to dump a dozen .json examples for my code. One wrong setting, and it's all noise! /o\)
Tweet media one
1
0
1
@zer0int1
zer0int (it·its)
4 days
X Xing a Y?.Okay AI, not sure if much helpful for finalizing my .json config examples. This is a CLIP, not your LLM buddy! 😂. 🤖: "A "bat batting a bat" is fine, but "club clubbing at a club" might sound violent." ~ GPT-5 #AIweirdness reasoning.
Tweet media one
0
0
0