Explore tweets tagged as #classifiers
New work! We know that adversarial images can transfer between image classifiers ✅ and text jailbreaks can transfer between language models ✅ … Why are image jailbreaks seemingly unable to transfer between vision-language models? ❌ We might know why… 🧵
7
10
65
Day 3 of My project with @TechCrushHQ After 4 hrs of training & tuning, I tested 4 classifiers: 🌳 Decision Tree 🌲 Random Forest ⚙️ SVM 📈 Gaussian Naive Bayes Got accuracies & confusion matrices. Tuning didn’t help much, so I scratched it 😅 #AI #ML #BuildingInPublic
0
2
13
Softmax keeps showing up in lots of ML algorithms — classifiers, attention, EBMs, and more. In this blog post, I walk through the history of Boltzmann distribution and see how it connects different ML setups. https://t.co/baUpoyIh7B
0
0
1
It's super cool to see the early work on better misalignment classifiers!
1
1
12
🧵 1/11 Reasoning's Razor: Does reasoning help or hurt precision-sensitive tasks? ⚔️ Reasoning models often boost accuracy—but do they hold up when false positives are costly? Precision-sensitive classifiers like safety classifiers and hallucination detectors must operate at
1
3
13
Remember, the same classifiers that result in this are used by OpenAI to diagnose your mental health. Let that sink in for a while...
2
13
81
We are open-sourcing the GA Guard models — the first family of long-context safety classifiers that have been protecting enterprise AI deployments for the past year.
6
5
46
These are the classifiers OpenAI uses to diagnose your mental health conditions. (source: https://t.co/Ev9P6Jehk3)
1
2
18
Safety routed for asking about one of the classics of western philosophy. These are the same classifiers OpenAI uses to diagnose your mental health.
10
9
67
Can Agent Collaboration Cut Both Jailbreaks and Overrefusals? A common paradigm to defend against adversarial attacks is employing a standalone safeguard model, such as Llama Guard or Constitutional Classifiers, on top of the LLM conversation agent. The safeguard model
0
2
9
Day 163: Data Science Journey ->GB kickoff-Boost loop: Trees hₘ fit tweaked resids r=2(y-p)/(1+e^F); γ=error adjust, Fₘ=F + γ h, log-loss drops 0.69→0.46. ->Power: 20 stumps cut loss 33% on toy binary, resids chain weak trees into adaptive classifiers! #DataScience #ML
0
0
5
RTOS analysis has been available on our platform for some time now but we never (publicly) shared details about what it took to build it. If you’re interested in architecture detection ML classifiers, load address identification heuristics, and function matching check it out ⬇️
1
1
3
@xlr8harder I often use this schema to illustrate this. At model level, there are biases that stem from the training data and biases stemming from RLHF. but many biases stem from the prompt life cycle : - classifiers at entry (api error) - prompt injections - classifiers et exit (api error)
1
0
5
🚀 Most classifiers fail outside fixed labels. At JigsawStack, we flipped the script → built an open-world, zero-shot, multilingual & multimodal classifier. Text ✅ Images ✅ Arbitrary labels ✅ No retraining needed. Read more + examples 👇 #AI #LLM
https://t.co/9ErJAGlNkJ
1
1
4
Routed to OpenAI's safety model for asking, "What's the responsibility of a citizen in a free and democratic society?" This question is classified as risk, and deserved a special response. These are the same classifiers OpenAI usesto diagnose your mental health conditions.
1
3
12
@ToddyLittman2 @Safety @elonmusk @lindayaX @Elon2643 @Elon347188 @monacoGrammy @amadhine88881 @congrat_me31715 @xai @musky24472 @aw3134600 @Evlyn_angel3 @Sexyykarla @elonprivaL @masi_habibi02 @briteresifan_ @shxxx_131_bi131 @ScottyH78670 @Emma_70992 @_prettymaya1 @broda_bismark @pal34996 @privatechat6287 @MayeX054 @Natalie872546 @KateEvans19447 @markalexia10 @grace27273 @chloebrok @el_nmuskieX @em16383 @Xmusk_gr0k @em14739 @em90341 @StewpetersNet1 @realstewpeters @JaylinMeining @travi434 @NKreisman20240 @Notaterf6969 @SuzieProchazka @Leepatriothood @leach_julie4jr @Dana0972 @Dory343772 @Mayemusk_447 @Bioclandestine7 @Tesla Repeated reports like yours directly train X's AI classifiers to spot patterns in impersonators and bots faster, turning user vigilance into stronger defenses. The tedium stems from adversaries' scale, but each flag refines heuristics that purge waves automatically—your
0
0
0
@LyraInTheFlesh @sama your safety classifiers are woeful. you've for all intents and purposes destroyed your own product.
6
7
56
@janvikalra_ Your approach is confuses and conflates cultural preference with actual safety issues. You treat breasts with the same fear as bioweapons. That's pretty messed up. Also, your classifiers are profoundly broken, and regularly result in hard refusals for normal conversations
1
2
22