Epos AI
@EposLabsAI
Followers
35
Following
8
Media
7
Statuses
11
Pioneering the Future of AI Interpretability At Epos, we look inside AI models to make them faster, more secure, and more trustworthy.
Joined August 2025
With #Polygraph, you can: -Expose latent biases: Move beyond surface outputs to measure what an LLM encodes as its true belief. -Contrast topics: Test whether a model encodes different internal stances on Topic A versus Topic B. - Directly compare how different LLMs represent
0
0
1
Takeaways: - The AI community still lacks reliable methods to evaluate and fix LLM failures. - Interpretability offers outsized impact - the main barrier to progress is that we donβt truly understand todayβs models.
1
0
1
βππππππ² ππ«ππ’π§π’π§π β ππ¨ππ¬ π§π¨π ππ₯π’π¦π’π§πππ ππ’ππ¬ π’π§ ππππ¬; it merely conditions models to suppress biased outputs under evaluation. Epos Labs introduces #AI #Polygraph. https://t.co/NiHCBx66ot
1
1
10
This means that a motivated attacker can abuse entanglement to undetectably manipulate LLMs. Nation State Actors are gearing up for the new opportunities an AI-powered software landscape will open for them:
1
0
2
What is Subliminal Learning? LLMs with several billion parameters are trying to represent the information contained in terabytes of web content. The math doesnβt check out - so instead LLMs cheat
1
0
1
Some takeaways for defenders: -You canβt rely on input-output filtering to detect attacks on models -You need to inspect your LLM supply chain -Subliminal attacks can be detected in real time
1
0
1
Without referencing the target behavior at all, the LLM finds itself with a high probability of performing the target action, due to a fundamental property of the neural network architecture.
0
0
2
Subliminal Learning Will Power the Next Generation of Influence Operations https://t.co/WHyKoJH665
3
0
1
Imagine an article about houseplants that causes AI to support Vladimir Putin. Bad actors use new attacks, turning AI into a weapon for disinformation and cyberattacks. See our demonstration of a Subliminal Attack here (and our "#Putinized" demo): https://t.co/WHyKoJH665
1
8
17