Josmy Faure @JosmyFaure1 X Profile

Josmy Faure

@JosmyFaure1

Followers

15

Following

6

Media

4

Statuses

9

Software Engineer at @Google | PhD Student at National Taiwan University

https://t.co/CskjX8oAt8

Taipei, Taiwan

Joined July 2017

Don't wanna be here? Send us removal request.

Josmy Faure

@JosmyFaure1

20 days

🚀 New at #ICCV2025: HERMES — a Video Understanding framework that’s both ⚡ efficient and 🎯 accurate. No more trade-off between speed and performance.

2

9

Min-Hung (Steve) Chen

@CMHungSteven

20 days

Super excited to share HERMES @ICCVConference — our video understanding framework that can boost accuracy, speed, and reduce memory! Please check the original post from @JosmyFaure1 for more details! website: https://t.co/LFQgh9lDq4 #ICCV2025 #NVIDIA #LLM #Video #multimodal

Josmy Faure

@JosmyFaure1

20 days

🚀 New at #ICCV2025: HERMES — a Video Understanding framework that’s both ⚡ efficient and 🎯 accurate. No more trade-off between speed and performance.

0

10

57

Josmy Faure

@JosmyFaure1

20 days

Huge thanks to my collaborators: @CMHungSteven, Jia-Fong Yeh, Hung-Ting Su, Shang-Hong Lai, Winston H. Hsu Excited to see how the community builds on this 🙌

0

Josmy Faure

@JosmyFaure1

20 days

And it’s open-source! You can plug HERMES into your own VLM today: 📄 Paper: https://t.co/N43RxfSaTk 💻 Code: https://t.co/CLNx7XXjTX 🌐 Project page:

github.com

[ICCV'25] HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics - joslefaure/HERMES

1

0

Josmy Faure

@JosmyFaure1

20 days

With ECO + SeTR, HERMES: ✅ 43% faster inference ✅ 46% less GPU memory ✅ +3.8% accuracy boost on top VLMs ✅ New SOTA on multiple benchmarks

1

0

Josmy Faure

@JosmyFaure1

20 days

(2) 💡 Semantics reTRiever (SeTR): captures overarching themes (e.g. an “80s rock party vibe”) scattered across the whole video.

1

0

Josmy Faure

@JosmyFaure1

20 days

Our approach consists of two cognitive-inspired modules: (1) 🧠 Episodic Compressor (ECO): processes long videos like humans do, bundling frames into meaningful episodes (“arriving,” “singing,” “cake-cutting”). Dense → efficient memory.

1

0

1

Josmy Faure

@JosmyFaure1

20 days

Until now, better video models were slower and more resource-intensive. We asked: can we break this trade-off?

1

0