Alexander Martin @alexdmartin314 X Profile

Alexander Martin

@alexdmartin314

Followers

117

Following

272

Media

12

Statuses

63

PhD student at @jhuclsp; @NSF GRF;

Joined October 2022

Don't wanna be here? Send us removal request.

Alexander Martin

@alexdmartin314

4 months

Wish you could get a Wikipedia style article for unfolding events?. Introducing WikiVideo: a new multimodal task and benchmark for Wikipedia-style article generation from multiple videos!

2

13

23

Alexander Martin

@alexdmartin314

26 days

RT @ariG23498: "Why is the training so slow?". We figure out that starving the model from data, or providing it with padding tokens leads t….

0

21

0

Alexander Martin

@alexdmartin314

1 month

RT @akshitwt: this is a great read by jack morris. gives such a fresh perspective on what really matters. TLDR: with every "new" architect….

0

231

0

Alexander Martin

@alexdmartin314

2 months

Also ask me about article generation from multiple videos (WikiVideo, or our ACL Workshop on Multimodal RAG (@MAGMaR_workshop, .

0

Alexander Martin

@alexdmartin314

2 months

Video-ColBERT: Saturday, June 14 5pm-7pm, Poster Session 4 (Hall D).MultiVENT2.0: Sunday, June 15 10:30am-12:30pm, Poster Session 5 (ExHall D).

1

0

Alexander Martin

@alexdmartin314

2 months

Talk to me at #CVPR2025 about Multimodal RAG topics! I'll be presenting two papers on video retrieval: Video-ColBERT (late interaction for video retrieval) and MultiVENT 2.0 (a challenging, multimodal event-centric IR benchmark!).

1

7

15

Alexander Martin

@alexdmartin314

2 months

RT @EYangTW: 🚨Wouldn’t it be nice if your agentic search system could reason over all your docs?. ✨Introducing Rank-K, a listwise reranker….

0

28

0

Alexander Martin

@alexdmartin314

3 months

RT @MAGMaR_workshop: 24 hours left to submit a paper to the Multimodal RAG workshop at #acl2025! We welcome all papers on related topics. A….

openreview.net

Welcome to the OpenReview homepage for ACL 2025 Workshop MAGMaR

0

3

0

Alexander Martin

@alexdmartin314

3 months

RT @MAGMaR_workshop: 🚨 IT'S HERE! 🚨.The Eval Leaderboard is now LIVE! 🏆💻. Our video retrieval collection stumps most pre-trained models. Se….

eval.ai

EvalAI is an open-source web platform for organizing and participating in challenges to push the state of the art on AI tasks.

0

5

0

Alexander Martin

@alexdmartin314

4 months

RT @kesnet50: 🚨 New preprint on transparent, tree-adaptive grounded reasoning!. We introduce Bonsai, a versatile reasoning system that gene….

0

8

0

Alexander Martin

@alexdmartin314

4 months

RT @willcfleshman: 🚨 Our latest paper is now on ArXiv! 👻.(w/ @ben_vandurme). SpectR: Dynamically Composing LM Experts with Spectral Routing….

0

12

0

Alexander Martin

@alexdmartin314

4 months

This work was done in collaboration w/ colleagues at Johns Hopkins University: Reno Kriz, William Walden, @kesnet50, Hannah Recknor, @EYangTW, Francis Ferraro, and @ben_vandurme.

0

2

Alexander Martin

@alexdmartin314

4 months

If you’re interested in article generation from videos and other tasks that require understanding events in videos, checkout our ACL Workshop MAGMAR and our related work!. MAGMAR: MultiVENT:

nlp.jhu.edu

A Massive Collection of Multilingual Videos of Events

1

0

1

Alexander Martin

@alexdmartin314

4 months

We find that CAG performs better than existing methods across all metrics, but still has a long way to go! There is plenty of future work in efficient and multi-video inference, high-level understanding, and improving video retrieval performance!

1

3

Alexander Martin

@alexdmartin314

4 months

To tackle this challenge, we present a collaborative, test-time scalable method: Collaborative Article Generation (CAG). CAG involves the collaboration between a VideoLLM and reasoning model to iterate through video content and synthesize it into an article

1

0

1

Alexander Martin

@alexdmartin314

4 months

WikiVideo is a challenging task that VideoLLMs can’t do!. It requires inference across multiple videos (avg 8 per topic) and requires models recognize low-level semantic features, like entities, and draw higher-level inferences about the unfolding event.

1

0

1

Alexander Martin

@alexdmartin314

4 months

WikiVideo was annotated by experts in a multistep annotation process to provide multimodal grounding of articles in our video corpus!. Paper: Dataset: Repo:

1

0

1

Alexander Martin

@alexdmartin314

4 months

This work was done in collaboration with Arun Reddy @EYangTW @andrewyates @kesnet50 Kenton Murray Reno Kriz @Celso_M_de_Melo @ben_vandurme and Rama Chellappa.

1

0

6

Alexander Martin

@alexdmartin314

4 months

Our method adopts a dual sigmoid loss and involves tokenwise interactions with both spatial and spatio-temporal features. Check out the paper here: