
Kumar Ashutosh
@chargedneutron_
Followers
133
Following
107
Media
4
Statuses
57
CS PhD student at @UTAustin | Visiting Researcher @AIatMeta | @iitbombay alum (2016-21).
Austin, TX
Joined January 2020
Introducing our #CVPR2024 (Highlight ✨) paper, "Detours for Navigating Instructional Videos.". Imagine watching an instructional video and having questions like, "Can I add carrots here?" Our novel VidDetours enable video detours that use user queries and prior viewing context.
1
1
5
RT @geopavlakos: Ashutosh @chargedneutron_ is presenting ExpertAF during this poster session! We are at poster #280. Come by to chat about….
0
1
0
Check out our new work in enabling LLMs to see and hear _without_ any training. Paper: Code:
Super excited to share some recent work that shows that pure, text-only LLMs, can see and hear without any training! Our approach, called "MILS", uses LLMs with off-the-shelf multimodal models, to caption images/videos/audio, improve image generation, style transfer, and more!
0
0
5
Work done at UT Austin and @AIatMeta. Collaborators: @sherryx90099597, @TusharNagarajan and Kristen Grauman.
0
0
1
We evaluate our method on several video datasets on step forecasting, step recognition and task recognition and see consistent gains over prior work. Project page: Please come by the poster session today (12/12 morning session) at #129.
0
0
2
Introducing our #NeurIPS23 paper on using video-mined task graphs for keystep recognition in instructional videos. 🎉. We first use videos on the internet to learn a task graph and then use the learned graph to regularize keystep recognition in instructional videos.
1
0
7
Exciting collaboration between 14 research institutions resulting in a first-of-its kind dataset pushing the frontiers in skill understanding. Covers cooking, sports, dance and health - thus enabling research in multiple domains and tasks. Glad to be a part of this effort!🎉.
1️⃣ Ego-Exo4D.A new foundational dataset + benchmark suite to support research on video learning & multimodal perception, co-developed with 14 university partners. Details ➡️ Core to the work is videos of skilled human activities, simultaneously capturing
1
0
4
Exciting to have both squash and cricket in the LA Olympics 🥳.
IOC Session approves @LA28’s proposal for 5⃣ additional sports:. ⚾Baseball/🥎softball, 🏏cricket, 🏈flag football, 🥍lacrosse and ⚫squash have been officially included as additional sports on the programme for the Olympic Games Los Angeles 2028. #LA28
0
0
2
RT @lexfridman: Here's my conversation with Mark Zuckerberg, his 3rd time on the podcast, but this time we talked in the Metaverse as photo….
0
8K
0
RT @lexfridman: Beautiful 60-day timelapse of a bird building a nest and raising her kids until they fly away from home. This is the magic….
0
3K
0
Check out our highlight paper during the poster session on Thursday afternoon (Board: 235). Work done in collaboration with UT Austin and @MetaAI . Paper: Code:
📺 HierVL is a novel hierarchical video-language embedding that simultaneously accounts for both long-term and short-term associations. Paper ➡️ 7/7
0
0
6
Our highlight paper (top 2.5%) will be presented on Thursday afternoon (6/22) poster session (THU-PM-235) and also in the @LSHVU workshop on Sunday (6/18) morning. Reach out to me if you are interested in this work!.
0
0
1
Paper: Project Page: Code: Work with @_rohitgirdhar_ , Lorenzo Torresani, Kristen Grauman at UT Austin and @MetaAI.
1
0
2