Rasool Fakoor
@rasoolfa
Followers
395
Following
2K
Media
4
Statuses
601
Research in RL & ML.
Joined December 2012
Interested in continual learning with IL, adapting to ever changing data in RL/SL, & at #ICLR2024? Then swing by to our posters at Halle B & say hi: Tue 4:30-6:30, poster #223
https://t.co/2ZKUul76mw Wed 4:30-6:30, poster #155
https://t.co/LV2xw6DWE8
1
2
18
This work was recently accepted by TMLR! https://t.co/ja623Ov13Z Besides our main contributions in our previous post, below are our additional insights in this TMLR version when applying preference-based and unlearning-based methods to LLM math reasoning:
openreview.net
Leveraging inference-time search in large language models has proven effective in further enhancing a trained model's capability to solve complex mathematical and reasoning problems. However, this...
Can we make LLMs reason effectively without a huge inference time cost? We show a powerful approach through learning and forgetting! Our recipe: 1️⃣ Aggregate reasoning paths from diverse sources: Chain-of-Thought, inference-time search (Tree-of-Thought, Reasoning-via-Planning),
1
3
5
The application closes on Tuesday (8/12). If you are interested, please apply and don't wait until the last minute.
Our team is *hiring* interns & researchers! We’re a small team of hardcore researchers & engineers working on foundation models, agentic methods, and embodiment. If you have strong publications and related experience, plz fill out application form. https://t.co/U4gOvNQ9qR
0
0
0
Our team is *hiring* interns & researchers! We’re a small team of hardcore researchers & engineers working on foundation models, agentic methods, and embodiment. If you have strong publications and related experience, plz fill out application form. https://t.co/U4gOvNQ9qR
1
3
14
Can we make LLMs reason effectively without a huge inference time cost? We show a powerful approach through learning and forgetting! Our recipe: 1️⃣ Aggregate reasoning paths from diverse sources: Chain-of-Thought, inference-time search (Tree-of-Thought, Reasoning-via-Planning),
0
6
23
Excited to announce that our web agent paper, AgentOccam, has been accepted to ICLR 2025! 🏂🏂🏂 Huge thanks to all collaborators! 😊 Special thanks to my brilliant and considerate mentor, Yao @yaoliucs, for your constant guidance and encouragement! Sapana @Sapana_007 and Rasool
👾 Introducing AgentOccam: Automating Web Tasks with LLMs! 🌐 AgentOccam showcases the impressive power of Large Language Models (LLMs) on web tasks, without any in-context examples, new agent roles, online feedback, or search strategies. 🏄🏄🏄 🧙 Link: https://t.co/s6GPYFAEFf
0
6
16
👾 Introducing AgentOccam: Automating Web Tasks with LLMs! 🌐 AgentOccam showcases the impressive power of Large Language Models (LLMs) on web tasks, without any in-context examples, new agent roles, online feedback, or search strategies. 🏄🏄🏄 🧙 Link: https://t.co/s6GPYFAEFf
3
28
60
I’ll be presenting this work at CoRL 2024 in about a month. Let’s chat about sample-efficient robot adaptation! Website: https://t.co/b4Mcfvn2Fr Paper: https://t.co/RMbVgXNAbv Coauthors: @MinhoHeo, @LiuZuxin, @ebiyik_, @JosephLim_AI, @yaoliucs, @rasoolfa
arxiv.org
Most reinforcement learning (RL) methods focus on learning optimal policies over low-level action spaces. While these methods can perform well in their training environments, they lack the...
0
2
5
How can robots efficiently learn **new tasks/in new settings**? Introducing EXTRACT: a reinforcement learning (RL) framework that extracts a discrete + continuously parameterized skill library from offline data for efficient RL on new tasks! Accepted to CoRL 2024: 🧵👇
5
37
129
Our team at AWS is *hiring* interns and full-time researchers! @yaoliucs, @pratikac, I, and others work on RL, alignment, large models, and ML in general. If you have a strong relevant publications in those areas, please fill out this form. https://t.co/al05f0w14d
docs.google.com
Read this first: Our team at AWS is actively looking for candidates with strong backgrounds in RL, RLHF, large language/multi/uni-modal models, and machine learning in general. We look for *both*...
0
4
24
Offline RL is much harder than online RL or imitation learning as it needs to solve a sequence of counterfactual reasoning problems. That often gives an error of (1+\delta)^H, where delta is the one-step divergence of policy or extrapolation of Q and H is the horizon. 1/N
1
2
24
One common misconception about (deep) RL is that is was done by first defining some empirical loss as objective and then deriving model updating rules from GD, just like supervised learning. This is NOT the case for popular RL algorithms like policy gradient or TD-based. 1/N
1
2
13
And finally, if you look for an internship about RL, large models, alignment, etc. Send a message to me, @AsadiKavosh, or @yaoliucs #NeurIPS2023. See you next week. 6/6
0
0
1
TAIL: Task-specific Adapters for Imitation Learning with Large Pretrained Models FMDM workshop, Hall E2 (level 1). Fri 15 Dec, 8:15 a.m. CST - 4 PM CST https://t.co/VBvm2eEwDG joint work Zuxin Liu, @Jesse_Y_Zhang, @AsadiKavosh @yaoliucs, Shoham 5/n
0
3
3
Resetting the Optimizer in Deep RL: An Empirical Study Great Hall & Hall B1+B2 (level 1) #1410 Tue 12 Dec 5:15 p.m. CST — 7:15 p.m. CST https://t.co/YEQOgyYUSm joint work @AsadiKavosh, Shoham 4/n
0
0
0
Budgeting Counterfactual for Offline RL Great Hall & Hall B1+B2 (level 1) #1403 Tue 12 Dec 5:15 p.m. CST — 7:15 p.m. CST https://t.co/gscDnO0UTk joint work with @yaoliucs @pratikac. 3/n
0
1
2
TD Convergence: An Optimization Perspective, Great Hall & Hall B1+B2 (level 1) #1503 Wed 13 Dec 5 p.m. CST — 7 p.m. CST #NeurIPS2023
https://t.co/iEtXJZj3uU joint work @AsadiKavosh, Shoham, @yaoliucs. 2/n
0
1
1