
Kyle🤖🚀🦭
@KyleMorgenstein
Followers
15K
Following
196K
Media
2K
Statuses
35K
Full of childlike wonder. Teaching robots manners. UT Austin PhD candidate. 🆕 RL Intern @ Apptronik. Past: Boston Dynamics AI Institute, NASA JPL, MIT ‘20.
he/him
Joined September 2018
RT @EugeneVinitsky: Personally I would have trouble getting up in the morning if my job was "make sure the bot can be antisemitic" but that….
0
1
0
RT @EugeneVinitsky: If you work at xai, you can just quit. You can get a job almost anywhere. What on earth are you doing. .
0
40
0
RT @tom_jiahao: Introducing Muscle v0 -- infinite degrees of freedom, from @DaxoRobotics. A different mountain to climb - with a far more b….
0
72
0
RT @EugeneVinitsky: Still in stealth but our team has grown to 20 and we're still hiring. If you're interested in joining the research fron….
0
15
0
RT @kiwi_sherbet: Many roboticists focus on designing human-like hands, but we took a closer look at the fingers. Human fingers are soft, r….
0
12
0
unless your task is finite horizon! most learning libraries don’t differentiate, and most end users would never think about it unless they get really in the weeds with the math.
@jsuarez5341 it’s ultimately a question of how you define your state-value function for the critic; most RL texts present the definitions for both finite and infinite horizons but most code bases don’t differentiate based on task (but they should, it absolutely makes a difference).
0
0
9
here’s a great example: in many PPO implementations for robotics we use infinite-horizon value bootstrapping because we derive the algo with a finite-horizon critic but then use it for infinite horizon tasks like velocity tracking. this isn’t standard but helps, UNLESS.
@KyleMorgenstein State based rewards that peak in the target state. This isn't how reward works in the rest of RL and it looks like robotics hacks around this with a modification to gae, which also doesn't work in the rest of RL.
3
1
11
when training RL policies for robotics, what are some common pitfalls people hit? what feels mysterious or hard to intuit? or what do you intuit but not have a better explanation for? starting to outline a blog more explicitly.
I wish there was a good venue to write/present about RL “tricks”. how PD gains affect action scale, how to tune reward functions, actor STD, etc. there’s good intuition for all of it, grounded both in learning theory and robot dynamics, but I don’t often see good explanations.
7
6
175
RT @robot_in_space2: A lot of people, especially practitioners, think they understand RL because they know PPO (or DDPG or SAC). I used to….
0
2
0