BoxingBytes Profile
BoxingBytes

@BoxingBytes

Followers
7
Following
61
Media
11
Statuses
87

Joined January 2024
Don't wanna be here? Send us removal request.
@BoxingBytes
BoxingBytes
19 hours
I Love wasting the only 90min a day I have debugging dumb shit 🙃.
0
0
0
@BoxingBytes
BoxingBytes
2 days
METRA On-policy built on top of pufferlib. Pure skill discovery. Trained on 4 discrete skills in the convert_circle. Env action space is multi discrete and obs are 28 dim. Running exp on harder envs (non locomotion-based & partial obs). From @seohong_park & @_oleh, @svlevine
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
0
2
@BoxingBytes
BoxingBytes
11 days
Implementing LSD from @seohong_park. Seems to work well on tasks where exploration is "coordinate" distance based but not much on partial one-hot obs. going to dive deeper in this, maybe switch to discrete skills?.
0
0
0
@BoxingBytes
BoxingBytes
2 months
0
2
0
@BoxingBytes
BoxingBytes
2 months
0
6
0
@BoxingBytes
BoxingBytes
2 months
It's so fun and always amazed me to watch such videos
0
0
1
@BoxingBytes
BoxingBytes
2 months
0
6
0
@BoxingBytes
BoxingBytes
3 months
I think next-gen exploration in RL has to be a function of your goals somehow.
0
0
1
@BoxingBytes
BoxingBytes
3 months
What are the limitations of exploration in RL? Most-count based methods pushes the agent to discover new states, but that's not what you should do in all envs. Dynamic based exploration is useless in very simple envs where the env dynamics are easy to compute.
1
0
1
@BoxingBytes
BoxingBytes
4 months
That doesn't matter as long as your model ENCODES A & B differently, but what if it doesn't? That's why it's probably worth taking some time looking at how encoding is made - can prob make a huge difference.
0
0
1
@BoxingBytes
BoxingBytes
4 months
Example: .A: You're approaching a green light at a busy intersection, no pedestrians in sight. B: You're approaching a green light at the same intersection, but now a pedestrian is about to cross the street. C: You’re driving on a straight empty highway, far from any intersection.
1
0
0
@BoxingBytes
BoxingBytes
4 months
There are scenarios where you might actually want to act very differently between A and B, but very similarily between A and C. In this case encoding might fuck you up and your model will always act similarly between A and B and very differently in C.
1
0
0
@BoxingBytes
BoxingBytes
4 months
Let's say you have 3 points in high dimensional data where you can look at some form of distance between them: A,B and C. A & B are very close to each other, and C is further ahead.
1
0
0
@BoxingBytes
BoxingBytes
5 months
Nothing here, just all converges. I guess puffer is too good
Tweet media one
0
0
0
@BoxingBytes
BoxingBytes
5 months
We better understand what's going on by looking at the average food amount in the envs. Same pattern
Tweet media one
1
0
0
@BoxingBytes
BoxingBytes
5 months
Experiments when agents die & respawn immediately, with truncated envs after 2k timesteps. Agents clearly learn to take all the food, then keeps dying until the env reset. At least, they learn fast. Let's see if sweeping gamma & lambda help them improve.
Tweet media one
1
0
0
@BoxingBytes
BoxingBytes
5 months
Solving for Ax=b is an optimization problem. Usefull to think this way, this means we can use optimization techniques.
0
0
0
@BoxingBytes
BoxingBytes
5 months
never use memset for floats or ints if not setting to 0. memset in C does byte level assignement, so setting all bytes to -1 doesn’t have the desired effect.
0
0
0
@BoxingBytes
BoxingBytes
5 months
It's crazy how much time you can waste looking at your sweeps running while doing absolutely nothing for hours starring at charts printing on the screen.
0
0
0