BoxingBytes @BoxingBytes X Profile

BoxingBytes

@BoxingBytes

Followers

7

Following

61

Media

11

Statuses

87

Joined January 2024

Don't wanna be here? Send us removal request.

BoxingBytes

@BoxingBytes

19 hours

I Love wasting the only 90min a day I have debugging dumb shit 🙃.

0

BoxingBytes

@BoxingBytes

2 days

METRA On-policy built on top of pufferlib. Pure skill discovery. Trained on 4 discrete skills in the convert_circle. Env action space is multi discrete and obs are 28 dim. Running exp on harder envs (non locomotion-based & partial obs). From @seohong_park & @_oleh, @svlevine

1

0

2

BoxingBytes

@BoxingBytes

5 days

Start building RL with puffer on GPU for brookies today:

boxingbytes.github.io

Reinforcement Learning is hard, and most environment setups are wonky, slow, too expensive to run, or can only run a handfull of environments.

0

5

15

BoxingBytes

@BoxingBytes

11 days

Implementing LSD from @seohong_park. Seems to work well on tasks where exploration is "coordinate" distance based but not much on partial one-hot obs. going to dive deeper in this, maybe switch to discrete skills?.

0

BoxingBytes

@BoxingBytes

2 months

RT @spenccheng:

0

2

0

BoxingBytes

@BoxingBytes

2 months

RT @jsuarez5341:

0

6

0

BoxingBytes

@BoxingBytes

2 months

It's so fun and always amazed me to watch such videos

0

1

BoxingBytes

@BoxingBytes

2 months

RT @spenccheng:

0

6

0

BoxingBytes

@BoxingBytes

3 months

I think next-gen exploration in RL has to be a function of your goals somehow.

0

1

BoxingBytes

@BoxingBytes

3 months

What are the limitations of exploration in RL? Most-count based methods pushes the agent to discover new states, but that's not what you should do in all envs. Dynamic based exploration is useless in very simple envs where the env dynamics are easy to compute.

1

0

1

BoxingBytes

@BoxingBytes

4 months

That doesn't matter as long as your model ENCODES A & B differently, but what if it doesn't? That's why it's probably worth taking some time looking at how encoding is made - can prob make a huge difference.

0

1

BoxingBytes

@BoxingBytes

4 months

Example: .A: You're approaching a green light at a busy intersection, no pedestrians in sight. B: You're approaching a green light at the same intersection, but now a pedestrian is about to cross the street. C: You’re driving on a straight empty highway, far from any intersection.

1

0

BoxingBytes

@BoxingBytes

4 months

There are scenarios where you might actually want to act very differently between A and B, but very similarily between A and C. In this case encoding might fuck you up and your model will always act similarly between A and B and very differently in C.

1

0

BoxingBytes

@BoxingBytes

4 months

Let's say you have 3 points in high dimensional data where you can look at some form of distance between them: A,B and C. A & B are very close to each other, and C is further ahead.

1

0

BoxingBytes

@BoxingBytes

5 months

Nothing here, just all converges. I guess puffer is too good

0

BoxingBytes

@BoxingBytes

5 months

We better understand what's going on by looking at the average food amount in the envs. Same pattern

1

0

BoxingBytes

@BoxingBytes

5 months

Experiments when agents die & respawn immediately, with truncated envs after 2k timesteps. Agents clearly learn to take all the food, then keeps dying until the env reset. At least, they learn fast. Let's see if sweeping gamma & lambda help them improve.

1

0

BoxingBytes

@BoxingBytes

5 months

Solving for Ax=b is an optimization problem. Usefull to think this way, this means we can use optimization techniques.

0

BoxingBytes

@BoxingBytes

5 months

never use memset for floats or ints if not setting to 0. memset in C does byte level assignement, so setting all bytes to -1 doesn’t have the desired effect.

0

BoxingBytes

@BoxingBytes

5 months

It's crazy how much time you can waste looking at your sweeps running while doing absolutely nothing for hours starring at charts printing on the screen.

0