Oswin So @ NeurIPS (Dec 2 - Dec 7) @oswinso X Profile

Oswin So @ NeurIPS (Dec 2 - Dec 7)

@oswinso

Followers

172

Following

119

Media

9

Statuses

48

Graduate Researcher with Chuchu Fan at MIT @mit_REALM. Bringing Guarantees to Safe Reinforcement Learning 🇭🇰

https://t.co/OfeJjcgHck

Cambridge, Massachusetts

Joined April 2013

Don't wanna be here? Send us removal request.

Oswin So @ NeurIPS (Dec 2 - Dec 7)

@oswinso

17 days

At #NeurIPS from Dec 2 to Dec 7 in San Diego! Looking forward to catching up and meeting new friends. Excited to chat about safety for robotics, constraint satisfaction in RL, and (stochastic) optimal control. Feel free to DM me to grab coffee or have a chat!

0

5

Isabel Liu

@YijieIsabelLiu

1 month

Robots can plan, but rarely improvise. How do we move beyond pick-and-place to multi-object, improvisational manipulation without giving up completeness guarantees? We introduce Shortcut Learning for Abstract Planning (SLAP), a new method that uses reinforcement learning (RL) to

1

20

65

Huihan Liu

@huihan_liu

6 months

Meet Casper👻, a friendly robot sidekick who shadows your day, decodes your intents on the fly, and lends a hand while you stay in control! Instead of passively receiving commands, what if a robot actively sense what you need in the background, and step in when confident? (1/n)

6

39

164

Robotics: Science and Systems

@RoboticsSciSys

6 months

🏆 Huge congratulations to the #RSS2025 Award Winners! https://t.co/BJMqRQzQQG

1

10

78

Almost Sure

@Almost_Sure

2 years

#almostsure blog post: On the integral ∫I(W ≥ 0) dW This looks at the mentioned integral, which displays properties particular to stochastic integration and which may seem counter-intuitive. https://t.co/gRRyvtNg2O

almostsuremath.com

In this post I look at the integral Xt = ∫0t 1{W≥0} dW for standard Brownian motion W. This is a particularly interesting example of stochastic integration with connections to local times, option p…

2

7

52

Oswin So @ NeurIPS (Dec 2 - Dec 7)

@oswinso

2 years

Using intuition from the discrete case, "Xᵤ downcrosses 0 when Wᵤ also downcrosses 0", and so u exists. However, I have no idea whether this holds in the continuous limit... Numerical simulations show that u exists, but I feel like this is due to numerical error?

0

Oswin So @ NeurIPS (Dec 2 - Dec 7)

@oswinso

2 years

Suppose now that Xₜ is started from ε: Xₜ ≔ ε + ∫₀ᵗ 1{Wₛ >= 0} dWₛ Since Xₜ = ε + max(W_t,0) - ½ L(t) and L(t) strictly increases only when Wₜ=0, does there exist a time u such that Xᵤ ≥ Wᵤ AND Xᵤ < 0? https://t.co/1aPExUdxuR

Oswin So @ NeurIPS (Dec 2 - Dec 7)

@oswinso

2 years

More observations and questions on the following stochastic integral: Xₜ ≔ ∫₀ᵗ 1{Wₛ >= 0} dWₛ Numerically simulating this does confirm that E[Xₜ]=0 and Xₜ does go negative. What I did not expect, however, is the distribution of Xₜ to look the way it does.

2

1

15

Oswin So @ NeurIPS (Dec 2 - Dec 7)

@oswinso

2 years

More observations and questions on the following stochastic integral: Xₜ ≔ ∫₀ᵗ 1{Wₛ >= 0} dWₛ Numerically simulating this does confirm that E[Xₜ]=0 and Xₜ does go negative. What I did not expect, however, is the distribution of Xₜ to look the way it does.

Oswin So @ NeurIPS (Dec 2 - Dec 7)

@oswinso

2 years

Small question about Ito integrals: Consider Xₜ ≔ ∫₀ᵗ 1{Wₛ >= 0} dWₛ where Wₜ is a Brownian Motion and 1 is the indicator. Xₜ is a martingale, so E[Xₜ] = 0. I would think that Xₜ is non-negative, but that doesn't seem to be true?

5

6

94

Oswin So @ NeurIPS (Dec 2 - Dec 7)

@oswinso

2 years

I realized theres a typo: I mean to put 1{X_s>=0} instead of 1{W_s>=0}. That changes the question significantly though.

1

6

Oswin So @ NeurIPS (Dec 2 - Dec 7)

@oswinso

2 years

Small question about Ito integrals: Consider Xₜ ≔ ∫₀ᵗ 1{Wₛ >= 0} dWₛ where Wₜ is a Brownian Motion and 1 is the indicator. Xₜ is a martingale, so E[Xₜ] = 0. I would think that Xₜ is non-negative, but that doesn't seem to be true?

4

3

32

Guan-Horng Liu

@guanhorng_liu

2 years

Momentum Schrödinger Bridge is a nice framework for multi-marginal distribution matching (e.g. population modeling) that overcomes the stiff trajectories induced by most pair-wise distribution matching methods. Fun project with @iamct_r @MoleiTaoMath & Evangelos 🌉🌝

Tianrong Chen 陈天荣

@iamct_r

2 years

😀 #NeurIPS2023 Introducing our work #DMSB ( https://t.co/RO6UhEqALi)! #DMSB is an extension of Schrödinger Bridge algorithm ( https://t.co/Lk5bHjqoOy and https://t.co/WiO8mXcHL1) in phase space to tackle trajectory inference task!

0

7

37

Francesco Orabona

@bremen79

2 years

New blog post: Yet Another ICML Award Fiasco The story of the @icmlconf 2023 Outstanding Paper Award to the D-Adaptation paper with worse results that the ones from 9 years ago Please share it to start a needed conversation on mistakenly granted awards https://t.co/pIIl7BDBlX

parameterfree.com

Disclaimer: I deliberated extensively on whether writing this blog post was a good idea. Some kind of action was necessary because this award was just too unfair. I consulted with many senior peopl…

17

99

478

Molei Tao

@MoleiTaoMath

2 years

What is variational optimization? Why can continuous dynamics help? Optimization is already a profound field, what can it bring in? Check out blog https://t.co/RowKS3daUE Comment/Retweet/Like will be deeply appreciated! 1/6

itsdynamical.github.io

TL; DR Gradient Descent (GD) is one of the most popular optimization algorithms for machine learning, and momentum is often used to accelerate its convergence. In this blog, we will start with a...

1

42

159

Oswin So @ NeurIPS (Dec 2 - Dec 7)

@oswinso

2 years

I'm excited to be presenting this work in #RSS2023 Daegu at the Controls & Dynamics section that starts at 1:30 pm tomorrow. Hope to see you there! (8/8)

0

Oswin So @ NeurIPS (Dec 2 - Dec 7)

@oswinso

2 years

We test EFPPO in simulation on challenging underactuated systems such as the "Top Gun: Maverick" inspired F16 fighter jet, and find up to ten-fold improvements in stability performance compared to baseline methods. (7/8)

1

0

Oswin So @ NeurIPS (Dec 2 - Dec 7)

@oswinso

2 years

However, the "cost structure" of the problem now changes. We prove a policy gradient theorem for this new cost structure, and combine this with the improvements from PPO to result in the *Epigraph Form PPO* (EFPPO) algorithm. (6/8)

1

0

Oswin So @ NeurIPS (Dec 2 - Dec 7)

@oswinso

2 years

Instead, we propose using the #epigraph_form, another technique of tackling constraints in constrained optimization. This introduces a scalar "cost budget" variable z. Importantly, the gradients of the policy do not scale linearly in this new variable, improving stability. (5/8)

1

0

Oswin So @ NeurIPS (Dec 2 - Dec 7)

@oswinso

2 years

While this works for "soft constraints", it leads to unstable optimization when we want safety constraints to always hold. In this setting, the Lagrange multipliers monotonically increase as long as safety constraints do not hold, destabilizing training. (4/8)

1

0

Oswin So @ NeurIPS (Dec 2 - Dec 7)

@oswinso

2 years

To solve constrained problems, we typically introduce a Lagrangian multiplier, then solve a minimax problem. In the constrained MDP literature, we can use reinforcement learning to solve this minimax problem. (3/8)

1

0

1

Oswin So @ NeurIPS (Dec 2 - Dec 7)

@oswinso

2 years

Many problems in robotics have a stability objective and a safety objective. To find a policy that satisfies both requirements, we can solve an infinite-horizon constrained optimal control problem, given some technical assumptions. (2/8)

1

0