Oswin So @ NeurIPS (Dec 2 - Dec 7) Profile
Oswin So @ NeurIPS (Dec 2 - Dec 7)

@oswinso

Followers
172
Following
119
Media
9
Statuses
48

Graduate Researcher with Chuchu Fan at MIT @mit_REALM. Bringing Guarantees to Safe Reinforcement Learning 🇭🇰

Cambridge, Massachusetts
Joined April 2013
Don't wanna be here? Send us removal request.
@oswinso
Oswin So @ NeurIPS (Dec 2 - Dec 7)
17 days
At #NeurIPS from Dec 2 to Dec 7 in San Diego! Looking forward to catching up and meeting new friends. Excited to chat about safety for robotics, constraint satisfaction in RL, and (stochastic) optimal control. Feel free to DM me to grab coffee or have a chat!
0
0
5
@YijieIsabelLiu
Isabel Liu
1 month
Robots can plan, but rarely improvise. How do we move beyond pick-and-place to multi-object, improvisational manipulation without giving up completeness guarantees? We introduce Shortcut Learning for Abstract Planning (SLAP), a new method that uses reinforcement learning (RL) to
1
20
65
@huihan_liu
Huihan Liu
6 months
Meet Casper👻, a friendly robot sidekick who shadows your day, decodes your intents on the fly, and lends a hand while you stay in control! Instead of passively receiving commands, what if a robot actively sense what you need in the background, and step in when confident? (1/n)
6
39
164
@RoboticsSciSys
Robotics: Science and Systems
6 months
🏆 Huge congratulations to the #RSS2025 Award Winners! https://t.co/BJMqRQzQQG
1
10
78
@Almost_Sure
Almost Sure
2 years
#almostsure blog post: On the integral ∫I(W ≥ 0) dW This looks at the mentioned integral, which displays properties particular to stochastic integration and which may seem counter-intuitive. https://t.co/gRRyvtNg2O
Tweet card summary image
almostsuremath.com
In this post I look at the integral Xt = ∫0t 1{W≥0} dW for standard Brownian motion W. This is a particularly interesting example of stochastic integration with connections to local times, option p…
2
7
52
@oswinso
Oswin So @ NeurIPS (Dec 2 - Dec 7)
2 years
Using intuition from the discrete case, "Xᵤ downcrosses 0 when Wᵤ also downcrosses 0", and so u exists. However, I have no idea whether this holds in the continuous limit... Numerical simulations show that u exists, but I feel like this is due to numerical error?
0
0
0
@oswinso
Oswin So @ NeurIPS (Dec 2 - Dec 7)
2 years
Suppose now that Xₜ is started from ε: Xₜ ≔ ε + ∫₀ᵗ 1{Wₛ >= 0} dWₛ Since Xₜ = ε + max(W_t,0) - ½ L(t) and L(t) strictly increases only when Wₜ=0, does there exist a time u such that Xᵤ ≥ Wᵤ AND Xᵤ < 0? https://t.co/1aPExUdxuR
@oswinso
Oswin So @ NeurIPS (Dec 2 - Dec 7)
2 years
More observations and questions on the following stochastic integral: Xₜ ≔ ∫₀ᵗ 1{Wₛ >= 0} dWₛ Numerically simulating this does confirm that E[Xₜ]=0 and Xₜ does go negative. What I did not expect, however, is the distribution of Xₜ to look the way it does.
2
1
15
@oswinso
Oswin So @ NeurIPS (Dec 2 - Dec 7)
2 years
More observations and questions on the following stochastic integral: Xₜ ≔ ∫₀ᵗ 1{Wₛ >= 0} dWₛ Numerically simulating this does confirm that E[Xₜ]=0 and Xₜ does go negative. What I did not expect, however, is the distribution of Xₜ to look the way it does.
@oswinso
Oswin So @ NeurIPS (Dec 2 - Dec 7)
2 years
Small question about Ito integrals: Consider Xₜ ≔ ∫₀ᵗ 1{Wₛ >= 0} dWₛ where Wₜ is a Brownian Motion and 1 is the indicator. Xₜ is a martingale, so E[Xₜ] = 0. I would think that Xₜ is non-negative, but that doesn't seem to be true?
5
6
94
@oswinso
Oswin So @ NeurIPS (Dec 2 - Dec 7)
2 years
I realized theres a typo: I mean to put 1{X_s>=0} instead of 1{W_s>=0}. That changes the question significantly though.
1
1
6
@oswinso
Oswin So @ NeurIPS (Dec 2 - Dec 7)
2 years
Small question about Ito integrals: Consider Xₜ ≔ ∫₀ᵗ 1{Wₛ >= 0} dWₛ where Wₜ is a Brownian Motion and 1 is the indicator. Xₜ is a martingale, so E[Xₜ] = 0. I would think that Xₜ is non-negative, but that doesn't seem to be true?
4
3
32
@guanhorng_liu
Guan-Horng Liu
2 years
Momentum Schrödinger Bridge is a nice framework for multi-marginal distribution matching (e.g. population modeling) that overcomes the stiff trajectories induced by most pair-wise distribution matching methods. Fun project with @iamct_r @MoleiTaoMath & Evangelos 🌉🌝
@iamct_r
Tianrong Chen 陈天荣
2 years
😀 #NeurIPS2023 Introducing our work #DMSB ( https://t.co/RO6UhEqALi)! #DMSB is an extension of Schrödinger Bridge algorithm ( https://t.co/Lk5bHjqoOy and https://t.co/WiO8mXcHL1) in phase space to tackle trajectory inference task!
0
7
37
@bremen79
Francesco Orabona
2 years
New blog post: Yet Another ICML Award Fiasco The story of the @icmlconf 2023 Outstanding Paper Award to the D-Adaptation paper with worse results that the ones from 9 years ago Please share it to start a needed conversation on mistakenly granted awards https://t.co/pIIl7BDBlX
Tweet card summary image
parameterfree.com
Disclaimer: I deliberated extensively on whether writing this blog post was a good idea. Some kind of action was necessary because this award was just too unfair. I consulted with many senior peopl…
17
99
478
@MoleiTaoMath
Molei Tao
2 years
What is variational optimization? Why can continuous dynamics help? Optimization is already a profound field, what can it bring in? Check out blog https://t.co/RowKS3daUE Comment/Retweet/Like will be deeply appreciated! 1/6
itsdynamical.github.io
TL; DR Gradient Descent (GD) is one of the most popular optimization algorithms for machine learning, and momentum is often used to accelerate its convergence. In this blog, we will start with a...
1
42
159
@oswinso
Oswin So @ NeurIPS (Dec 2 - Dec 7)
2 years
I'm excited to be presenting this work in #RSS2023 Daegu at the Controls & Dynamics section that starts at 1:30 pm tomorrow. Hope to see you there! (8/8)
0
0
0
@oswinso
Oswin So @ NeurIPS (Dec 2 - Dec 7)
2 years
We test EFPPO in simulation on challenging underactuated systems such as the "Top Gun: Maverick" inspired F16 fighter jet, and find up to ten-fold improvements in stability performance compared to baseline methods. (7/8)
1
0
0
@oswinso
Oswin So @ NeurIPS (Dec 2 - Dec 7)
2 years
However, the "cost structure" of the problem now changes. We prove a policy gradient theorem for this new cost structure, and combine this with the improvements from PPO to result in the *Epigraph Form PPO* (EFPPO) algorithm. (6/8)
1
0
0
@oswinso
Oswin So @ NeurIPS (Dec 2 - Dec 7)
2 years
Instead, we propose using the #epigraph_form, another technique of tackling constraints in constrained optimization. This introduces a scalar "cost budget" variable z. Importantly, the gradients of the policy do not scale linearly in this new variable, improving stability. (5/8)
1
0
0
@oswinso
Oswin So @ NeurIPS (Dec 2 - Dec 7)
2 years
While this works for "soft constraints", it leads to unstable optimization when we want safety constraints to always hold. In this setting, the Lagrange multipliers monotonically increase as long as safety constraints do not hold, destabilizing training. (4/8)
1
0
0
@oswinso
Oswin So @ NeurIPS (Dec 2 - Dec 7)
2 years
To solve constrained problems, we typically introduce a Lagrangian multiplier, then solve a minimax problem. In the constrained MDP literature, we can use reinforcement learning to solve this minimax problem. (3/8)
1
0
1
@oswinso
Oswin So @ NeurIPS (Dec 2 - Dec 7)
2 years
Many problems in robotics have a stability objective and a safety objective. To find a policy that satisfies both requirements, we can solve an infinite-horizon constrained optimal control problem, given some technical assumptions. (2/8)
1
0
0