めんだこ @horromary X Profile

めんだこ

@horromary

Followers

583

Following

282

Media

21

Statuses

192

深層強化学習の技術記事を書くよ

東京

Joined February 2016

Don't wanna be here? Send us removal request.

めんだこ

@horromary

3 days

はてなブログに投稿しました.Jax/Flax NNXで実装する深層強化学習②：PPOによるロボット犬の歩行学習 - どこから見てもメンダコ . 夏休みの自由研究としてUnitree社のロボット犬で大規模並列強化学習。ちょうど中国のヒト型ロボット運動会も盛況ですね.#はてなブログ.

horomary.hatenablog.com

MuJoCo-XLA (MJX)環境にてロボット犬（UnitreeGo1）の歩行学習のためにPPOをFlax NNXで実装します。 Jax/Flax NNXとは Massively Parallel Reinforcement Learning （大規模並列強化学習）大規模並列強化学習のためのプラットフォーム MuJ…

0

7

63

めんだこ

@horromary

12 days

RT @hillbig: 現代化学（2025年4月より連載中）の「AIによる計算化学の発展」の第2回「ニューラルネットワークポテンシャル」の記事を公開します（スレッドにリンクを貼ります）。.

0

26

0

めんだこ

@horromary

13 days

自作PPOでGo1君の走行にようやく成功。初期実装では立つのがやっとだったが、「エントロピーボーナス付きTanhNormal方策」「SiLU活性化関数」「RunningStatsによる観測の正規化」で大幅に性能が向上した。やはり連続値コントロールは難しい.

0

17

めんだこ

@horromary

15 days

Mujoco向けにPPOの実装中。こっちは方策関数をミスって生まれたての小鹿みたいになったGo1君

0

1

39

めんだこ

@horromary

17 days

jaxの乱数管理、最初は面倒だったけど慣れるとすごい安心��。不確実性に由来するバグでも完全再現できるありがたさ.

0

4

めんだこ

@horromary

17 days

tanh(Normal)の方策関数、暗黙知が多すぎ問題.

0

2

めんだこ

@horromary

24 days

isaaclabのインストール難易度が高すぎる、起動しない.

0

5

めんだこ

@horromary

1 month

RT @MSFTResearch: Today in the journal Science: BioEmu from Microsoft Research AI for Science. This generative deep learning method emulate….

0

265

0

めんだこ

@horromary

1 month

はてなブログに投稿しました.Jax/Flax NNXで実装する深層強化学習：① DQN（Atari/Breakout） - どこから見てもメンダコ . Pytorchスタイルになって書きやすくなったFlaxの新API「NNX」でDQNを実装しました.#はてなブログ #強化学習 #jax #flax.

horomary.hatenablog.com

Pytorchスタイルになって書きやすくなったFlaxの新API「NNX」の使用感の確認のため、ALE/Breakout（ブロック崩し）向けにDQNを実装しました。 Jaxとは？ ①Numpyの使いやすさ ②柔軟な自動微分 ③マルチCPU/GPU/TPUでの分散並列コンピューティング Flax NNXとは？ PyTor…

0

2

17

めんだこ

@horromary

2 months

Jax/Flax.NNXでDQNを実装。NNXではPytorchの書き味とJaxのパフォーマンス＆スケーラビリティが両立されてていい感じ

0

7

めんだこ

@horromary

2 months

RT @KarlPertsch: We’re releasing the RoboArena today!🤖🦾. Fair & scalable evaluation is a major bottleneck for research on generalist polici….

0

84

0

めんだこ

@horromary

2 months

RT @hillbig: オフライン強化学習（環境と相互作用せず、収集済みデータから学習）を従来の1000倍規模のデータセットを用いて、困難なタスクが学習できるかを調査した….

0

29

0

めんだこ

@horromary

3 months

RT @shizhediao: Does RL truly expand a model’s reasoning🧠capabilities? Contrary to recent claims, the answer is yes—if you push RL training….

0

66

0

めんだこ

@horromary

3 months

RT @svlevine: Classifier-free guidance is an RL policy improvement operator in (very thin) disguise!. This makes it easier than ever to imp….

0

45

0

めんだこ

@horromary

3 months

RT @xuandongzhao: 🚀 Excited to share the most inspiring work I’ve been part of this year:. "Learning to Reason without External Rewards"….

0

513

0

めんだこ

@horromary

3 months

RT @MickeyKubo: Moai forumで以前やっていただいた講演資料をspeaker deckにあげていこうと思います。.

speakerdeck.com

数理最適化に基づく制御ーモデル予測制御を中心にー

0

34

0

めんだこ

@horromary

3 months

はてなブログに投稿しました.サンプル効率強化学習②：潜在世界モデルベース強化学習 - どこから見てもメンダコ #はてなブログ.

horomary.hatenablog.com

サンプル効率に優れたMuZeroの後継手法EfficientZeroV2を実装。強化学習実用のカギはサンプル効率世界モデルベース強化学習とは前提手法 MuZero：潜在変数空間上での木探索 EfficientZeroV2：MuZero派生の全部盛り EfficientZeroV2の実装 ① Gumbel-MCT…

0

3

37

めんだこ

@horromary

3 months

Reimplementation of EfficientZeroV2 (Atari Breakout 100K). While I couldn’t fully reproduce it due to limited computational resources, it still delivered impressive performance with just 100K frames — one of the most sample-efficient RL methods.