Daniel Palenicek @DPalenicek X Profile

Daniel Palenicek

@DPalenicek

Followers

324

Following

246

Media

14

Statuses

63

PhD Researcher in Robot #ReinforcementLearning 🤖🧠 at @ias_tudarmstadt and @Hessian_AI advised by @Jan_R_Peters. Former intern: @Bosch_AI and @Huawei R&D UK

Darmstadt, Germany

Joined July 2020

Don't wanna be here? Send us removal request.

Daniel Palenicek

@DPalenicek

1 month

Beyond excited to share that CrossQ+WN has been accepted at #NeurIPS2025 @Hessian_AI @ias_tudarmstadt @CS_TUDarmstadt @DFKI @NeurIPSConf

Daniel Palenicek

@DPalenicek

5 months

🚀 New preprint "Scaling Off-Policy Reinforcement Learning with Batch and Weight Normalization"🤖 We propose CrossQ+WN, a simple yet powerful off-policy RL for more sample-efficiency and scalability to higher update-to-data ratios. 🧵 https://t.co/Z6QrMxZaPY #RL @ias_tudarmstadt

1

3

34

Théo Vincent

@Theo_Vincent_

3 days

Very happy to share that "iterated Q-Network (i-QN)" received a J2C Certification from TMLR🥳 What is i-QN about?👇 https://t.co/PhRg8xOA4R

Théo Vincent

@Theo_Vincent_

3 months

To increase the reward propagation in value-based RL algorithms, it is tempting to reduce the target update period🤔 But, this makes the training unstable💔 @RL_Conference, I will present i-QN, a new method that allows faster reward propagation, while keeping stability⚡️ 👉🧵

0

4

26

Théo Vincent

@Theo_Vincent_

16 days

Deep RL is sometimes mysterious...🧐 @YogeshTrip7354 and I found 2 bugs in the implementation of BBF, and... SURPRISE, the performances of the corrected version are the same as the faulty version! It is even worse for some games🤔 Details on the 2 bugs in🧵👇

3

6

39

Nico Bohlinger

@NicoBohlinger

24 days

⚡️ Can one unified policy control 10 million different robots and zero-shot transfer to completely unseen robots, even humanoids? 🔗 Yes! Checkout our paper: https://t.co/fUikut5ZNx If you are interested in massive multi-embodiment learning, come and chat with me at: - Today,

28

92

535

Daniel Palenicek

@DPalenicek

23 days

Read the full preprint here: 👉 https://t.co/3xgo4tpbAK Code coming soon. We’d love feedback & discussion! 💬

0

3

Daniel Palenicek

@DPalenicek

23 days

Key takeaway: Well-conditioned optimization > raw scale. XQC proves principled architecture choices can outperform larger, more complex ones

1

0

3

Daniel Palenicek

@DPalenicek

23 days

📊 Results across 70 tasks (55 proprioception + 15 vision-based): ⚡️ Matches/outperforms SimbaV2, BRO, BRC, MRQ, and DRQ-V2 🌿~4.5× fewer parameters and 1/10 FLOP/s of SimbaV2 💪Especially strong on the hardest tasks: HumanoidBench, DMC Hard & DMC Humanoids from pixels

1

0

1

Daniel Palenicek

@DPalenicek

23 days

This leads to XQC, a streamlined extension of Soft Actor-Critic with ✅ only 4 hidden layers ✅ BN after each linear layer ✅ WN projection ✅ CE critic loss Simplicity + principled design = efficiency ⚡️

1

0

1

Daniel Palenicek

@DPalenicek

23 days

🔑 Insight: A simple synergy—BatchNorm + WeightNorm + Cross-Entropy loss—makes critics dramatically more well-conditioned. ➡️Result: Stable effective learning rates and smoother optimization.

1

0

1

Daniel Palenicek

@DPalenicek

23 days

Instead of "bigger is better," we ask: Can better conditioning beat scaling? By analyzing the Hessian eigenspectrum of critic networks, we uncover how different architectural choices shape optimization landscapes.

1

0

1

Daniel Palenicek

@DPalenicek

23 days

🚀 New preprint! Introducing XQC— a simple, well-conditioned actor-critic that achieves SOTA sample efficiency in #RL ✅ ~4.5× fewer parameters than SimbaV2 ✅ Scales to vision-based RL 👉 https://t.co/3xgo4tpbAK Thanks to Florian Vogt @JoeMWatson @Jan_R_Peters @Hessian_AI @DFKI

1

4

12

Théo Vincent

@Theo_Vincent_

3 months

To increase the reward propagation in value-based RL algorithms, it is tempting to reduce the target update period🤔 But, this makes the training unstable💔 @RL_Conference, I will present i-QN, a new method that allows faster reward propagation, while keeping stability⚡️ 👉🧵

1

12

92

Tim Schneider

@timschneider94

3 months

Pushing for #icra but still missing real robot experiments? 😰 Skip the ROS headaches — get your Franka robot running in minutes with franky! 🦾 Super beginner-friendly, Pythonic, and fast to set up. 🔗 https://t.co/Modc3KX2TY @ias_tudarmstadt @Jan_R_Peters 🧵👇

1

16

67

Daniel Palenicek

@DPalenicek

3 months

Interested in training diffusion policies in online #RL? Then make sure to come to our #ICML2025 poster and talk to my colleague @onclk_ @ias_tudarmstadt @Hessian_AI @CS_TUDarmstadt @Jan_R_Peters

Onur Celik

@onclk_

3 months

🎓 Attending #ICML2025 and interested in training diffusion policies in online RL? Come chat with me about our work DIME: Diffusion-Based Maximum Entropy Reinforcement Learning at 📍 Poster W-719 (West Hall B2-B3) 🗓️ Wednesday, July 16 @ 4:30 p.m.

0

3

18

Daniel Palenicek

@DPalenicek

5 months

@JoeMWatson @Jan_R_Peters @Hessian_AI @ias_tudarmstadt @CS_TUDarmstadt @DFKI If you're working on RL stability, plasticity, or sample efficiency, this might be relevant for you. We'd love to hear your thoughts and feedback! Come talk to us at RLDM in June in Dublin ( https://t.co/KN7NPaitbp)

0

1

3

Daniel Palenicek

@DPalenicek

5 months

Thanks to my co-authors Florian Vogt, @JoeMWatson, @Jan_R_Peters @Hessian_AI @ias_tudarmstadt @CS_TUDarmstadt @DFKI #RL #ML #AI

1

2

Daniel Palenicek

@DPalenicek

5 months

📚 TL;DR: We combine BN + WN in CrossQ for stable high-UTD training and SOTA performance on challenging RL benchmarks. No need for network resets, no critic ensembles, no other tricks... Simple regularization, big gains. https://t.co/Z6QrMxZaPY

arxiv.org

Reinforcement learning has achieved significant milestones, but sample efficiency remains a bottleneck for real-world applications. Recently, CrossQ has demonstrated state-of-the-art sample...

1

0

Daniel Palenicek

@DPalenicek

5 months

Thanks to my co-authors Florian Vogt, @JoeMWatson, @Jan_R_Peters @ias_tudarmstadt @Hessian_AI @DFKI @ @CS_TUDarmstadt #RL #CrossQ

0

1

3

Daniel Palenicek

@DPalenicek

5 months

👋 We'd love to hear your thoughts and feedback! Come talk to us at RLDM in June in Dublin ( https://t.co/KN7NPaitbp) If you're working on RL stability, plasticity, or sample efficiency, this might be relevant for you.

1

0

2

Daniel Palenicek

@DPalenicek

5 months

⚖️ Simpler ≠ Weaker: Compared to SOTA baselines like BRO our method: ✅ Needs 90% fewer parameters (~600K vs. 5M) ✅ Avoids parameter resets ✅ Scales stably with compute. We also compare strongly to the concurrent SIMBA algorithm. No tricks—just principled normalization. ✨

1

0

1