Add mean KL divergence logging #24

Giulero · 2025-07-14T13:28:50Z

This pull request introduces a new metric, mean_kl_divergence, to the AMP PPO algorithm and its associated runner. The changes ensure that this metric is calculated, logged, and integrated into the training loop, providing additional insights into the divergence between policy and expert distributions.

Updates to `amp_ppo.py` (Algorithm Enhancements):

Added mean_kl_divergence as a new variable to track the average KL divergence during training. This metric is updated incrementally during the mini-batch processing and normalized after all updates. [1] [2] [3]
Included mean_kl_divergence in the tuple returned by the update method, ensuring it is accessible to the runner.

Updates to `amp_on_policy_runner.py` (Runner Integration):

Updated the update_run_name_with_sequence method to unpack and use the mean_kl_divergence metric from the algorithm's update method.
Enhanced the log method to record mean_kl_divergence in TensorBoard under the "Loss" category, providing visibility into this new metric during training.

Copilot

Pull Request Overview

This PR adds tracking and logging of the mean KL divergence computed during each PPO update.

Introduce a mean_kl_divergence accumulator in amp_ppo.update()
Normalize and return the KL divergence metric alongside existing update stats
Unpack and log the new mean_kl_divergence scalar in amp_on_policy_runner.py

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
amp_rsl_rl/algorithms/amp_ppo.py	Add `mean_kl_divergence` initialization, accumulation, normalization, and include it in the update return tuple
amp_rsl_rl/runners/amp_on_policy_runner.py	Unpack the new KL divergence value from `update()` and add it to tensorboard logging

Comments suppressed due to low confidence (2)

amp_rsl_rl/runners/amp_on_policy_runner.py:486

Consider adding or updating unit/integration tests to verify that mean_kl_divergence is correctly produced by update() and logged by the runner.

amp_rsl_rl/algorithms/amp_ppo.py:368

The update() method’s docstring and type annotation should be updated to describe the new mean_kl_divergence return value so consumers know what each tuple element represents.

        mean_kl_divergence: float = 0.0

amp_rsl_rl/algorithms/amp_ppo.py

Add mean KL divergence logging

46a3846

Giulero requested review from Copilot and GiulioRomualdi and removed request for Copilot July 14, 2025 13:28

Copilot AI reviewed Jul 14, 2025

View reviewed changes

amp_rsl_rl/algorithms/amp_ppo.py Show resolved Hide resolved

GiulioRomualdi approved these changes Jul 14, 2025

View reviewed changes

GiulioRomualdi merged commit 6d6e530 into main Jul 14, 2025

GiulioRomualdi deleted the kl_div_log branch July 14, 2025 18:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add mean KL divergence logging #24

Add mean KL divergence logging #24

Uh oh!

Giulero commented Jul 14, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Add mean KL divergence logging #24

Add mean KL divergence logging #24

Uh oh!

Conversation

Giulero commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Updates to amp_ppo.py (Algorithm Enhancements):

Updates to amp_on_policy_runner.py (Runner Integration):

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Giulero commented Jul 14, 2025 •

edited

Loading

Updates to `amp_ppo.py` (Algorithm Enhancements):

Updates to `amp_on_policy_runner.py` (Runner Integration):