Skip to content

Add mean KL divergence logging #24

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 14, 2025
Merged

Add mean KL divergence logging #24

merged 1 commit into from
Jul 14, 2025

Conversation

Giulero
Copy link
Contributor

@Giulero Giulero commented Jul 14, 2025

This pull request introduces a new metric, mean_kl_divergence, to the AMP PPO algorithm and its associated runner. The changes ensure that this metric is calculated, logged, and integrated into the training loop, providing additional insights into the divergence between policy and expert distributions.

Updates to amp_ppo.py (Algorithm Enhancements):

  • Added mean_kl_divergence as a new variable to track the average KL divergence during training. This metric is updated incrementally during the mini-batch processing and normalized after all updates. [1] [2] [3]
  • Included mean_kl_divergence in the tuple returned by the update method, ensuring it is accessible to the runner.

Updates to amp_on_policy_runner.py (Runner Integration):

  • Updated the update_run_name_with_sequence method to unpack and use the mean_kl_divergence metric from the algorithm's update method.
  • Enhanced the log method to record mean_kl_divergence in TensorBoard under the "Loss" category, providing visibility into this new metric during training.

@Giulero Giulero requested review from Copilot and GiulioRomualdi and removed request for Copilot July 14, 2025 13:28
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds tracking and logging of the mean KL divergence computed during each PPO update.

  • Introduce a mean_kl_divergence accumulator in amp_ppo.update()
  • Normalize and return the KL divergence metric alongside existing update stats
  • Unpack and log the new mean_kl_divergence scalar in amp_on_policy_runner.py

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
amp_rsl_rl/algorithms/amp_ppo.py Add mean_kl_divergence initialization, accumulation, normalization, and include it in the update return tuple
amp_rsl_rl/runners/amp_on_policy_runner.py Unpack the new KL divergence value from update() and add it to tensorboard logging
Comments suppressed due to low confidence (2)

amp_rsl_rl/runners/amp_on_policy_runner.py:486

  • Consider adding or updating unit/integration tests to verify that mean_kl_divergence is correctly produced by update() and logged by the runner.
        )

amp_rsl_rl/algorithms/amp_ppo.py:368

  • The update() method’s docstring and type annotation should be updated to describe the new mean_kl_divergence return value so consumers know what each tuple element represents.
        mean_kl_divergence: float = 0.0

@GiulioRomualdi GiulioRomualdi merged commit 6d6e530 into main Jul 14, 2025
@GiulioRomualdi GiulioRomualdi deleted the kl_div_log branch July 14, 2025 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants