Add modified Wasserstein loss in the Discriminator #25

Giulero · 2025-07-17T15:11:05Z

This pull request introduces support for a new loss type (Wasserstein loss).

Note that the Wasserstein GAN formulation minimizes as loss

$$ L = D(fake) - D(true). $$

Empirically, I found out that if the last layer of the discriminator is a tanh(), the training is smoother and more stable.

The reward is then

$$ r = w \cdot \text{exp}(\eta \cdot D(fake)). $$

AMP PPO Algorithm Updates:

Removed redundant methods discriminator_policy_loss and discriminator_expert_loss from amp_ppo.py. Loss computation is now delegated to the Discriminator class for better encapsulation.
Updated the update method in amp_ppo.py to use the new compute_loss method from the Discriminator class, simplifying the logic for AMP loss and gradient penalty computation.

Discriminator Enhancements:

Added support for a new loss type, "Wasserstein", alongside the existing "BCEWithLogits". Introduced parameters loss_type and eta_wgan in the Discriminator class to configure the loss function. [1] [2]
Refactored the Discriminator class to include modular methods for policy_loss, expert_loss, and a unified compute_loss method. This centralizes loss computation and supports both loss types.
Reintroduced and updated the compute_grad_pen method to handle gradient penalty computation for both BCE and Wasserstein loss types. Added a new wgan_loss method for Wasserstein loss.

Runner Updates:

Updated the amp_on_policy_runner.py to pass the loss_type configuration to the Discriminator during initialization, enabling runtime selection of the loss function.
Minor formatting fix in the load method for improved readability.

… (maybe)

DIvide over disc output std the reward (to stabilize)

Co-authored-by: Copilot <[email protected]>

Copilot

Pull Request Overview

This PR introduces a new Wasserstein-style loss option for the AMP Discriminator and centralizes loss computation within the Discriminator class, simplifying the AMP PPO update logic.

Add loss_type and eta_wgan parameters to support both BCEWithLogits and modified Wasserstein losses.
Refactor discriminator loss and gradient‐penalty computations into Discriminator.compute_loss.
Update the on-policy runner and amp_ppo algorithm to use the new loss API.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
amp_rsl_rl/runners/amp_on_policy_runner.py	Migrate to named args in Discriminator init, pass `loss_type` from config.
amp_rsl_rl/networks/discriminator.py	Implement `loss_type`/`eta_wgan`, modularize policy/expert/WGAN losses, update gradient penalty.
amp_rsl_rl/algorithms/amp_ppo.py	Remove redundant discriminator loss methods and call `compute_loss` instead.

Comments suppressed due to low confidence (3)

amp_rsl_rl/runners/amp_on_policy_runner.py:175

The eta_wgan parameter is not passed from the runner’s configuration to the Discriminator, preventing users from customizing the WGAN scaling. Consider adding eta_wgan=self.discriminator_cfg.get("eta_wgan", 0.3) when initializing the Discriminator.

            loss_type=self.discriminator_cfg["loss_type"],

amp_rsl_rl/networks/discriminator.py:32

The class docstring is not updated to reflect the new loss_type and eta_wgan constructor parameters. Please update the docstring so users know how to configure these options.

        loss_type: str = "BCEWithLogits",

amp_rsl_rl/networks/discriminator.py:182

[nitpick] Consider renaming the lambda_ parameter to a more descriptive name (e.g., penalty_coeff or gp_lambda) to improve readability and avoid confusion with the Python keyword lambda.

        lambda_: float = 10,

Copilot · 2025-07-17T15:17:05Z

amp_rsl_rl/networks/discriminator.py

+        elif self.loss_type == "Wasserstein":
+            self.loss_fun = None
+            self.eta_wgan = eta_wgan
+            print("The Wasserstein-like loss is experimental")


[nitpick] Use a logger (e.g., Python's logging module) or warnings.warn instead of print for runtime warnings to ensure consistent logging behavior and better control over output formatting.

Suggested change

print("The Wasserstein-like loss is experimental")

warnings.warn("The Wasserstein-like loss is experimental", UserWarning)

Giulero and others added 7 commits July 10, 2025 20:28

Add WGAN support and loss computation to Discriminator class

87cb23e

Clear comments

67fb763

Remove comments

bba0216

Discriminator initialization parameters

dd46772

Refine loss computation in Discriminator class for improved stability…

f884c7f

… (maybe)

Refactor loss type handling.

d83b0fb

DIvide over disc output std the reward (to stabilize)

Improve comments

8f6293f

Giulero requested review from Copilot and GiulioRomualdi July 17, 2025 15:11

Remove comment

3d61661

This comment was marked as outdated.

Sign in to view

Giulero and others added 2 commits July 17, 2025 17:13

Remove comment

5800b72

Co-authored-by: Copilot <[email protected]>

Fix comment

347f2ce

Co-authored-by: Copilot <[email protected]>

Giulero requested a review from Copilot July 17, 2025 15:16

Copilot AI reviewed Jul 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add modified Wasserstein loss in the Discriminator #25

Add modified Wasserstein loss in the Discriminator #25

Uh oh!

Giulero commented Jul 17, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jul 17, 2025

Uh oh!

Uh oh!

	print("The Wasserstein-like loss is experimental")
	warnings.warn("The Wasserstein-like loss is experimental", UserWarning)

Add modified Wasserstein loss in the Discriminator #25

Are you sure you want to change the base?

Add modified Wasserstein loss in the Discriminator #25

Uh oh!

Conversation

Giulero commented Jul 17, 2025

AMP PPO Algorithm Updates:

Discriminator Enhancements:

Runner Updates:

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!