Why the loss is divided by n, not n^2?

```
# img_emb : image model embedding [n, dim]
# txt_emb : text model embedding [n, dim]
# t_prime, b : learnable temperature and bias
# n : mini-batch size

t = exp(t_prime)
z_img = l2_normalize(img_emb)
z_txt = l2_normalize(txt_emb)
logits = dot(z_img, z_txt.T) * t + b
labels = 2 * eye(n) - ones(n) # -1 with diagonal 1
l = -sum(log_sigmoid(labels * logits)) / n
```

Why is the loss divided by n instead of n^2 after summation? As n increases, the number of negative samples grows, causing the computed averaged loss value to become larger and larger.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why the loss is divided by n, not n^2? #179

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Why the loss is divided by n, not n^2? #179

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions