-
Notifications
You must be signed in to change notification settings - Fork 192
Open
Description
# img_emb : image model embedding [n, dim]
# txt_emb : text model embedding [n, dim]
# t_prime, b : learnable temperature and bias
# n : mini-batch size
t = exp(t_prime)
z_img = l2_normalize(img_emb)
z_txt = l2_normalize(txt_emb)
logits = dot(z_img, z_txt.T) * t + b
labels = 2 * eye(n) - ones(n) # -1 with diagonal 1
l = -sum(log_sigmoid(labels * logits)) / n
Why is the loss divided by n instead of n^2 after summation? As n increases, the number of negative samples grows, causing the computed averaged loss value to become larger and larger.
Metadata
Metadata
Assignees
Labels
No labels