# Loss
Requires polishing
# Compare and contrast of BinaryCrossEntropy and CrossEntropy
# Definition
Definition of cross entropy
, where is cross entropy, is entropy and is KL distance.
For discrete probability distribution p and q:
Considering binary classification problem with an positive instance:
- BinaryCrossEntropy: FC layer output 1 logits, activated by sigmoid, gives the probability a, and the calculation is
- CrossEntropy: FC layer output 2 logits, activated by softmax, gives the probability (1-b, b), and the calculation is
Considering X(X>=3) classification problem with an instance(class 1):
- CrossEntropy: FC layer output X logits, activated by softmax, gives the probability (n1, n2, ..., nx), and the calculation is
TIP
sum of probabilities of softmax is 1, as one instance can only have one label
Considering X(X>=2) multi-label classification problem, if the instance only has a single label with class 1:
- BinaryCrossEntropy: FC layer output X logits, activated by sigmoid, gives the probability (m1, m2, ...., mx), and the calculation is
TIP
probabilities are independent with each other, as one instance can have multiple labels
"""
Considering binary classification and for a single instance with target 1
"""
target = [1.0] # loss computation requires float
predictions = [0.4]
# bce way 1
bce_fn = tf.keras.losses.BinaryCrossentropy(from_logits=False)
bce = bce_fn(target, predictions)
# bce way 2
bce = tf.keras.losses.binary_crossentropy(target, predictions, from_logits=False)
num_classes = 2
one_hot_target = tf.one_hot(target, 2) # [0., 1.]
predictions = [0.4, 0.6] # sum to 1; if not, tf will normalize them
# ce way 1
ce_fn = tf.keras.losses.CategoricalCrossentropy(from_logits=False)
ce = ce_fn(one_hot_target, predictions)
# ce way 2
ce = tf.keras.losses.categorical_crossentropy(one_hot_target, predictions, from_logits=False)
# ce way 3
ce_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False)
ce = ce_fn(target, predictions)
# ce way 4
ce = tf.keras.losses.sparse_categorical_crossentropy(target, predictions)
# Problems with gradient stability
WARNING
if softmax/sigmoid first then compute log:
- log of 0 might happen
- exp of large positive might happen
To solve gradient stability problem for softmax, use tf.nn.log_softmax
as per link
To solve gradient stability problem for sigmoid, use tf.nn.sigmoid_cross_entropy_with_logits
as per link
# Imbalanced Classification
At sample level, given fixed model structure: class A weight
= total sample
/ class A number
is good to use. However
At model level, given n
fully connected outputs, event though class of each unit of n
is imbalanced, it not good to add weights.