Mean absolute error keras

Computes the mean of absolute difference between labels and predictions.

Computes the mean of absolute difference between labels and predictions.

Inherits From: Loss

View aliases

Main aliases

tf.losses.MeanAbsoluteError

Compat aliases for migration

See
Migration guide for
more details.

tf.compat.v1.keras.losses.MeanAbsoluteError

tf.keras.losses.MeanAbsoluteError(
    reduction=losses_utils.ReductionV2.AUTO,
    name='mean_absolute_error'
)

Used in the notebooks

Used in the tutorials
  • Generate Artificial Faces with CelebA Progressive GAN Model

loss = abs(y_true - y_pred)

Standalone usage:

y_true = [[0., 1.], [0., 0.]]
y_pred = [[1., 1.], [1., 0.]]
# Using 'auto'/'sum_over_batch_size' reduction type.
mae = tf.keras.losses.MeanAbsoluteError()
mae(y_true, y_pred).numpy()
0.5
# Calling with 'sample_weight'.
mae(y_true, y_pred, sample_weight=[0.7, 0.3]).numpy()
0.25
# Using 'sum' reduction type.
mae = tf.keras.losses.MeanAbsoluteError(
    reduction=tf.keras.losses.Reduction.SUM)
mae(y_true, y_pred).numpy()
1.0
# Using 'none' reduction type.
mae = tf.keras.losses.MeanAbsoluteError(
    reduction=tf.keras.losses.Reduction.NONE)
mae(y_true, y_pred).numpy()
array([0.5, 0.5], dtype=float32)

Usage with the compile() API:

model.compile(optimizer='sgd', loss=tf.keras.losses.MeanAbsoluteError())

Args

reduction Type of tf.keras.losses.Reduction to apply to
loss. Default value is AUTO. AUTO indicates that the reduction
option will be determined by the usage context. For almost all cases
this defaults to SUM_OVER_BATCH_SIZE. When used with
tf.distribute.Strategy, outside of built-in training loops such as
tf.keras compile and fit, using AUTO or
SUM_OVER_BATCH_SIZE will raise an error. Please see this custom
training tutorial for
more details.
name Optional name for the instance. Defaults to
‘mean_absolute_error’.

Methods

from_config

View source

@classmethod
from_config(
    config
)

Instantiates a Loss from its config (output of get_config()).

Args
config Output of get_config().
Returns
A keras.losses.Loss instance.

get_config

View source

get_config()

Returns the config dictionary for a Loss instance.

__call__

View source

__call__(
    y_true, y_pred, sample_weight=None
)

Invokes the Loss instance.

Args
y_true Ground truth values. shape = [batch_size, d0, .. dN], except
sparse loss functions such as sparse categorical crossentropy where
shape = [batch_size, d0, .. dN-1]
y_pred The predicted values. shape = [batch_size, d0, .. dN]
sample_weight Optional sample_weight acts as a coefficient for the
loss. If a scalar is provided, then the loss is simply scaled by the
given value. If sample_weight is a tensor of size [batch_size],
then the total loss for each sample of the batch is rescaled by the
corresponding element in the sample_weight vector. If the shape of
sample_weight is [batch_size, d0, .. dN-1] (or can be
broadcasted to this shape), then each loss element of y_pred is
scaled by the corresponding value of sample_weight. (Note
ondN-1: all loss functions reduce by 1 dimension, usually
axis=-1.)
Returns
Weighted loss float Tensor. If reduction is NONE, this has
shape [batch_size, d0, .. dN-1]; otherwise, it is scalar. (Note
dN-1 because all loss functions reduce by 1 dimension, usually
axis=-1.)
Raises
ValueError If the shape of sample_weight is invalid.

Использование функций потерь

Функция потерь (или объективная функция, или функция оценки результатов оптимизации) является одним из двух параметров, необходимых для компиляции модели:

model.compile(loss=’mean_squared_error’, optimizer=’sgd’)
from keras import losses
model.compile(loss=losses.mean_squared_error, optimizer=’sgd’)

Можно либо передать имя существующей функции потерь, либо передать символическую функцию TensorFlow/Theano, которая возвращает скаляр для каждой точки данных и принимает следующие два аргумента:

y_true: истинные метки. Тензор TensorFlow/Theano.

y_pred: Прогнозы. Тензор TensorFlow/Theano той же формы, что и y_true.

Фактически оптимизированная цель — это среднее значение выходного массива по всем точкам данных.

Доступные функции потери

mean_squared_error

keras.losses.mean_squared_error(y_true, y_pred)


mean_absolute_error

keras.losses.mean_absolute_error(y_true, y_pred)


mean_absolute_percentage_error

keras.losses.mean_absolute_percentage_error(y_true, y_pred)


mean_squared_logarithmic_error

keras.losses.mean_squared_logarithmic_error(y_true, y_pred)


squared_hinge

keras.losses.squared_hinge(y_true, y_pred)


hinge

keras.losses.hinge(y_true, y_pred)


categorical_hinge

keras.losses.categorical_hinge(y_true, y_pred)


logcosh

keras.losses.logcosh(y_true, y_pred)

Логарифм гиперболического косинуса ошибки прогнозирования.

log(cosh(x)) приблизительно равен (x ** 2) / 2 для малого x и  abs(x) — log(2) для большого x. Это означает, что ‘logcosh’ работает в основном как средняя квадратичная ошибка, но не будет так сильно зависеть от случайного сильно неправильного предсказания.

Аргументы

  • y_true: тензор истинных целей.
  • y_pred: тензор прогнозируемых целей.

Возвращает

Тензор с одной записью о скалярной потере на каждый сэмпл.


huber_loss

keras.losses.huber_loss(y_true, y_pred, delta=1.0)


categorical_crossentropy

keras.losses.categorical_crossentropy(y_true, y_pred, from_logits=False, label_smoothing=0)


sparse_categorical_crossentropy

keras.losses.sparse_categorical_crossentropy(y_true, y_pred, from_logits=False, axis=-1)


binary_crossentropy

keras.losses.binary_crossentropy(y_true, y_pred, from_logits=False, label_smoothing=0)


kullback_leibler_divergence

keras.losses.kullback_leibler_divergence(y_true, y_pred)


poisson

keras.losses.poisson(y_true, y_pred)


cosine_proximity

keras.losses.cosine_proximity(y_true, y_pred, axis=-1)


is_categorical_crossentropy

keras.losses.is_categorical_crossentropy(loss)


Примечание: при использовании потери categorical_crossentropy ваши данные должны быть в категориальном формате (например, если у вас 10 классов, то целью для каждой выборки должен быть 10-мерный вектор, который является полностью нулевым, за исключением 1 в индексе, соответствующем классу выборки). Для того, чтобы преобразовать целые данные в категорические, можно использовать утилиту Keras to_categorical:

from keras.utils import to_categorical
categorical_labels = to_categorical(int_labels, num_classes=None)

При использовании переменной sparse_categorical_crossentropy loss, ваши данные должны быть целыми. Если у вас есть категориальные данные, следует использовать categoryical_crossentropy.

categoryical_crossentropy — это еще один термин для обозначения потери лога по нескольким классам.

# Copyright 2015 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the «License»); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an «AS IS» BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # ============================================================================== «»»Built-in loss functions.»»» import abc import functools import warnings import tensorflow.compat.v2 as tf from keras import backend from keras.saving import saving_lib from keras.saving.legacy import serialization as legacy_serialization from keras.saving.legacy.serialization import deserialize_keras_object from keras.saving.legacy.serialization import serialize_keras_object from keras.utils import losses_utils from keras.utils import tf_utils # isort: off from tensorflow.python.ops.ragged import ragged_map_ops from tensorflow.python.ops.ragged import ragged_util from tensorflow.python.util import dispatch from tensorflow.python.util.tf_export import keras_export from tensorflow.tools.docs import doc_controls @keras_export(«keras.losses.Loss») class Loss: «»»Loss base class. To be implemented by subclasses: * `call()`: Contains the logic for loss calculation using `y_true`, `y_pred`. Example subclass implementation: «`python class MeanSquaredError(Loss): def call(self, y_true, y_pred): return tf.reduce_mean(tf.math.square(y_pred — y_true), axis=-1) «` When using a Loss under a `tf.distribute.Strategy`, except passing it to `Model.compile()` for use by `Model.fit()`, please use reduction types ‘SUM’ or ‘NONE’, and reduce losses explicitly. Using ‘AUTO’ or ‘SUM_OVER_BATCH_SIZE’ will raise an error when calling the Loss object from a custom training loop or from user-defined code in `Layer.call()`. Please see this custom training [tutorial](https://www.tensorflow.org/tutorials/distribute/custom_training) for more details on this. «»» def __init__(self, reduction=losses_utils.ReductionV2.AUTO, name=None): «»»Initializes `Loss` class. Args: reduction: Type of `tf.keras.losses.Reduction` to apply to loss. Default value is `AUTO`. `AUTO` indicates that the reduction option will be determined by the usage context. For almost all cases this defaults to `SUM_OVER_BATCH_SIZE`. When used under a `tf.distribute.Strategy`, except via `Model.compile()` and `Model.fit()`, using `AUTO` or `SUM_OVER_BATCH_SIZE` will raise an error. Please see this custom training [tutorial]( https://www.tensorflow.org/tutorials/distribute/custom_training) for more details. name: Optional name for the instance. «»» losses_utils.ReductionV2.validate(reduction) self.reduction = reduction self.name = name # SUM_OVER_BATCH is only allowed in losses managed by `fit` or # CannedEstimators. self._allow_sum_over_batch_size = False self._set_name_scope() def _set_name_scope(self): «»»Creates a valid `name_scope` name.»»» if self.name is None: self._name_scope = self.__class__.__name__.strip(«_») elif self.name == «<lambda>»: self._name_scope = «lambda» else: # E.g. ‘_my_loss’ => ‘my_loss’ self._name_scope = self.name.strip(«_») def __call__(self, y_true, y_pred, sample_weight=None): «»»Invokes the `Loss` instance. Args: y_true: Ground truth values. shape = `[batch_size, d0, .. dN]`, except sparse loss functions such as sparse categorical crossentropy where shape = `[batch_size, d0, .. dN-1]` y_pred: The predicted values. shape = `[batch_size, d0, .. dN]` sample_weight: Optional `sample_weight` acts as a coefficient for the loss. If a scalar is provided, then the loss is simply scaled by the given value. If `sample_weight` is a tensor of size `[batch_size]`, then the total loss for each sample of the batch is rescaled by the corresponding element in the `sample_weight` vector. If the shape of `sample_weight` is `[batch_size, d0, .. dN-1]` (or can be broadcasted to this shape), then each loss element of `y_pred` is scaled by the corresponding value of `sample_weight`. (Note on`dN-1`: all loss functions reduce by 1 dimension, usually axis=-1.) Returns: Weighted loss float `Tensor`. If `reduction` is `NONE`, this has shape `[batch_size, d0, .. dN-1]`; otherwise, it is scalar. (Note `dN-1` because all loss functions reduce by 1 dimension, usually axis=-1.) Raises: ValueError: If the shape of `sample_weight` is invalid. «»» # If we are wrapping a lambda function strip ‘<>’ from the name as it is # not accepted in scope name. graph_ctx = tf_utils.graph_context_for_symbolic_tensors( y_true, y_pred, sample_weight ) with backend.name_scope(self._name_scope), graph_ctx: if tf.executing_eagerly(): call_fn = self.call else: call_fn = tf.__internal__.autograph.tf_convert( self.call, tf.__internal__.autograph.control_status_ctx() ) losses = call_fn(y_true, y_pred) in_mask = losses_utils.get_mask(y_pred) out_mask = losses_utils.get_mask(losses) if in_mask is not None and out_mask is not None: mask = in_mask & out_mask elif in_mask is not None: mask = in_mask elif out_mask is not None: mask = out_mask else: mask = None reduction = self._get_reduction() sample_weight = losses_utils.apply_valid_mask( losses, sample_weight, mask, reduction ) return losses_utils.compute_weighted_loss( losses, sample_weight, reduction=reduction ) @classmethod def from_config(cls, config): «»»Instantiates a `Loss` from its config (output of `get_config()`). Args: config: Output of `get_config()`. Returns: A `Loss` instance. «»» return cls(**config) def get_config(self): «»»Returns the config dictionary for a `Loss` instance.»»» return {«reduction»: self.reduction, «name»: self.name} @abc.abstractmethod @doc_controls.for_subclass_implementers def call(self, y_true, y_pred): «»»Invokes the `Loss` instance. Args: y_true: Ground truth values. shape = `[batch_size, d0, .. dN]`, except sparse loss functions such as sparse categorical crossentropy where shape = `[batch_size, d0, .. dN-1]` y_pred: The predicted values. shape = `[batch_size, d0, .. dN]` Returns: Loss values with the shape `[batch_size, d0, .. dN-1]`. «»» raise NotImplementedError(«Must be implemented in subclasses.») def _get_reduction(self): «»»Handles `AUTO` reduction cases and returns the reduction value.»»» if ( not self._allow_sum_over_batch_size and tf.distribute.has_strategy() and ( self.reduction == losses_utils.ReductionV2.AUTO or self.reduction == losses_utils.ReductionV2.SUM_OVER_BATCH_SIZE ) ): raise ValueError( «Please use `tf.keras.losses.Reduction.SUM` or « «`tf.keras.losses.Reduction.NONE` for loss reduction when « «losses are used with `tf.distribute.Strategy`, « «except for specifying losses in `Model.compile()` « «for use by the built-in training looop `Model.fit()`.n« «Please see https://www.tensorflow.org/tutorials» «/distribute/custom_training for more details.» ) if self.reduction == losses_utils.ReductionV2.AUTO: return losses_utils.ReductionV2.SUM_OVER_BATCH_SIZE return self.reduction @keras_export(«keras.__internal__.losses.LossFunctionWrapper», v1=[]) class LossFunctionWrapper(Loss): «»»Wraps a loss function in the `Loss` class.»»» def __init__( self, fn, reduction=losses_utils.ReductionV2.AUTO, name=None, **kwargs ): «»»Initializes `LossFunctionWrapper` class. Args: fn: The loss function to wrap, with signature `fn(y_true, y_pred, **kwargs)`. reduction: Type of `tf.keras.losses.Reduction` to apply to loss. Default value is `AUTO`. `AUTO` indicates that the reduction option will be determined by the usage context. For almost all cases this defaults to `SUM_OVER_BATCH_SIZE`. When used under a `tf.distribute.Strategy`, except via `Model.compile()` and `Model.fit()`, using `AUTO` or `SUM_OVER_BATCH_SIZE` will raise an error. Please see this custom training [tutorial]( https://www.tensorflow.org/tutorials/distribute/custom_training) for more details. name: Optional name for the instance. **kwargs: The keyword arguments that are passed on to `fn`. «»» super().__init__(reduction=reduction, name=name) self.fn = fn self._fn_kwargs = kwargs def call(self, y_true, y_pred): «»»Invokes the `LossFunctionWrapper` instance. Args: y_true: Ground truth values. y_pred: The predicted values. Returns: Loss values per sample. «»» if tf.is_tensor(y_pred) and tf.is_tensor(y_true): y_pred, y_true = losses_utils.squeeze_or_expand_dimensions( y_pred, y_true ) ag_fn = tf.__internal__.autograph.tf_convert( self.fn, tf.__internal__.autograph.control_status_ctx() ) return ag_fn(y_true, y_pred, **self._fn_kwargs) def get_config(self): config = {} for k, v in self._fn_kwargs.items(): config[k] = ( backend.eval(v) if tf_utils.is_tensor_or_variable(v) else v ) if saving_lib.saving_v3_enabled(): from keras.utils import get_registered_name config[«fn»] = get_registered_name(self.fn) base_config = super().get_config() return dict(list(base_config.items()) + list(config.items())) @classmethod def from_config(cls, config): «»»Instantiates a `Loss` from its config (output of `get_config()`). Args: config: Output of `get_config()`. Returns: A `keras.losses.Loss` instance. «»» if saving_lib.saving_v3_enabled(): fn_name = config.pop(«fn», None) if fn_name and cls is LossFunctionWrapper: config[«fn»] = get(fn_name) return cls(**config) @keras_export(«keras.losses.MeanSquaredError») class MeanSquaredError(LossFunctionWrapper): «»»Computes the mean of squares of errors between labels and predictions. `loss = square(y_true — y_pred)` Standalone usage: >>> y_true = [[0., 1.], [0., 0.]] >>> y_pred = [[1., 1.], [1., 0.]] >>> # Using ‘auto’/’sum_over_batch_size’ reduction type. >>> mse = tf.keras.losses.MeanSquaredError() >>> mse(y_true, y_pred).numpy() 0.5 >>> # Calling with ‘sample_weight’. >>> mse(y_true, y_pred, sample_weight=[0.7, 0.3]).numpy() 0.25 >>> # Using ‘sum’ reduction type. >>> mse = tf.keras.losses.MeanSquaredError( … reduction=tf.keras.losses.Reduction.SUM) >>> mse(y_true, y_pred).numpy() 1.0 >>> # Using ‘none’ reduction type. >>> mse = tf.keras.losses.MeanSquaredError( … reduction=tf.keras.losses.Reduction.NONE) >>> mse(y_true, y_pred).numpy() array([0.5, 0.5], dtype=float32) Usage with the `compile()` API: «`python model.compile(optimizer=’sgd’, loss=tf.keras.losses.MeanSquaredError()) «` «»» def __init__( self, reduction=losses_utils.ReductionV2.AUTO, name=«mean_squared_error» ): «»»Initializes `MeanSquaredError` instance. Args: reduction: Type of `tf.keras.losses.Reduction` to apply to loss. Default value is `AUTO`. `AUTO` indicates that the reduction option will be determined by the usage context. For almost all cases this defaults to `SUM_OVER_BATCH_SIZE`. When used under a `tf.distribute.Strategy`, except via `Model.compile()` and `Model.fit()`, using `AUTO` or `SUM_OVER_BATCH_SIZE` will raise an error. Please see this custom training [tutorial]( https://www.tensorflow.org/tutorials/distribute/custom_training) for more details. name: Optional name for the instance. Defaults to ‘mean_squared_error’. «»» super().__init__(mean_squared_error, name=name, reduction=reduction) @keras_export(«keras.losses.MeanAbsoluteError») class MeanAbsoluteError(LossFunctionWrapper): «»»Computes the mean of absolute difference between labels and predictions. `loss = abs(y_true — y_pred)` Standalone usage: >>> y_true = [[0., 1.], [0., 0.]] >>> y_pred = [[1., 1.], [1., 0.]] >>> # Using ‘auto’/’sum_over_batch_size’ reduction type. >>> mae = tf.keras.losses.MeanAbsoluteError() >>> mae(y_true, y_pred).numpy() 0.5 >>> # Calling with ‘sample_weight’. >>> mae(y_true, y_pred, sample_weight=[0.7, 0.3]).numpy() 0.25 >>> # Using ‘sum’ reduction type. >>> mae = tf.keras.losses.MeanAbsoluteError( … reduction=tf.keras.losses.Reduction.SUM) >>> mae(y_true, y_pred).numpy() 1.0 >>> # Using ‘none’ reduction type. >>> mae = tf.keras.losses.MeanAbsoluteError( … reduction=tf.keras.losses.Reduction.NONE) >>> mae(y_true, y_pred).numpy() array([0.5, 0.5], dtype=float32) Usage with the `compile()` API: «`python model.compile(optimizer=’sgd’, loss=tf.keras.losses.MeanAbsoluteError()) «` «»» def __init__( self, reduction=losses_utils.ReductionV2.AUTO, name=«mean_absolute_error», ): «»»Initializes `MeanAbsoluteError` instance. Args: reduction: Type of `tf.keras.losses.Reduction` to apply to loss. Default value is `AUTO`. `AUTO` indicates that the reduction option will be determined by the usage context. For almost all cases this defaults to `SUM_OVER_BATCH_SIZE`. When used under a `tf.distribute.Strategy`, except via `Model.compile()` and `Model.fit()`, using `AUTO` or `SUM_OVER_BATCH_SIZE` will raise an error. Please see this custom training [tutorial]( https://www.tensorflow.org/tutorials/distribute/custom_training) for more details. name: Optional name for the instance. Defaults to ‘mean_absolute_error’. «»» super().__init__(mean_absolute_error, name=name, reduction=reduction) @keras_export(«keras.losses.MeanAbsolutePercentageError») class MeanAbsolutePercentageError(LossFunctionWrapper): «»»Computes the mean absolute percentage error between `y_true` & `y_pred`. Formula: `loss = 100 * abs((y_true — y_pred) / y_true)` Note that to avoid dividing by zero, a small epsilon value is added to the denominator. Standalone usage: >>> y_true = [[2., 1.], [2., 3.]] >>> y_pred = [[1., 1.], [1., 0.]] >>> # Using ‘auto’/’sum_over_batch_size’ reduction type. >>> mape = tf.keras.losses.MeanAbsolutePercentageError() >>> mape(y_true, y_pred).numpy() 50. >>> # Calling with ‘sample_weight’. >>> mape(y_true, y_pred, sample_weight=[0.7, 0.3]).numpy() 20. >>> # Using ‘sum’ reduction type. >>> mape = tf.keras.losses.MeanAbsolutePercentageError( … reduction=tf.keras.losses.Reduction.SUM) >>> mape(y_true, y_pred).numpy() 100. >>> # Using ‘none’ reduction type. >>> mape = tf.keras.losses.MeanAbsolutePercentageError( … reduction=tf.keras.losses.Reduction.NONE) >>> mape(y_true, y_pred).numpy() array([25., 75.], dtype=float32) Usage with the `compile()` API: «`python model.compile(optimizer=’sgd’, loss=tf.keras.losses.MeanAbsolutePercentageError()) «` «»» def __init__( self, reduction=losses_utils.ReductionV2.AUTO, name=«mean_absolute_percentage_error», ): «»»Initializes `MeanAbsolutePercentageError` instance. Args: reduction: Type of `tf.keras.losses.Reduction` to apply to loss. Default value is `AUTO`. `AUTO` indicates that the reduction option will be determined by the usage context. For almost all cases this defaults to `SUM_OVER_BATCH_SIZE`. When used under a `tf.distribute.Strategy`, except via `Model.compile()` and `Model.fit()`, using `AUTO` or `SUM_OVER_BATCH_SIZE` will raise an error. Please see this custom training [tutorial]( https://www.tensorflow.org/tutorials/distribute/custom_training) for more details. name: Optional name for the instance. Defaults to ‘mean_absolute_percentage_error’. «»» super().__init__( mean_absolute_percentage_error, name=name, reduction=reduction ) @keras_export(«keras.losses.MeanSquaredLogarithmicError») class MeanSquaredLogarithmicError(LossFunctionWrapper): «»»Computes the mean squared logarithmic error between `y_true` & `y_pred`. `loss = square(log(y_true + 1.) — log(y_pred + 1.))` Standalone usage: >>> y_true = [[0., 1.], [0., 0.]] >>> y_pred = [[1., 1.], [1., 0.]] >>> # Using ‘auto’/’sum_over_batch_size’ reduction type. >>> msle = tf.keras.losses.MeanSquaredLogarithmicError() >>> msle(y_true, y_pred).numpy() 0.240 >>> # Calling with ‘sample_weight’. >>> msle(y_true, y_pred, sample_weight=[0.7, 0.3]).numpy() 0.120 >>> # Using ‘sum’ reduction type. >>> msle = tf.keras.losses.MeanSquaredLogarithmicError( … reduction=tf.keras.losses.Reduction.SUM) >>> msle(y_true, y_pred).numpy() 0.480 >>> # Using ‘none’ reduction type. >>> msle = tf.keras.losses.MeanSquaredLogarithmicError( … reduction=tf.keras.losses.Reduction.NONE) >>> msle(y_true, y_pred).numpy() array([0.240, 0.240], dtype=float32) Usage with the `compile()` API: «`python model.compile(optimizer=’sgd’, loss=tf.keras.losses.MeanSquaredLogarithmicError()) «` «»» def __init__( self, reduction=losses_utils.ReductionV2.AUTO, name=«mean_squared_logarithmic_error», ): «»»Initializes `MeanSquaredLogarithmicError` instance. Args: reduction: Type of `tf.keras.losses.Reduction` to apply to loss. Default value is `AUTO`. `AUTO` indicates that the reduction option will be determined by the usage context. For almost all cases this defaults to `SUM_OVER_BATCH_SIZE`. When used under a `tf.distribute.Strategy`, except via `Model.compile()` and `Model.fit()`, using `AUTO` or `SUM_OVER_BATCH_SIZE` will raise an error. Please see this custom training [tutorial]( https://www.tensorflow.org/tutorials/distribute/custom_training) for more details. name: Optional name for the instance. Defaults to ‘mean_squared_logarithmic_error’. «»» super().__init__( mean_squared_logarithmic_error, name=name, reduction=reduction ) @keras_export(«keras.losses.BinaryCrossentropy») class BinaryCrossentropy(LossFunctionWrapper): «»»Computes the cross-entropy loss between true labels and predicted labels. Use this cross-entropy loss for binary (0 or 1) classification applications. The loss function requires the following inputs: — `y_true` (true label): This is either 0 or 1. — `y_pred` (predicted value): This is the model’s prediction, i.e, a single floating-point value which either represents a [logit](https://en.wikipedia.org/wiki/Logit), (i.e, value in [-inf, inf] when `from_logits=True`) or a probability (i.e, value in [0., 1.] when `from_logits=False`). **Recommended Usage:** (set `from_logits=True`) With `tf.keras` API: «`python model.compile( loss=tf.keras.losses.BinaryCrossentropy(from_logits=True), …. ) «` As a standalone function: >>> # Example 1: (batch_size = 1, number of samples = 4) >>> y_true = [0, 1, 0, 0] >>> y_pred = [-18.6, 0.51, 2.94, -12.8] >>> bce = tf.keras.losses.BinaryCrossentropy(from_logits=True) >>> bce(y_true, y_pred).numpy() 0.865 >>> # Example 2: (batch_size = 2, number of samples = 4) >>> y_true = [[0, 1], [0, 0]] >>> y_pred = [[-18.6, 0.51], [2.94, -12.8]] >>> # Using default ‘auto’/’sum_over_batch_size’ reduction type. >>> bce = tf.keras.losses.BinaryCrossentropy(from_logits=True) >>> bce(y_true, y_pred).numpy() 0.865 >>> # Using ‘sample_weight’ attribute >>> bce(y_true, y_pred, sample_weight=[0.8, 0.2]).numpy() 0.243 >>> # Using ‘sum’ reduction` type. >>> bce = tf.keras.losses.BinaryCrossentropy(from_logits=True, … reduction=tf.keras.losses.Reduction.SUM) >>> bce(y_true, y_pred).numpy() 1.730 >>> # Using ‘none’ reduction type. >>> bce = tf.keras.losses.BinaryCrossentropy(from_logits=True, … reduction=tf.keras.losses.Reduction.NONE) >>> bce(y_true, y_pred).numpy() array([0.235, 1.496], dtype=float32) **Default Usage:** (set `from_logits=False`) >>> # Make the following updates to the above «Recommended Usage» section >>> # 1. Set `from_logits=False` >>> tf.keras.losses.BinaryCrossentropy() # OR …(‘from_logits=False’) >>> # 2. Update `y_pred` to use probabilities instead of logits >>> y_pred = [0.6, 0.3, 0.2, 0.8] # OR [[0.6, 0.3], [0.2, 0.8]] «»» def __init__( self, from_logits=False, label_smoothing=0.0, axis=1, reduction=losses_utils.ReductionV2.AUTO, name=«binary_crossentropy», ): «»»Initializes `BinaryCrossentropy` instance. Args: from_logits: Whether to interpret `y_pred` as a tensor of [logit](https://en.wikipedia.org/wiki/Logit) values. By default, we assume that `y_pred` contains probabilities (i.e., values in [0, 1]). label_smoothing: Float in [0, 1]. When 0, no smoothing occurs. When > 0, we compute the loss between the predicted labels and a smoothed version of the true labels, where the smoothing squeezes the labels towards 0.5. Larger values of `label_smoothing` correspond to heavier smoothing. axis: The axis along which to compute crossentropy (the features axis). Defaults to -1. reduction: Type of `tf.keras.losses.Reduction` to apply to loss. Default value is `AUTO`. `AUTO` indicates that the reduction option will be determined by the usage context. For almost all cases this defaults to `SUM_OVER_BATCH_SIZE`. When used under a `tf.distribute.Strategy`, except via `Model.compile()` and `Model.fit()`, using `AUTO` or `SUM_OVER_BATCH_SIZE` will raise an error. Please see this custom training [tutorial]( https://www.tensorflow.org/tutorials/distribute/custom_training) for more details. name: Name for the op. Defaults to ‘binary_crossentropy’. «»» super().__init__( binary_crossentropy, name=name, reduction=reduction, from_logits=from_logits, label_smoothing=label_smoothing, axis=axis, ) self.from_logits = from_logits @keras_export(«keras.losses.BinaryFocalCrossentropy») class BinaryFocalCrossentropy(LossFunctionWrapper): «»»Computes focal cross-entropy loss between true labels and predictions. Binary cross-entropy loss is often used for binary (0 or 1) classification tasks. The loss function requires the following inputs: — `y_true` (true label): This is either 0 or 1. — `y_pred` (predicted value): This is the model’s prediction, i.e, a single floating-point value which either represents a [logit](https://en.wikipedia.org/wiki/Logit), (i.e, value in [-inf, inf] when `from_logits=True`) or a probability (i.e, value in `[0., 1.]` when `from_logits=False`). According to [Lin et al., 2018](https://arxiv.org/pdf/1708.02002.pdf), it helps to apply a «focal factor» to down-weight easy examples and focus more on hard examples. By default, the focal tensor is computed as follows: `focal_factor = (1 — output) ** gamma` for class 1 `focal_factor = output ** gamma` for class 0 where `gamma` is a focusing parameter. When `gamma=0`, this function is equivalent to the binary crossentropy loss. With the `compile()` API: «`python model.compile( loss=tf.keras.losses.BinaryFocalCrossentropy(gamma=2.0, from_logits=True), …. ) «` As a standalone function: >>> # Example 1: (batch_size = 1, number of samples = 4) >>> y_true = [0, 1, 0, 0] >>> y_pred = [-18.6, 0.51, 2.94, -12.8] >>> loss = tf.keras.losses.BinaryFocalCrossentropy(gamma=2, … from_logits=True) >>> loss(y_true, y_pred).numpy() 0.691 >>> # Apply class weight >>> loss = tf.keras.losses.BinaryFocalCrossentropy( … apply_class_balancing=True, gamma=2, from_logits=True) >>> loss(y_true, y_pred).numpy() 0.51 >>> # Example 2: (batch_size = 2, number of samples = 4) >>> y_true = [[0, 1], [0, 0]] >>> y_pred = [[-18.6, 0.51], [2.94, -12.8]] >>> # Using default ‘auto’/’sum_over_batch_size’ reduction type. >>> loss = tf.keras.losses.BinaryFocalCrossentropy(gamma=3, … from_logits=True) >>> loss(y_true, y_pred).numpy() 0.647 >>> # Apply class weight >>> loss = tf.keras.losses.BinaryFocalCrossentropy( … apply_class_balancing=True, gamma=3, from_logits=True) >>> loss(y_true, y_pred).numpy() 0.482 >>> # Using ‘sample_weight’ attribute with focal effect >>> loss = tf.keras.losses.BinaryFocalCrossentropy(gamma=3, … from_logits=True) >>> loss(y_true, y_pred, sample_weight=[0.8, 0.2]).numpy() 0.133 >>> # Apply class weight >>> loss = tf.keras.losses.BinaryFocalCrossentropy( … apply_class_balancing=True, gamma=3, from_logits=True) >>> loss(y_true, y_pred, sample_weight=[0.8, 0.2]).numpy() 0.097 >>> # Using ‘sum’ reduction` type. >>> loss = tf.keras.losses.BinaryFocalCrossentropy(gamma=4, … from_logits=True, … reduction=tf.keras.losses.Reduction.SUM) >>> loss(y_true, y_pred).numpy() 1.222 >>> # Apply class weight >>> loss = tf.keras.losses.BinaryFocalCrossentropy( … apply_class_balancing=True, gamma=4, from_logits=True, … reduction=tf.keras.losses.Reduction.SUM) >>> loss(y_true, y_pred).numpy() 0.914 >>> # Using ‘none’ reduction type. >>> loss = tf.keras.losses.BinaryFocalCrossentropy( … gamma=5, from_logits=True, … reduction=tf.keras.losses.Reduction.NONE) >>> loss(y_true, y_pred).numpy() array([0.0017 1.1561], dtype=float32) >>> # Apply class weight >>> loss = tf.keras.losses.BinaryFocalCrossentropy( … apply_class_balancing=True, gamma=5, from_logits=True, … reduction=tf.keras.losses.Reduction.NONE) >>> loss(y_true, y_pred).numpy() array([0.0004 0.8670], dtype=float32) Args: apply_class_balancing: A bool, whether to apply weight balancing on the binary classes 0 and 1. alpha: A weight balancing factor for class 1, default is `0.25` as mentioned in reference [Lin et al., 2018]( https://arxiv.org/pdf/1708.02002.pdf). The weight for class 0 is `1.0 — alpha`. gamma: A focusing parameter used to compute the focal factor, default is `2.0` as mentioned in the reference [Lin et al., 2018](https://arxiv.org/pdf/1708.02002.pdf). from_logits: Whether to interpret `y_pred` as a tensor of [logit](https://en.wikipedia.org/wiki/Logit) values. By default, we assume that `y_pred` are probabilities (i.e., values in `[0, 1]`). label_smoothing: Float in `[0, 1]`. When `0`, no smoothing occurs. When > `0`, we compute the loss between the predicted labels and a smoothed version of the true labels, where the smoothing squeezes the labels towards `0.5`. Larger values of `label_smoothing` correspond to heavier smoothing. axis: The axis along which to compute crossentropy (the features axis). Defaults to `-1`. reduction: Type of `tf.keras.losses.Reduction` to apply to loss. Default value is `AUTO`. `AUTO` indicates that the reduction option will be determined by the usage context. For almost all cases this defaults to `SUM_OVER_BATCH_SIZE`. When used under a `tf.distribute.Strategy`, except via `Model.compile()` and `Model.fit()`, using `AUTO` or `SUM_OVER_BATCH_SIZE` will raise an error. Please see this custom training [tutorial]( https://www.tensorflow.org/tutorials/distribute/custom_training) for more details. name: Name for the op. Defaults to ‘binary_focal_crossentropy’. «»» def __init__( self, apply_class_balancing=False, alpha=0.25, gamma=2.0, from_logits=False, label_smoothing=0.0, axis=1, reduction=losses_utils.ReductionV2.AUTO, name=«binary_focal_crossentropy», ): «»»Initializes `BinaryFocalCrossentropy` instance.»»» super().__init__( binary_focal_crossentropy, apply_class_balancing=apply_class_balancing, alpha=alpha, gamma=gamma, name=name, reduction=reduction, from_logits=from_logits, label_smoothing=label_smoothing, axis=axis, ) self.from_logits = from_logits self.apply_class_balancing = apply_class_balancing self.alpha = alpha self.gamma = gamma def get_config(self): config = { «apply_class_balancing»: self.apply_class_balancing, «alpha»: self.alpha, «gamma»: self.gamma, } base_config = super().get_config() return dict(list(base_config.items()) + list(config.items())) @keras_export(«keras.losses.CategoricalCrossentropy») class CategoricalCrossentropy(LossFunctionWrapper): «»»Computes the crossentropy loss between the labels and predictions. Use this crossentropy loss function when there are two or more label classes. We expect labels to be provided in a `one_hot` representation. If you want to provide labels as integers, please use `SparseCategoricalCrossentropy` loss. There should be `# classes` floating point values per feature. In the snippet below, there is `# classes` floating pointing values per example. The shape of both `y_pred` and `y_true` are `[batch_size, num_classes]`. Standalone usage: >>> y_true = [[0, 1, 0], [0, 0, 1]] >>> y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]] >>> # Using ‘auto’/’sum_over_batch_size’ reduction type. >>> cce = tf.keras.losses.CategoricalCrossentropy() >>> cce(y_true, y_pred).numpy() 1.177 >>> # Calling with ‘sample_weight’. >>> cce(y_true, y_pred, sample_weight=tf.constant([0.3, 0.7])).numpy() 0.814 >>> # Using ‘sum’ reduction type. >>> cce = tf.keras.losses.CategoricalCrossentropy( … reduction=tf.keras.losses.Reduction.SUM) >>> cce(y_true, y_pred).numpy() 2.354 >>> # Using ‘none’ reduction type. >>> cce = tf.keras.losses.CategoricalCrossentropy( … reduction=tf.keras.losses.Reduction.NONE) >>> cce(y_true, y_pred).numpy() array([0.0513, 2.303], dtype=float32) Usage with the `compile()` API: «`python model.compile(optimizer=’sgd’, loss=tf.keras.losses.CategoricalCrossentropy()) «` «»» def __init__( self, from_logits=False, label_smoothing=0.0, axis=1, reduction=losses_utils.ReductionV2.AUTO, name=«categorical_crossentropy», ): «»»Initializes `CategoricalCrossentropy` instance. Args: from_logits: Whether `y_pred` is expected to be a logits tensor. By default, we assume that `y_pred` encodes a probability distribution. label_smoothing: Float in [0, 1]. When > 0, label values are smoothed, meaning the confidence on label values are relaxed. For example, if `0.1`, use `0.1 / num_classes` for non-target labels and `0.9 + 0.1 / num_classes` for target labels. axis: The axis along which to compute crossentropy (the features axis). Defaults to -1. reduction: Type of `tf.keras.losses.Reduction` to apply to loss. Default value is `AUTO`. `AUTO` indicates that the reduction option will be determined by the usage context. For almost all cases this defaults to `SUM_OVER_BATCH_SIZE`. When used under a `tf.distribute.Strategy`, except via `Model.compile()` and `Model.fit()`, using `AUTO` or `SUM_OVER_BATCH_SIZE` will raise an error. Please see this custom training [tutorial]( https://www.tensorflow.org/tutorials/distribute/custom_training) for more details. name: Optional name for the instance. Defaults to ‘categorical_crossentropy’. «»» super().__init__( categorical_crossentropy, name=name, reduction=reduction, from_logits=from_logits, label_smoothing=label_smoothing, axis=axis, ) @keras_export(«keras.losses.SparseCategoricalCrossentropy») class SparseCategoricalCrossentropy(LossFunctionWrapper): «»»Computes the crossentropy loss between the labels and predictions. Use this crossentropy loss function when there are two or more label classes. We expect labels to be provided as integers. If you want to provide labels using `one-hot` representation, please use `CategoricalCrossentropy` loss. There should be `# classes` floating point values per feature for `y_pred` and a single floating point value per feature for `y_true`. In the snippet below, there is a single floating point value per example for `y_true` and `# classes` floating pointing values per example for `y_pred`. The shape of `y_true` is `[batch_size]` and the shape of `y_pred` is `[batch_size, num_classes]`. Standalone usage: >>> y_true = [1, 2] >>> y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]] >>> # Using ‘auto’/’sum_over_batch_size’ reduction type. >>> scce = tf.keras.losses.SparseCategoricalCrossentropy() >>> scce(y_true, y_pred).numpy() 1.177 >>> # Calling with ‘sample_weight’. >>> scce(y_true, y_pred, sample_weight=tf.constant([0.3, 0.7])).numpy() 0.814 >>> # Using ‘sum’ reduction type. >>> scce = tf.keras.losses.SparseCategoricalCrossentropy( … reduction=tf.keras.losses.Reduction.SUM) >>> scce(y_true, y_pred).numpy() 2.354 >>> # Using ‘none’ reduction type. >>> scce = tf.keras.losses.SparseCategoricalCrossentropy( … reduction=tf.keras.losses.Reduction.NONE) >>> scce(y_true, y_pred).numpy() array([0.0513, 2.303], dtype=float32) Usage with the `compile()` API: «`python model.compile(optimizer=’sgd’, loss=tf.keras.losses.SparseCategoricalCrossentropy()) «` «»» def __init__( self, from_logits=False, ignore_class=None, reduction=losses_utils.ReductionV2.AUTO, name=«sparse_categorical_crossentropy», ): «»»Initializes `SparseCategoricalCrossentropy` instance. Args: from_logits: Whether `y_pred` is expected to be a logits tensor. By default, we assume that `y_pred` encodes a probability distribution. ignore_class: Optional integer. The ID of a class to be ignored during loss computation. This is useful, for example, in segmentation problems featuring a «void» class (commonly -1 or 255) in segmentation maps. By default (`ignore_class=None`), all classes are considered. reduction: Type of `tf.keras.losses.Reduction` to apply to loss. Default value is `AUTO`. `AUTO` indicates that the reduction option will be determined by the usage context. For almost all cases this defaults to `SUM_OVER_BATCH_SIZE`. When used under a `tf.distribute.Strategy`, except via `Model.compile()` and `Model.fit()`, using `AUTO` or `SUM_OVER_BATCH_SIZE` will raise an error. Please see this custom training [tutorial]( https://www.tensorflow.org/tutorials/distribute/custom_training) for more details. name: Optional name for the instance. Defaults to ‘sparse_categorical_crossentropy’. «»» super().__init__( sparse_categorical_crossentropy, name=name, reduction=reduction, from_logits=from_logits, ignore_class=ignore_class, ) @keras_export(«keras.losses.Hinge») class Hinge(LossFunctionWrapper): «»»Computes the hinge loss between `y_true` & `y_pred`. `loss = maximum(1 — y_true * y_pred, 0)` `y_true` values are expected to be -1 or 1. If binary (0 or 1) labels are provided we will convert them to -1 or 1. Standalone usage: >>> y_true = [[0., 1.], [0., 0.]] >>> y_pred = [[0.6, 0.4], [0.4, 0.6]] >>> # Using ‘auto’/’sum_over_batch_size’ reduction type. >>> h = tf.keras.losses.Hinge() >>> h(y_true, y_pred).numpy() 1.3 >>> # Calling with ‘sample_weight’. >>> h(y_true, y_pred, sample_weight=[1, 0]).numpy() 0.55 >>> # Using ‘sum’ reduction type. >>> h = tf.keras.losses.Hinge( … reduction=tf.keras.losses.Reduction.SUM) >>> h(y_true, y_pred).numpy() 2.6 >>> # Using ‘none’ reduction type. >>> h = tf.keras.losses.Hinge( … reduction=tf.keras.losses.Reduction.NONE) >>> h(y_true, y_pred).numpy() array([1.1, 1.5], dtype=float32) Usage with the `compile()` API: «`python model.compile(optimizer=’sgd’, loss=tf.keras.losses.Hinge()) «` «»» def __init__(self, reduction=losses_utils.ReductionV2.AUTO, name=«hinge»): «»»Initializes `Hinge` instance. Args: reduction: Type of `tf.keras.losses.Reduction` to apply to loss. Default value is `AUTO`. `AUTO` indicates that the reduction option will be determined by the usage context. For almost all cases this defaults to `SUM_OVER_BATCH_SIZE`. When used under a `tf.distribute.Strategy`, except via `Model.compile()` and `Model.fit()`, using `AUTO` or `SUM_OVER_BATCH_SIZE` will raise an error. Please see this custom training [tutorial]( https://www.tensorflow.org/tutorials/distribute/custom_training) for more details. name: Optional name for the instance. Defaults to ‘hinge’. «»» super().__init__(hinge, name=name, reduction=reduction) @keras_export(«keras.losses.SquaredHinge») class SquaredHinge(LossFunctionWrapper): «»»Computes the squared hinge loss between `y_true` & `y_pred`. `loss = square(maximum(1 — y_true * y_pred, 0))` `y_true` values are expected to be -1 or 1. If binary (0 or 1) labels are provided we will convert them to -1 or 1. Standalone usage: >>> y_true = [[0., 1.], [0., 0.]] >>> y_pred = [[0.6, 0.4], [0.4, 0.6]] >>> # Using ‘auto’/’sum_over_batch_size’ reduction type. >>> h = tf.keras.losses.SquaredHinge() >>> h(y_true, y_pred).numpy() 1.86 >>> # Calling with ‘sample_weight’. >>> h(y_true, y_pred, sample_weight=[1, 0]).numpy() 0.73 >>> # Using ‘sum’ reduction type. >>> h = tf.keras.losses.SquaredHinge( … reduction=tf.keras.losses.Reduction.SUM) >>> h(y_true, y_pred).numpy() 3.72 >>> # Using ‘none’ reduction type. >>> h = tf.keras.losses.SquaredHinge( … reduction=tf.keras.losses.Reduction.NONE) >>> h(y_true, y_pred).numpy() array([1.46, 2.26], dtype=float32) Usage with the `compile()` API: «`python model.compile(optimizer=’sgd’, loss=tf.keras.losses.SquaredHinge()) «` «»» def __init__( self, reduction=losses_utils.ReductionV2.AUTO, name=«squared_hinge» ): «»»Initializes `SquaredHinge` instance. Args: reduction: Type of `tf.keras.losses.Reduction` to apply to loss. Default value is `AUTO`. `AUTO` indicates that the reduction option will be determined by the usage context. For almost all cases this defaults to `SUM_OVER_BATCH_SIZE`. When used under a `tf.distribute.Strategy`, except via `Model.compile()` and `Model.fit()`, using `AUTO` or `SUM_OVER_BATCH_SIZE` will raise an error. Please see this custom training [tutorial]( https://www.tensorflow.org/tutorials/distribute/custom_training) for more details. name: Optional name for the instance. Defaults to ‘squared_hinge’. «»» super().__init__(squared_hinge, name=name, reduction=reduction) @keras_export(«keras.losses.CategoricalHinge») class CategoricalHinge(LossFunctionWrapper): «»»Computes the categorical hinge loss between `y_true` & `y_pred`. `loss = maximum(neg — pos + 1, 0)` where `neg=maximum((1-y_true)*y_pred) and pos=sum(y_true*y_pred)` Standalone usage: >>> y_true = [[0, 1], [0, 0]] >>> y_pred = [[0.6, 0.4], [0.4, 0.6]] >>> # Using ‘auto’/’sum_over_batch_size’ reduction type. >>> h = tf.keras.losses.CategoricalHinge() >>> h(y_true, y_pred).numpy() 1.4 >>> # Calling with ‘sample_weight’. >>> h(y_true, y_pred, sample_weight=[1, 0]).numpy() 0.6 >>> # Using ‘sum’ reduction type. >>> h = tf.keras.losses.CategoricalHinge( … reduction=tf.keras.losses.Reduction.SUM) >>> h(y_true, y_pred).numpy() 2.8 >>> # Using ‘none’ reduction type. >>> h = tf.keras.losses.CategoricalHinge( … reduction=tf.keras.losses.Reduction.NONE) >>> h(y_true, y_pred).numpy() array([1.2, 1.6], dtype=float32) Usage with the `compile()` API: «`python model.compile(optimizer=’sgd’, loss=tf.keras.losses.CategoricalHinge()) «` «»» def __init__( self, reduction=losses_utils.ReductionV2.AUTO, name=«categorical_hinge» ): «»»Initializes `CategoricalHinge` instance. Args: reduction: Type of `tf.keras.losses.Reduction` to apply to loss. Default value is `AUTO`. `AUTO` indicates that the reduction option will be determined by the usage context. For almost all cases this defaults to `SUM_OVER_BATCH_SIZE`. When used under a `tf.distribute.Strategy`, except via `Model.compile()` and `Model.fit()`, using `AUTO` or `SUM_OVER_BATCH_SIZE` will raise an error. Please see this custom training [tutorial]( https://www.tensorflow.org/tutorials/distribute/custom_training) for more details. name: Optional name for the instance. Defaults to ‘categorical_hinge’. «»» super().__init__(categorical_hinge, name=name, reduction=reduction) @keras_export(«keras.losses.Poisson») class Poisson(LossFunctionWrapper): «»»Computes the Poisson loss between `y_true` & `y_pred`. `loss = y_pred — y_true * log(y_pred)` Standalone usage: >>> y_true = [[0., 1.], [0., 0.]] >>> y_pred = [[1., 1.], [0., 0.]] >>> # Using ‘auto’/’sum_over_batch_size’ reduction type. >>> p = tf.keras.losses.Poisson() >>> p(y_true, y_pred).numpy() 0.5 >>> # Calling with ‘sample_weight’. >>> p(y_true, y_pred, sample_weight=[0.8, 0.2]).numpy() 0.4 >>> # Using ‘sum’ reduction type. >>> p = tf.keras.losses.Poisson( … reduction=tf.keras.losses.Reduction.SUM) >>> p(y_true, y_pred).numpy() 0.999 >>> # Using ‘none’ reduction type. >>> p = tf.keras.losses.Poisson( … reduction=tf.keras.losses.Reduction.NONE) >>> p(y_true, y_pred).numpy() array([0.999, 0.], dtype=float32) Usage with the `compile()` API: «`python model.compile(optimizer=’sgd’, loss=tf.keras.losses.Poisson()) «` «»» def __init__(self, reduction=losses_utils.ReductionV2.AUTO, name=«poisson»): «»»Initializes `Poisson` instance. Args: reduction: Type of `tf.keras.losses.Reduction` to apply to loss. Default value is `AUTO`. `AUTO` indicates that the reduction option will be determined by the usage context. For almost all cases this defaults to `SUM_OVER_BATCH_SIZE`. When used under a `tf.distribute.Strategy`, except via `Model.compile()` and `Model.fit()`, using `AUTO` or `SUM_OVER_BATCH_SIZE` will raise an error. Please see this custom training [tutorial]( https://www.tensorflow.org/tutorials/distribute/custom_training) for more details. name: Optional name for the instance. Defaults to ‘poisson’. «»» super().__init__(poisson, name=name, reduction=reduction) @keras_export(«keras.losses.LogCosh») class LogCosh(LossFunctionWrapper): «»»Computes the logarithm of the hyperbolic cosine of the prediction error. `logcosh = log((exp(x) + exp(-x))/2)`, where x is the error `y_pred — y_true`. Standalone usage: >>> y_true = [[0., 1.], [0., 0.]] >>> y_pred = [[1., 1.], [0., 0.]] >>> # Using ‘auto’/’sum_over_batch_size’ reduction type. >>> l = tf.keras.losses.LogCosh() >>> l(y_true, y_pred).numpy() 0.108 >>> # Calling with ‘sample_weight’. >>> l(y_true, y_pred, sample_weight=[0.8, 0.2]).numpy() 0.087 >>> # Using ‘sum’ reduction type. >>> l = tf.keras.losses.LogCosh( … reduction=tf.keras.losses.Reduction.SUM) >>> l(y_true, y_pred).numpy() 0.217 >>> # Using ‘none’ reduction type. >>> l = tf.keras.losses.LogCosh( … reduction=tf.keras.losses.Reduction.NONE) >>> l(y_true, y_pred).numpy() array([0.217, 0.], dtype=float32) Usage with the `compile()` API: «`python model.compile(optimizer=’sgd’, loss=tf.keras.losses.LogCosh()) «` «»» def __init__( self, reduction=losses_utils.ReductionV2.AUTO, name=«log_cosh» ): «»»Initializes `LogCosh` instance. Args: reduction: Type of `tf.keras.losses.Reduction` to apply to loss. Default value is `AUTO`. `AUTO` indicates that the reduction option will be determined by the usage context. For almost all cases this defaults to `SUM_OVER_BATCH_SIZE`. When used under a `tf.distribute.Strategy`, except via `Model.compile()` and `Model.fit()`, using `AUTO` or `SUM_OVER_BATCH_SIZE` will raise an error. Please see this custom training [tutorial]( https://www.tensorflow.org/tutorials/distribute/custom_training) for more details. name: Optional name for the instance. Defaults to ‘log_cosh’. «»» super().__init__(log_cosh, name=name, reduction=reduction) @keras_export(«keras.losses.KLDivergence») class KLDivergence(LossFunctionWrapper): «»»Computes Kullback-Leibler divergence loss between `y_true` & `y_pred`. `loss = y_true * log(y_true / y_pred)` See: https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence Standalone usage: >>> y_true = [[0, 1], [0, 0]] >>> y_pred = [[0.6, 0.4], [0.4, 0.6]] >>> # Using ‘auto’/’sum_over_batch_size’ reduction type. >>> kl = tf.keras.losses.KLDivergence() >>> kl(y_true, y_pred).numpy() 0.458 >>> # Calling with ‘sample_weight’. >>> kl(y_true, y_pred, sample_weight=[0.8, 0.2]).numpy() 0.366 >>> # Using ‘sum’ reduction type. >>> kl = tf.keras.losses.KLDivergence( … reduction=tf.keras.losses.Reduction.SUM) >>> kl(y_true, y_pred).numpy() 0.916 >>> # Using ‘none’ reduction type. >>> kl = tf.keras.losses.KLDivergence( … reduction=tf.keras.losses.Reduction.NONE) >>> kl(y_true, y_pred).numpy() array([0.916, -3.08e-06], dtype=float32) Usage with the `compile()` API: «`python model.compile(optimizer=’sgd’, loss=tf.keras.losses.KLDivergence()) «` «»» def __init__( self, reduction=losses_utils.ReductionV2.AUTO, name=«kl_divergence» ): «»»Initializes `KLDivergence` instance. Args: reduction: Type of `tf.keras.losses.Reduction` to apply to loss. Default value is `AUTO`. `AUTO` indicates that the reduction option will be determined by the usage context. For almost all cases this defaults to `SUM_OVER_BATCH_SIZE`. When used under a `tf.distribute.Strategy`, except via `Model.compile()` and `Model.fit()`, using `AUTO` or `SUM_OVER_BATCH_SIZE` will raise an error. Please see this custom training [tutorial]( https://www.tensorflow.org/tutorials/distribute/custom_training) for more details. name: Optional name for the instance. Defaults to ‘kl_divergence’. «»» super().__init__(kl_divergence, name=name, reduction=reduction) @keras_export(«keras.losses.Huber») class Huber(LossFunctionWrapper): «»»Computes the Huber loss between `y_true` & `y_pred`. For each value x in `error = y_true — y_pred`: «` loss = 0.5 * x^2 if |x| <= d loss = 0.5 * d^2 + d * (|x| — d) if |x| > d «` where d is `delta`. See: https://en.wikipedia.org/wiki/Huber_loss Standalone usage: >>> y_true = [[0, 1], [0, 0]] >>> y_pred = [[0.6, 0.4], [0.4, 0.6]] >>> # Using ‘auto’/’sum_over_batch_size’ reduction type. >>> h = tf.keras.losses.Huber() >>> h(y_true, y_pred).numpy() 0.155 >>> # Calling with ‘sample_weight’. >>> h(y_true, y_pred, sample_weight=[1, 0]).numpy() 0.09 >>> # Using ‘sum’ reduction type. >>> h = tf.keras.losses.Huber( … reduction=tf.keras.losses.Reduction.SUM) >>> h(y_true, y_pred).numpy() 0.31 >>> # Using ‘none’ reduction type. >>> h = tf.keras.losses.Huber( … reduction=tf.keras.losses.Reduction.NONE) >>> h(y_true, y_pred).numpy() array([0.18, 0.13], dtype=float32) Usage with the `compile()` API: «`python model.compile(optimizer=’sgd’, loss=tf.keras.losses.Huber()) «` «»» def __init__( self, delta=1.0, reduction=losses_utils.ReductionV2.AUTO, name=«huber_loss», ): «»»Initializes `Huber` instance. Args: delta: A float, the point where the Huber loss function changes from a quadratic to linear. reduction: Type of `tf.keras.losses.Reduction` to apply to loss. Default value is `AUTO`. `AUTO` indicates that the reduction option will be determined by the usage context. For almost all cases this defaults to `SUM_OVER_BATCH_SIZE`. When used under a `tf.distribute.Strategy`, except via `Model.compile()` and `Model.fit()`, using `AUTO` or `SUM_OVER_BATCH_SIZE` will raise an error. Please see this custom training [tutorial]( https://www.tensorflow.org/tutorials/distribute/custom_training) for more details. name: Optional name for the instance. Defaults to ‘huber_loss’. «»» super().__init__(huber, name=name, reduction=reduction, delta=delta) @keras_export( «keras.metrics.mean_squared_error», «keras.metrics.mse», «keras.metrics.MSE», «keras.losses.mean_squared_error», «keras.losses.mse», «keras.losses.MSE», ) @tf.__internal__.dispatch.add_dispatch_support def mean_squared_error(y_true, y_pred): «»»Computes the mean squared error between labels and predictions. After computing the squared distance between the inputs, the mean value over the last dimension is returned. `loss = mean(square(y_true — y_pred), axis=-1)` Standalone usage: >>> y_true = np.random.randint(0, 2, size=(2, 3)) >>> y_pred = np.random.random(size=(2, 3)) >>> loss = tf.keras.losses.mean_squared_error(y_true, y_pred) >>> assert loss.shape == (2,) >>> assert np.array_equal( … loss.numpy(), np.mean(np.square(y_true — y_pred), axis=-1)) Args: y_true: Ground truth values. shape = `[batch_size, d0, .. dN]`. y_pred: The predicted values. shape = `[batch_size, d0, .. dN]`. Returns: Mean squared error values. shape = `[batch_size, d0, .. dN-1]`. «»» y_pred = tf.convert_to_tensor(y_pred) y_true = tf.cast(y_true, y_pred.dtype) return backend.mean(tf.math.squared_difference(y_pred, y_true), axis=1) def _ragged_tensor_apply_loss(loss_fn, y_true, y_pred, y_pred_extra_dim=False): «»»Apply a loss function on a per batch basis. Args: loss_fn: The loss function y_true: truth values (RaggedTensor) y_pred: predicted values (RaggedTensor) y_pred_extra_dim: whether y_pred has an additional dimension compared to y_true Returns: Loss-function result. A dense tensor if the output has a single dimension (per-batch loss value); a ragged tensor otherwise. «»» def rt_is_equiv_dense(rt): «»»Returns true if this RaggedTensor has the same row_lengths across all ragged dimensions and thus can be converted to a dense tensor without loss of information. Args: rt: RaggedTensor. «»» return tf.reduce_all( [ tf.equal( tf.math.reduce_variance( tf.cast(row_lens, backend.floatx()) ), tf.constant([0.0]), ) for row_lens in rt.nested_row_lengths() ] ) def _convert_to_dense(inputs): return tuple( rt.to_tensor() if isinstance(rt, tf.RaggedTensor) else rt for rt in inputs ) def _call_loss(inputs, ragged_output): «»»Adapt the result to ragged or dense tensor according to the expected output type. This is done so that all the return values of the map operation have the same type. «»» r = loss_fn(*inputs) if ragged_output and not isinstance(r, tf.RaggedTensor): r = tf.RaggedTensor.from_tensor(r) elif not ragged_output and isinstance(r, tf.RaggedTensor): r = r.to_tensor() return r def _wrapper(inputs, ragged_output): _, y_pred = inputs if isinstance(y_pred, tf.RaggedTensor): return tf.cond( rt_is_equiv_dense(y_pred), lambda: _call_loss(_convert_to_dense(inputs), ragged_output), lambda: _call_loss(inputs, ragged_output), ) return loss_fn(*inputs) if not isinstance(y_true, tf.RaggedTensor): return loss_fn(y_true, y_pred.to_tensor()) lshape = y_pred.shape.as_list()[1:1] if len(lshape) > 0: spec = tf.RaggedTensorSpec(shape=lshape, dtype=y_pred.dtype) else: spec = tf.TensorSpec(shape=[], dtype=y_pred.dtype) nested_splits_list = [rt.nested_row_splits for rt in (y_true, y_pred)] if y_pred_extra_dim: # The last dimension of a categorical prediction may be ragged or not. rdims = [len(slist) for slist in nested_splits_list] if rdims[0] == rdims[1] 1: nested_splits_list[1] = nested_splits_list[1][:1] map_fn = functools.partial(_wrapper, ragged_output=len(lshape) > 1) assertion_list = ragged_util.assert_splits_match(nested_splits_list) with tf.control_dependencies(assertion_list): return ragged_map_ops.map_fn(map_fn, elems=(y_true, y_pred), dtype=spec) @dispatch.dispatch_for_types(mean_squared_error, tf.RaggedTensor) def _ragged_tensor_mse(y_true, y_pred): «»»Implements support for handling RaggedTensors. Args: y_true: RaggedTensor truth values. shape = `[batch_size, d0, .. dN]`. y_pred: RaggedTensor predicted values. shape = `[batch_size, d0, .. dN]`. Returns: Mean squared error values. shape = `[batch_size, d0, .. dN-1]`. When the number of dimensions of the batch feature vector [d0, .. dN] is greater than one the return value is a RaggedTensor. Otherwise a Dense tensor with dimensions [batch_size] is returned. «»» return _ragged_tensor_apply_loss(mean_squared_error, y_true, y_pred) @keras_export( «keras.metrics.mean_absolute_error», «keras.metrics.mae», «keras.metrics.MAE», «keras.losses.mean_absolute_error», «keras.losses.mae», «keras.losses.MAE», ) @tf.__internal__.dispatch.add_dispatch_support def mean_absolute_error(y_true, y_pred): «»»Computes the mean absolute error between labels and predictions. `loss = mean(abs(y_true — y_pred), axis=-1)` Standalone usage: >>> y_true = np.random.randint(0, 2, size=(2, 3)) >>> y_pred = np.random.random(size=(2, 3)) >>> loss = tf.keras.losses.mean_absolute_error(y_true, y_pred) >>> assert loss.shape == (2,) >>> assert np.array_equal( … loss.numpy(), np.mean(np.abs(y_true — y_pred), axis=-1)) Args: y_true: Ground truth values. shape = `[batch_size, d0, .. dN]`. y_pred: The predicted values. shape = `[batch_size, d0, .. dN]`. Returns: Mean absolute error values. shape = `[batch_size, d0, .. dN-1]`. «»» y_pred = tf.convert_to_tensor(y_pred) y_true = tf.cast(y_true, y_pred.dtype) return backend.mean(tf.abs(y_pred y_true), axis=1) @dispatch.dispatch_for_types(mean_absolute_error, tf.RaggedTensor) def _ragged_tensor_mae(y_true, y_pred): «»»RaggedTensor adapter for mean_absolute_error.»»» return _ragged_tensor_apply_loss(mean_absolute_error, y_true, y_pred) @keras_export( «keras.metrics.mean_absolute_percentage_error», «keras.metrics.mape», «keras.metrics.MAPE», «keras.losses.mean_absolute_percentage_error», «keras.losses.mape», «keras.losses.MAPE», ) @tf.__internal__.dispatch.add_dispatch_support def mean_absolute_percentage_error(y_true, y_pred): «»»Computes the mean absolute percentage error between `y_true` & `y_pred`. `loss = 100 * mean(abs((y_true — y_pred) / y_true), axis=-1)` Standalone usage: >>> y_true = np.random.random(size=(2, 3)) >>> y_true = np.maximum(y_true, 1e-7) # Prevent division by zero >>> y_pred = np.random.random(size=(2, 3)) >>> loss = tf.keras.losses.mean_absolute_percentage_error(y_true, y_pred) >>> assert loss.shape == (2,) >>> assert np.array_equal( … loss.numpy(), … 100. * np.mean(np.abs((y_true — y_pred) / y_true), axis=-1)) Args: y_true: Ground truth values. shape = `[batch_size, d0, .. dN]`. y_pred: The predicted values. shape = `[batch_size, d0, .. dN]`. Returns: Mean absolute percentage error values. shape = `[batch_size, d0, .. dN-1]`. «»» y_pred = tf.convert_to_tensor(y_pred) y_true = tf.cast(y_true, y_pred.dtype) diff = tf.abs( (y_true y_pred) / backend.maximum(tf.abs(y_true), backend.epsilon()) ) return 100.0 * backend.mean(diff, axis=1) @dispatch.dispatch_for_types(mean_absolute_percentage_error, tf.RaggedTensor) def _ragged_tensor_mape(y_true, y_pred): «»»Support RaggedTensors.»»» return _ragged_tensor_apply_loss( mean_absolute_percentage_error, y_true, y_pred ) @keras_export( «keras.metrics.mean_squared_logarithmic_error», «keras.metrics.msle», «keras.metrics.MSLE», «keras.losses.mean_squared_logarithmic_error», «keras.losses.msle», «keras.losses.MSLE», ) @tf.__internal__.dispatch.add_dispatch_support def mean_squared_logarithmic_error(y_true, y_pred): «»»Computes the mean squared logarithmic error between `y_true` & `y_pred`. `loss = mean(square(log(y_true + 1) — log(y_pred + 1)), axis=-1)` Standalone usage: >>> y_true = np.random.randint(0, 2, size=(2, 3)) >>> y_pred = np.random.random(size=(2, 3)) >>> loss = tf.keras.losses.mean_squared_logarithmic_error(y_true, y_pred) >>> assert loss.shape == (2,) >>> y_true = np.maximum(y_true, 1e-7) >>> y_pred = np.maximum(y_pred, 1e-7) >>> assert np.allclose( … loss.numpy(), … np.mean( … np.square(np.log(y_true + 1.) — np.log(y_pred + 1.)), axis=-1)) Args: y_true: Ground truth values. shape = `[batch_size, d0, .. dN]`. y_pred: The predicted values. shape = `[batch_size, d0, .. dN]`. Returns: Mean squared logarithmic error values. shape = `[batch_size, d0, .. dN-1]`. «»» y_pred = tf.convert_to_tensor(y_pred) y_true = tf.cast(y_true, y_pred.dtype) first_log = tf.math.log(backend.maximum(y_pred, backend.epsilon()) + 1.0) second_log = tf.math.log(backend.maximum(y_true, backend.epsilon()) + 1.0) return backend.mean( tf.math.squared_difference(first_log, second_log), axis=1 ) @dispatch.dispatch_for_types(mean_squared_logarithmic_error, tf.RaggedTensor) def _ragged_tensor_msle(y_true, y_pred): «»»Implements support for handling RaggedTensors.»»» return _ragged_tensor_apply_loss( mean_squared_logarithmic_error, y_true, y_pred ) def _maybe_convert_labels(y_true): «»»Converts binary labels into -1/1.»»» are_zeros = tf.equal(y_true, 0) are_ones = tf.equal(y_true, 1) is_binary = tf.reduce_all(tf.logical_or(are_zeros, are_ones)) def _convert_binary_labels(): # Convert the binary labels to -1 or 1. return 2.0 * y_true 1.0 updated_y_true = tf.__internal__.smart_cond.smart_cond( is_binary, _convert_binary_labels, lambda: y_true ) return updated_y_true @keras_export(«keras.metrics.squared_hinge», «keras.losses.squared_hinge») @tf.__internal__.dispatch.add_dispatch_support def squared_hinge(y_true, y_pred): «»»Computes the squared hinge loss between `y_true` & `y_pred`. `loss = mean(square(maximum(1 — y_true * y_pred, 0)), axis=-1)` Standalone usage: >>> y_true = np.random.choice([-1, 1], size=(2, 3)) >>> y_pred = np.random.random(size=(2, 3)) >>> loss = tf.keras.losses.squared_hinge(y_true, y_pred) >>> assert loss.shape == (2,) >>> assert np.array_equal( … loss.numpy(), … np.mean(np.square(np.maximum(1. — y_true * y_pred, 0.)), axis=-1)) Args: y_true: The ground truth values. `y_true` values are expected to be -1 or 1. If binary (0 or 1) labels are provided we will convert them to -1 or 1. shape = `[batch_size, d0, .. dN]`. y_pred: The predicted values. shape = `[batch_size, d0, .. dN]`. Returns: Squared hinge loss values. shape = `[batch_size, d0, .. dN-1]`. «»» y_pred = tf.convert_to_tensor(y_pred) y_true = tf.cast(y_true, y_pred.dtype) y_true = _maybe_convert_labels(y_true) return backend.mean( tf.square(tf.maximum(1.0 y_true * y_pred, 0.0)), axis=1 ) @keras_export(«keras.metrics.hinge», «keras.losses.hinge») @tf.__internal__.dispatch.add_dispatch_support def hinge(y_true, y_pred): «»»Computes the hinge loss between `y_true` & `y_pred`. `loss = mean(maximum(1 — y_true * y_pred, 0), axis=-1)` Standalone usage: >>> y_true = np.random.choice([-1, 1], size=(2, 3)) >>> y_pred = np.random.random(size=(2, 3)) >>> loss = tf.keras.losses.hinge(y_true, y_pred) >>> assert loss.shape == (2,) >>> assert np.array_equal( … loss.numpy(), … np.mean(np.maximum(1. — y_true * y_pred, 0.), axis=-1)) Args: y_true: The ground truth values. `y_true` values are expected to be -1 or 1. If binary (0 or 1) labels are provided they will be converted to -1 or 1. shape = `[batch_size, d0, .. dN]`. y_pred: The predicted values. shape = `[batch_size, d0, .. dN]`. Returns: Hinge loss values. shape = `[batch_size, d0, .. dN-1]`. «»» y_pred = tf.convert_to_tensor(y_pred) y_true = tf.cast(y_true, y_pred.dtype) y_true = _maybe_convert_labels(y_true) return backend.mean(tf.maximum(1.0 y_true * y_pred, 0.0), axis=1) @keras_export(«keras.losses.categorical_hinge») @tf.__internal__.dispatch.add_dispatch_support def categorical_hinge(y_true, y_pred): «»»Computes the categorical hinge loss between `y_true` & `y_pred`. `loss = maximum(neg — pos + 1, 0)` where `neg=maximum((1-y_true)*y_pred) and pos=sum(y_true*y_pred)` Standalone usage: >>> y_true = np.random.randint(0, 3, size=(2,)) >>> y_true = tf.keras.utils.to_categorical(y_true, num_classes=3) >>> y_pred = np.random.random(size=(2, 3)) >>> loss = tf.keras.losses.categorical_hinge(y_true, y_pred) >>> assert loss.shape == (2,) >>> pos = np.sum(y_true * y_pred, axis=-1) >>> neg = np.amax((1. — y_true) * y_pred, axis=-1) >>> assert np.array_equal(loss.numpy(), np.maximum(0., neg — pos + 1.)) Args: y_true: The ground truth values. `y_true` values are expected to be either `{-1, +1}` or `{0, 1}` (i.e. a one-hot-encoded tensor). y_pred: The predicted values. Returns: Categorical hinge loss values. «»» y_pred = tf.convert_to_tensor(y_pred) y_true = tf.cast(y_true, y_pred.dtype) pos = tf.reduce_sum(y_true * y_pred, axis=1) neg = tf.reduce_max((1.0 y_true) * y_pred, axis=1) zero = tf.cast(0.0, y_pred.dtype) return tf.maximum(neg pos + 1.0, zero) @keras_export(«keras.losses.huber», v1=[]) @tf.__internal__.dispatch.add_dispatch_support def huber(y_true, y_pred, delta=1.0): «»»Computes Huber loss value. For each value x in `error = y_true — y_pred`: «` loss = 0.5 * x^2 if |x| <= d loss = d * |x| — 0.5 * d^2 if |x| > d «` where d is `delta`. See: https://en.wikipedia.org/wiki/Huber_loss Args: y_true: tensor of true targets. y_pred: tensor of predicted targets. delta: A float, the point where the Huber loss function changes from a quadratic to linear. Returns: Tensor with one scalar loss entry per sample. «»» y_pred = tf.cast(y_pred, dtype=backend.floatx()) y_true = tf.cast(y_true, dtype=backend.floatx()) delta = tf.cast(delta, dtype=backend.floatx()) error = tf.subtract(y_pred, y_true) abs_error = tf.abs(error) half = tf.convert_to_tensor(0.5, dtype=abs_error.dtype) return backend.mean( tf.where( abs_error <= delta, half * tf.square(error), delta * abs_error half * tf.square(delta), ), axis=1, ) @keras_export( «keras.losses.log_cosh», «keras.losses.logcosh», «keras.metrics.log_cosh», «keras.metrics.logcosh», ) @tf.__internal__.dispatch.add_dispatch_support def log_cosh(y_true, y_pred): «»»Logarithm of the hyperbolic cosine of the prediction error. `log(cosh(x))` is approximately equal to `(x ** 2) / 2` for small `x` and to `abs(x) — log(2)` for large `x`. This means that ‘logcosh’ works mostly like the mean squared error, but will not be so strongly affected by the occasional wildly incorrect prediction. Standalone usage: >>> y_true = np.random.random(size=(2, 3)) >>> y_pred = np.random.random(size=(2, 3)) >>> loss = tf.keras.losses.logcosh(y_true, y_pred) >>> assert loss.shape == (2,) >>> x = y_pred — y_true >>> assert np.allclose( … loss.numpy(), … np.mean(x + np.log(np.exp(-2. * x) + 1.) — tf.math.log(2.), … axis=-1), … atol=1e-5) Args: y_true: Ground truth values. shape = `[batch_size, d0, .. dN]`. y_pred: The predicted values. shape = `[batch_size, d0, .. dN]`. Returns: Logcosh error values. shape = `[batch_size, d0, .. dN-1]`. «»» y_pred = tf.convert_to_tensor(y_pred) y_true = tf.cast(y_true, y_pred.dtype) def _logcosh(x): return ( x + tf.math.softplus(2.0 * x) tf.cast(tf.math.log(2.0), x.dtype) ) return backend.mean(_logcosh(y_pred y_true), axis=1) @keras_export( «keras.metrics.categorical_crossentropy», «keras.losses.categorical_crossentropy», ) @tf.__internal__.dispatch.add_dispatch_support def categorical_crossentropy( y_true, y_pred, from_logits=False, label_smoothing=0.0, axis=1 ): «»»Computes the categorical crossentropy loss. Standalone usage: >>> y_true = [[0, 1, 0], [0, 0, 1]] >>> y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]] >>> loss = tf.keras.losses.categorical_crossentropy(y_true, y_pred) >>> assert loss.shape == (2,) >>> loss.numpy() array([0.0513, 2.303], dtype=float32) Args: y_true: Tensor of one-hot true targets. y_pred: Tensor of predicted targets. from_logits: Whether `y_pred` is expected to be a logits tensor. By default, we assume that `y_pred` encodes a probability distribution. label_smoothing: Float in [0, 1]. If > `0` then smooth the labels. For example, if `0.1`, use `0.1 / num_classes` for non-target labels and `0.9 + 0.1 / num_classes` for target labels. axis: Defaults to -1. The dimension along which the entropy is computed. Returns: Categorical crossentropy loss value. «»» if isinstance(axis, bool): raise ValueError( «`axis` must be of type `int`. « f»Received: axis={axis} of type {type(axis)}« ) y_pred = tf.convert_to_tensor(y_pred) y_true = tf.cast(y_true, y_pred.dtype) label_smoothing = tf.convert_to_tensor(label_smoothing, dtype=y_pred.dtype) if y_pred.shape[1] == 1: warnings.warn( «In loss categorical_crossentropy, expected « «y_pred.shape to be (batch_size, num_classes) « f»with num_classes > 1. Received: y_pred.shape={y_pred.shape}. « «Consider using ‘binary_crossentropy’ if you only have 2 classes.», SyntaxWarning, stacklevel=2, ) def _smooth_labels(): num_classes = tf.cast(tf.shape(y_true)[1], y_pred.dtype) return y_true * (1.0 label_smoothing) + ( label_smoothing / num_classes ) y_true = tf.__internal__.smart_cond.smart_cond( label_smoothing, _smooth_labels, lambda: y_true ) return backend.categorical_crossentropy( y_true, y_pred, from_logits=from_logits, axis=axis ) @dispatch.dispatch_for_types(categorical_crossentropy, tf.RaggedTensor) def _ragged_tensor_categorical_crossentropy( y_true, y_pred, from_logits=False, label_smoothing=0.0, axis=1 ): «»»Implements support for handling RaggedTensors. Args: y_true: Tensor of one-hot true targets. y_pred: Tensor of predicted targets. from_logits: Whether `y_pred` is expected to be a logits tensor. By default, we assume that `y_pred` encodes a probability distribution. label_smoothing: Float in [0, 1]. If > `0` then smooth the labels. For example, if `0.1`, use `0.1 / num_classes` for non-target labels and `0.9 + 0.1 / num_classes` for target labels. axis: The axis along which to compute crossentropy (the features axis). Defaults to -1. Returns: Categorical crossentropy loss value. Expected shape: (batch, sequence_len, n_classes) with sequence_len being variable per batch. Return shape: (batch, sequence_len). When used by CategoricalCrossentropy() with the default reduction (SUM_OVER_BATCH_SIZE), the reduction averages the loss over the number of elements independent of the batch. E.g. if the RaggedTensor has 2 batches with [2, 1] values respectively the resulting loss is the sum of the individual loss values divided by 3. «»» fn = functools.partial( categorical_crossentropy, from_logits=from_logits, label_smoothing=label_smoothing, axis=axis, ) return _ragged_tensor_apply_loss(fn, y_true, y_pred) @keras_export( «keras.metrics.sparse_categorical_crossentropy», «keras.losses.sparse_categorical_crossentropy», ) @tf.__internal__.dispatch.add_dispatch_support def sparse_categorical_crossentropy( y_true, y_pred, from_logits=False, axis=1, ignore_class=None ): «»»Computes the sparse categorical crossentropy loss. Standalone usage: >>> y_true = [1, 2] >>> y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]] >>> loss = tf.keras.losses.sparse_categorical_crossentropy(y_true, y_pred) >>> assert loss.shape == (2,) >>> loss.numpy() array([0.0513, 2.303], dtype=float32) >>> y_true = [[[ 0, 2], … [-1, -1]], … [[ 0, 2], … [-1, -1]]] >>> y_pred = [[[[1.0, 0.0, 0.0], [0.0, 0.0, 1.0]], … [[0.2, 0.5, 0.3], [0.0, 1.0, 0.0]]], … [[[1.0, 0.0, 0.0], [0.0, 0.5, 0.5]], … [[0.2, 0.5, 0.3], [0.0, 1.0, 0.0]]]] >>> loss = tf.keras.losses.sparse_categorical_crossentropy( … y_true, y_pred, ignore_class=-1) >>> loss.numpy() array([[[2.3841855e-07, 2.3841855e-07], [0.0000000e+00, 0.0000000e+00]], [[2.3841855e-07, 6.9314730e-01], [0.0000000e+00, 0.0000000e+00]]], dtype=float32) Args: y_true: Ground truth values. y_pred: The predicted values. from_logits: Whether `y_pred` is expected to be a logits tensor. By default, we assume that `y_pred` encodes a probability distribution. axis: Defaults to -1. The dimension along which the entropy is computed. ignore_class: Optional integer. The ID of a class to be ignored during loss computation. This is useful, for example, in segmentation problems featuring a «void» class (commonly -1 or 255) in segmentation maps. By default (`ignore_class=None`), all classes are considered. Returns: Sparse categorical crossentropy loss value. «»» return backend.sparse_categorical_crossentropy( y_true, y_pred, from_logits=from_logits, ignore_class=ignore_class, axis=axis, ) @dispatch.dispatch_for_types(sparse_categorical_crossentropy, tf.RaggedTensor) def _ragged_tensor_sparse_categorical_crossentropy( y_true, y_pred, from_logits=False, axis=1, ignore_class=None ): «»»Implements support for handling RaggedTensors. Expected y_pred shape: (batch, sequence_len, n_classes) with sequence_len being variable per batch. Return shape: (batch, sequence_len). When used by SparseCategoricalCrossentropy() with the default reduction (SUM_OVER_BATCH_SIZE), the reduction averages the loss over the number of elements independent of the batch. E.g. if the RaggedTensor has 2 batches with [2, 1] values respectively, the resulting loss is the sum of the individual loss values divided by 3. «»» fn = functools.partial( sparse_categorical_crossentropy, from_logits=from_logits, ignore_class=ignore_class, axis=axis, ) return _ragged_tensor_apply_loss(fn, y_true, y_pred, y_pred_extra_dim=True) @keras_export( «keras.metrics.binary_crossentropy», «keras.losses.binary_crossentropy» ) @tf.__internal__.dispatch.add_dispatch_support def binary_crossentropy( y_true, y_pred, from_logits=False, label_smoothing=0.0, axis=1 ): «»»Computes the binary crossentropy loss. Standalone usage: >>> y_true = [[0, 1], [0, 0]] >>> y_pred = [[0.6, 0.4], [0.4, 0.6]] >>> loss = tf.keras.losses.binary_crossentropy(y_true, y_pred) >>> assert loss.shape == (2,) >>> loss.numpy() array([0.916 , 0.714], dtype=float32) Args: y_true: Ground truth values. shape = `[batch_size, d0, .. dN]`. y_pred: The predicted values. shape = `[batch_size, d0, .. dN]`. from_logits: Whether `y_pred` is expected to be a logits tensor. By default, we assume that `y_pred` encodes a probability distribution. label_smoothing: Float in [0, 1]. If > `0` then smooth the labels by squeezing them towards 0.5 That is, using `1. — 0.5 * label_smoothing` for the target class and `0.5 * label_smoothing` for the non-target class. axis: The axis along which the mean is computed. Defaults to -1. Returns: Binary crossentropy loss value. shape = `[batch_size, d0, .. dN-1]`. «»» y_pred = tf.convert_to_tensor(y_pred) y_true = tf.cast(y_true, y_pred.dtype) label_smoothing = tf.convert_to_tensor(label_smoothing, dtype=y_pred.dtype) def _smooth_labels(): return y_true * (1.0 label_smoothing) + 0.5 * label_smoothing y_true = tf.__internal__.smart_cond.smart_cond( label_smoothing, _smooth_labels, lambda: y_true ) return backend.mean( backend.binary_crossentropy(y_true, y_pred, from_logits=from_logits), axis=axis, ) @dispatch.dispatch_for_types(binary_crossentropy, tf.RaggedTensor) def _ragged_tensor_binary_crossentropy( y_true, y_pred, from_logits=False, label_smoothing=0.0, axis=1 ): «»»Implements support for handling RaggedTensors. Args: y_true: Tensor of one-hot true targets. y_pred: Tensor of predicted targets. from_logits: Whether `y_pred` is expected to be a logits tensor. By default, we assume that `y_pred` encodes a probability distribution. label_smoothing: Float in [0, 1]. If > `0` then smooth the labels. For example, if `0.1`, use `0.1 / num_classes` for non-target labels and `0.9 + 0.1 / num_classes` for target labels. axis: Axis along which to compute crossentropy. Returns: Binary crossentropy loss value. Expected shape: (batch, sequence_len) with sequence_len being variable per batch. Return shape: (batch,); returns the per batch mean of the loss values. When used by BinaryCrossentropy() with the default reduction (SUM_OVER_BATCH_SIZE), the reduction averages the per batch losses over the number of batches. «»» fn = functools.partial( binary_crossentropy, from_logits=from_logits, label_smoothing=label_smoothing, axis=axis, ) return _ragged_tensor_apply_loss(fn, y_true, y_pred) @keras_export( «keras.metrics.binary_focal_crossentropy», «keras.losses.binary_focal_crossentropy», ) @tf.__internal__.dispatch.add_dispatch_support def binary_focal_crossentropy( y_true, y_pred, apply_class_balancing=False, alpha=0.25, gamma=2.0, from_logits=False, label_smoothing=0.0, axis=1, ): «»»Computes the binary focal crossentropy loss. According to [Lin et al., 2018](https://arxiv.org/pdf/1708.02002.pdf), it helps to apply a focal factor to down-weight easy examples and focus more on hard examples. By default, the focal tensor is computed as follows: `focal_factor = (1 — output)**gamma` for class 1 `focal_factor = output**gamma` for class 0 where `gamma` is a focusing parameter. When `gamma` = 0, there is no focal effect on the binary crossentropy loss. If `apply_class_balancing == True`, this function also takes into account a weight balancing factor for the binary classes 0 and 1 as follows: `weight = alpha` for class 1 (`target == 1`) `weight = 1 — alpha` for class 0 where `alpha` is a float in the range of `[0, 1]`. Standalone usage: >>> y_true = [[0, 1], [0, 0]] >>> y_pred = [[0.6, 0.4], [0.4, 0.6]] >>> loss = tf.keras.losses.binary_focal_crossentropy(y_true, y_pred, … gamma=2) >>> assert loss.shape == (2,) >>> loss.numpy() array([0.330, 0.206], dtype=float32) Args: y_true: Ground truth values, of shape `(batch_size, d0, .. dN)`. y_pred: The predicted values, of shape `(batch_size, d0, .. dN)`. apply_class_balancing: A bool, whether to apply weight balancing on the binary classes 0 and 1. alpha: A weight balancing factor for class 1, default is `0.25` as mentioned in the reference. The weight for class 0 is `1.0 — alpha`. gamma: A focusing parameter, default is `2.0` as mentioned in the reference. from_logits: Whether `y_pred` is expected to be a logits tensor. By default, we assume that `y_pred` encodes a probability distribution. label_smoothing: Float in `[0, 1]`. If higher than 0 then smooth the labels by squeezing them towards `0.5`, i.e., using `1. — 0.5 * label_smoothing` for the target class and `0.5 * label_smoothing` for the non-target class. axis: The axis along which the mean is computed. Defaults to `-1`. Returns: Binary focal crossentropy loss value. shape = `[batch_size, d0, .. dN-1]`. «»» y_pred = tf.convert_to_tensor(y_pred) y_true = tf.cast(y_true, y_pred.dtype) label_smoothing = tf.convert_to_tensor(label_smoothing, dtype=y_pred.dtype) def _smooth_labels(): return y_true * (1.0 label_smoothing) + 0.5 * label_smoothing y_true = tf.__internal__.smart_cond.smart_cond( label_smoothing, _smooth_labels, lambda: y_true ) return backend.mean( backend.binary_focal_crossentropy( target=y_true, output=y_pred, apply_class_balancing=apply_class_balancing, alpha=alpha, gamma=gamma, from_logits=from_logits, ), axis=axis, ) @dispatch.dispatch_for_types(binary_focal_crossentropy, tf.RaggedTensor) def _ragged_tensor_binary_focal_crossentropy( y_true, y_pred, apply_class_balancing=False, alpha=0.25, gamma=2.0, from_logits=False, label_smoothing=0.0, axis=1, ): «»»Implements support for handling RaggedTensors. Expected shape: `(batch, sequence_len)` with sequence_len being variable per batch. Return shape: `(batch,)`; returns the per batch mean of the loss values. When used by BinaryFocalCrossentropy() with the default reduction (SUM_OVER_BATCH_SIZE), the reduction averages the per batch losses over the number of batches. Args: y_true: Tensor of one-hot true targets. y_pred: Tensor of predicted targets. apply_class_balancing: A bool, whether to apply weight balancing on the binary classes 0 and 1. alpha: A weight balancing factor for class 1, default is `0.25` as mentioned in the reference [Lin et al., 2018]( https://arxiv.org/pdf/1708.02002.pdf). The weight for class 0 is `1.0 — alpha`. gamma: A focusing parameter, default is `2.0` as mentioned in the reference. from_logits: Whether `y_pred` is expected to be a logits tensor. By default, we assume that `y_pred` encodes a probability distribution. label_smoothing: Float in `[0, 1]`. If > `0` then smooth the labels. For example, if `0.1`, use `0.1 / num_classes` for non-target labels and `0.9 + 0.1 / num_classes` for target labels. axis: Axis along which to compute crossentropy. Returns: Binary focal crossentropy loss value. «»» fn = functools.partial( binary_focal_crossentropy, apply_class_balancing=apply_class_balancing, alpha=alpha, gamma=gamma, from_logits=from_logits, label_smoothing=label_smoothing, axis=axis, ) return _ragged_tensor_apply_loss(fn, y_true, y_pred) @keras_export( «keras.metrics.kl_divergence», «keras.metrics.kullback_leibler_divergence», «keras.metrics.kld», «keras.metrics.KLD», «keras.losses.kl_divergence», «keras.losses.kullback_leibler_divergence», «keras.losses.kld», «keras.losses.KLD», ) @tf.__internal__.dispatch.add_dispatch_support def kl_divergence(y_true, y_pred): «»»Computes Kullback-Leibler divergence loss between `y_true` & `y_pred`. `loss = y_true * log(y_true / y_pred)` See: https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence Standalone usage: >>> y_true = np.random.randint(0, 2, size=(2, 3)).astype(np.float64) >>> y_pred = np.random.random(size=(2, 3)) >>> loss = tf.keras.losses.kullback_leibler_divergence(y_true, y_pred) >>> assert loss.shape == (2,) >>> y_true = tf.keras.backend.clip(y_true, 1e-7, 1) >>> y_pred = tf.keras.backend.clip(y_pred, 1e-7, 1) >>> assert np.array_equal( … loss.numpy(), np.sum(y_true * np.log(y_true / y_pred), axis=-1)) Args: y_true: Tensor of true targets. y_pred: Tensor of predicted targets. Returns: A `Tensor` with loss. Raises: TypeError: If `y_true` cannot be cast to the `y_pred.dtype`. «»» y_pred = tf.convert_to_tensor(y_pred) y_true = tf.cast(y_true, y_pred.dtype) y_true = backend.clip(y_true, backend.epsilon(), 1) y_pred = backend.clip(y_pred, backend.epsilon(), 1) return tf.reduce_sum(y_true * tf.math.log(y_true / y_pred), axis=1) @keras_export(«keras.metrics.poisson», «keras.losses.poisson») @tf.__internal__.dispatch.add_dispatch_support def poisson(y_true, y_pred): «»»Computes the Poisson loss between y_true and y_pred. The Poisson loss is the mean of the elements of the `Tensor` `y_pred — y_true * log(y_pred)`. Standalone usage: >>> y_true = np.random.randint(0, 2, size=(2, 3)) >>> y_pred = np.random.random(size=(2, 3)) >>> loss = tf.keras.losses.poisson(y_true, y_pred) >>> assert loss.shape == (2,) >>> y_pred = y_pred + 1e-7 >>> assert np.allclose( … loss.numpy(), np.mean(y_pred — y_true * np.log(y_pred), axis=-1), … atol=1e-5) Args: y_true: Ground truth values. shape = `[batch_size, d0, .. dN]`. y_pred: The predicted values. shape = `[batch_size, d0, .. dN]`. Returns: Poisson loss value. shape = `[batch_size, d0, .. dN-1]`. Raises: InvalidArgumentError: If `y_true` and `y_pred` have incompatible shapes. «»» y_pred = tf.convert_to_tensor(y_pred) y_true = tf.cast(y_true, y_pred.dtype) return backend.mean( y_pred y_true * tf.math.log(y_pred + backend.epsilon()), axis=1 ) @keras_export( «keras.losses.cosine_similarity», v1=[ «keras.metrics.cosine_proximity», «keras.metrics.cosine», «keras.losses.cosine_proximity», «keras.losses.cosine», «keras.losses.cosine_similarity», ], ) @tf.__internal__.dispatch.add_dispatch_support def cosine_similarity(y_true, y_pred, axis=1): «»»Computes the cosine similarity between labels and predictions. Note that it is a number between -1 and 1. When it is a negative number between -1 and 0, 0 indicates orthogonality and values closer to -1 indicate greater similarity. The values closer to 1 indicate greater dissimilarity. This makes it usable as a loss function in a setting where you try to maximize the proximity between predictions and targets. If either `y_true` or `y_pred` is a zero vector, cosine similarity will be 0 regardless of the proximity between predictions and targets. `loss = -sum(l2_norm(y_true) * l2_norm(y_pred))` Standalone usage: >>> y_true = [[0., 1.], [1., 1.], [1., 1.]] >>> y_pred = [[1., 0.], [1., 1.], [-1., -1.]] >>> loss = tf.keras.losses.cosine_similarity(y_true, y_pred, axis=1) >>> loss.numpy() array([-0., -0.999, 0.999], dtype=float32) Args: y_true: Tensor of true targets. y_pred: Tensor of predicted targets. axis: Axis along which to determine similarity. Returns: Cosine similarity tensor. «»» y_true = tf.linalg.l2_normalize(y_true, axis=axis) y_pred = tf.linalg.l2_normalize(y_pred, axis=axis) return tf.reduce_sum(y_true * y_pred, axis=axis) @keras_export(«keras.losses.CosineSimilarity») class CosineSimilarity(LossFunctionWrapper): «»»Computes the cosine similarity between labels and predictions. Note that it is a number between -1 and 1. When it is a negative number between -1 and 0, 0 indicates orthogonality and values closer to -1 indicate greater similarity. The values closer to 1 indicate greater dissimilarity. This makes it usable as a loss function in a setting where you try to maximize the proximity between predictions and targets. If either `y_true` or `y_pred` is a zero vector, cosine similarity will be 0 regardless of the proximity between predictions and targets. `loss = -sum(l2_norm(y_true) * l2_norm(y_pred))` Standalone usage: >>> y_true = [[0., 1.], [1., 1.]] >>> y_pred = [[1., 0.], [1., 1.]] >>> # Using ‘auto’/’sum_over_batch_size’ reduction type. >>> cosine_loss = tf.keras.losses.CosineSimilarity(axis=1) >>> # l2_norm(y_true) = [[0., 1.], [1./1.414, 1./1.414]] >>> # l2_norm(y_pred) = [[1., 0.], [1./1.414, 1./1.414]] >>> # l2_norm(y_true) . l2_norm(y_pred) = [[0., 0.], [0.5, 0.5]] >>> # loss = mean(sum(l2_norm(y_true) . l2_norm(y_pred), axis=1)) >>> # = -((0. + 0.) + (0.5 + 0.5)) / 2 >>> cosine_loss(y_true, y_pred).numpy() -0.5 >>> # Calling with ‘sample_weight’. >>> cosine_loss(y_true, y_pred, sample_weight=[0.8, 0.2]).numpy() -0.0999 >>> # Using ‘sum’ reduction type. >>> cosine_loss = tf.keras.losses.CosineSimilarity(axis=1, … reduction=tf.keras.losses.Reduction.SUM) >>> cosine_loss(y_true, y_pred).numpy() -0.999 >>> # Using ‘none’ reduction type. >>> cosine_loss = tf.keras.losses.CosineSimilarity(axis=1, … reduction=tf.keras.losses.Reduction.NONE) >>> cosine_loss(y_true, y_pred).numpy() array([-0., -0.999], dtype=float32) Usage with the `compile()` API: «`python model.compile(optimizer=’sgd’, loss=tf.keras.losses.CosineSimilarity(axis=1)) «` Args: axis: The axis along which the cosine similarity is computed (the features axis). Defaults to -1. reduction: Type of `tf.keras.losses.Reduction` to apply to loss. Default value is `AUTO`. `AUTO` indicates that the reduction option will be determined by the usage context. For almost all cases this defaults to `SUM_OVER_BATCH_SIZE`. When used under a `tf.distribute.Strategy`, except via `Model.compile()` and `Model.fit()`, using `AUTO` or `SUM_OVER_BATCH_SIZE` will raise an error. Please see this custom training [tutorial]( https://www.tensorflow.org/tutorials/distribute/custom_training) for more details. name: Optional name for the instance. «»» def __init__( self, axis=1, reduction=losses_utils.ReductionV2.AUTO, name=«cosine_similarity», ): super().__init__( cosine_similarity, reduction=reduction, name=name, axis=axis ) # Aliases. bce = BCE = binary_crossentropy mse = MSE = mean_squared_error mae = MAE = mean_absolute_error mape = MAPE = mean_absolute_percentage_error msle = MSLE = mean_squared_logarithmic_error kld = KLD = kullback_leibler_divergence = kl_divergence logcosh = log_cosh huber_loss = huber def is_categorical_crossentropy(loss): result = ( isinstance(loss, CategoricalCrossentropy) or ( isinstance(loss, LossFunctionWrapper) and loss.fn == categorical_crossentropy ) or ( hasattr(loss, «__name__») and loss.__name__ == «categorical_crossentropy» ) or (loss == «categorical_crossentropy») ) return result @keras_export(«keras.losses.serialize») def serialize(loss, use_legacy_format=False): «»»Serializes loss function or `Loss` instance. Args: loss: A Keras `Loss` instance or a loss function. Returns: Loss configuration dictionary. «»» if use_legacy_format: return legacy_serialization.serialize_keras_object(loss) return serialize_keras_object(loss) @keras_export(«keras.losses.deserialize») def deserialize(name, custom_objects=None, use_legacy_format=False): «»»Deserializes a serialized loss class/function instance. Args: name: Loss configuration. custom_objects: Optional dictionary mapping names (strings) to custom objects (classes and functions) to be considered during deserialization. Returns: A Keras `Loss` instance or a loss function. «»» if use_legacy_format: return legacy_serialization.deserialize_keras_object( name, module_objects=globals(), custom_objects=custom_objects, printable_module_name=«loss function», ) return deserialize_keras_object( name, module_objects=globals(), custom_objects=custom_objects, printable_module_name=«loss function», ) @keras_export(«keras.losses.get») def get(identifier): «»»Retrieves a Keras loss as a `function`/`Loss` class instance. The `identifier` may be the string name of a loss function or `Loss` class. >>> loss = tf.keras.losses.get(«categorical_crossentropy») >>> type(loss) <class ‘function’> >>> loss = tf.keras.losses.get(«CategoricalCrossentropy») >>> type(loss) <class ‘…keras.losses.CategoricalCrossentropy’> You can also specify `config` of the loss to this function by passing dict containing `class_name` and `config` as an identifier. Also note that the `class_name` must map to a `Loss` class >>> identifier = {«class_name»: «CategoricalCrossentropy», … «config»: {«from_logits»: True}} >>> loss = tf.keras.losses.get(identifier) >>> type(loss) <class ‘…keras.losses.CategoricalCrossentropy’> Args: identifier: A loss identifier. One of None or string name of a loss function/class or loss configuration dictionary or a loss function or a loss class instance. Returns: A Keras loss as a `function`/ `Loss` class instance. Raises: ValueError: If `identifier` cannot be interpreted. «»» if identifier is None: return None if isinstance(identifier, str): identifier = str(identifier) use_legacy_format = «module» not in identifier return deserialize(identifier, use_legacy_format=use_legacy_format) if isinstance(identifier, dict): return deserialize(identifier) if callable(identifier): return identifier raise ValueError( f»Could not interpret loss function identifier: {identifier}« ) LABEL_DTYPES_FOR_LOSSES = { tf.compat.v1.losses.sparse_softmax_cross_entropy: «int32», sparse_categorical_crossentropy: «int32», }

You’ve created a deep learning model in Keras, you prepared the data and now you are wondering which loss you should choose for your problem. 

We’ll get to that in a second but first what is a loss function?

In deep learning, the loss is computed to get the gradients with respect to model weights and update those weights accordingly via backpropagation. Loss is calculated and the network is updated after every iteration until model updates don’t bring any improvement in the desired evaluation metric. 

So while you keep using the same evaluation metric like f1 score or AUC on the validation set during (long parts) of your machine learning project, the loss can be changed, adjusted and modified to get the best evaluation metric performance.

You can think of the loss function just like you think about the model architecture or the optimizer and it is important to put some thought into choosing it. In this piece we’ll look at:

  • loss functions available in Keras and how to use them,
  • how you can define your own custom loss function in Keras,
  • how to add sample weighing to create observation-sensitive losses,
  • how to avoid nans in the loss,
  • how you can monitor the loss function via plotting and callbacks.

Let’s get into it!

Keras loss functions 101

In Keras, loss functions are passed during the compile stage, as shown below. 

In this example, we’re defining the loss function by creating an instance of the loss class. Using the class is advantageous because you can pass some additional parameters. 

from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential()
model.add(layers.Dense(64, kernel_initializer='uniform', input_shape=(10,)))
model.add(layers.Activation('softmax'))

loss_function = keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(loss=loss_function, optimizer='adam')

If you want to use a loss function that is built into Keras without specifying any parameters you can just use the string alias as shown below:

model.compile(loss='sparse_categorical_crossentropy', optimizer='adam')

You might be wondering how does one decide on which loss function to use?

There are various loss functions available in Keras. Other times you might have to implement your own custom loss functions. 

Let’s dive into all those scenarios.

Which loss functions are available in Keras?

Binary Classification

Binary classification loss function comes into play when solving a problem involving just two classes. For example, when predicting fraud in credit card transactions, a transaction is either fraudulent or not. 

Binary Cross Entropy

The Binary Cross entropy will calculate the cross-entropy loss between the predicted classes and the true classes. By default, the sum_over_batch_size reduction is used. This means that the loss will return the average of the per-sample losses in the batch.

y_true = [[0., 1.], [0.2, 0.8],[0.3, 0.7],[0.4, 0.6]]
y_pred = [[0.6, 0.4], [0.4, 0.6],[0.6, 0.4],[0.8, 0.2]]
bce = tf.keras.losses.BinaryCrossentropy(reduction='sum_over_batch_size')
bce(y_true, y_pred).numpy()

The sum reduction means that the loss function will return the sum of the per-sample losses in the batch.

bce = tf.keras.losses.BinaryCrossentropy(reduction='sum')
bce(y_true, y_pred).numpy()

Using the reduction as none returns the full array of the per-sample losses.

bce = tf.keras.losses.BinaryCrossentropy(reduction='none')
bce(y_true, y_pred).numpy()
array([0.9162905 , 0.5919184 , 0.79465103, 1.0549198 ], dtype=float32)

In binary classification, the activation function used is the sigmoid activation function. It constrains the output to a number between 0 and 1. 

Multiclass classification

Problems involving the prediction of more than one class use different loss functions. In this section we’ll look at a couple:

Categorical Crossentropy

The CategoricalCrossentropy also computes the cross-entropy loss between the true classes and predicted classes. The labels are given in an one_hot format. 

cce = tf.keras.losses.CategoricalCrossentropy()
cce(y_true, y_pred).numpy()

Sparse Categorical Crossentropy

If you have two or more classes and  the labels are integers, the SparseCategoricalCrossentropy should be used. 

y_true = [0, 1,2]
y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1],[0.1, 0.8, 0.1]]
scce = tf.keras.losses.SparseCategoricalCrossentropy()
scce(y_true, y_pred).numpy()

The Poison Loss

You can also use the Poisson class to compute the poison loss. It’s a great choice if your dataset comes from a Poisson distribution for example the number of calls a call center receives per hour. 

y_true = [[0.1, 1.,0.8], [0.1, 0.9,0.1],[0.2, 0.7,0.1],[0.3, 0.1,0.6]]
y_pred = [[0.6, 0.2,0.2], [0.2, 0.6,0.2],[0.7, 0.1,0.2],[0.8, 0.1,0.1]]
p = tf.keras.losses.Poisson()
p(y_true, y_pred).numpy()

Kullback-Leibler Divergence Loss

The relative entropy can be computed using the KLDivergence class. According to the official docs at PyTorch:

KL divergence is a useful distance measure for continuous distributions and is often useful when performing direct regression over the space of (discretely sampled) continuous output distributions. 

y_true = [[0.1, 1.,0.8], [0.1, 0.9,0.1],[0.2, 0.7,0.1],[0.3, 0.1,0.6]]
y_pred = [[0.6, 0.2,0.2], [0.2, 0.6,0.2],[0.7, 0.1,0.2],[0.8, 0.1,0.1]]
kl = tf.keras.losses.KLDivergence()
kl(y_true, y_pred).numpy()

In a multi-class problem, the activation function used is the softmax function.

Object Detection

The Focal Loss

In classification problems involving imbalanced data and object detection problems, you can use the Focal Loss. The loss introduces an adjustment to the cross-entropy criterion. 

It is done by altering its shape in a way that the loss allocated to well-classified examples is down-weighted. This ensures that the model is able to learn equally from minority and majority classes.

The cross-entropy loss is scaled by scaling the factors decaying at zero as the confidence in the correct class increases. The factor of scaling down weights the contribution of unchallenging samples at training time and focuses on the challenging ones.

import tensorflow_addons as tfa

y_true = [[0.97], [0.91], [0.03]]
y_pred = [[1.0], [1.0], [0.0]]
sfc = tfa.losses.SigmoidFocalCrossEntropy()
sfc(y_true, y_pred).numpy()
array([0.00010971, 0.00329749, 0.00030611], dtype=float32)

Generalized Intersection over Union

The Generalized Intersection over Union loss from the TensorFlow add on can also be used. The Intersection over Union (IoU) is a very common metric in object detection problems. IoU is however not very efficient in problems involving non-overlapping bounding boxes. 

The Generalized Intersection over Union was introduced to address this challenge that IoU is facing. It ensures that generalization is achieved by maintaining the scale-invariant property of IoU, encoding the shape properties of the compared objects into the region property, and making sure that there is a strong correlation with IoU in the event of overlapping objects. 

gl = tfa.losses.GIoULoss()
boxes1 = tf.constant([[4.0, 3.0, 7.0, 5.0], [5.0, 6.0, 10.0, 7.0]])
boxes2 = tf.constant([[3.0, 4.0, 6.0, 8.0], [14.0, 14.0, 15.0, 15.0]])
loss = gl(boxes1, boxes2)

Regression

In regression problems, you have to calculate the differences between the predicted values and the true values but as always there are many ways to do it.

Mean Squared Error

The MeanSquaredError class can be used to compute the mean square of errors between the predictions and the true values. 

y_true = [12, 20, 29., 60.]
y_pred = [14., 18., 27., 55.]
mse = tf.keras.losses.MeanSquaredError()
mse(y_true, y_pred).numpy()

Use Mean Squared Error when you desire to have large errors penalized more than smaller ones. 

Mean Absolute Percentage Error

The mean absolute percentage error is computed using the function below.

It is calculated as shown below.

y_true = [12, 20, 29., 60.]
y_pred = [14., 18., 27., 55.]
mape = tf.keras.losses.MeanAbsolutePercentageError()
mape(y_true, y_pred).numpy()

Consider using this loss when you want a loss that you can explain intuitively. People understand percentages easily. The loss is also robust to outliers. 

Mean Squared Logarithmic Error

The mean squared logarithmic error can be computed using the formula below:

Here’s an implementation of the same:

y_true = [12, 20, 29., 60.]
y_pred = [14., 18., 27., 55.]
msle = tf.keras.losses.MeanSquaredLogarithmicError()
msle(y_true, y_pred).numpy()

Mean Squared Logarithmic Error penalizes underestimates more than it does overestimates. It’s a great choice when you prefer not to penalize large errors, it is, therefore, robust to outliers. 

Cosine Similarity Loss

If your interest is in computing the cosine similarity between the true and predicted values, you’d use the CosineSimilarity class. It is computed as:

The result is a number between  -1 and 1 . 0 indicates orthogonality while values close to -1 show that there is great similarity.

y_true = [[12, 20], [29., 60.]]
y_pred = [[14., 18.], [27., 55.]]
cosine_loss = tf.keras.losses.CosineSimilarity(axis=1)
cosine_loss(y_true, y_pred).numpy()

LogCosh Loss

The LogCosh class computes the logarithm of the hyperbolic cosine of the prediction error.

Here’s its implementation as a stand-alone function. 

y_true = [[12, 20], [29., 60.]]
y_pred = [[14., 18.], [27., 55.]]
l = tf.keras.losses.LogCosh()
l(y_true, y_pred).numpy()

LogCosh Loss works like the mean squared error, but will not be so strongly affected by the occasional wildly incorrect prediction. — TensorFlow Docs

Huber loss

For regression problems that are less sensitive to outliers, the Huber loss is used. 

y_true = [12, 20, 29., 60.]
y_pred = [14., 18., 27., 55.]
h = tf.keras.losses.Huber()
h(y_true, y_pred).numpy()

Learning Embeddings

Triplet Loss

You can also compute the triplet loss with semi-hard negative mining via TensorFlow addons. The loss encourages the positive distances between pairs of embeddings with the same labels to be less than the minimum negative distance. 

import tensorflow_addons as tfa

model.compile(optimizer='adam',
              loss=tfa.losses.TripletSemiHardLoss(),
              metrics=['accuracy'])

Creating custom loss functions in Keras

Sometimes there is no good loss available or you need to implement some modifications. Let’s learn how to do that.

A custom loss function can be created by defining a function that takes the true values and predicted values as required parameters. The function should return an array of losses. The function can then be passed at the compile stage. 

def custom_loss_function(y_true, y_pred):
   squared_difference = tf.square(y_true - y_pred)
   return tf.reduce_mean(squared_difference, axis=-1)

model.compile(optimizer='adam', loss=custom_loss_function)

Let’s see how we can apply this custom loss function to an array of predicted and true values.

import numpy as np

y_true = [12, 20, 29., 60.]
y_pred = [14., 18., 27., 55.]
cl = custom_loss_function(np.array(y_true),np.array(y_pred))
cl.numpy()

Use of Keras loss weights

During the training process, one can weigh the loss function by observations or samples. The weights can be arbitrary, but a typical choice is class weights (distribution of labels). Each observation is weighted by the fraction of the class it belongs to (reversed) so that the loss for minority class observations is more important when calculating the loss.  

One of the ways to do this is to pass the class weights during the training process. 

The weights are passed using a dictionary that contains the weight for each class. You can compute the weights using Scikit-learn or calculate the weights based on your own criterion. 

weights = { 0:1.01300017,1:0.88994364,2:1.00704935, 3:0.97863318,      4:1.02704553, 5:1.10680686,6:1.01385603,7:0.95770152, 8:1.02546573,
               9:1.00857287}
model.fit(x_train, y_train,verbose=1, epochs=10,class_weight=weights)

The second way is to pass these weights at the compile stage.

weights = [1.013, 0.889, 1.007, 0.978, 1.027,1.106,1.013,0.957,1.025, 1.008]

model.compile(optimizer=tf.keras.optimizers.SGD(),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(),
              loss_weights=weights,
              metrics=['accuracy'])

How to monitor Keras loss function [example]

It is usually a good idea to monitor the loss function on the training and validation set as the model is training. Looking at those learning curves is a good indication of overfitting or other problems with model training.

There are two main options of how this can be done.

Monitor Keras loss using console logs 

The quickest and easiest way to log and look at the losses is simply printing them to the console. 

import tensorflow as tf

mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
   tf.keras.layers.Flatten(input_shape=(28, 28)),
   tf.keras.layers.Dense(512, activation='relu'),
   tf.keras.layers.Dropout(0.2),
   tf.keras.layers.Dense(10, activation='softmax')
])

model.compile(optimizer='sgd',
             loss='sparse_categorical_crossentropy',
             metrics=['accuracy'])

model.fit(x_train, y_train,verbose=1, epochs=10)

The problem with this approach is that those logs can be easily lost, it is difficult to see progress, and when working on remote machines, you may not have access to it.

Monitor Keras loss using a callback

Another cleaner option is to use a callback that will log the loss somewhere on every batch and epoch ended. 

You need to decide where and what you would like to log, but it is really simple. 

For example, logging Keras loss to neptune.ai could look like this:

from keras.callbacks import Callback

class NeptuneCallback(Callback):
    def on_batch_end(self, batch, logs=None):
        for metric_name, metric_value in logs.items():
            neptune_run[f"{metric_name}"].log(metric_value)

    def on_epoch_end(self, epoch, logs=None):
        for metric_name, metric_value in logs.items():
            neptune_run[f"{metric_name}"].log(metric_value)

You can create the monitoring callback yourself or use one of the many available Keras callbacks both in the Keras library and in other libraries that integrate with it, like neptune.ai, TensorBoard, and others.

Once you have the callback ready, you simply pass it to the model.fit(...):

pip install neptune-tensorflow-keras
# the same as above
import neptune.new as neptune
from neptune.new.integrations.tensorflow_keras import NeptuneCallback
 
 
run = neptune.init_run()
 
neptune_callback = NeptuneCallback(run=run)
 
model.fit(
    x_train,
    y_train,
    validation_split=0.2,
    epochs=10,
    callbacks=[neptune_callback],
)

And monitor your experiment learning curves in the web app: 

Note: For the most up-to-date code examples, please refer to the Neptune-Keras integration docs.

With neptune.ai, you can not only track losses, but also other metrics and parameters, as well as artifacts, source code, system metrics and more.

Why Keras loss nan happens

Most of the time, losses you log will be just some regular values, but sometimes you might get nans when working with Keras loss functions. 

When that happens, your model will not update its weights and will stop learning, so this situation needs to be avoided.

There could be many reasons for nan loss but usually, what happens is:

  • nans in the training set will lead to nans in the loss,
  • NumPy infinite in the training set will also lead to nans in the loss,
  • Using a training set that is not scaled,
  • Use of very large l2 regularizers and a learning rate above 1,
  • Use of the wrong optimizer function,
  • Large (exploding) gradients that result in a large update to network weights during training.

So in order to avoid nans in the loss, ensure that:

  • Check that your training data is properly scaled and doesn’t contain nans;
  • Check that you are using the right optimizer and that your learning rate is not too large;
  • Check whether the l2 regularization is not too large;
  • If you are facing the exploding gradient problem, you can either: re-design the network or use gradient clipping so that your gradients have a certain “maximum allowed model update”.

Vanishing and Exploding Gradients in Neural Network Models: Debugging, Monitoring, and Fixing

Understanding Gradient Clipping (and How It Can Fix Exploding Gradients Problem)

Final thoughts

Hopefully, this article gave you some background into loss functions in Keras.

We’ve covered:

  • Built-in loss functions in Keras,
  • Implementation of your own custom loss functions,
  • How to add sample weighing to create observation-sensitive losses,
  • How to avoid loss nans,
  • How you can visualize loss as your model is training.

For more information, check out the Keras Repository and the TensorFlow Loss Functions documentation.

How to build custom loss functions in Keras for any use case

In this article, there is an in-depth discussion on

  • What are Loss Functions 
  • What are Evaluation Metrics?
  • Commonly used Loss functions in Keras (Regression and Classification)
  • Built-in loss functions in Keras
  • What is the custom loss function?
  • Implementation of common loss functions in Keras
  • Custom Loss Function for Layers i.e Custom Regularization Loss
  • Dealing with NaN values in Keras Loss
  • Why should you use a Custom Loss?
  • Monitoring Keras Loss using callbacks

What are Loss Functions

Loss functions are one of the core parts of a machine learning model. If you’ve been in the field of data science for some time, you must have heard it. Loss functions, also known as cost functions, are special types of functions, which help us minimize the error, and reach as close as possible to the expected output.

In deep learning, the loss is computed to get the gradients for the model weights and update those weights accordingly using backpropagation.

What are Loss Functions

Basic working or understanding of error can be gained from the image above, where there is an actual value and a predicted value. The difference between the actual value and predicted value can be known as error. 

This can be written in the equation form as

So our goal is to minimize the difference between the predicted value which is hθ(x) and the actual value y. In other words, you have to minimize the value of the cost function. This main idea can be understood better from the following picture by Professor Andrew NG where he explains that choosing the correct value of θ0 and θ1 which are weights of a model, such that our prediction hθ is closest to y which is the actual output.

Here Professor Andrew NG is using the Mean Squared Error function, which will be discussed later on.

An easy explanation can be said that the goal of a machine learning model is to minimize the cost and maximize the evaluation metric. This can be achieved by updating the weights of a machine learning model using some algorithm such as Gradient Descent.

Here you can see the weight that is being updated and the cost function, that is used to update the weight of a machine learning model.

What are Evaluation Metrics

Evaluation metrics are the metrics used to evaluate and judge the performance of a machine learning model. Evaluating a machine learning project is very essential. There are different types of evaluation metrics such as ‘Mean Squared Error’, ‘Accuracy’, ‘Mean Absolute Error’ etc. The cost functions used, such as mean squared error, or binary cross-entropy are also metrics, but they are difficult to read and interpret how our model is performing. So there is a need for other metrics like Accuracy, Precision, Recall, etc. Using different metrics is important because a model may perform well using one measurement from one evaluation metric, but may perform poorly using another measurement from another evaluation metric.

output01

Here you can see the performance of our model using 2 metrics. The first one is Loss and the second one is accuracy. It can be seen that our loss function (which was cross-entropy in this example) has a value of 0.4474 which is difficult to interpret whether it is a good loss or not, but it can be seen from the accuracy that currently it has an accuracy of 80%.

Hence it is important to use different evaluation metrics other than loss/cost function only to properly evaluate the model’s performance and capabilities.

Some of the common evaluation metrics include:

  • Accuracy
  • Precision
  • Recall
  • F-1 Score
  • MSE
  • MAE
  • Confusion Matrix
  • Logarithmic Loss
  • ROC curve

And many more.

Commonly Used Loss Functions in Machine Learning Algorithms and their Keras Implementation

Commonly Used Loss Functions in Machine Learning Algorithms and their Keras Implementation

Common Regression Losses:

Regression is the type of problem where you are going to predict a continuous variable. This means that our variable can be any number, not some specific labels.

For example, when you have to predict prices of houses, it can be a house of any price, so it is a regression problem. 

Some of the common examples of regressions tasks are

  • Prices Prediction
  • Stock Market Prediction
  • Financial Forecasting
  • Trend Analysis
  • Time Series Predictions

And many more.

Common Regression Losses

This figure above explains the regression problem where you are going to predict the price of the house by checking three features which are size of the house, rooms in the house, and baths in the house. Our model will check these features, and will predict a continuous number that will be the price of the house.

Since regression problems deal with predicting a continuous number, so you have to use different types of loss then classification problems. Some of the commonly used loss functions in regression problems are as follows.

Mean Squared Error  

Mean squared error, also known as L2 Loss is mainly used for Regression Tasks. As the name suggests, it is calculated by taking the mean of the square of the loss/error which is the difference between actual and predicted value.

The Mathematical equation for Mean Squared Error is

Mean Squared Error

Where Ŷi is the predicted value, and Yi is the actual value. Mean Squared Error penalizes the model for making errors by taking the square. This is the reason that this loss function is less robust to outliers in the dataset.

Implementation in Keras.

				
					import keras
import numpy as np
 
y_true = np.array([[10.0,7.0]]) #sample data
y_pred = np.array([[8.0, 6.0]])
 
a = keras.losses.MSE(y_true, y_pred)
 
print(f'Value of Mean Squared Error is {a.numpy()}')

				
			

Here predicted values and the true values are passed inside the Mean Squared Error Object from keras.losses and computed the loss. It returns a tf.Tensor object which has been converted into numpy to see more clearly.

Using via compile Method:

Keras losses can be specified for a deep learning model using the compile method from keras.Model..

				
					model = keras.Sequential([
                     keras.layers.Dense(10, input_shape=(1,), activation='relu'),
                     keras.layers.Dense(1)
 
])

				
			

And now the compile method can be used to specify the loss and metrics.

				
					model.compile(loss='mse', optimizer='adam')

				
			

Now when our model is going to be trained, it will use the Mean Squared Error loss function to compute the loss, update the weights using ADAM optimizer.

				
					model.fit(np.array([[10.0],[20.0], [30.0],[40.0],[50.0],[60.0],[10.0], [20.0]]), np.array([6, 12, 18,24,30, 36,6, 12]), epochs=10)

				
			

Mean Absolute Error

Mean Absolute error, also known as L1 Error, is defined as the average of the absolute differences between the actual value and the predicted value. This is the average of the absolute difference between the predicted and the actual value. 

Mathematically, it can be shown as:

Mean Absolute Error

The Mean Absolute error uses the scale-dependent accuracy measure which means that it uses the same scale which is being used by the data being measured, thus it can not be used in making comparisons between series that are using different scales.

Mean Squared Error is also a common regression loss, which means that it is used to predict a continuous variable.

Standalone Implementation in Keras:

				
					import keras 
import numpy as np
 
 
y_true = np.array([[10.0,7.0]]) #dummy data
y_pred = np.array([[8.0, 6.0]])
 
c = keras.losses.MAE(y_true, y_pred) #calculating loss
 
print(f'Value of Mean Absolute Error is {c.numpy()}')

				
			

What you have to do is to create an MAE object from keras.losses and pass in our true and predicted labels to calculate the loss using the equation given above.

Implementing using compile method

When working with a deep learning model in Keras, you have to define the model structure first.

				
					model = keras.models.Sequential([
                                   keras.layers.Dense(10, input_shape=(1,), activation='relu'),
                                   keras.layers.Dense(1)
])

				
			

After defining the model architecture, you have to compile it and use the MAE loss function. Notice that either there is linear or no activation function in the last layer means that you are going to predict a continuous variable.

				
					model.compile(loss='mae', optimizer='adam')

				
			

You can now simply just fit the model to check our model’s progress. Here our model is going to train on a very small dummy random array just to check the progress.

				
					model.fit(np.array([[10.0],[20.0], [30.0],[40.0],[50.0],[60.0],[10.0], [20.0]]), np.array([6, 12, 18,24,30, 36,6, 12]), epochs=10)

				
			

And you can see the loss value which has been calculated using the MAE formula for each epoch.

Common Classification Losses:

Classification problems are those problems, in which you have to predict a label. This means that the output should be only from the given labels that you have provided to the model.

For example: There is a problem where you have to detect if the input image belongs to any given class such as dog, cat, or horse. The model will predict 3 numbers ranging from 0 to 1 and the one with the highest probability will be picked

Common Classification Losses

If you want to predict whether it is going to rain tomorrow or not, this means that the model can output between 0 and 1, and you will choose the option of rain if it is greater than 0.5, and no rain if it is less than 0.5.

Common Classification Loss:

1. Cross-Entropy

Cross Entropy is one of the most commonly used classification loss functions. You can say that it is the measure of the degrees of the dissimilarity between two probabilistic distributions. For example, in the task of predicting whether it will rain tomorrow or not, there are two distributions, one for True, and one for False.

Cross Entropy is of 3 main types. 

  a. Binary Cross Entropy

Binary Cross Entropy, as the name suggests, is the cross entropy that occurs between two classes, or in the problem of binary classification where you have to detect whether it belongs to class ‘A’, and if it does not belong to class ‘A’, then it belongs to class ‘B’.

Just like in the example of rain prediction, if it is going to rain tomorrow, then it belongs to rain class, and if there is less probability of rain tomorrow, then this means that it belongs to no rain class.

Mathematical Equation for Binary Cross Entropy is

output

This loss function has 2 parts. If our actual label is 1, the equation after ‘+’ becomes 0 because 1-1 = 0. So loss when our label is 1 is 

And when our label is 0, then the first part becomes 0. So our loss in that case would be

This loss function is also known as the Log Loss function, because of the logarithm of the loss.

Standalone Implementation:

You can create an object for Binary Cross Entropy from Keras.losses. Then you have to pass in our true and predicted labels.

				
					import keras
import numpy as np
 
y_true=np.array([[1.0]])
y_pred = np.array([[0.9]])
 
loss = keras.losses.BinaryCrossentropy()
print(f"BCE LOSS VALUE IS {loss(y_true, y_pred).numpy()}")
				
			

bce

Implementation using compile method

To use Binary Cross Entropy in a deep learning model, design the architecture, and compile the model while specifying the loss as Binary Cross Entropy.

				
					import keras
import numpy as np
 
model = keras.models.Sequential([
                                   keras.layers.Dense(16, input_shape=(2,), activation='relu'),
                                   keras.layers.Dense(8, activation='relu'),
                                   keras.layers.Dense(1, activation='sigmoid') #Sigmoid for probabilistic distribution
])
model.compile(optimizer='sgd', loss=keras.losses.BinaryCrossentropy(), metrics=['acc'])# binary cross entropy
 
model.fit(np.array([[10.0, 20.0],[20.0,30.0],[30.0,6.0], [8.0, 20.0]]),np.array([1,1,0,1]) ,epochs=10)

				
			

This will train the model using Binary Cross Entropy Loss function.

  b. Categorical Cross Entropy

Categorical Cross Entropy is the cross entropy that is used for multi-class classification. This means for a single training example, you have n probabilities, and you take the class with maximum probability where n is number of classes.

Categorical Cross Entropy

Mathematically, you can write it as:

This double sum is over the N number of examples and C categories. The term 1yi ∈ Cc shows that the ith observation belongs to the cth category. The Pmodel[yi Cc] is the probability predicted by the model for the ith observation to belong to the cth category. When there are more than 2 probabilities, the neural network outputs a vector of C probabilities, with each probability belonging to each class. When the number of categories is just two, the neural network outputs a single probability ŷi , with the other one being 1 minus the output. This is why the binary cross entropy looks a bit different from categorical cross entropy, despite being a special case of it.

Standalone Implementation

You will create a Categorical Cross Entropy object from keras.losses and pass in our true and predicted labels, on which it will calculate the Cross Entropy and return a Tensor.

Note that you have to provide a matrix that is one hot encoded showing probability for each class, as shown in this example.

				
					import keras
import numpy as np
 
y_true = [[0, 1, 0], [0, 0, 1]] #3classes
 
y_pred = [[0.05, 0.95, 0], [0.1, 0.5, 0.4]]
 
 
 
loss = keras.losses.CategoricalCrossentropy()
print(f"CCE LOSS VALUE IS {loss(y_true, y_pred).numpy()}")
				
			

Implementation using compile method

When implemented using the compile method, you have to design a model in Keras, and compile it using Categorical Cross Entropy loss. Now when the model is trained, it is calculating the loss based on categorical cross entropy, and updating the weights according to the given optimizer.

				
					import keras
import numpy as np
from keras.utils import to_categorical #to one hot encode the data
 
model = keras.models.Sequential([
                                   keras.layers.Dense(16, input_shape=(2,), activation='relu'),
                                   keras.layers.Dense(8, activation='relu'),
                                   keras.layers.Dense(3, activation='softmax') #Softmax for multiclass probability
])
 
model.compile(optimizer='sgd', loss=keras.losses.CategoricalCrossentropy(), metrics=['acc'])# categorical cross entropy
 
model.fit(np.array([[10.0, 20.0],[20.0,30.0],[30.0,6.0], [8.0, 20.0], [-1.0, -100.0], [-10.0, -200.0]]) ,to_categorical(np.array([1,1,0,1, 2, 2])) ,epochs=10)

				
			

Here, it will train the model on our dummy dataset.

  c. Sparse Categorical Cross Entropy

Mathematically, there is no difference between Categorical Cross Entropy, and Sparse Categorical Cross Entropy according to official documentation. Use this cross entropy loss function when there are two or more label classes. We expect labels to be provided as integers. If you want to provide labels using one-hot representation, please use CategoricalCrossentrop loss. There should be # classes floating point values per feature for y_pred and a single floating point value per feature for y_true.

As you have seen earlier in Categorical Cross Entropy that one hot matrix has been passed as the true labels, and predicted labels. An example of which is as follows:

				
					to_categorical(np.array([1,1,0,1, 2, 2]))

				
			

For using sparse categorical cross entropy in Keras, you need to pass in the label encoded labels. You can use sklearn for this purpose.

Lets see this example to understand better.

				
					from sklearn.preprocessing import LabelEncoder
 
le = LabelEncoder()
output = le.fit_transform(np.array(['High chance of Rain', 'No Rain', 'Maybe', 'Maybe', 'No Rain', 'High chance of Rain']))
				
			

Here a LabelEncoder object has been created, and the fit_transform method is used to encode it. The output of it is as follows.

Standalone Implementation

To perform standalone implementation, you need to perform label encoding on labels. There should be n floating point values per feature for each true label, where n is the total number of classes.

				
					from sklearn.preprocessing import LabelEncoder
t = LabelEncoder()
y_pred = [[0.1,0.1,0.8], [0.1,0.4,0.5], [0.5,0.3,0.2], [0.6,0.3,0.1]]
y_true = t.fit_transform(['Rain', 'Rain', 'High Changes of Rain', 'No Rain'])
loss = keras.losses.SparseCategoricalCrossentropy()
print(f"Sparse Categorical Loss is {loss(y_true, y_pred).numpy()} ")
				
			

sparse

Implementation using model.compile

To implement Sparse Categorical Cross Entropy in a deep learning model, you have to design the model, and compile it using the loss sparse categorical cross entropy. Remember to perform label encoding of your class labels so that sparse categorical cross entropy can work.

				
					import keras
import numpy as np
from sklearn.preprocessing import LabelEncoder
model = keras.models.Sequential([
                                   keras.layers.Dense(16, input_shape=(2,), activation='relu'),
                                   keras.layers.Dense(8, activation='relu'),
                                   keras.layers.Dense(3, activation='softmax') #Softmax for multiclass probability
])
le = LabelEncoder()
model.compile(optimizer='sgd', loss=keras.losses.SparseCategoricalCrossentropy(), metrics=['acc'])# sparse categorical cross entropy
				
			

Now the model will be trained on the dummy dataset.

				
					model.fit(np.array([[10.0, 20.0],[20.0,30.0],[30.0,6.0], [8.0, 20.0], [-1.0, -100.0], [-10.0, -200.0]]) ,le.fit_transform(np.array(['High chance of Rain', 'High chance of Rain', 'High chance of Rain', 'Maybe', 'No Rain', 'No Rain'])) ,epochs=10)


				
			

epoch

The model has been trained, where the loss is calculated using sparse categorical cross entropy, and the weights have been updated using stochastic gradient descent.

2. Hinge Loss

Hinge loss is a commonly used loss function for classification problems. It is mainly used in problems where you have to do ‘maximum-margin’ classification. A common example of which is Support Vector Machines.

The following image shows how maximum margin classification works.

Hinge Loss

Source: Stanford NLP Group

The mathematical formula for hinge loss is:

Where yi is the actual label and ŷ is the predicted label. When prediction is positive, value goes on one side, and when the prediction is negative, value goes totally opposite. This is why it is known as maximum margin classification.

Standalone Implementation:

To perform standalone implementation of Hinge Loss in Keras, you are going to use Hinge Loss Class from keras.losses.

				
					import keras
import numpy as np
y_true = [[0., 1.], [0., 0.]]
y_pred = [[0.6, 0.4], [0.4, 0.6]]
h = keras.losses.Hinge()
print(f'Value for Hinge Loss is {h(y_true, y_pred).numpy()}')
				
			

Implementation using compile Method

To implement Hinge loss using compile method, you will design our model and compile it where you will mention our loss as Hinge.
Note that Hinge Loss works best with tanh as the activation in the last layer.

				
					import keras
import numpy as np
				
			

				
					model = keras.models.Sequential([
                                   keras.layers.Dense(16, input_shape=(2,), activation='relu'),
                                   keras.layers.Dense(8, activation='tanh'),
                                   keras.layers.Dense(1, activation='tanh') 
])

				
			

				
					model.compile(optimizer='adam', loss=keras.losses.Hinge(), metrics=['acc'])# Hinge Loss
				
			

				
					from sklearn.preprocessing import LabelEncoder
 
le = LabelEncoder()
				
			

				
					model.fit(np.array([[10.0, 20.0],[20.0,30.0],[30.0,6.0], [-1.0, -100.0], [-10.0, -200.0]]) ,le.fit_transform(np.array(['High chance of Rain', 'High chance of Rain', 'High chance of Rain', 'No Rain', 'No Rain'])) ,epochs=10, batch_size=5)
				
			

This will train the model using Hinge Loss and update the weights using Adam optimizer.

Custom Loss Functions

So far you have seen some of the important cost functions that are widely used in industry and are good and easy to understand, and are built-in by famous deep learning frameworks such as Keras, or PyTorch. These built-in loss functions are enough for most of the typical tasks such as classification, or regression.

But there are some tasks, which can not be performed well using these built-in loss functions, and require some other loss that is more suitable for that task. For that purpose, a custom loss function is designed that calculates the error between the predicted value and actual value based on custom criteria.

Why you should use Custom Loss

Artificial Intelligence in general and Deep Learning in general is a very strong research field. There are various industries using Deep Learning to solve complex scenarios.

There is a lot of research on how to perform a specific task using Deep Learning. For example there is a task on generating different recipes of food using the picture of the food. Now on papers with code (a famous site for deep learning and machine learning research papers), there are a lot of research papers on this topic.

Why you should use Custom Loss

Now Imagine you are reading a research paper where the researchers thought that using Cross Entropy, or Mean Squared Error, or whatever the general loss function is for that specific type of the problem is not good enough. It may require you to modify it according to the need. This may involve adding some new parameters, or a whole new technique to achieve better results. 

Now when you are implementing that problem, or you hired some data scientists to solve that specific problem for you, you may find that this specific problem is best solved using that specific loss function which is not available by default in Keras, and you need to implement it yourself.

A custom loss function can improve the models performance significantly, and can be really useful in solving some specific problems.

To create a custom loss, you have to take care of some rules.

  1. The loss function must only take two values, that are true labels, and predicted labels. This is because in order to calculate the error in prediction, these two values are needed. These arguments are passed from the model itself when the model is being fitted.

For example:

				
					def customLoss(y_true, y_pred):
            	return loss
	model.compile(loss=customLoss, optimizer='sgd')
				
			

   2. Make sure that you are making the use of y_pred or predicted value in the loss function, because if you do not do so, the gradient expression would not be defined, and it can throw some error.

   3. You can now simply use it in  model.compile  function just like you would use any other loss function.

Example:

Let’s say you want to perform a regression task where you want to use a custom loss function that divides the loss value of Mean Squared Error by 10. Mathematically, it can be denoted as:

Now to implement it in Keras, you need to define a custom loss function, with two parameters that are true and predicted values. Then you will perform mathematical functions as per our algorithm, and return the loss value.

Note that Keras Backend functions and Tensorflow mathematical operations will be used instead of numpy functions to avoid some silly errors. Keras backend functions work similarly to numpy functions.

				
					import keras
import numpy as np
from tensorflow.python.ops import math_ops
 
def custom_loss(y_true, y_pred):
 diff = math_ops.squared_difference(y_pred, y_true)  #squared difference
 loss = K.mean(diff, axis=-1) #mean over last dimension
 loss = loss / 10.0
 return loss
				
			

Here you can see a custom function with 2 parameters that are true and predicted values, and the first step was to calculate the squared difference between the predicted labels and the true labels using squared difference function from Tensorflow Python ops. Then the mean is calculated to complete the mean squared error, and divided by 10 to complete our algorithm. The loss value is then returned.

You can use it in our deep learning model, by compiling our model and setting the loss function to the custom loss defined above.

				
					model = keras.Sequential([
                   keras.layers.Dense(10, activation='relu', input_shape=(1,)),
                   keras.layers.Dense(1)
])
 
model.compile(loss=custom_loss, optimizer='sgd')
 
X_train = np.array([[10.0],[20.0], [30.0],[40.0],[50.0],[60.0],[10.0], [20.0]])
y_train = np.array([6.0, 12, 18,24,30, 36,6, 12]) #dummy data
 
model.fit(X_train, y_train, batch_size=2, epochs=10)
				
			

Passing multiple arguments to a Keras Loss Function

Now, if you want to add some extra parameters to our loss function, for example, in the above formula, the MSE is being divided by 10. Now if you want to divide it by any value that is given by the user, you need to create a Wrapper Function with those extra parameters.

Wrapper function in short is a function whose job is to call another function, with little or no computation. The additional parameters will be passed in the wrapper function, while the main 2 parameters will remain the same in our original function.

Let’s see it with the code.

				
					def wrapper(param1):
 def custom_loss_1(y_true, y_pred):
   diff = math_ops.squared_difference(y_pred, y_true)  #squared difference
   loss = K.mean(diff, axis=-1) #mean
   loss = loss / param1
   return loss
 return custom_loss_1
				
			

To do the standalone computation using Keras, You will first create the object of our wrapper, and then pass in it y_true and y_pred parameters.

				
					loss = wrapper(10.0)
 
final_loss = loss(y_true=[[10.0,7.0]], y_pred=[[8.0, 6.0]])
print(f"Final Loss is {final_loss.numpy()}")
				
			

final-loss

You can use it in our deep learning models by simply calling the function by using appropriate value for our param1.

				
					model1 = keras.Sequential([
                   keras.layers.Dense(10, activation='relu', input_shape=(1,)),
                   keras.layers.Dense(1)
])
 
model1.compile(loss=wrapper(10.0), optimizer='sgd')
				
			

Here the model has been compiled using the value 10.0 for our param1.

The model can be trained and the results can be seen .

				
					model1.fit(X_train, y_train, batch_size=2, epochs=10)
				
			

Creating Custom Loss for Layers

Loss functions that are applied to the output of the model (i.e what you have seen till now) are not the only way to calculate and compute the losses. The custom losses for custom layers or subclassed models can be computed for the quantities which you want to minimize during the training like the regularization losses.

These losses are added using add_loss() function from keras.Layer.

For example, if you want to add custom l2 regularization in our layer, the mathematical formula of which is as follows:

You can create your own custom regularizer class which should be inherited from keras.layers..

				
					from keras.layers import Layer
from tensorflow.math import reduce_sum, square

class MyActivityRegularizer(Layer):

    def __init__(self, rate=1e-2):
        super(MyActivityRegularizer, self).__init__()
        self.rate = rate

    def call(self, inputs):
        self.add_loss(self.rate * reduce_sum(square(inputs)))
        return inputs
				
			

Now, since the regularized loss has been defined, you can simply add it in any built-in layer, or create our own layer.

				
					
class SparseMLP(Layer):
  """Stack of Linear layers with our custom regularization loss."""

  def __init__(self, output_dim):
      super(SparseMLP, self).__init__()
      self.dense_1 = layers.Dense(32, activation=tf.nn.relu)
      self.regularization = MyActivityRegularizer(1e-2)
      self.dense_2 = layers.Dense(output_dim)

  def call(self, inputs):
      x = self.dense_1(inputs)
      x = self.regularization(x)
      return self.dense_2(x)


				
			

Here custom sparse MLP layer has been defined, where when stacking two linear layers, The custom loss function has been added which will regularize the weights of our deep learning model.

It can be tested:

				
					mlp = SparseMLP(1)
y = mlp(np.random.normal(size=(10, 10)))
 
print(mlp.losses)  # List containing one float32 scalar
				
			

It returns a tf.Tensor, which can be converted into numpy using mlp.losses.numpy() method.

Dealing with NaN in Custom Loss in Keras

There are many reasons that our loss function in Keras gives NaN values. If you are new to Keras or practical deep learning, this could be very annoying because you have no idea why Keras is not giving the desired output. Since Keras is a high level API, built over low level frameworks such as Theano, Tensorflow etc. it is difficult to know the problem. 

There are many different reasons for which many people have received NaN in their loss, like shown in this figure below

Some of the main reasons, which are very common, are as follows:

 1. Missing Values in training dataset

This is one of the most common reasons for why the loss is nan while training. You should remove all the missing values from your dataset, or fill them using a good strategy, such as filling with mean. You can check nan values by using Pandas built in functions.

				
					
print(np.any(np.isnan(X_test)))


				
			

And if there are any null values, you can either use pandas fillna() function to fill them, or dropna() function to drop those values.

 2. Loss is unable to get traction on training dataset

This means that the custom loss function you designed, is not suitable for the dataset, and the business problem you are trying to solve. You should look at the problem from another perspective, and try to find a suitable loss function for your problem.

 3. Exploding Gradients

Exploding Gradients is a very common problem especially in large neural networks where the value of your gradients become very large. This problem can be solved using Gradient Clipping.

In Keras, you can add gradient clipping to your model when compiling it by adding a parameterclipnorm=x in the selected choice of optimizer. This will clip all the gradients above the value x.

For example:

				
					
opt = keras.optimizers.Adam(clipnorm=1.0)


				
			

This will clip all the gradients that are greater than 1.0. You can add it into your model as

				
					
model.compile(loss=custom_loss, optimizer=opt)


				
			

Using RMSProp optimizer function with heavy regularization also helps in diminishing the exploding gradients problem.
4. Dataset is not scaled

Scaling and normalizing the dataset is important. Unscaled data can lead the neural network to behave very strangely. Hence it is advised to properly scale the data.

There are 2 most commonly used scaling methods, and both of them are easily implementable in sklearn which is a famous Machine Learning Library in Python.

  1. StandardScaler
				
					from sklearn.preprocessing import StandardScaler
ss = StandardScaler()
 
X_train_scale = ss.fit_transform(X_train) #using fit_transform so that it can fit on data, and other data can be normalized to same scale
X_test_scale = ss.transform(X_test) #using trasnfrom to get it on same scale as training
				
			

				
					from sklearn.preprocessing import StandardScaler
ss = StandardScaler()
 
X_train_scale = ss.fit_transform(X_train) #using fit_transform so that it can fit on data, and other data can be normalized to same scale
X_test_scale = ss.transform(X_test) #using trasnfrom to get it on same scale as training
				
			

5. Dying ReLU problem 

A dead ReLU happens, when the relu function always outputs the same value (0 mostly). This means that it takes no role in the discrimination between the inputs. Once a ReLU reaches this state, it is unrecoverable because the function gradient at 0 is also 0, so gradient descent will not change the weights and the model will not improve.

This can be improved by using the Leaky ReLU activation function, where there is a small positive gradient for negative inputs. y=0.01x when x < 0 say

Dying ReLU problem 

Hence, it is advised to use Leaky ReLU to avoid NaNs in your loss.
In Keras, you can add a leaky relu layer as follows.

				
					keras.layers.LeakyReLU(alpha=0.3, **kwargs)
				
			

6. Not a good choice of optimizer function

If you are using Stochastic Gradient Descent, then it is very likely that you are going to face the exploding gradients problem. One way to tackle it is by Scheduling Learning Rate after some epochs, but now due to more advancements and research it has been proven that using a per-parameter adaptive learning rate algorithm like Adam optimizer, you no longer need to schedule the learning rate.

So there are chances that you are not using the right optimizer function.

To use the ADAM optimizer function in Keras, you can use it from keras.optimizers class.

				
					keras.optimizers.Adam(
    learning_rate=0.001,
    beta_1=0.9,
    beta_2=0.999,
    epsilon=1e-07,
    amsgrad=False,
    name="Adam",
    **kwargs
)
model.compile(optimizer= keras.optimizers.Adam(), loss=custom_loss)
				
			

7. Wrong Activation Function

The wrong choice of activation function can also lead to very strange behaviour of the deep learning model. For example if you are working on a multi-class classification problem, and using the relu activation function or sigmoid activation function in the final layer instead of categorical_crossentropy loss function, that can lead the deep learning model to perform very weirdly.

8. Low Batch Size

It has been seen that the optimizer functions on a very low batch size such as 16 or 32 are less stable, as compared to the batch size of 64 or 128. 

9. High Learning Rate

High learning rate can lead the deep learning model to not converge to optimum, and it can get lost somewhere in between.

Hence it is advisable to use a lower amount of Learning Rate. It can also be improved using Hyper Parameter Tuning.

10. Different file type (for NLP Problems)

If you are doing some textual problem, you can check your file type by running the following command.

Linux
$ file -i {input}

OSX
$ file -I {input}

This will give you the file type. If that file type is ISO-8859-1 or us-ascii then try converting the file to  utf-8 or utf-16le.

Monitoring Keras Loss using Callbacks

It is important to monitor your loss when you are training the model, so that you can understand different types of behaviours your model is showing. There are many callbacks introduced by Keras using which you can monitor the loss. Some of the famous ones are:

1. CSVLogger

CSVLogger is a callback provided by Keras that can be used to save the epoch result in a csv file, so that later on it can be visualized, information could be extracted, and the results of epochs can be stored.

You can use CSVLogger from keras.callbacks.

				
					from keras.callbacks import CSVLogger

csv_logger = CSVLogger('training.csv')
model.fit(X_train, Y_train, callbacks=[csv_logger])
				
			

This will fit the model on the dataset, and stores the callback information in a training.csv file, which you can load in a dataframe and visualize it.

2. TerminateOnNaN

Imagine you set the training limit of your model to 1000 epochs, and your model starts showing NaN loss. You can not just sit and stare at the screen while the progress is 0. Keras provides a TerminateOnNan callback that terminates the training whenever NaN loss is encountered.

				
					import keras
terNan = keras.callbacks.TerminateOnNaN()
model.fit(X_train, Y_train, callbacks=[terNan])
				
			

3. RemoteMonitor

RemoteMonitor is a powerful callback in Keras, which can help us monitor, and visualize the learning in real time.

To use this callback, you need to clone hualos by Francis Chollet, who is the creator of Keras.

				
					git clone https://github.com/fchollet/hualos
cd hualos
python api.py
				
			

Now you can access the hualos at localhost:9000 from your browser. Now you have to define the callback, and add it to your model while training

				
					monitor = RemoteMonitor()
hist = model.fit( train_X, train_Y, nb_epoch=50, callbacks=[ monitor ] )
				
			

During the training, the localhost:9000 is automatically updated, and you can see the visualizations of learning in real time.

4. EarlyStopping

EarlyStopping is a very useful callback provided by Keras, where you can stop the training earlier than expected based on some monitor value.

For example you set your epochs to be 100, and your model is not improving after the 10th epoch. You can not sit and stare at the screen so that model may finish the training, and you can change the architecture of the model. Keras provides EarlyStopping callback, which is used to stop the training based on some criteria.

				
					
es = keras.callbacks.EarlyStopping(
    monitor="val_loss",
    patience=3,
)


				
			

Here, the EarlyStopping callback has been defined, and the monitor has been set to the validation loss value. And it will check that if the value of validation loss does not improve for 3 epochs, it will stop the training.

This article should give you good foundations in dealing with loss functions, especially in Keras, implementing your own custom loss functions which you develop yourself or a researcher has already developed, and you are implementing that, their implementation using Keras a deep learning framework, avoiding silly errors such as repeating NaNs in your loss function, and how you should monitor your loss function in Keras.

Hopefully, now you have a good grip on these topics:

  • What are Loss Functions 
  • What are Evaluation Metrics?
  • Commonly used Loss functions in Keras (Regression and Classification)
  • Built-in loss functions in Keras
  • What is the custom loss function?
  • Why should you use a Custom Loss?
  • Implementation of common loss functions in Keras
  • Custom Loss Function for Layers i.e Custom Regularization Loss
  • Dealing with NaN values in Keras Loss
  • Monitoring Keras Loss using callbacks

Содержание

  1. Losses
  2. Available losses
  3. Probabilistic losses
  4. Regression losses
  5. Hinge losses for «maximum-margin» classification
  6. Usage of losses with compile() & fit()
  7. Standalone usage of losses
  8. Creating custom losses
  9. The add_loss() API
  10. Regression losses
  11. MeanSquaredError class
  12. MeanAbsoluteError class
  13. MeanAbsolutePercentageError class
  14. MeanSquaredLogarithmicError class
  15. CosineSimilarity class
  16. mean_squared_error function
  17. mean_absolute_error function
  18. mean_absolute_percentage_error function
  19. mean_squared_logarithmic_error function
  20. cosine_similarity function
  21. Huber class
  22. Regression metrics
  23. MeanSquaredError class
  24. RootMeanSquaredError class
  25. MeanAbsoluteError class
  26. MeanAbsolutePercentageError class
  27. MeanSquaredLogarithmicError class
  28. CosineSimilarity class
  29. LogCoshError class
  30. Как использовать метрики для глубокого обучения с Keras в Python
  31. Обзор учебника
  32. Keras Metrics
  33. Методы регрессии Кераса
  34. Керас Метрики классификации
  35. Пользовательские метрики в Керасе
  36. Дальнейшее чтение
  37. Резюме

Losses

The purpose of loss functions is to compute the quantity that a model should seek to minimize during training.

Available losses

Note that all losses are available both via a class handle and via a function handle. The class handles enable you to pass configuration arguments to the constructor (e.g. loss_fn = CategoricalCrossentropy(from_logits=True) ), and they perform reduction by default when used in a standalone way (see details below).

Probabilistic losses

Regression losses

Hinge losses for «maximum-margin» classification

Usage of losses with compile() & fit()

A loss function is one of the two arguments required for compiling a Keras model:

All built-in loss functions may also be passed via their string identifier:

Loss functions are typically created by instantiating a loss class (e.g. keras.losses.SparseCategoricalCrossentropy ). All losses are also provided as function handles (e.g. keras.losses.sparse_categorical_crossentropy ).

Using classes enables you to pass configuration arguments at instantiation time, e.g.:

Standalone usage of losses

A loss is a callable with arguments loss_fn(y_true, y_pred, sample_weight=None) :

  • y_true: Ground truth values, of shape (batch_size, d0, . dN) . For sparse loss functions, such as sparse categorical crossentropy, the shape should be (batch_size, d0, . dN-1)
  • y_pred: The predicted values, of shape (batch_size, d0, .. dN) .
  • sample_weight: Optional sample_weight acts as reduction weighting coefficient for the per-sample losses. If a scalar is provided, then the loss is simply scaled by the given value. If sample_weight is a tensor of size [batch_size] , then the total loss for each sample of the batch is rescaled by the corresponding element in the sample_weight vector. If the shape of sample_weight is (batch_size, d0, . dN-1) (or can be broadcasted to this shape), then each loss element of y_pred is scaled by the corresponding value of sample_weight . (Note on dN-1 : all loss functions reduce by 1 dimension, usually axis=-1 .)

By default, loss functions return one scalar loss value per input sample, e.g.

However, loss class instances feature a reduction constructor argument, which defaults to «sum_over_batch_size» (i.e. average). Allowable values are «sum_over_batch_size», «sum», and «none»:

  • «sum_over_batch_size» means the loss instance will return the average of the per-sample losses in the batch.
  • «sum» means the loss instance will return the sum of the per-sample losses in the batch.
  • «none» means the loss instance will return the full array of per-sample losses.

Note that this is an important difference between loss functions like tf.keras.losses.mean_squared_error and default loss class instances like tf.keras.losses.MeanSquaredError : the function version does not perform reduction, but by default the class instance does.

When using fit() , this difference is irrelevant since reduction is handled by the framework.

Here’s how you would use a loss class instance as part of a simple training loop:

Creating custom losses

Any callable with the signature loss_fn(y_true, y_pred) that returns an array of losses (one of sample in the input batch) can be passed to compile() as a loss. Note that sample weighting is automatically supported for any such loss.

Here’s a simple example:

The add_loss() API

Loss functions applied to the output of a model aren’t the only way to create losses.

When writing the call method of a custom layer or a subclassed model, you may want to compute scalar quantities that you want to minimize during training (e.g. regularization losses). You can use the add_loss() layer method to keep track of such loss terms.

Here’s an example of a layer that adds a sparsity regularization loss based on the L2 norm of the inputs:

Loss values added via add_loss can be retrieved in the .losses list property of any Layer or Model (they are recursively retrieved from every underlying layer):

Источник

Regression losses

MeanSquaredError class

Computes the mean of squares of errors between labels and predictions.

loss = square(y_true — y_pred)

Usage with the compile() API:

MeanAbsoluteError class

Computes the mean of absolute difference between labels and predictions.

loss = abs(y_true — y_pred)

Usage with the compile() API:

MeanAbsolutePercentageError class

Computes the mean absolute percentage error between y_true & y_pred .

loss = 100 * abs((y_true — y_pred) / y_true)

Note that to avoid dividing by zero, a small epsilon value is added to the denominator.

Usage with the compile() API:

MeanSquaredLogarithmicError class

Computes the mean squared logarithmic error between y_true & y_pred .

loss = square(log(y_true + 1.) — log(y_pred + 1.))

Usage with the compile() API:

CosineSimilarity class

Computes the cosine similarity between labels and predictions.

Note that it is a number between -1 and 1. When it is a negative number between -1 and 0, 0 indicates orthogonality and values closer to -1 indicate greater similarity. The values closer to 1 indicate greater dissimilarity. This makes it usable as a loss function in a setting where you try to maximize the proximity between predictions and targets. If either y_true or y_pred is a zero vector, cosine similarity will be 0 regardless of the proximity between predictions and targets.

loss = -sum(l2_norm(y_true) * l2_norm(y_pred))

Usage with the compile() API:

Arguments

  • axis: The axis along which the cosine similarity is computed (the features axis). Defaults to -1.
  • reduction: Type of tf.keras.losses.Reduction to apply to loss. Default value is AUTO . AUTO indicates that the reduction option will be determined by the usage context. For almost all cases this defaults to SUM_OVER_BATCH_SIZE . When used with tf.distribute.Strategy , outside of built-in training loops such as tf.keras compile and fit , using AUTO or SUM_OVER_BATCH_SIZE will raise an error. Please see this custom training tutorial for more details.
  • name: Optional name for the instance.

mean_squared_error function

Computes the mean squared error between labels and predictions.

After computing the squared distance between the inputs, the mean value over the last dimension is returned.

loss = mean(square(y_true — y_pred), axis=-1)

Arguments

  • y_true: Ground truth values. shape = [batch_size, d0, .. dN] .
  • y_pred: The predicted values. shape = [batch_size, d0, .. dN] .

Returns

Mean squared error values. shape = [batch_size, d0, .. dN-1] .

mean_absolute_error function

Computes the mean absolute error between labels and predictions.

loss = mean(abs(y_true — y_pred), axis=-1)

Arguments

  • y_true: Ground truth values. shape = [batch_size, d0, .. dN] .
  • y_pred: The predicted values. shape = [batch_size, d0, .. dN] .

Returns

Mean absolute error values. shape = [batch_size, d0, .. dN-1] .

mean_absolute_percentage_error function

Computes the mean absolute percentage error between y_true & y_pred .

loss = 100 * mean(abs((y_true — y_pred) / y_true), axis=-1)

Arguments

  • y_true: Ground truth values. shape = [batch_size, d0, .. dN] .
  • y_pred: The predicted values. shape = [batch_size, d0, .. dN] .

Returns

Mean absolute percentage error values. shape = [batch_size, d0, .. dN-1] .

mean_squared_logarithmic_error function

Computes the mean squared logarithmic error between y_true & y_pred .

loss = mean(square(log(y_true + 1) — log(y_pred + 1)), axis=-1)

Arguments

  • y_true: Ground truth values. shape = [batch_size, d0, .. dN] .
  • y_pred: The predicted values. shape = [batch_size, d0, .. dN] .

Returns

Mean squared logarithmic error values. shape = [batch_size, d0, .. dN-1] .

cosine_similarity function

Computes the cosine similarity between labels and predictions.

Note that it is a number between -1 and 1. When it is a negative number between -1 and 0, 0 indicates orthogonality and values closer to -1 indicate greater similarity. The values closer to 1 indicate greater dissimilarity. This makes it usable as a loss function in a setting where you try to maximize the proximity between predictions and targets. If either y_true or y_pred is a zero vector, cosine similarity will be 0 regardless of the proximity between predictions and targets.

loss = -sum(l2_norm(y_true) * l2_norm(y_pred))

Arguments

  • y_true: Tensor of true targets.
  • y_pred: Tensor of predicted targets.
  • axis: Axis along which to determine similarity.

Returns

Cosine similarity tensor.

Huber class

Computes the Huber loss between y_true & y_pred .

For each value x in error = y_true — y_pred :

Источник

Regression metrics

MeanSquaredError class

Computes the mean squared error between y_true and y_pred .

Arguments

  • name: (Optional) string name of the metric instance.
  • dtype: (Optional) data type of the metric result.

Usage with compile() API:

RootMeanSquaredError class

Computes root mean squared error metric between y_true and y_pred .

Usage with compile() API:

MeanAbsoluteError class

Computes the mean absolute error between the labels and predictions.

Arguments

  • name: (Optional) string name of the metric instance.
  • dtype: (Optional) data type of the metric result.

Usage with compile() API:

MeanAbsolutePercentageError class

Computes the mean absolute percentage error between y_true and y_pred .

Arguments

  • name: (Optional) string name of the metric instance.
  • dtype: (Optional) data type of the metric result.

Usage with compile() API:

MeanSquaredLogarithmicError class

Computes the mean squared logarithmic error between y_true and y_pred .

Arguments

  • name: (Optional) string name of the metric instance.
  • dtype: (Optional) data type of the metric result.

Usage with compile() API:

CosineSimilarity class

Computes the cosine similarity between the labels and predictions.

cosine similarity = (a . b) / ||a|| ||b||

This metric keeps the average cosine similarity between predictions and labels over a stream of data.

Arguments

  • name: (Optional) string name of the metric instance.
  • dtype: (Optional) data type of the metric result.
  • axis: (Optional) Defaults to -1. The dimension along which the cosine similarity is computed.

Usage with compile() API:

LogCoshError class

Computes the logarithm of the hyperbolic cosine of the prediction error.

logcosh = log((exp(x) + exp(-x))/2) , where x is the error (y_pred — y_true)

Arguments

  • name: (Optional) string name of the metric instance.
  • dtype: (Optional) data type of the metric result.

Источник

Как использовать метрики для глубокого обучения с Keras в Python

Библиотека Keras предоставляет способ для расчета и составления отчетов по набору стандартных метрик при обучении моделям глубокого обучения.

В дополнение к предложению стандартных метрик для задач классификации и регрессии, Keras также позволяет вам определять собственные отчеты и сообщать о них при обучении моделей глубокого обучения. Это особенно полезно, если вы хотите отслеживать показатель производительности, который лучше отражает навыки вашей модели во время тренировки.

В этом руководстве вы узнаете, как использовать встроенные метрики и как определять и использовать свои собственные метрики при обучении моделям глубокого обучения в Keras.

После завершения этого урока вы узнаете:

  • Как работают метрики Keras и как их использовать при обучении ваших моделей.
  • Как использовать регрессионные и классификационные метрики в Керасе с отработанными примерами.
  • Как определить и использовать свою собственную метрику в Keras с проработанным примером.

Обзор учебника

Этот урок разделен на 4 части; они есть:

  1. Keras Metrics
  2. Методы регрессии Кераса
  3. Керас Метрики классификации
  4. Пользовательские метрики в Керасе

Keras Metrics

Keras позволяет перечислять метрики для мониторинга во время обучения вашей модели.

Вы можете сделать это, указав «метрика»И предоставление списка имен функций (или псевдонимов имен функций)компиляции ()функция на вашей модели.

Конкретные показатели, которые вы перечисляете, могут быть именами функций Keras (например,mean_squared_error) или псевдонимы строк для этих функций (например, ‘MSE«).

Значения метрики записываются в конце каждой эпохи в наборе обучающих данных. Если также предоставляется набор данных проверки, то записанная метрика также рассчитывается для набора данных проверки.

Все метрики сообщаются в подробном выводе и в объекте истории, возвращенном из вызовапоместиться()функция. В обоих случаях имя метрической функции используется в качестве ключа для значений метрики. В случае метрик для набора данных проверки «val_”Префикс добавлен к ключу.

В качестве обучающих показателей могут использоваться как функции потерь, так и явно определенные метрики Keras.

Методы регрессии Кераса

Ниже приведен список метрик, которые вы можете использовать в Keras для решения проблем регрессии.

  • Средняя квадратическая ошибка: mean_squared_error, MSE или mse
  • Средняя абсолютная ошибка: mean_absolute_error, MAE, mae
  • Средняя абсолютная ошибка в процентах: mean_absolute_percentage_error, MAPE, mape
  • Косинус Близость: cosine_proximity, косинус

В приведенном ниже примере демонстрируются эти 4 встроенных показателя регрессии для простой искусственной задачи регрессии.

При выполнении примера печатаются значения метрик в конце каждой эпохи.

Затем создается линейный график из 4 метрик в течение тренировочных эпох.

Обратите внимание, что метрики были указаны с использованием псевдонимов строк [‘MSE‘,‘Мая‘,‘Мапэ‘,‘косинус‘] И упоминались как ключевые значения в объекте истории, используя их расширенное имя функции.

Мы также могли бы указать метрики, используя их расширенное имя, следующим образом:

Мы также можем указать имена функций напрямую, если они импортированы в скрипт.

Вы также можете использовать функции потерь в качестве метрик.

Например, вы можете использовать среднеквадратичную логарифмическую ошибку (mean_squared_logarithmic_error,MSLEили жеmsle) функция потерь как метрика выглядит следующим образом:

Керас Метрики классификации

Ниже приведен список метрик, которые вы можете использовать в Keras для задач классификации.

  • Двоичная точность: двоичная_точность, соотв.
  • Категорическая Точность: категорическая точность
  • Разреженная категориальная точность: sparse_categorical_accuracy
  • Top k категорическая точность: top_k_categorical_accuracy (требуется указать параметр k)
  • Разреженная вершина k Категорическая точность: sparse_top_k_categorical_accuracy (требуется указать параметр k)

Независимо от того, является ли ваша проблема проблемой двоичной или мультиклассовой классификации, вы можете указать ‘акк‘Метрика для отчета о точности.

Ниже приведен пример проблемы бинарной классификации со встроенной метрикой точности.

Запуск примера сообщает о точности в конце каждой эпохи обучения.

Построен линейный график точности за эпоху.

Пользовательские метрики в Керасе

Вы также можете определить свои собственные метрики и указать имя функции в списке функций для «метрикаАргумент при вызовекомпиляции ()функция.

Метрика, которую я часто люблю отслеживать, — это ошибка среднего квадрата или RMSE.

Вы можете получить представление о том, как написать собственную метрику, изучив код для существующей метрики.

K — это бэкэнд, используемый Keras.

Из этого примера и других примеров функций и метрик потерь подход заключается в использовании стандартных математических функций в бэкэнде для вычисления интересующей метрики.

Например, мы можем написать собственную метрику для вычисления RMSE следующим образом:

Вы можете видеть, что функция — тот же самый код как MSE с добавлениемSQRT ()завершение результата.

Мы можем проверить это в нашем примере регрессии следующим образом. Обратите внимание, что мы просто перечисляем имя функции напрямую, а не предоставляем его в виде строки или псевдонима для разрешения Keras.

Выполнение примера сообщает о собственной метрике RMSE в конце каждой эпохи обучения.

В конце прогона создается линейный график пользовательской метрики RMSE.

Ваша пользовательская метрическая функция должна работать с внутренними структурами данных Keras, которые могут отличаться в зависимости от используемого бэкэнда (например,tensorflow.python.framework.ops.Tensorпри использовании тензорного потока), а не непосредственно значения yhat и y.

По этой причине я бы рекомендовал использовать математические функции бэкэнда везде, где это возможно, для согласованности и скорости выполнения.

Дальнейшее чтение

Этот раздел предоставляет больше ресурсов по этой теме, если вы хотите углубиться.

Резюме

В этом руководстве вы узнали, как использовать метрики Keras при обучении своим моделям глубокого обучения.

В частности, вы узнали:

  • Как работает метрика Keras и как вы конфигурируете свои модели для отчета по метрикам во время обучения.
  • Как использовать классификацию и регрессионные метрики, встроенные в Keras.
  • Как эффективно определять собственные отчеты и составлять отчеты по ним, одновременно обучая модели глубокого обучения.

У вас есть вопросы?
Задайте свои вопросы в комментариях ниже, и я сделаю все возможное, чтобы ответить.

Источник

Types of Keras Loss Functions Explained for Beginners
Types of Keras Loss Functions Explained for Beginners

Contents

  • 1 Introduction
  • 2 What is Loss Function?
  • 3 Types of Loss Functions in Keras
    • 3.1 1. Keras Loss Function for Classification 
      • 3.1.1 i) Keras Binary Cross Entropy 
        • 3.1.1.1 Syntax of Keras Binary Cross Entropy
        • 3.1.1.2 Keras Binary Cross Entropy Example
      • 3.1.2 ii) Keras Categorical Cross Entropy 
        • 3.1.2.1 Syntax of Keras Categorical Cross Entropy
        • 3.1.2.2 Keras Categorical Cross Entropy Example
      • 3.1.3 iii) Keras KL Divergence
        • 3.1.3.1 Syntax of Keras KL Divergence
        • 3.1.3.2 Keras KL Divergence Example
      • 3.1.4 iv) Keras Poisson Loss Function
        • 3.1.4.1 Syntax of Keras Poisson Loss Function
        • 3.1.4.2 Keras Poisson Loss Function Example
      • 3.1.5 iv) Keras Hinge Loss
        • 3.1.5.1 Syntax of Keras Hinge Loss
        • 3.1.5.2 Keras Hinge Loss Example
      • 3.1.6 vi) Keras Squared Hinge Loss
        • 3.1.6.1 Syntax of Squared Hinge Loss in Keras
        • 3.1.6.2 Example of Squared Hinge Loss in Keras
      • 3.1.7 vii) Keras Categorical Hinge Loss
        • 3.1.7.1 Syntax of Keras Categorical Hinge Loss
        • 3.1.7.2 Keras Categorical Hinge Loss Example
    • 3.2 2. Keras Loss Function for Regression
      • 3.2.1 i) Keras Mean Square Error Loss
        • 3.2.1.1 Syntax of Mean Square Error Loss in Keras
        • 3.2.1.2 Keras Mean Square Error Loss Example
      • 3.2.2 ii) Keras Mean Absolute Error Loss
        • 3.2.2.1 Syntax of Mean Absolute Error Loss in Keras
        • 3.2.2.2 Keras Mean Absolute Error Loss Example
      • 3.2.3 iii) Keras Cosine Similarity Loss
        • 3.2.3.1 Syntax of Cosine Similarity Loss in Keras
        • 3.2.3.2 Keras Cosine Similarity Loss Example
      • 3.2.4 iv) Keras Huber Loss Function
        • 3.2.4.1 Syntax of Huber Loss Function in Keras
        • 3.2.4.2 Huber Loss Function in Keras Example
    • 3.3 Keras Custom Loss Function
      • 3.3.1 Keras Custom Loss function Example
    • 3.4 Keras add_loss() API
      • 3.4.1 Keras add_loss() API Example
  • 4 Conclusion

Introduction

In this tutorial, we will look at various types of Keras loss functions for training neural networks. The loss functions are an important part of any neural network training process as it helps the network to minimize the error and reach as close as possible to the expected output. Here we will go through Kera loss functions for regression, classification and also see how to create a custom loss function in Keras.

What is Loss Function?

Loss Functions, also known as cost functions, are used for computing the error with the aim that the model should minimize it during training.

Loss Functions also help in finding out the slope i.e. gradient w.r.t. weights used in the model and then these weights are updated after each epoch with the help of backpropagation.

The below animation shows how a loss function works.

  • Read MoreDummies guide to Loss Functions in Machine Learning [with Animation]

Types of Keras Loss Functions

Ad

Deep Learning Specialization on Coursera

Selecting a loss function is not so easy, so we’ll be going over some prominent loss functions that can be helpful in various instances.

1. Keras Loss Function for Classification 

Let us first understand the Keras loss functions for classification which is usually calculated by using probabilistic losses.

i) Keras Binary Cross Entropy 

Binary Cross Entropy loss function finds out the loss between the true labels and predicted labels for the binary classification models that gives the output as a probability between 0 to 1.

Types of Keras Loss Functions for Classification

Syntax of Keras Binary Cross Entropy

Following is the syntax of Binary Cross Entropy Loss Function in Keras.

In [1]:

tf.keras.losses.BinaryCrossentropy(
    from_logits=False, label_smoothing=0, reduction="auto", name="binary_crossentropy"
)
Keras Binary Cross Entropy Example

The example for Keras binary cross entropy shows how two sets of random values are used as data and then the required function from losses class is used.

In [3]:

y_true = [[0., 1.], [0., 0.]]
y_pred = [[0.6, 0.4], [0.4, 0.6]]
# Using 'auto'/'sum_over_batch_size' reduction type.  
bce = tf.keras.losses.BinaryCrossentropy()
bce(y_true, y_pred).numpy()

ii) Keras Categorical Cross Entropy 

This is the second type of probabilistic loss function for classification in Keras and is a generalized version of binary cross entropy that we discussed above. Categorical Cross Entropy is used for multiclass classification where there are more than two class labels.

Syntax of Keras Categorical Cross Entropy

Following is the syntax of Categorical Cross Entropy Loss Function in Keras.

In [4]:

tf.keras.losses.CategoricalCrossentropy(from_logits=False,label_smoothing=0, reduction="auto",name="categorical_crossentropy",)
Keras Categorical Cross Entropy Example

The following is an example of Keras categorical cross entropy. y_true denotes the actual probability distribution of the output and y_pred denotes the probability distribution we got from the model.

In [5]:

y_true = [[0, 1, 0], [0, 0, 1]]
y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]]
# Using 'auto'/'sum_over_batch_size' reduction type.  
cce = tf.keras.losses.CategoricalCrossentropy()
cce(y_true, y_pred).numpy()

iii) Keras KL Divergence

The KL Divergence or Kullback-Leibler Divergene Loss function is computed between the actual value and predicted value in the case of continuous distributions.

Syntax of Keras KL Divergence

Below is the syntax of LL Divergence in Keras –

In [8]:

tf.keras.losses.KLDivergence(reduction="auto", name="kl_divergence")
Keras KL Divergence Example

The KLDivergence() function is used in this case. The result obtained shows that there is not a huge loss but still it is considerable.

In [9]:

y_true = [[0, 1], [0, 0]]
y_pred = [[0.6, 0.4], [0.4, 0.6]]
# Using 'auto'/'sum_over_batch_size' reduction type.  
kl = tf.keras.losses.KLDivergence()
kl(y_true, y_pred).numpy()

iv) Keras Poisson Loss Function

In the Poisson loss function, we calculate the Poisson loss between the actual value and predicted value. Poisson Loss Function is generally used with datasets that consists of Poisson distribution. An example of Poisson distribution is the count of calls received by the call center in an hour.

Syntax of Keras Poisson Loss Function

Following is the syntax of Poisson Loss Function in Keras.

In [6]:

tf.keras.losses.Poisson(reduction="auto", name="poisson")
Keras Poisson Loss Function Example

The poisson loss function is used in below example.

In [7]:

y_true = [[0., 1.], [0., 0.]]
y_pred = [[1., 1.], [0., 0.]]
# Using 'auto'/'sum_over_batch_size' reduction type.  
p = tf.keras.losses.Poisson()
p(y_true, y_pred).numpy()

iv) Keras Hinge Loss

The above Keras loss functions for classification were using probabilistic loss as their basis for calculation. Now we are going to see some loss functions in Keras that use Hinge Loss for maximum margin classification like in SVM.

The hinge loss function is performed by computing hinge loss of true values and predicted values.

Syntax of Keras Hinge Loss

Below is the syntax of Keras Hinge loss –

In [18]:

tf.keras.losses.Hinge(reduction="auto", name="hinge")
Keras Hinge Loss Example

The hinge() function from the Keras package helps in finding the hinge loss

In [19]:

y_true = [[0., 1.], [0., 0.]]
y_pred = [[0.6, 0.4], [0.4, 0.6]]
# Using 'auto'/'sum_over_batch_size' reduction type.  
h = tf.keras.losses.Hinge()
h(y_true, y_pred).numpy()

vi) Keras Squared Hinge Loss

The squared hinge loss is calculated using squared_hinge() function and is similar to Hinge Loss calculation discussed above except that the result is squared.

Syntax of Squared Hinge Loss in Keras

In [22]:

tf.keras.losses.squared_hinge(y_true, y_pred)
Example of Squared Hinge Loss in Keras

In this example, at first, data is generated using numpy randon function, then Keras squared hinge loss function calculates the loss.

In [23]:

import numpy as np

y_true = np.random.choice([-1, 1], size=(2, 3))
y_pred = np.random.random(size=(2, 3))
loss = tf.keras.losses.squared_hinge(y_true, y_pred)
assert loss.shape == (2,)
assert np.array_equal(loss.numpy(),np.mean(np.square(np.maximum(1. - y_true * y_pred, 0.)), axis=-1))

vii) Keras Categorical Hinge Loss

The second type of hinge loss function is the categorical hinge loss function. It can help in computing categorical hinge loss between true values and predicted values for multiclass classification.

Syntax of Keras Categorical Hinge Loss

Below is the syntax of Categorical Hinge Loss in Keras –

In [20]:

tf.keras.losses.CategoricalHinge(reduction="auto", name="categorical_hinge")
Keras Categorical Hinge Loss Example

With the CategoricalHinge() function we calculate the final result for categorical hinge loss.

In [21]:

y_true = [[0, 1], [0, 0]]
y_pred = [[0.6, 0.4], [0.4, 0.6]]
# Using 'auto'/'sum_over_batch_size' reduction type.  
h = tf.keras.losses.CategoricalHinge()
h(y_true, y_pred).numpy()

2. Keras Loss Function for Regression

Let us now see the second types of loss function in Keras for Regression models

These regression loss functions are calculated on the basis of residual or error of the actual value and predicted value. The below animation shows this concept.

Types of Keras Loss Functions for Regression

Different types of Regression Loss function in Keras are as follows:

i) Keras Mean Square Error Loss

The mean square error in Keras is used for computing the mean square of errors between predicted values and actual values to get the loss.

Syntax of Mean Square Error Loss in Keras

Below is the syntax of Keras Mean Square in Keras –

In [10]:

tf.keras.losses.MeanSquaredError(reduction="auto", name="mean_squared_error")
Keras Mean Square Error Loss Example

The below code snippet shows how we can implement mean square error in Keras.

In [11]:

y_true = [[0., 1.], [0., 0.]]
y_pred = [[1., 1.], [1., 0.]]
# Using 'auto'/'sum_over_batch_size' reduction type.  
mse = tf.keras.losses.MeanSquaredError()
mse(y_true, y_pred).numpy()

ii) Keras Mean Absolute Error Loss

The mean absolute error is computed using mean of absolute difference of labels and predicted values.

Syntax of Mean Absolute Error Loss in Keras

Below is the syntax of mean absolute error loss in Keras –

In [12]:

tf.keras.losses.MeanAbsoluteError(
    reduction="auto", name="mean_absolute_error"
)
Keras Mean Absolute Error Loss Example

With help of losses class of Keras, we can import mean absolute error and then apply this over a dataset to compute mean absolute error loss.

In [13]:

y_true = [[0., 1.], [0., 0.]]
y_pred = [[1., 1.], [1., 0.]]
# Using 'auto'/'sum_over_batch_size' reduction type.  
mae = tf.keras.losses.MeanAbsoluteError()
mae(y_true, y_pred).numpy()

iii) Keras Cosine Similarity Loss

To calculate cosine similarity loss amongst the labels and predictions, we use cosine similarity. The value for cosine similarity ranges from -1 to 1.

Syntax of Cosine Similarity Loss in Keras

Below is the syntax of cosine similarity loss in Keras –

In [14]:

tf.keras.losses.CosineSimilarity(
    axis=-1, reduction="auto", name="cosine_similarity"
)
Keras Cosine Similarity Loss Example

In this example, for implementing cosine similarity in Keras, we are going to use cosine_loss function.

In [15]:

y_true = [[0., 1.], [1., 1.]]
y_pred = [[1., 0.], [1., 1.]]
# Using 'auto'/'sum_over_batch_size' reduction type.  
cosine_loss = tf.keras.losses.CosineSimilarity(axis=1)
cosine_loss(y_true, y_pred).numpy()

iv) Keras Huber Loss Function

In regression related problems where data is less affected by outliers, we can use huber loss function.

Syntax of Huber Loss Function in Keras

Below is the syntax of Huber Loss function in Keras

In [16]:

tf.keras.losses.Huber(delta=1.0, reduction="auto", name="huber_loss")
Huber Loss Function in Keras Example

Keras library provides Huber function for calculating the Huber loss.

In [17]:

y_true = [[0, 1], [0, 0]]
y_pred = [[0.6, 0.4], [0.4, 0.6]]
# Using 'auto'/'sum_over_batch_size' reduction type.  
h = tf.keras.losses.Huber()
h(y_true, y_pred).numpy()

Keras Custom Loss Function

In spite of so many loss functions, there are cases when these loss functions do not serve the purpose. In such scenarios, we can build a custom loss function in Keras, which is especially useful for research purposes.

You can pass this custom loss function in Keras as a parameter while compiling the model. But there is a constraint here that the custom loss function should take the true value (y_true) and predicted value (y_pred) as input and return an array of loss.  If your function does not match this signature then you cannot use this as a custom function in Keras.

Keras Custom Loss function Example

The below code snippet shows how to build a custom loss function. Once this function is created, we use it to compile the model using Keras.

In [24]:

def custom_loss_function(y_true, y_pred):
   squared_difference = tf.square(y_true - y_pred)
   return tf.reduce_mean(squared_difference, axis=-1)

model.compile(optimizer='adam', loss=custom_loss_function)

Keras add_loss() API

As we saw above, the custom loss function in Keras has a restriction to use a specific signature of having y_true and y_pred as arguments. Keras provides another option of add_loss() API which does not have this constraint.

Keras add_loss() API Example

The below cell contains an example of how add_loss() function is used for building loss function.

In [25]:

from keras.layers import Layer
class Custom_layer(Layer):
  def __init__(self,rate=1e-2):
    super(Custom_layer,self).__init__()
    self.rate=rate
  
  def call(self,inputs):
    self.add_loss(self.rate*tf.square(inputs))
return inputs 

Usage of loss functions

A loss function (or objective function, or optimization score function) is one of the two parameters required to compile a model:

model.compile(loss='mean_squared_error', optimizer='sgd')
from keras import losses

model.compile(loss=losses.mean_squared_error, optimizer='sgd')

You can either pass the name of an existing loss function, or pass a TensorFlow/Theano symbolic function that returns a scalar for each data-point and takes the following two arguments:

  • y_true: True labels. TensorFlow/Theano tensor.
  • y_pred: Predictions. TensorFlow/Theano tensor of the same shape as y_true.

The actual optimized objective is the mean of the output array across all datapoints.

For a few examples of such functions, check out the losses source.

Available loss functions

mean_squared_error

mean_squared_error(y_true, y_pred)

mean_absolute_error

mean_absolute_error(y_true, y_pred)

mean_absolute_percentage_error

mean_absolute_percentage_error(y_true, y_pred)

mean_squared_logarithmic_error

mean_squared_logarithmic_error(y_true, y_pred)

squared_hinge

squared_hinge(y_true, y_pred)

hinge

hinge(y_true, y_pred)

categorical_hinge

categorical_hinge(y_true, y_pred)

logcosh

logcosh(y_true, y_pred)

Logarithm of the hyperbolic cosine of the prediction error.

log(cosh(x)) is approximately equal to (x ** 2) / 2 for small x and
to abs(x) - log(2) for large x. This means that ‘logcosh’ works mostly
like the mean squared error, but will not be so strongly affected by the
occasional wildly incorrect prediction. However, it may return NaNs if the
intermediate value cosh(y_pred - y_true) is too large to be represented
in the chosen precision.


categorical_crossentropy

categorical_crossentropy(y_true, y_pred)

sparse_categorical_crossentropy

sparse_categorical_crossentropy(y_true, y_pred)

binary_crossentropy

binary_crossentropy(y_true, y_pred)

kullback_leibler_divergence

kullback_leibler_divergence(y_true, y_pred)

poisson

poisson(y_true, y_pred)

cosine_proximity

cosine_proximity(y_true, y_pred)

Note: when using the categorical_crossentropy loss, your targets should be in categorical format (e.g. if you have 10 classes, the target for each sample should be a 10-dimensional vector that is all-zeros except for a 1 at the index corresponding to the class of the sample). In order to convert integer targets into categorical targets, you can use the Keras utility to_categorical:

from keras.utils.np_utils import to_categorical

categorical_labels = to_categorical(int_labels, num_classes=None)

Понравилась статья? Поделить с друзьями:
  • Mean absolute error gradient descent
  • Mean absolute error distribution
  • Mean absolute error and mean squared error
  • Me7 95040 error 0x01
  • Me returned a temporary error мегафон как исправить