Torch mean squared error - Исправление ошибок и поиск оптимальных решений проблем

class torch.nn.MSELoss(size_average=None, reduce=None, reduction=‘mean’)[source]¶

Creates a criterion that measures the mean squared error (squared L2 norm) between
each element in the input xx and target yy.

The unreduced (i.e. with reduction set to 'none') loss can be described as:

ℓ(x,y)=L={l1,…,lN}⊤,ln=(xn−yn)2,ell(x, y) = L = {l_1,dots,l_N}^top, quad
l_n = left( x_n — y_n right)^2,

where NN is the batch size. If reduction is not 'none'
(default 'mean'), then:

ℓ(x,y)={mean⁡(L),if reduction=‘mean’;sum⁡(L),if reduction=‘sum’.ell(x, y) =
begin{cases}
operatorname{mean}(L), & text{if reduction} = text{`mean’;}\
operatorname{sum}(L), & text{if reduction} = text{`sum’.}
end{cases}

xx and yy are tensors of arbitrary shapes with a total
of nn elements each.

The mean operation still operates over all the elements, and divides by nn.

The division by nn can be avoided if one sets reduction = 'sum'.

Parameters:

size_average (bool, optional) – Deprecated (see reduction). By default,
the losses are averaged over each loss element in the batch. Note that for
some losses, there are multiple elements per sample. If the field size_average
is set to False, the losses are instead summed for each minibatch. Ignored
when reduce is False. Default: True
reduce (bool, optional) – Deprecated (see reduction). By default, the
losses are averaged or summed over observations for each minibatch depending
on size_average. When reduce is False, returns a loss per
batch element instead and ignores size_average. Default: True
reduction (str, optional) – Specifies the reduction to apply to the output:
'none' | 'mean' | 'sum'. 'none': no reduction will be applied,
'mean': the sum of the output will be divided by the number of
elements in the output, 'sum': the output will be summed. Note: size_average
and reduce are in the process of being deprecated, and in the meantime,
specifying either of those two args will override reduction. Default: 'mean'

Shape:

Input: (∗)(*), where ∗* means any number of dimensions.
Target: (∗)(*), same shape as the input.

Examples:

>>> loss = nn.MSELoss()
>>> input = torch.randn(3, 5, requires_grad=True)
>>> target = torch.randn(3, 5)
>>> output = loss(input, target)
>>> output.backward()

Источник

Loss functions are fundamental in ML model training, and, in most machine learning projects, there is no way to drive your model into making correct predictions without a loss function. In layman terms, a loss function is a mathematical function or expression used to measure how well a model is doing on some dataset. Knowing how well a model is doing on a particular dataset gives the developer insights into making a lot of decisions during training such as using a new, more powerful model or even changing the loss function itself to a different type. Speaking of types of loss functions, there are several of these loss functions which have been developed over the years, each suited to be used for a particular training task.

In this article, we are going to explore these different loss functions which are part of the PyTorch nn module. We will further take a deep dive into how PyTorch exposes these loss functions to users as part of its nn module API by building a custom one.

Now that we have a high-level understanding of what loss functions are, let’s explore some more technical details about how loss functions work.

What are loss functions?

We stated earlier that loss functions tell us how well a model does on a particular dataset. Technically, how it does this is by measuring how close a predicted value is close to the actual value. When our model is making predictions that are very close to the actual values on both our training and testing dataset, it means we have a quite robust model.

Although loss functions give us critical information about the performance of our model, that is not the primary function of loss function, as there are more robust techniques to assess our models such as accuracy and F-scores. The importance of loss functions is mostly realized during training, where we nudge the weights of our model in the direction that minimizes the loss. By doing so, we increase the probability of our model making correct predictions, something which probably would not have been possible without a loss function.

Different loss functions suit different problems, each carefully crafted by researchers to ensure stable gradient flow during training.

Sometimes, the mathematical expressions of loss functions can be a bit daunting, and this has led to some developers treating them as black boxes. We are going to uncover some of PyTorch’s most used loss functions later, but before that, let us take a look at how we use loss functions in the world of PyTorch.

Loss functions in PyTorch

PyTorch comes out of the box with a lot of canonical loss functions with simplistic design patterns that allow developers to easily iterate over these different loss functions very quickly during training. All PyTorch’s loss functions are packaged in the nn module, PyTorch’s base class for all neural networks. This makes adding a loss function into your project as easy as just adding a single line of code. Let’s look at how to add a Mean Square Error loss function in PyTorch.

import torch.nn as nn
MSE_loss_fn = nn.MSELoss()

The function returned from the code above can be used to calculate how far a prediction is from the actual value using the format below.

#predicted_value is the prediction from our neural network
#target is the actual value in our dataset
#loss_value is the loss between the predicted value and the actual value
Loss_value = MSE_loss_fn(predicted_value, target)

Now that we have an idea of how to use loss functions in PyTorch, let’s dive deep into the behind the scenes of several of the loss functions PyTorch offers.

Which loss functions are available in PyTorch?

A lot of these loss functions PyTorch comes with are broadly categorised into 3 groups — Regression loss, Classification loss and Ranking loss.

Regression losses are mostly concerned with continuous values which can take any value between two limits. One example of this would be predictions of the house prices of a community.

Classification loss functions deal with discrete values, like the task of classifying an object as a box, pen or bottle.

Ranking losses predict the relative distances between values. An example of this would be face verification, where we want to know which face images belong to a particular face, and can do so by ranking which faces do and do not belong to the original face-holder via their degree of relative approximation to the target face scan.

L1 loss function

The L1 loss function computes the mean absolute error between each value in the predicted tensor and that of the target. It first calculates the absolute difference between each value in the predicted tensor and that of the target, and computes the sum of all the values returned from each absolute difference computation. Finally, it computes the average of this sum value to obtain the Mean Absolute Error (MAE). The L1 loss function is very robust for handling noise.

Mean Average Error Formula

import torch.nn as nn

#size_average and reduce are deprecated

#reduction specifies the method of reduction to apply to output. Possible values are 'mean' (default) where we compute the average of the output, 'sum' where the output is summed and 'none' which applies no reduction to output

Loss_fn = nn.L1Loss(size_average=None, reduce=None, reduction='mean')

input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5)
output = loss_fn(input, target)
print(output) #tensor(0.7772, grad_fn=<L1LossBackward>)

The single value returned is the computed loss between two tensors with dimension 3 by 5.

Mean Square Error

The Mean Square Error shares some striking similarities with the Mean Absolute Error. Instead of computing the absolute difference between values in the prediction tensor and target, as is the case with Mean Absolute Error, it computes the square difference between values in the prediction tensor and that of the target tensor. By doing so, relatively large differences are penalized more, while relatively small differences are penalized less. MSE is considered less robust at handling outliers and noise than MAE, however.

Mean Squared Error Formula

import torch.nn as nn

loss = nn.MSELoss(size_average=None, reduce=None, reduction='mean')
#L1 loss function parameters explanation applies here.

input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5)
output = loss(input, target)
print(output) #tensor(0.9823, grad_fn=<MseLossBackward>)

Cross Entropy Loss

Cross Entropy loss is used in classification problems involving a number of discrete classes. It measures the difference between two probability distributions for a given set of random variables. Usually, when using Cross Entropy Loss, the output of our network is a Softmax layer, which ensures that the output of the neural network is a probability value (value between 0-1).

The softmax layer consists of two parts — the exponent of the prediction for a particular class.

yi is the output of the neural network for a particular class. The output of this function is a number close to zero, but never zero, if yi is large and negative, and closer to 1 if yi is positive and very large.

import numpy as np

np.exp(34) #583461742527454.9
np.exp(-34) #1.713908431542013e-15

The second part is a normalization value and is used to ensure that the output of the softmax layer is always a probability value.

This is obtained by summing all the exponents of each class value. The final equation of softmax looks like this:

In PyTorch’s nn module, cross-entropy loss combines log-softmax and Negative Log-Likelihood Loss into a single loss function.

Notice how the gradient function in the printed output is a Negative Log-Likelihood loss (NLL). This actually reveals that Cross-Entropy loss combines NLL loss under the hood with a log-softmax layer.

Negative Log-Likelihood Loss

The NLL loss function works quite similarly to the Cross-Entropy Loss function. As mentioned earlier in the Cross Entropy section, Cross-Entropy Loss combines a log-softmax layer and NLL loss to obtain the value of the Cross Entropy loss. This means that NLL loss can be used to obtain the Cross Entropy loss value by having the last layer of the neural network be a log-softmax layer instead of a normal softmax layer.

m = nn.LogSoftmax(dim=1)
loss = nn.NLLLoss()
# input is of size N x C = 3 x 5
input = torch.randn(3, 5, requires_grad=True)
# each element in target has to have 0 <= value < C
target = torch.tensor([1, 0, 4])
output = loss(m(input), target)
output.backward()
# 2D loss example (used, for example, with image inputs)
N, C = 5, 4
loss = nn.NLLLoss()
# input is of size N x C x height x width
data = torch.randn(N, 16, 10, 10)
conv = nn.Conv2d(16, C, (3, 3))
m = nn.LogSoftmax(dim=1)
# each element in target has to have 0 <= value < C
target = torch.empty(N, 8, 8, dtype=torch.long).random_(0, C)
output = loss(m(conv(data)), target)
print(output) #tensor(1.4892, grad_fn=<NllLoss2DBackward>)

#credit NLLLoss — PyTorch 1.9.0 documentation

Binary Cross Entropy Loss

Binary Cross-Entropy loss is a special class of Cross-Entropy losses used for the special problem of classifying data points into only two classes. Labels for this type of problem are usually binary, and our goal is therefore to push the model to predict a number close to zero for a zero label and a number close to one for a one label. Usually when using BCE loss for binary classification, the output of the neural network is a Sigmoid layer to ensure that the output is either a value close to zero or a value close to one.

import torch.nn as nn

m = nn.Sigmoid()
loss = nn.BCELoss()
input = torch.randn(3, requires_grad=True)
target = torch.empty(3).random_(2)
output = loss(m(input), target)
print(output) #tensor(0.4198, grad_fn=<BinaryCrossEntropyBackward>)

Binary Cross Entropy Loss with Logits

I mentioned in the previous section that a Binary Cross Entropy loss is usually output as a sigmoid layer to ensure that output is between 0 and 1. A Binary Cross-Entropy Loss with Logits combines these two layers into just one layer. According to the PyTorch documentation, this is a more numerically stable version as it takes advantage of the log-sum exp trick.

import torch
import torch.nn as nn

target = torch.ones([10, 64], dtype=torch.float32)  # 64 classes, batch size = 10
output = torch.full([10, 64], 1.5)  # A prediction (logit)
pos_weight = torch.ones([64])  # All weights are equal to 1
criterion = torch.nn.BCEWithLogitsLoss(pos_weight=pos_weight)
loss = criterion(output, target)  # -log(sigmoid(1.5))
print(loss) #tensor(0.2014)

Bring this project to life

Smooth L1 Loss

The smooth L1 loss function combines the benefits of MSE loss and MAE loss through a heuristic value beta. This criterion was introduced in the Fast R-CNN paper. When the absolute difference between the ground truth value and the predicted value is below beta, the criterion uses a squared difference, much like MSE loss. The graph of MSE loss is a continuous curve, which means the gradient at each loss value varies and can be derived everywhere. Moreover, as the loss value reduces the gradient diminishes, which is convenient during gradient descent. However for very large loss values the gradient explodes, hence the criterion switching to a Mean Absolute Error, whose gradient is almost constant for every loss value, when the absolute difference becomes larger than beta and the potential gradient explosion is eliminated.

import torch.nn as nn

loss = nn.SmoothL1Loss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5)
output = loss(input, target)

print(output) #tensor(0.7838, grad_fn=<SmoothL1LossBackward>)

Hinge Embedding Loss

Hinge Embedding Loss is mostly used in semi-supervised learning tasks to measure the similarity between two inputs. It’s used when there is an input tensor and a label tensor containing values of 1 or -1. It is mostly used in problems involving non-linear embeddings and semi-supervised learning.

import torch
import torch.nn as nn

input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5)

hinge_loss = nn.HingeEmbeddingLoss()
output = hinge_loss(input, target)
output.backward()

print('input: ', input)
print('target: ', target)
print('output: ', output)

#input:  tensor([[ 1.4668e+00,  2.9302e-01, -3.5806e-01,  1.8045e-01,  #1.1793e+00],
#       [-6.9471e-05,  9.4336e-01,  8.8339e-01, -1.1010e+00,  #1.5904e+00],
#       [-4.7971e-02, -2.7016e-01,  1.5292e+00, -6.0295e-01,  #2.3883e+00]],
#       requires_grad=True)
#target:  tensor([[-0.2386, -1.2860, -0.7707,  1.2827, -0.8612],
#        [ 0.6747,  0.1610,  0.5223, -0.8986,  0.8069],
#        [ 1.0354,  0.0253,  1.0896, -1.0791, -0.0834]])
#output:  tensor(1.2103, grad_fn=<MeanBackward0>)

Margin Ranking Loss

Margin Ranking loss belongs to the ranking losses whose main objective, unlike other loss functions, is to measure the relative distance between a set of inputs in a dataset. The margin Ranking loss function takes two inputs and a label containing only 1 or -1. If the label is 1, then it is assumed that the first input should have a higher ranking than the second input and if the label is -1, it is assumed that the second input should have a higher ranking than the first input. This relationship is shown by the equation and code below.

import torch.nn as nn

loss = nn.MarginRankingLoss()
input1 = torch.randn(3, requires_grad=True)
input2 = torch.randn(3, requires_grad=True)
target = torch.randn(3).sign()
output = loss(input1, input2, target)
print('input1: ', input1)
print('input2: ', input2)
print('output: ', output)

#input1:  tensor([-1.1109,  0.1187,  0.9441], requires_grad=True)
#input2:  tensor([ 0.9284, -0.3707, -0.7504], requires_grad=True)
#output:  tensor(0.5648, grad_fn=<MeanBackward0>)

Triplet Margin loss

This criterion measures similarity between data points by using triplets of the training data sample. The triplets involved are an anchor sample, a positive sample and a negative sample. The objective is 1) to get the distance between the positive sample and the anchor as minimal as possible, and 2) to get the distance between the anchor and the negative sample to have greater than a margin value plus the distance between the positive sample and the anchor. Usually, the positive sample belongs to the same class as the anchor, but the negative sample does not. Hence, by using this loss function, we aim to use triplet margin loss to predict a high similarity value between the anchor and the positive sample and a low similarity value between the anchor and the negative sample.

import torch.nn as nn

triplet_loss = nn.TripletMarginLoss(margin=1.0, p=2)
anchor = torch.randn(100, 128, requires_grad=True)
positive = torch.randn(100, 128, requires_grad=True)
negative = torch.randn(100, 128, requires_grad=True)
output = triplet_loss(anchor, positive, negative)
print(output)  #tensor(1.1151, grad_fn=<MeanBackward0>)

Cosine Embedding loss

Cosine Embedding loss measures the loss given inputs x1, x2, and a label tensor y containing values 1 or -1. It is used for measuring the degree to which two inputs are similar or dissimilar.

The criterion measures similarity by computing the cosine distance between the two data points in space. The cosine distance correlates to the angle between the two points which means that the smaller the angle, the closer the inputs and hence the more similar they are.

import torch.nn as nn

loss = nn.CosineEmbeddingLoss()
input1 = torch.randn(3, 6, requires_grad=True)
input2 = torch.randn(3, 6, requires_grad=True)
target = torch.randn(3).sign()
output = loss(input1, input2, target)
print('input1: ', input1)
print('input2: ', input2)
print('output: ', output)

#input1:  tensor([[ 1.2969e-01,  1.9397e+00, -1.7762e+00, -1.2793e-01, #-4.7004e-01,
#         -1.1736e+00],
#        [-3.7807e-02,  4.6385e-03, -9.5373e-01,  8.4614e-01, -1.1113e+00,
#          4.0305e-01],
#        [-1.7561e-01,  8.8705e-01, -5.9533e-02,  1.3153e-03, -6.0306e-01,
#          7.9162e-01]], requires_grad=True)
#input2:  tensor([[-0.6177, -0.0625, -0.7188,  0.0824,  0.3192,  1.0410],
#        [-0.5767,  0.0298, -0.0826,  0.5866,  1.1008,  1.6463],
#        [-0.9608, -0.6449,  1.4022,  1.2211,  0.8248, -1.9933]],
#       requires_grad=True)
#output:  tensor(0.0033, grad_fn=<MeanBackward0>)

Kullback-Leibler Divergence loss

Given two distributions, P and Q, Kullback Leibler Divergence (KLD) loss measures how much information is lost when P (assumed to be the true distributions) is replaced with Q. By measuring how much information is lost when we use Q to approximate P, we are able to obtain the similarity between P and Q and hence drive our algorithm to produce a distribution very close to the true distribution, P. The information loss when Q is used to approximate P is not the same when P is used to approximate Q, and thus KL Divergence is not symmetric.

import torch.nn as nn

loss = nn.KLDivLoss(size_average=None, reduce=None, reduction='mean', log_target=False)
input1 = torch.randn(3, 6, requires_grad=True)
input2 = torch.randn(3, 6, requires_grad=True)
output = loss(input1, input2)

print('output: ', output) #tensor(-0.0284, grad_fn=<KlDivBackward>)

Building your own custom loss function

PyTorch provides us with two popular ways to build our own loss function to suit our problem; these are namely using a class implementation and using a function implementation. Let’s see how we can implement both methods starting with the function implementation.

Custom loss with functions

This is easily the simplest way to write your own custom loss function. It’s just as easy as creating a function, passing into it the required inputs and other parameters, performing some operation using PyTorch’s core API or Functional API, and returning a value. Let’s see a demonstration with Custom Mean Square Error.

def custom_mean_square_error(y_predictions, target):
  square_difference = torch.square(y_predictions - target)
  loss_value = torch.mean(square_difference)
  return loss_value

In the code above, we define a custom loss function to calculate the mean square error given a prediction tensor and a target sensor

y_predictions = torch.randn(3, 5, requires_grad=True);
target = torch.randn(3, 5)
pytorch_loss = nn.MSELoss();
p_loss = pytorch_loss(y_predictions, target)
loss = custom_mean_square_error(y_predictions, target)
print('custom loss: ', loss)
print('pytorch loss: ', p_loss)

#custom loss:  tensor(2.3134, grad_fn=<MeanBackward0>)
#pytorch loss:  tensor(2.3134, grad_fn=<MseLossBackward>)

We can compute the loss using our custom loss function and PyTorch’s MSE loss function to observe that we have obtained the same results.

Custom loss with Python classes

This approach is probably the standard and recommended method of defining custom losses in PyTorch. The loss function is created as a node in the neural network graph by subclassing the nn module. This means that our Custom loss function is a PyTorch layer exactly the same way a convolutional layer is. Let’s see a demonstration of how this works with a custom MSE loss.

class Custom_MSE(nn.Module):
  def __init__(self):
    super(Custom_MSE, self).__init__();

  def forward(self, predictions, target):
    square_difference = torch.square(predictions - target)
    loss_value = torch.mean(square_difference)
    return loss_value
  
  # def __call__(self, predictions, target):
  #   square_difference = torch.square(y_predictions - target)
  #   loss_value = torch.mean(square_difference)
  #   return loss_value

We can define the actual implementation of the loss inside the ‘forward’ function call or inside ‘__call__’. See the IPython notebook on Gradient to see the custom MSE function used in practice.

Final Thoughts

We have discussed a lot about loss functions available in PyTorch and also taken a deep dive into the inner workings of most of these loss functions. Choosing the right loss function for a particular problem can be an overwhelming task. Hopefully, this tutorial alongside the official PyTorch documentation serves as a guideline when trying to understand which loss function suits your problem well.

Источник

Your neural networks can do a lot of different tasks. Whether it’s classifying data, like grouping pictures of animals into cats and dogs, regression tasks, like predicting monthly revenues, or anything else. Every task has a different output and needs a different type of loss function.

The way you configure your loss functions can make or break the performance of your algorithm. By correctly configuring the loss function, you can make sure your model will work how you want it to.

Luckily for us, there are loss functions we can use to make the most of machine learning tasks.

In this article, we’ll talk about popular loss functions in PyTorch, and about building custom loss functions. Once you’re done reading, you should know which one to choose for your project.

Check how you can monitor your PyTorch model training and keep track of all model-building metadata with Neptune + PyTorch integration.

What are the loss functions?

Before we jump into PyTorch specifics, let’s refresh our memory of what loss functions are.

Loss functions are used to gauge the error between the prediction output and the provided target value. A loss function tells us how far the algorithm model is from realizing the expected outcome. The word ‘loss’ means the penalty that the model gets for failing to yield the desired results.

For example, a loss function (let’s call it J) can take the following two parameters:

Predicted output (y_pred)
Target value (y)

Illustration of a neural network loss

This function will determine your model’s performance by comparing its predicted output with the expected output. If the deviation between y_pred and y is very large, the loss value will be very high.

If the deviation is small or the values are nearly identical, it’ll output a very low loss value. Therefore, you need to use a loss function that can penalize a model properly when it is training on the provided dataset.

Loss functions change based on the problem statement that your algorithm is trying to solve.

How to add PyTorch loss functions?

PyTorch’s torch.nn module has multiple standard loss functions that you can use in your project.

To add them, you need to first import the libraries:

import torch
import torch.nn as nn

Next, define the type of loss you want to use. Here’s how to define the mean absolute error loss function:

loss = nn.L1Loss()

After adding a function, you can use it to accomplish your specific task.

Which loss functions are available in PyTorch?

Broadly speaking, loss functions in PyTorch are divided into two main categories: regression losses and classification losses.

Regression loss functions are used when the model is predicting a continuous value, like the age of a person.

Classification loss functions are used when the model is predicting a discrete value, such as whether an email is spam or not.

Ranking loss functions are used when the model is predicting the relative distances between inputs, such as ranking products according to their relevance on an e-commerce search page.

Now we’ll explore the different types of loss functions in PyTorch, and how to use them:

Mean Absolute Error Loss
Mean Squared Error Loss
Negative Log-Likelihood Loss
Cross-Entropy Loss
Hinge Embedding Loss
Margin Ranking Loss
Triplet Margin Loss
Kullback-Leibler divergence

1. PyTorch Mean Absolute Error (L1 Loss Function)

torch.nn.L1Loss

The Mean Absolute Error (MAE), also called L1 Loss, computes the average of the sum of absolute differences between actual values and predicted values.

It checks the size of errors in a set of predicted values, without caring about their positive or negative direction. If the absolute values of the errors are not used, then negative values could cancel out the positive values.

The Pytorch L1 Loss is expressed as:

x represents the actual value and y the predicted value.

When could it be used?

Regression problems, especially when the distribution of the target variable has outliers, such as small or big values that are a great distance from the mean value. It is considered to be more robust to outliers.

Example

import torch
import torch.nn as nn

input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5)

mae_loss = nn.L1Loss()
output = mae_loss(input, target)
output.backward()

print('input: ', input)
print('target: ', target)
print('output: ', output)

###################### OUTPUT ######################


input:  tensor([[ 0.2423,  2.0117, -0.0648, -0.0672, -0.1567],
        [-0.2198, -1.4090,  1.3972, -0.7907, -1.0242],
        [ 0.6674, -0.2657, -0.9298,  1.0873,  1.6587]], requires_grad=True)
target:  tensor([[-0.7271, -0.6048,  1.7069, -1.5939,  0.1023],
        [-0.7733, -0.7241,  0.3062,  0.9830,  0.4515],
        [-0.4787,  1.3675, -0.7110,  2.0257, -0.9578]])
output:  tensor(1.2850, grad_fn=<L1LossBackward>)

2. PyTorch Mean Squared Error Loss Function

torch.nn.MSELoss

The Mean Squared Error (MSE), also called L2 Loss, computes the average of the squared differences between actual values and predicted values.

Pytorch MSE Loss always outputs a positive result, regardless of the sign of actual and predicted values. To enhance the accuracy of the model, you should try to reduce the L2 Loss—a perfect value is 0.0.

The squaring implies that larger mistakes produce even larger errors than smaller ones. If the classifier is off by 100, the error is 10,000. If it’s off by 0.1, the error is 0.01. This punishes the model for making big mistakes and encourages small mistakes.

The Pytorch L2 Loss is expressed as:

x represents the actual value and y the predicted value.

When could it be used?

MSE is the default loss function for most Pytorch regression problems.

Example

import torch
import torch.nn as nn

input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5)
mse_loss = nn.MSELoss()
output = mse_loss(input, target)
output.backward()

print('input: ', input)
print('target: ', target)
print('output: ', output)

###################### OUTPUT ######################


input:  tensor([[ 0.3177,  1.1312, -0.8966, -0.0772,  2.2488],
        [ 0.2391,  0.1840, -1.2232,  0.2017,  0.9083],
        [-0.0057, -3.0228,  0.0529,  0.4084, -0.0084]], requires_grad=True)
target:  tensor([[ 0.2767,  0.0823,  1.0074,  0.6112, -0.1848],
        [ 2.6384, -1.4199,  1.2608,  1.8084,  0.6511],
        [ 0.2333, -0.9921,  1.5340,  0.3703, -0.5324]])
output:  tensor(2.3280, grad_fn=<MseLossBackward>)

3. PyTorch Negative Log-Likelihood Loss Function

torch.nn.NLLLoss

The Negative Log-Likelihood Loss function (NLL) is applied only on models with the softmax function as an output activation layer. Softmax refers to an activation function that calculates the normalized exponential function of every unit in the layer.

The Softmax function is expressed as:

The function takes an input vector of size N, and then modifies the values such that every one of them falls between 0 and 1. Furthermore, it normalizes the output such that the sum of the N values of the vector equals to 1.

NLL uses a negative connotation since the probabilities (or likelihoods) vary between zero and one, and the logarithms of values in this range are negative. In the end, the loss value becomes positive.

In NLL, minimizing the loss function assists us get a better output. The negative log likelihood is retrieved from approximating the maximum likelihood estimation (MLE). This means that we try to maximize the model’s log likelihood, and as a result, minimize the NLL.

In NLL, the model is punished for making the correct prediction with smaller probabilities and encouraged for making the prediction with higher probabilities. The logarithm does the punishment.

NLL does not only care about the prediction being correct but also about the model being certain about the prediction with a high score.

The Pytorch NLL Loss is expressed as:

where x is the input, y is the target, w is the weight, and N is the batch size.

When could it be used?

Multi-class classification problems

Example

import torch
import torch.nn as nn


input = torch.randn(3, 5, requires_grad=True)

target = torch.tensor([1, 0, 4])

m = nn.LogSoftmax(dim=1)
nll_loss = nn.NLLLoss()
output = nll_loss(m(input), target)
output.backward()

print('input: ', input)
print('target: ', target)
print('output: ', output)


input:  tensor([[ 1.6430, -1.1819,  0.8667, -0.5352,  0.2585],
        [ 0.8617, -0.1880, -0.3865,  0.7368, -0.5482],
        [-0.9189, -0.1265,  1.1291,  0.0155, -2.6702]], requires_grad=True)
target:  tensor([1, 0, 4])
output:  tensor(2.9472, grad_fn=<NllLossBackward>)

4. PyTorch Cross-Entropy Loss Function

torch.nn.CrossEntropyLoss

This loss function computes the difference between two probability distributions for a provided set of occurrences or random variables.

It is used to work out a score that summarizes the average difference between the predicted values and the actual values. To enhance the accuracy of the model, you should try to minimize the score—the cross-entropy score is between 0 and 1, and a perfect value is 0.

Other loss functions, like the squared loss, punish incorrect predictions. Cross-Entropy penalizes greatly for being very confident and wrong.

Unlike the Negative Log-Likelihood Loss, which doesn’t punish based on prediction confidence, Cross-Entropy punishes incorrect but confident predictions, as well as correct but less confident predictions.

The Cross-Entropy function has a wide range of variants, of which the most common type is the Binary Cross-Entropy (BCE). The BCE Loss is mainly used for binary classification models; that is, models having only 2 classes.

The Pytorch Cross-Entropy Loss is expressed as:

Where x is the input, y is the target, w is the weight, C is the number of classes, and N spans the mini-batch dimension.

When could it be used?

Binary classification tasks, for which it’s the default loss function in Pytorch.
Creating confident models—the prediction will be accurate and with a higher probability.

Example

import torch
import torch.nn as nn

input = torch.randn(3, 5, requires_grad=True)
target = torch.empty(3, dtype=torch.long).random_(5)

cross_entropy_loss = nn.CrossEntropyLoss()
output = cross_entropy_loss(input, target)
output.backward()

print('input: ', input)
print('target: ', target)
print('output: ', output)


input:  tensor([[ 0.1639, -1.2095,  0.0496,  1.1746,  0.9474],
        [ 1.0429,  1.3255, -1.2967,  0.2183,  0.3562],
        [-0.1680,  0.2891,  1.9272,  2.2542,  0.1844]], requires_grad=True)
target:  tensor([4, 0, 3])
output:  tensor(1.0393, grad_fn=<NllLossBackward>)

5. PyTorch Hinge Embedding Loss Function

torch.nn.HingeEmbeddingLoss

The Hinge Embedding Loss is used for computing the loss when there is an input tensor, x, and a labels tensor, y. Target values are between {1, -1}, which makes it good for binary classification tasks.

With the Hinge Loss function, you can give more error whenever a difference exists in the sign between the actual class values and the predicted class values. This motivates examples to have the right sign.

The Hinge Embedding Loss is expressed as:

When could it be used?

Classification problems, especially when determining if two inputs are dissimilar or similar.
Learning nonlinear embeddings or semi-supervised learning tasks.

Example

import torch
import torch.nn as nn

input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5)

hinge_loss = nn.HingeEmbeddingLoss()
output = hinge_loss(input, target)
output.backward()

print('input: ', input)
print('target: ', target)
print('output: ', output)

###################### OUTPUT ######################

input:  tensor([[ 0.1054, -0.4323, -0.0156,  0.8425,  0.1335],
        [ 1.0882, -0.9221,  1.9434,  1.8930, -1.9206],
        [ 1.5480, -1.9243, -0.8666,  0.1467,  1.8022]], requires_grad=True)
target:  tensor([[-1.0748,  0.1622, -0.4852, -0.7273,  0.4342],
        [-1.0646, -0.7334,  1.9260, -0.6870, -1.5155],
        [-0.3828, -0.4476, -0.3003,  0.6489, -2.7488]])
output:  tensor(1.2183, grad_fn=<MeanBackward0>)

6. PyTorch Margin Ranking Loss Function

torch.nn.MarginRankingLoss

The Margin Ranking Loss computes a criterion to predict the relative distances between inputs. This is different from other loss functions, like MSE or Cross-Entropy, which learn to predict directly from a given set of inputs.

With the Margin Ranking Loss, you can calculate the loss provided there are inputs x1, x2, as well as a label tensor, y (containing 1 or -1).

When y == 1, the first input will be assumed as a larger value. It’ll be ranked higher than the second input. If y == -1, the second input will be ranked higher.

The Pytorch Margin Ranking Loss is expressed as:

When could it be used?

Ranking problems

Example

import torch
import torch.nn as nn

input_one = torch.randn(3, requires_grad=True)
input_two = torch.randn(3, requires_grad=True)
target = torch.randn(3).sign()

ranking_loss = nn.MarginRankingLoss()
output = ranking_loss(input_one, input_two, target)
output.backward()

print('input one: ', input_one)
print('input two: ', input_two)
print('target: ', target)
print('output: ', output)



input one:  tensor([1.7669, 0.5297, 1.6898], requires_grad=True)
input two:  tensor([ 0.1008, -0.2517,  0.1402], requires_grad=True)
target:  tensor([-1., -1., -1.])
output:  tensor(1.3324, grad_fn=<MeanBackward0>)

7. PyTorch Triplet Margin Loss Function

torch.nn.TripletMarginLoss

The Triplet Margin Loss computes a criterion for measuring the triplet loss in models. With this loss function, you can calculate the loss provided there are input tensors, x1, x2, x3, as well as margin with a value greater than zero.

A triplet consists of a (anchor), p (positive examples), and n (negative examples).

The Pytorch Triplet Margin Loss is expressed as:

When could it be used?

Determining the relative similarity existing between samples.
It is used in content-based retrieval problems

Example

import torch
import torch.nn as nn
anchor = torch.randn(100, 128, requires_grad=True)
positive = torch.randn(100, 128, requires_grad=True)
negative = torch.randn(100, 128, requires_grad=True)

triplet_margin_loss = nn.TripletMarginLoss(margin=1.0, p=2)
output = triplet_margin_loss(anchor, positive, negative)
output.backward()

print('anchor: ', anchor)
print('positive: ', positive)
print('negative: ', negative)
print('output: ', output)


anchor:  tensor([[ 0.6152, -0.2224,  2.2029,  ..., -0.6894,  0.1641,  1.7254],
        [ 1.3034, -1.0999,  0.1705,  ...,  0.4506, -0.2095, -0.8019],
        [-0.1638, -0.2643,  1.5279,  ..., -0.3873,  0.9648, -0.2975],
        ...,
        [-1.5240,  0.4353,  0.3575,  ...,  0.3086, -0.8936,  1.7542],
        [-1.8443, -2.0940, -0.1264,  ..., -0.6701, -1.7227,  0.6539],
        [-3.3725, -0.4695, -0.2689,  ...,  2.6315, -1.3222, -0.9542]],
       requires_grad=True)
positive:  tensor([[-0.4267, -0.1484, -0.9081,  ...,  0.3615,  0.6648,  0.3271],
        [-0.0404,  1.2644, -1.0385,  ..., -0.1272,  0.8937,  1.9377],
        [-1.2159, -0.7165, -0.0301,  ..., -0.3568, -0.9472,  0.0750],
        ...,
        [ 0.2893,  1.7894, -0.0040,  ...,  2.0052, -3.3667,  0.5894],
        [-1.5308,  0.5288,  0.5351,  ...,  0.8661, -0.9393, -0.5939],
        [ 0.0709, -0.4492, -0.9036,  ...,  0.2101, -0.8306, -0.6935]],
       requires_grad=True)
negative:  tensor([[-1.8089, -1.3162, -1.7045,  ...,  1.7220,  1.6008,  0.5585],
        [-0.4567,  0.3363, -1.2184,  ..., -2.3124,  0.7193,  0.2762],
        [-0.8471,  0.7779,  0.1627,  ..., -0.8704,  1.4201,  1.2366],
        ...,
        [-1.9165,  1.7768, -1.9975,  ..., -0.2091, -0.7073,  2.4570],
        [-1.7506,  0.4662,  0.9482,  ...,  0.0916, -0.2020, -0.5102],
        [-0.7463, -1.9737,  1.3279,  ...,  0.1629, -0.3693, -0.6008]],
       requires_grad=True)
output:  tensor(1.0755, grad_fn=<MeanBackward0>)

8. PyTorch Kullback-Leibler Divergence Loss Function

torch.nn.KLDivLoss

The Kullback-Leibler Divergence, shortened to KL Divergence, computes the difference between two probability distributions.

With this loss function, you can compute the amount of lost information (expressed in bits) in case the predicted probability distribution is utilized to estimate the expected target probability distribution.

Its output tells you the proximity of two probability distributions. If the predicted probability distribution is very far from the true probability distribution, it’ll lead to a big loss. If the value of KL Divergence is zero, it implies that the probability distributions are the same.

KL Divergence behaves just like Cross-Entropy Loss, with a key difference in how they handle predicted and actual probability. Cross-Entropy punishes the model according to the confidence of predictions, and KL Divergence doesn’t. KL Divergence only assesses how the probability distribution prediction is different from the distribution of ground truth.

The KL Divergence Loss is expressed as:

x represents the true label’s probability and y represents the predicted label’s probability.

When could it be used?

Approximating complex functions
Multi-class classification tasks
If you want to make sure that the distribution of predictions is similar to that of training data

Example

import torch
import torch.nn as nn

input = torch.randn(2, 3, requires_grad=True)
target = torch.randn(2, 3)

kl_loss = nn.KLDivLoss(reduction = 'batchmean')
output = kl_loss(input, target)
output.backward()

print('input: ', input)
print('target: ', target)
print('output: ', output)

###################### OUTPUT ######################

input:  tensor([[ 1.4676, -1.5014, -1.5201],
        [ 1.8420, -0.8228, -0.3931]], requires_grad=True)
target:  tensor([[ 0.0300, -1.7714,  0.8712],
        [-1.7118,  0.9312, -1.9843]])
output:  tensor(0.8774, grad_fn=<DivBackward0>)

How to create a custom loss function in PyTorch?

PyTorch lets you create your own custom loss functions to implement in your projects.

Here’s how you can create your own simple Cross-Entropy Loss function.

Creating custom loss function as a python function

def myCustomLoss(my_outputs, my_labels):
    
    my_batch_size = my_outputs.size()[0]
    
    my_outputs = F.log_softmax(my_outputs, dim=1)
    
    my_outputs = my_outputs[range(my_batch_size), my_labels]
    
    return -torch.sum(my_outputs)/number_examples

You can also create other advanced PyTorch custom loss functions.

Creating custom loss function with a class definition

Let’s modify the Dice coefficient, which computes the similarity between two samples, to act as a loss function for binary classification problems:

class DiceLoss(nn.Module):
    def __init__(self, weight=None, size_average=True):
        super(DiceLoss, self).__init__()

    def forward(self, inputs, targets, smooth=1):

        inputs = F.sigmoid(inputs)

        inputs = inputs.view(-1)
        targets = targets.view(-1)

        intersection = (inputs * targets).sum()
        dice = (2.*intersection + smooth)/(inputs.sum() + targets.sum() + smooth)

        return 1 - dice

How to monitor PyTorch loss functions?

It is quite obvious that while training a model, one needs to keep an eye on the loss function values to track the model’s performance. As the loss value keeps decreasing, the model keeps getting better. There are a number of ways that we can do this. Let’s take a look at them.

For this, we will be training a simple Neural Network created in PyTorch which will perform classification on the famous Iris dataset.

Making the required imports for getting the dataset.

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

Loading the dataset.

iris = load_iris()
X = iris['data']
y = iris['target']
names = iris['target_names']
feature_names = iris['feature_names']

Scaling the dataset to have mean=0 and variance=1, gives quick model convergence.

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

Splitting the dataset into train and test in an 80-20 ratio.

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=2)

Making the necessary imports for our Neural Network and its training.

import torch
import torch.nn.functional as F
import torch.nn as nn
import matplotlib.pyplot as plt
import numpy as np
plt.style.use('ggplot')

Defining our network.

class PyTorch_NN(nn.Module):
    def __init__(self, input_dim, output_dim):
        super(PyTorch_NN, self).__init__()
        self.input_layer = nn.Linear(input_dim, 128)
        self.hidden_layer = nn.Linear(128, 64)
        self.output_layer = nn.Linear(64, output_dim)

    def forward(self, x):
        x = F.relu(self.input_layer(x))
        x = F.relu(self.hidden_layer(x))
        x = F.softmax(self.output_layer(x), dim=1)
        return x

Defining functions for getting accuracy and training the network.

def get_accuracy(pred_arr,original_arr):
    pred_arr = pred_arr.detach().numpy()
    original_arr = original_arr.numpy()
    final_pred= []

    for i in range(len(pred_arr)):
        final_pred.append(np.argmax(pred_arr[i]))
    final_pred = np.array(final_pred)
    count = 0

    for i in range(len(original_arr)):
        if final_pred[i] == original_arr[i]:
            count+=1
    return count/len(final_pred)*100

def train_network(model, optimizer, criterion, X_train, y_train, X_test, y_test, num_epochs):
    train_loss=[]
    train_accuracy=[]
    test_accuracy=[]

    for epoch in range(num_epochs):

        
        output_train = model(X_train)

        train_accuracy.append(get_accuracy(output_train, y_train))

        
        loss = criterion(output_train, y_train)
        train_loss.append(loss.item())

        
        optimizer.zero_grad()

        
        loss.backward()

        
        optimizer.step()

        with torch.no_grad():
            output_test = model(X_test)
            test_accuracy.append(get_accuracy(output_test, y_test))

        if (epoch + 1) % 5 == 0:
            print(f"Epoch {epoch+1}/{num_epochs}, Train Loss: {loss.item():.4f}, Train Accuracy: {sum(train_accuracy)/len(train_accuracy):.2f}, Test Accuracy: {sum(test_accuracy)/len(test_accuracy):.2f}")

    return train_loss, train_accuracy, test_accuracy

Creating model, optimizer, and loss function object.

input_dim  = 4
output_dim = 3
learning_rate = 0.01

model = PyTorch_NN(input_dim, output_dim)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

1. Monitoring PyTorch loss in the notebook

Now you must have noticed the print statements in the train_network function to monitor the loss as well as accuracy. This is one way to do it.

X_train = torch.FloatTensor(X_train)
X_test = torch.FloatTensor(X_test)
y_train = torch.LongTensor(y_train)
y_test = torch.LongTensor(y_test)

train_loss, train_accuracy, test_accuracy = train_network(model=model, optimizer=optimizer, criterion=criterion, X_train=X_train, y_train=y_train, X_test=X_test, y_test=y_test, num_epochs=100)

We get an output like this.

If we want we can also plot these values using Matplotlib.

fig, (ax1, ax2, ax3) = plt.subplots(3, figsize=(12, 6), sharex=True)

ax1.plot(train_accuracy)
ax1.set_ylabel("training accuracy")

ax2.plot(train_loss)
ax2.set_ylabel("training loss")

ax3.plot(test_accuracy)
ax3.set_ylabel("test accuracy")

ax3.set_xlabel("epochs")

We would see a graph like this indicating the correlation between loss and accuracy.

This method is not bad and does the job. But we must remember that the more complex our problem statement and model get, the more sophisticated monitoring technique it would require.

2. Monitoring PyTorch loss with neptune.ai

A simpler way to monitor your metrics would be to log them in a service like Neptune, and focus on more important tasks, such as building and training the model.

To do this, we just need to follow a couple of small steps.

Note: For the most up-to-date code examples, please refer to the Neptune-PyTorch integration docs.

First, let’s install the required stuff.

pip install neptune-client

Now let’s initialize a Neptune run.

import neptune.new as neptune

run = neptune.init_run()

We can also assign config variables such as:

run["config/model"] = type(model).__name__
run["config/criterion"] = type(criterion).__name__
run["config/optimizer"] = type(optimizer).__name__

Here’s how it looks in the UI.

Finally, we can log our loss by adding just a couple of lines to our train_network function. Notice the ‘run’ associated lines.

def train_network(model, optimizer, criterion, X_train, y_train, X_test, y_test, num_epochs):
    train_loss=[]
    train_accuracy=[]
    test_accuracy=[]

    for epoch in range(num_epochs):

        
        output_train = model(X_train)

        acc = get_accuracy(output_train, y_train)
        train_accuracy.append(acc)
        run["training/epoch/accuracy"].log(acc)

        
        loss = criterion(output_train, y_train)
        run["training/epoch/loss"].log(loss)

        train_loss.append(loss.item())

        
        optimizer.zero_grad()

        
        loss.backward()

        
        optimizer.step()

        with torch.no_grad():
            output_test = model(X_test)
            test_acc = get_accuracy(output_test, y_test)
            test_accuracy.append(test_acc)

            run["test/epoch/accuracy"].log(test_acc)

        if (epoch + 1) % 5 == 0:
            print(f"Epoch {epoch+1}/{num_epochs}, Train Loss: {loss.item():.4f}, Train Accuracy: {sum(train_accuracy)/len(train_accuracy):.2f}, Test Accuracy: {sum(test_accuracy)/len(test_accuracy):.2f}")

    return train_loss, train_accuracy, test_accuracy

Here’s what we get in the dashboard. Absolutely seamless.

You can view this run here in the Neptune app. Needless to say, you can do this with any loss function.

Note: For the most up-to-date code examples, please refer to the Neptune-PyTorch integration docs.

Final thoughts

We went through the most common loss functions in PyTorch. You can choose any function that will fit your project, or create your own custom function.

Hopefully, this article will serve as your quick start guide to using PyTorch loss functions in your machine learning tasks.

If you want to immerse yourself more deeply into the subject or learn about other loss functions, you can visit the PyTorch official documentation.

Источник

Mean squared error is computed as the mean of the squared differences between the input and target (predicted and actual) values. To compute the mean squared error in PyTorch, we apply the MSELoss() function provided by the torch.nn module. It creates a criterion that measures the mean squared error. It is also known as the squared L2 norm.

Both the actual and predicted values are torch tensors having the same number of
elements. Both tensors may have any number of dimensions. This function returns
a tensor of a scalar value. It is a type of loss function provided by the torch.nn
module. The loss functions are used to optimize a deep neural network by
minimizing the loss.

Syntax

torch.nn.MSELoss()

Steps

To measure the mean squared error, one could follow the steps given below

Import the required library. In all the following examples, the required Python library is torch. Make sure you have already installed it.

import torch

Create the input and target tensors and print them.

input = torch.tensor([0.10, 0.20, 0.40, 0.50])
target = torch.tensor([0.09, 0.2, 0.38, 0.52])

Create a criterion to measure the mean squared error

mse = nn.MSELoss()

Compute the mean squared error (loss) and print it.

output = mse(input, target)
print("MSE loss:", output)

Example 1

In this program, we measure the mean squared error between the input and target
tensors. Both the input and target tensors are 1D torch tensors.

# Import the required libraries
import torch
import torch.nn as nn

# define the input and target tensors
input = torch.tensor([0.10, 0.20, 0.40, 0.50])
target = torch.tensor([0.09, 0.2, 0.38, 0.52])

# print input and target tensors
print("Input Tensor:
", input)
print("Target Tensor:
", target)

# create a criterion to measure the mean squared error
mse = nn.MSELoss()

# compute the loss (mean squared error)
output = mse(input, target)

# output.backward()
print("MSE loss:", output)

Output

Input Tensor:
   tensor([0.1000, 0.2000, 0.4000, 0.5000])
Target Tensor:
   tensor([0.0900, 0.2000, 0.3800, 0.5200])
MSE loss: tensor(0.0002)

Notice that the mean squared error is a scalar value.

Example 2

In this program, we measure the mean squared error between the input and target
tensors. Both the input and target tensors are 2D torch tensors.

# Import the required libraries
import torch
import torch.nn as nn

# define the input and target tensors
input = torch.randn(3, 4)
target = torch.randn(3, 4)

# print input and target tensors
print("Input Tensor:
", input)
print("Target Tensor:
", target)

# create a criterion to measure the mean squared error
mse = nn.MSELoss()

# compute the loss (mean squared error)
output = mse(input, target)

# output.backward()
print("MSE loss:", output)

Output

Input Tensor:
   tensor([[-1.6413, 0.8950, -1.0392, 0.2382],
      [-0.3868, 0.2483, 0.9811, -0.9260],
      [-0.0263, -0.0911, -0.6234, 0.6360]])
Target Tensor:
   tensor([[-1.6068, 0.7233, -0.0925, -0.3140],
      [-0.4978, 1.3121, -1.4910, -1.4643],
      [-2.2589, 0.3073, 0.2038, -1.5656]])
MSE loss: tensor(1.6209)

Example 3

In this program, we measure the mean squared error between the input and target tensors. Both the input and target tensors are 2D torch tensors. The input tensor takes the parameter requires_grad=true. So, we also compute the gradients of
this function w.r.t. the input tensor.

# Import the required libraries
import torch
import torch.nn as nn

# define the input and target tensors
input = torch.randn(4, 5, requires_grad = True)
target = torch.randn(4, 5)

# print input and target tensors
print("Input Tensor:
", input)
print("Target Tensor:
", target)

# create a criterion to measure the mean squared error
loss = nn.MSELoss()

# compute the loss (mean squared error)
output = loss(input, target)
output.backward()
print("MSE loss:", output)
print("input.grad:
", input.grad)

Output

Input Tensor:
   tensor([[ 0.1813, 0.4199, 1.1768, -0.7068, 0.2960],
      [ 0.7950, 0.0945, -0.0954, -1.0170, -0.1471],
      [ 1.2264, 1.7573, 0.9099, 1.3720, -0.9087],
      [-1.0122, -0.8649, -0.7797, -0.7787, 0.9944]],
      requires_grad=True)
Target Tensor:
   tensor([[-0.6370, -0.8421, 1.2474, 0.4363, -0.1481],
      [-0.1500, -1.3141, 0.7349, 0.1184, -2.7065],
      [-1.0776, 1.3530, 0.6939, -1.3191, 0.7406],
      [ 0.2058, 0.4765, 0.0695, 1.2146, 1.1519]])
MSE loss: tensor(1.9330, grad_fn=<MseLossBackward>)
input.grad:
   tensor([[ 0.0818, 0.1262, -0.0071, -0.1143, 0.0444],
      [ 0.0945, 0.1409, -0.0830, -0.1135, 0.2559],
      [ 0.2304, 0.0404, 0.0216, 0.2691, -0.1649],
      [-0.1218, -0.1341, -0.0849, -0.1993, -0.0158]])

Источник

Содержание

How to measure the mean squared error(squared L2 norm) in PyTorch?
Syntax
Steps
Example 1
Output
Example 2
MSELoss¶
Mean Squared Error (MSE)В¶
Module InterfaceВ¶
Functional InterfaceВ¶
PyTorch MSELoss – Detailed Guide
PyTorch MSELoss
PyTorch MSELoss code
PyTorch MSELoss implementation
PyTorch MSELoss Weighted
PyTorch MSELoss ignore index
PyTorch MSELoss batch size
PyTorch MSELoss with regularization
torch.nn¶
Containers¶
Convolution Layers¶
Pooling layers¶
Padding Layers¶
Non-linear Activations (weighted sum, nonlinearity)¶
Non-linear Activations (other)¶
Normalization Layers¶
Recurrent Layers¶
Transformer Layers¶
Linear Layers¶
Dropout Layers¶
Sparse Layers¶
Distance Functions¶
Loss Functions¶
Vision Layers¶
Shuffle Layers¶
DataParallel Layers (multi-GPU, distributed)¶
Utilities¶
Quantized Functions¶
Lazy Modules Initialization¶
Tutorials
Resources

How to measure the mean squared error(squared L2 norm) in PyTorch?

Both the actual and predicted values are torch tensors having the same number of elements. Both tensors may have any number of dimensions. This function returns a tensor of a scalar value. It is a type of loss function provided by the torch.nn module. The loss functions are used to optimize a deep neural network by minimizing the loss.

Syntax

Steps

To measure the mean squared error, one could follow the steps given below

Import the required library. In all the following examples, the required Python library is torch. Make sure you have already installed it.

Create the input and target tensors and print them.

Create a criterion to measure the mean squared error

Compute the mean squared error (loss) and print it.

Example 1

In this program, we measure the mean squared error between the input and target tensors. Both the input and target tensors are 1D torch tensors.

Output

Notice that the mean squared error is a scalar value.

Example 2

In this program, we measure the mean squared error between the input and target tensors. Both the input and target tensors are 2D torch tensors.

Источник

MSELoss¶

Creates a criterion that measures the mean squared error (squared L2 norm) between each element in the input x x x and target y y y .

The unreduced (i.e. with reduction set to ‘none’ ) loss can be described as:

The mean operation still operates over all the elements, and divides by n n n .

size_average (bool, optional) – Deprecated (see reduction ). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field size_average is set to False , the losses are instead summed for each minibatch. Ignored when reduce is False . Default: True

reduce (bool, optional) – Deprecated (see reduction ). By default, the losses are averaged or summed over observations for each minibatch depending on size_average . When reduce is False , returns a loss per batch element instead and ignores size_average . Default: True

reduction (str, optional) – Specifies the reduction to apply to the output: ‘none’ | ‘mean’ | ‘sum’ . ‘none’ : no reduction will be applied, ‘mean’ : the sum of the output will be divided by the number of elements in the output, ‘sum’ : the output will be summed. Note: size_average and reduce are in the process of being deprecated, and in the meantime, specifying either of those two args will override reduction . Default: ‘mean’

Источник

Mean Squared Error (MSE)В¶

Module InterfaceВ¶

Where is a tensor of target values, and is a tensor of predictions.

squaredВ¶ ( bool ) вЂ“ If True returns MSE value, if False returns RMSE value.

kwargsВ¶ ( Any ) вЂ“ Additional keyword arguments, see Advanced metric settings for more info.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

Computes mean squared error over state.

Update state with predictions and targets.

predsВ¶ ( Tensor ) вЂ“ Predictions from model

targetВ¶ ( Tensor ) вЂ“ Ground truth values

Functional InterfaceВ¶

Computes mean squared error.

predsВ¶ ( Tensor ) вЂ“ estimated labels

targetВ¶ ( Tensor ) вЂ“ ground truth labels

squaredВ¶ ( bool ) вЂ“ returns RMSE value if set to False

Tensor with MSE

Источник

PyTorch MSELoss – Detailed Guide

In this Python tutorial, we will learn about PyTorch MSELoss in Python and we will also cover different examples related to it. And, we will cover these topics.

PyTorch MSELoss
PyTorch MSELoss code
PyTorch MSELoss implementation
PyTorch MSELoss weighted
PyTorch MSELoss nan
PyTorch MSELoss batch size
PyTorch MSELoss with regularization

Table of Contents

PyTorch MSELoss

In this section, we will learn about how PyTorch MSELoss works in python.

Before moving forward we should have a piece of knowledge about MSELoss.

Mse stands for mean square error which is the most commonly used loss function for regression.
The loss is the mean supervised data square difference between true and predicted values.
PyTorch MSELoss is a process that measures the average of the square difference between actual value and predicted value.

Code:

In the following code, we will import the torch library for which we can measure the mean square error between each element.

nn.MSELoss() is used to measure the mean square error.
torch.randn(5, 7, requires_grad=True) is used to generate the random numbers.
loss(inputs, targets) is used to calculate the loss.
print(output) is used to print the output on the screen.

Output:

After running the above code, we get the following output in which we can see that the mean square error loss is printed on the screen.

PyTorch MSELoss code

In this section, we will learn about the PyTorch MSELoss code in python.

PyTorch MSELoss code is defined as a process to measure the average of the square difference between the actual value and predicted value.

Code:

In the following code, we will import a torch module from which we can calculate the MSELoss.

torch.randn(2, 4, requires_grad=True) is used to generate the random numbers.
nn.MSELoss() is used to calculate the mean square error.
mseloss(inputs, targets) is used to get the output value.
print(‘input: ‘, inputs) is used to print the input value.
print(‘target: ‘, targets) is used to print the target value.
print(‘output: ‘, outputs) is used to print the output value.

Output:

In the following output, we can see that the mean square error loss value is printed on the screen.

PyTorch MSELoss implementation

In this section, we will learn about how to implement the MSELoss in python.

PyTorch MSELoss implementation is done to calculate the average of the square difference between the actual value and the predicted value.

Code:

In the following code, we will import the torch module from which we implement the MSELoss.

target = torch.randn(3, 5) is used to generate the random numbers for the target variable.
nn.MSELoss() is used to calculate the mean square loss error.
mseloss(input, target) is used to calculate the output value.
print(‘input_value : ‘, input) is used to print the input variables.
print(‘target_value : ‘, target) is used to print the target variables.
print(‘output_value : ‘, outputval) is used to print the output value.

Output:

After running the above code, we get the following output where we get the average square difference between the input value. And the target value is printed on the screen.

PyTorch MSELoss Weighted

In this section, we will learn about Pytorch MSELoss weighted in Python.

PyTorch MSELoss weighted is defined as the process to calculate the mean of the square difference between the input variable and target variable.
The MSELoss is most commonly used for regression and in linear regression, every target variable is evaluated to be a weighted sum of the input variable.

Code:

In the following code, we will import some libraries from which we can calculate the mean of the square difference between the input variable and target variable.

torch.from_numpy(input_vals) is used as an input variables.
torch.from_numpy(target_vals) is used as an target variables.
print(input_vals) is used to print the input variables.
print(target_vals) is used to print the target variables.
weights = torch.randn(4, 5, requires_grad=True) is used to generate the random weights.
biases = torch.randn(4, requires_grad=True) is used to generate the random biases.
predictions = model(input_vals) is used to make predictions.

Output:

After running the above code, we get the following output in which we can see that PyTorch MSELoss weighted is printed on the screen.

PyTorch MSELoss ignore index

In this section, we will learn about the PyTorch MSELoss ignore index in python.

Before moving forward we should have a piece of knowledge about ignore index. Ignore index is a parameter that specifies a target variable that is ignored and does not donate to the input gradients.

Syntax:

Parameters:

size_average is the losses are mean over every loss element in the batch.
reduce is the losses are mean and summed over observation for each mini-batch depending upon size_average.
ignore_index is a parameter that specifies a target variable that is ignored and does not donate to input gradients.
reduction is that specifies the reductions is applied to the output.

PyTorch MSELoss batch size

In this section, we will learn about how to implement PyTorch MSELoss batch size in python.

PyTorch MSELoss batch size is defined as the process in which a number of training examples are used in one iteration.

Code:

In the following code, we will import some libraries from which we can maintain the batch size.

torch.from_numpy(input_var) is used as an input variables.
torch.from_numpy(target_var) is used as target variables.
train_dataset = TensorDataset(input_var, target_var) is used to train dataset.
batchsize = 3 is used to define size of batch.
train_dl = DataLoader(train_dataset, batchsize, shuffle=True) is used to load the data.
nn.Linear(3, 2) is used to create the feed forward network.
print(model.weight) is used to printthe model weights.
predictions = model(input_var) is used to generate the predictions.

Output:

In the following output, we break the large dataset into smaller ones with a batch size of 3, and the PyTorch mse bach size values are printed on the screen.

PyTorch MSELoss with regularization

In this section, we will learn about PyTorch MSELoss with regularization in python.

PyTorch MSELoss with regularization is a process that puts sum of the absolute values of all the weights in the model.

Code:

In the following code, we will import the torch library from which we can calculate the PyTorch MSELoss with regularization.

nn.MSELoss() is used to calculate the mean square error loss.
input_var = torch.randn(10, 12, requires_grad=True) is used to generate the random input variables.
target_var = torch.randn(10, 12) is used to generate the random target variables.
MSE_loss(input_var, target_var) is used to puts the sum of the absolute value of weights.
print(“Output: “, output_var) is used to print the output variables.

Output:

After running the above code we get the following output in which we can see that the PyTorch MSELoss regularization variable is printed on the screen.

You may like the following PyTorch tutorials:

So, in this tutorial, we discussed PyTorch MSELoss and we have also covered different examples related to its implementation. Here is the list of examples that we have covered.

PyTorch MSELoss
PyTorch MSELoss code
PyTorch MSELoss implementation
PyTorch MSELoss weighted
PyTorch MSELoss nan
PyTorch MSELoss batch size
PyTorch MSELoss with regularization

Python is one of the most popular languages in the United States of America. I have been working with Python for a long time and I have expertise in working with various libraries on Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… I have experience in working with various clients in countries like United States, Canada, United Kingdom, Australia, New Zealand, etc. Check out my profile.

Источник

torch.nn¶

These are the basic building blocks for graphs:

A kind of Tensor that is to be considered a module parameter.

A parameter that is not initialized.

A buffer that is not initialized.

Containers¶

Base class for all neural network modules.

A sequential container.

Holds submodules in a list.

Holds submodules in a dictionary.

Holds parameters in a list.

Holds parameters in a dictionary.

Global Hooks For Module

Registers a forward pre-hook common to all modules.

Registers a global forward hook for all the modules

Registers a backward hook common to all the modules.

Convolution Layers¶

Applies a 1D convolution over an input signal composed of several input planes.

Applies a 2D convolution over an input signal composed of several input planes.

Applies a 3D convolution over an input signal composed of several input planes.

Applies a 1D transposed convolution operator over an input image composed of several input planes.

Applies a 2D transposed convolution operator over an input image composed of several input planes.

Applies a 3D transposed convolution operator over an input image composed of several input planes.

A torch.nn.Conv1d module with lazy initialization of the in_channels argument of the Conv1d that is inferred from the input.size(1) .

A torch.nn.Conv2d module with lazy initialization of the in_channels argument of the Conv2d that is inferred from the input.size(1) .

A torch.nn.Conv3d module with lazy initialization of the in_channels argument of the Conv3d that is inferred from the input.size(1) .

A torch.nn.ConvTranspose1d module with lazy initialization of the in_channels argument of the ConvTranspose1d that is inferred from the input.size(1) .

A torch.nn.ConvTranspose2d module with lazy initialization of the in_channels argument of the ConvTranspose2d that is inferred from the input.size(1) .

A torch.nn.ConvTranspose3d module with lazy initialization of the in_channels argument of the ConvTranspose3d that is inferred from the input.size(1) .

Extracts sliding local blocks from a batched input tensor.

Combines an array of sliding local blocks into a large containing tensor.

Pooling layers¶

Applies a 1D max pooling over an input signal composed of several input planes.

Applies a 2D max pooling over an input signal composed of several input planes.

Applies a 3D max pooling over an input signal composed of several input planes.

Computes a partial inverse of MaxPool1d .

Computes a partial inverse of MaxPool2d .

Computes a partial inverse of MaxPool3d .

Applies a 1D average pooling over an input signal composed of several input planes.

Applies a 2D average pooling over an input signal composed of several input planes.

Applies a 3D average pooling over an input signal composed of several input planes.

Applies a 2D fractional max pooling over an input signal composed of several input planes.

Applies a 3D fractional max pooling over an input signal composed of several input planes.

Applies a 1D power-average pooling over an input signal composed of several input planes.

Applies a 2D power-average pooling over an input signal composed of several input planes.

Applies a 1D adaptive max pooling over an input signal composed of several input planes.

Applies a 2D adaptive max pooling over an input signal composed of several input planes.

Applies a 3D adaptive max pooling over an input signal composed of several input planes.

Applies a 1D adaptive average pooling over an input signal composed of several input planes.

Applies a 2D adaptive average pooling over an input signal composed of several input planes.

Applies a 3D adaptive average pooling over an input signal composed of several input planes.

Padding Layers¶

Pads the input tensor using the reflection of the input boundary.

Pads the input tensor using replication of the input boundary.

Pads the input tensor boundaries with zero.

Pads the input tensor boundaries with a constant value.

Non-linear Activations (weighted sum, nonlinearity)¶

Applies the Exponential Linear Unit (ELU) function, element-wise, as described in the paper: Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs).

Applies the Hard Shrinkage (Hardshrink) function element-wise.

Applies the Hardsigmoid function element-wise.

Applies the HardTanh function element-wise.

Applies the Hardswish function, element-wise, as described in the paper: Searching for MobileNetV3.