I am using visual studio 2019 to build .NET CORE web api.
When i build the application with docker via VS2019, i am getting the error:
«Docker command failed with exit code 125».
Any idea what is the cause?
This is the Output:
1>d2c0ee48dbf44507b3dee44a17ab8b8186fe4ec59c283af834191e8f3c902f1a
1>C:Users97254.nugetpackagesmicrosoft.visualstudio.azure.containers.tools.targets1.9.5buildContainer.targets(196,5): error CTC1015: Docker command failed with exit code 125.
1>C:Users97254.nugetpackagesmicrosoft.visualstudio.azure.containers.tools.targets1.9.5buildContainer.targets(196,5): error CTC1015: docker: Error response from daemon: hcsshim::CreateComputeSystem d2c0ee48dbf44507b3dee44a17ab8b8186fe4ec59c283af834191e8f3c902f1a: The request is not supported.
1>C:Users97254.nugetpackagesmicrosoft.visualstudio.azure.containers.tools.targets1.9.5buildContainer.targets(196,5): error CTC1015: (extra info: {"SystemType":"Container","Name":"d2c0ee48dbf44507b3dee44a17ab8b8186fe4ec59c283af834191e8f3c902f1a","Owner":"docker","IgnoreFlushesDuringBoot":true,"LayerFolderPath":"C:\ProgramData\Docker\windowsfilter\d2c0ee48dbf44507b3dee44a17ab8b8186fe4ec59c283af834191e8f3c902f1a","Layers":[{"ID":"51d2ce01-5190-5398-8d36-73a41302a60e","Path":"C:\ProgramData\Docker\windowsfilter\47c9023ce74aa96d2e5002cdf0f7e354f5de55b217e08498dda14e1be5d6998f"},{"ID":"c6ab7d12-aab5-5873-9f2e-0ec11613a58d","Path":"C:\ProgramData\Docker\windowsfilter\035fac58f721cc9c158ef24fdb84a7e74eb4eea2cf29421a3744241cc62eabe7"},{"ID":"cdf32ccb-53d2-56ad-8b1d-9dfa6ae076d7","Path":"C:\ProgramData\Docker\windowsfilter\7ac67c0c7c4a6dfc2c0becbc948078b48873f003c49f16c1d0be0d69b179d3b3"},{"ID":"4c8a0736-dba8-5fde-8cc0-56aa33e0149d","Path":"C:\ProgramData\Docker\windowsfilter\43ad281ee856dabf07411276e5774b386bd37aee8b3099c3b7faa1d314a7013e"},{"ID":"af791769-1fd1-5170-b6e4-2245156a8f6f","Path":"C:\ProgramData\Docker\windowsfilter\878625f6c364e37ff07532212a6979f096de46d9eb455c964366ecd5a9c03ba9"},{"ID":"082795f2-b562-5088-a34f-91d16d7e5c36","Path":"C:\ProgramData\Docker\windowsfilter\4dbda25002ed56956709c11b4cc902083e712fc8a862b2b62970e839ec2bffec"},{"ID":"e409f795-d9cf-539a-ac95-dbedc3506ccb","Path":"C:\ProgramData\Docker\windowsfilter\7e2f83b59544b3cf2e553cdd9d94dd27a4684360c23f44d93a2b39e5dd0301cb"},{"ID":"a5fdd7a2-0ea0-553a-9c1e-976a729321e3","Path":"C:\ProgramData\Docker\windowsfilter\27f0f73d07810d0877a35fc1215e5336e6034eba1c08974f2f308796c9a32890"},{"ID":"b0175521-e8e7-55e8-97a8-77d96d2bb78a","Path":"C:\ProgramData\Docker\windowsfilter\03bb0041548802485ca5cc3a0475fde8905f048e50eb6b43413b1796b34773ad"},{"ID":"85e06fce-32e5-5cb6-8678-a4750186c146","Path":"C:\ProgramData\Docker\windowsfilter\8f5d06ad16da8edecc28898e8cda1ce15e4087e16e1b9a5d2d7a4d10a2c55398"}],"HostName":"d2c0ee48dbf4","MappedDirectories":[{"HostPath":"c:\users\97254\onecoremsvsmon\16.3.0039.0","ContainerPath":"c:\remote_debugger","ReadOnly":true,"BandwidthMaximum":0,"IOPSMaximum":0,"CreateInUtilityVM":false},{"HostPath":"c:\cheggtest\qfetcherapi","ContainerPath":"c:\app","ReadOnly":false,"BandwidthMaximum":0,"IOPSMaximum":0,"CreateInUtilityVM":false},{"HostPath":"c:\cheggtest","ContainerPath":"c:\src","ReadOnly":false,"BandwidthMaximum":0,"IOPSMaximum":0,"CreateInUtilityVM":false},{"HostPath":"c:\users\97254\.nuget\packages","ContainerPath":"c:\.nuget\fallbackpackages2","ReadOnly":false,"BandwidthMaximum":0,"IOPSMaximum":0,"CreateInUtilityVM":false},{"HostPath":"c:\program files\dotnet\sdk\nugetfallbackfolder","ContainerPath":"c:\.nuget\fallbackpackages","ReadOnly":false,"BandwidthMaximum":0,"IOPSMaximum":0,"CreateInUtilityVM":false}],"HvPartition":true,"EndpointList":["766EA888-2541-4D88-B330-EBD3ECA2FF64"],"HvRuntime":{"ImagePath":"C:\ProgramData\Docker\windowsfilter\8f5d06ad16da8edecc28898e8cda1ce15e4087e16e1b9a5d2d7a4d10a2c55398\UtilityVM"},"AllowUnqualifiedDNSQuery":true}).
Содержание
- Exit Codes In Containers & Kubernetes – The Complete Guide
- Learning objective
- What are Container Exit Codes
- The Container Lifecycle
- Detect and fix errors 5x faster
- Understanding Container Exit CodesExit Code 0: Purposely Stopped
- Understanding Container Exit Codes
- Exit Code 0: Purposely Stopped
- Exit Code 1: Application Error
- Exit Code 125: Container Failed to Run
- Exit Code 126: Command Invoke Error
- Exit Code 127: File or Directory Not Found
- Exit Code 128: Invalid Argument Used on Exit
- Exit Code 134: Abnormal Termination (SIGABRT)
- Exit Code 137: Immediate Termination (SIGKILL)
- Exit Code 139: Segmentation Fault (SIGSEGV)
- Exit Code 143: Graceful Termination (SIGTERM)
- Exit Code 1: Application Error
- Exit Code 125
- Exit Code 125: Container Failed to Run
- Exit Code 126: Command Invoke Error
- Exit Code 127: File or Directory Not Found
- Exit Code 128: Invalid Argument Used on Exit
- Exit Code 134: Abnormal Termination (SIGABRT)
- Exit Code 137: Immediate Termination (SIGKILL)
- Exit Code 139: Segmentation Fault (SIGSEGV)
- Exit Code 143: Graceful Termination (SIGTERM)
- Exit Code 255: Exit Status Out Of Range
- Which Kubernetes Errors are Related to Container Exit Codes?
- Troubleshooting Kubernetes Pod Termination with Komodor
- See Our Additional Guides on Key Observability Topics
Exit Codes In Containers & Kubernetes – The Complete Guide
Learning objective
What are Container Exit Codes
Exit codes are used by container engines, when a container terminates, to report why it was terminated.
If you are a Kubernetes user, container failures are one of the most common causes of pod exceptions, and understanding container exit codes can help you get to the root cause of pod failures when troubleshooting.
The most common exit codes used by containers are:
Code # | Name | What it means |
---|---|---|
Exit Code 0 | Purposely stopped | Used by developers to indicate that the container was automatically stopped |
Exit Code 1 | Application error | Container was stopped due to application error or incorrect reference in the image specification |
Exit Code 125 | Container failed to run error | The docker run command did not execute successfully |
Exit Code 126 | Command invoke error | A command specified in the image specification could not be invoked |
Exit Code 127 | File or directory not found | File or directory specified in the image specification was not found |
Exit Code 128 | Invalid argument used on exit | Exit was triggered with an invalid exit code (valid codes are integers between 0-255) |
Exit Code 134 | Abnormal termination (SIGABRT) | The container aborted itself using the abort() function. |
Exit Code 137 | Immediate termination (SIGKILL) | Container was immediately terminated by the operating system via SIGKILL signal |
Exit Code 139 | Segmentation fault (SIGSEGV) | Container attempted to access memory that was not assigned to it and was terminated |
Exit Code 143 | Graceful termination (SIGTERM) | Container received warning that it was about to be terminated, then terminated |
Exit Code 255 | Exit Status Out Of Range | Container exited, returning an exit code outside the acceptable range, meaning the cause of the error is not known |
Below we’ll explain how to troubleshoot failed containers on a self-managed host and in Kubernetes, and provide more details on all of the exit codes listed above.
This is part of an extensive series of guides about Observability.
The Container Lifecycle
To better understand the causes of container failure, let’s discuss the lifecycle of a container first. Taking Docker as an example – at any given time, a Docker container can be in one of several states:
- Created – the Docker container is created but not started yet (this is the status after running docker create, but before actually running the container)
- Up – the Docker container is currently running. This means the operating system process managed by the container is running. This happens when you use the commands docker start or docker run can happen using docker start or docker run.
- Paused – the container process was running, but Docker purposely paused the container. Typically this happens when you run the Docker pause command
- Exited – the Docker container has been terminated, usually because the container’s process was killed
When a container reaches the Exited status, Docker will report an exit code in the logs, to inform you what happened to the container that caused it to shut down.
Detect and fix errors 5x faster
Komodor monitors your entire K8s stack, identifies issues, and uncovers their root cause.
Understanding Container Exit CodesExit Code 0: Purposely Stopped
Understanding Container Exit Codes
Below we cover each of the exit codes in more detail.
Exit Code 0: Purposely Stopped
Exit Code 0 is triggered by developers when they purposely stop their container after a task completes. Technically, Exit Code 0 means that the foreground process is not attached to a specific container.
What to do if a container terminated with Exit Code 0?
- Check the container logs to identify which library caused the container to exit
- Review the code of the existing library and identify why it triggered Exit Code 0, and whether it is functioning correctly
Exit Code 1: Application Error
Exit Code 1 indicates that the container was stopped due to one of the following:
- An application error – this could be a simple programming error in code run by the container, such as “divide by zero”, or advanced errors related to the runtime environment, such as Java, Python, etc
- An invalid reference – this means the image specification refers to a file that does not exist in the container image
What to do if a container terminated with Exit Code 1?
- Check the container log to see if one of the files listed in the image specification could not be found. If this is the issue, correct the image specification to point to the correct path and filename.
- If you cannot find an incorrect file reference, check the container logs for an application error, and debug the library that caused the error.
Exit Code 125: Container Failed to Run
Exit Code 125 means that the command is used to run the container. For example docker run was invoked in the system shell but did not execute successfully. Here are common reasons this might happen:
- An undefined flag was used in the command, for example docker run —abcd
- The user-defined in the image specification does not have sufficient permissions on the machine
- Incompatibility between the container engine and the host operating system or hardware
What to do if a container terminated with Exit Code 125?
- Check if the command used to run the container uses the proper syntax
- Check if the user running the container, or the context in which the command is executed in the image specification, has sufficient permissions to create containers on the host
- If your container engine provides other options for running a container, try them. For example, in Docker, try docker start instead of docker run
- Test if you are able to run other containers on the host using the same username or context. If not, reinstall the container engine, or resolve the underlying compatibility issue between the container engine and the host setup
Exit Code 126: Command Invoke Error
Exit Code 126 means that a command used in the container specification could not be invoked. This is often the cause of a missing dependency or an error in a continuous integration script used to run the container.
What to do if a container terminated with Exit Code 126?
- Check the container logs to see which command could not be invoked
- Try running the container specification without the command to ensure you isolate the problem
- Troubleshoot the command to ensure you are using the correct syntax and all dependencies are available
- Correct the container specification and retry running the container
Exit Code 127: File or Directory Not Found
Exit Code 127 means a command specified in the container specification refers to a non-existent file or directory.
What to do if a container terminated with Exit Code 127?
Same as Exit Code 126, identify the failing command and make sure you reference a valid filename and file path available within the container image.
Exit Code 128: Invalid Argument Used on Exit
Exit Code 128 means that code within the container triggered an exit command, but did not provide a valid exit code. The Linux exit command only allows integers between 0-255, so if the process was exited with, for example, exit code 3.5 , the logs will report Exit Code 128.
What to do if a container terminated with Exit Code 128?
- Check the container logs to identify which library caused the container to exit.
- Identify where the offending library uses the exit command, and correct it to provide a valid exit code.
Exit Code 134: Abnormal Termination (SIGABRT)
Exit Code 134 means that the container abnormally terminated itself, closed the process and flushed open streams. This operation is irreversible, like SIGKILL (see Exit Code 137 below). A process can trigger SIGABRT by doing one of the following:
Calling the abort() function in the libc library
Calling the assert() macro, used for debugging. The process is then aborted if the assertion is false.
What to do if a container terminated with Exit Code 134?
- Check container logs to see which library triggered the SIGABRT signal
- Check if process abortion was planned (for example because the library was in debug mode), and if not, troubleshoot the library and modify it to avoid aborting the container.
Exit Code 137: Immediate Termination (SIGKILL)
Exit Code 137 means that the container has received a SIGKILL signal from the host operating system. This signal instructs a process to terminate immediately, with no grace period. This can be either:
- Triggered when a container is killed via the container engine, for example when using the docker kill command
- Triggered by a Linux user sending a kill -9 command to the process
- Triggered by Kubernetes after attempting to terminate a container and waiting for a grace period of 30 seconds (by default)
- Triggered automatically by the host, usually due to running out of memory. In this case, the docker inspect command will indicate an OOMKilled error.
What to do if a container terminated with Exit Code 137?
- Check logs on the host to see what happened prior to the container terminating, and whether it previously received a SIGTERM signal (graceful termination) before receiving SIGKILL
- If there was a prior SIGTERM signal, check if your container process handles SIGTERM and is able to gracefully terminate
- If there was no SIGTERM and the container reported an OOMKilled error, troubleshoot memory issues on the host
Exit Code 139: Segmentation Fault (SIGSEGV)
Exit Code 139 means that the container received a SIGSEGV signal from the operating system. This indicates a segmentation error – a memory violation, caused by a container trying to access a memory location to which it does not have access. There are three common causes of SIGSEGV errors:
- Coding error—container process did not initialize properly, or it tried to access memory through a pointer to previously freed memory
- Incompatibility between binaries and libraries—container process runs a binary file that is not compatible with a shared library, and thus may try to access inappropriate memory addresses
- Hardware incompatibility or misconfiguration—if you see multiple segmentation errors across multiple libraries, there may be a problem with memory subsystems on the host or a system configuration issue
What to do if a container terminated with Exit Code 139?
- Check if the container process handles SIGSEGV. On both Linux and Windows, you can handle a container’s response to segmentation violations. For example, the container can collect and report a stack trace
- If you need to further troubleshoot SIGSEGV, you may need to set the operating system to allow programs to run even after a segmentation fault occurs, to allow for investigation and debugging. Then, try to intentionally cause a segmentation violation and debug the library causing the issue
- If you cannot replicate the issue, check memory subsystems on the host and troubleshoot memory configuration
Exit Code 143: Graceful Termination (SIGTERM)
Exit Code 143 means that the container received a SIGTERM signal from the operating system, which asks the container to gracefully terminate, and the container succeeded in gracefully terminating (otherwise you will see Exit Code 137). This exit code can be:
- Triggered by the container engine stopping the container, for example when using the docker stop or docker-compose down commands
- Triggered by Kubernetes setting a pod to Terminating status, and giving containers a 30 second period to gracefully shut down
What to do if a container terminated with Exit Code 143?
Check host logs to see the context in which the operating system sent the SIGTERM signal. If you are using Kubernetes, check the kubelet logs to see if and when the pod was shut down.
In general, Exit Code 143 does not require troubleshooting. It means the container was properly shut down after being instructed to do so by the host.
Exit Code 1: Application Error
Exit Code 1 indicates that the container was stopped due to one of the following:
- An application error – this could be a simple programming error in code run by the container, such as “divide by zero”, or advanced errors related to the runtime environment, such as Java, Python, etc
- An invalid reference – this means the image specification refers to a file that does not exist in the container image
What to do if a container terminated with Exit Code 1?
- Check the container log to see if one of the files listed in the image specification could not be found. If this is the issue, correct the image specification to point to the correct path and filename.
- If you cannot find an incorrect file reference, check the container logs for an application error, and debug the library that caused the error.
Exit Code 125
Exit Code 125: Container Failed to Run
Exit Code 125 means that the command is used to run the container. For example docker run was invoked in the system shell but did not execute successfully. Here are common reasons this might happen:
- An undefined flag was used in the command, for example docker run —abcd
- The user-defined in the image specification does not have sufficient permissions on the machine
- Incompatibility between the container engine and the host operating system or hardware
What to do if a container terminated with Exit Code 125?
- Check if the command used to run the container uses the proper syntax
- Check if the user running the container, or the context in which the command is executed in the image specification, has sufficient permissions to create containers on the host
- If your container engine provides other options for running a container, try them. For example, in Docker, try docker start instead of docker run
- Test if you are able to run other containers on the host using the same username or context. If not, reinstall the container engine, or resolve the underlying compatibility issue between the container engine and the host setup
Exit Code 126: Command Invoke Error
Exit Code 126 means that a command used in the container specification could not be invoked. This is often the cause of a missing dependency or an error in a continuous integration script used to run the container.
What to do if a container terminated with Exit Code 126?
- Check the container logs to see which command could not be invoked
- Try running the container specification without the command to ensure you isolate the problem
- Troubleshoot the command to ensure you are using the correct syntax and all dependencies are available
- Correct the container specification and retry running the container
Exit Code 127: File or Directory Not Found
Exit Code 127 means a command specified in the container specification refers to a non-existent file or directory.
What to do if a container terminated with Exit Code 127?
Same as Exit Code 126 above, identify the failing command and make sure you reference a valid filename and file path available within the container image.
Exit Code 128: Invalid Argument Used on Exit
Exit Code 128 means that code within the container triggered an exit command, but did not provide a valid exit code. The Linux exit command only allows integers between 0-255, so if the process was exited with, for example, exit code 3.5 , the logs will report Exit Code 128.
What to do if a container terminated with Exit Code 128?
- Check the container logs to identify which library caused the container to exit.
- Identify where the offending library uses the exit command, and correct it to provide a valid exit code.
Exit Code 134: Abnormal Termination (SIGABRT)
Exit Code 134 means that the container abnormally terminated itself, closed the process and flushed open streams. This operation is irreversible, like SIGKILL (see Exit Code 137 below). A process can trigger SIGABRT by doing one of the following:
Calling the abort() function in the libc library
Calling the assert() macro, used for debugging. The process is then aborted if the assertion is false.
What to do if a container terminated with Exit Code 134?
- Check container logs to see which library triggered the SIGABRT signal
- Check if process abortion was planned (for example because the library was in debug mode), and if not, troubleshoot the library and modify it to avoid aborting the container.
Exit Code 137 means that the container has received a SIGKILL signal from the host operating system. This signal instructs a process to terminate immediately, with no grace period. This can be either:
- Triggered when a container is killed via the container engine, for example when using the docker kill command
- Triggered by a Linux user sending a kill -9 command to the process
- Triggered by Kubernetes after attempting to terminate a container and waiting for a grace period of 30 seconds (by default)
- Triggered automatically by the host, usually due to running out of memory. In this case, the docker inspect command will indicate an OOMKilled error.
What to do if a container terminated with Exit Code 137?
- Check logs on the host to see what happened prior to the container terminating, and whether it previously received a SIGTERM signal (graceful termination) before receiving SIGKILL
- If there was a prior SIGTERM signal, check if your container process handles SIGTERM and is able to gracefully terminate
- If there was no SIGTERM and the container reported an OOMKilled error, troubleshoot memory issues on the host
Exit Code 139: Segmentation Fault (SIGSEGV)
Exit Code 139 means that the container received a SIGSEGV signal from the operating system. This indicates a segmentation error – a memory violation, caused by a container trying to access a memory location to which it does not have access. There are three common causes of SIGSEGV errors:
- Coding error—container process did not initialize properly, or it tried to access memory through a pointer to previously freed memory
- Incompatibility between binaries and libraries—container process runs a binary file that is not compatible with a shared library, and thus may try to access inappropriate memory addresses
- Hardware incompatibility or misconfiguration—if you see multiple segmentation errors across multiple libraries, there may be a problem with memory subsystems on the host or a system configuration issue
What to do if a container terminated with Exit Code 139?
- Check if the container process handles SIGSEGV. On both Linux and Windows, you can handle a container’s response to segmentation violations. For example, the container can collect and report a stack trace
- If you need to further troubleshoot SIGSEGV, you may need to set the operating system to allow programs to run even after a segmentation fault occurs, to allow for investigation and debugging. Then, try to intentionally cause a segmentation violation and debug the library causing the issue
- If you cannot replicate the issue, check memory subsystems on the host and troubleshoot memory configuration
Exit Code 143: Graceful Termination (SIGTERM)
Exit Code 143 means that the container received a SIGTERM signal from the operating system, which asks the container to gracefully terminate, and the container succeeded in gracefully terminating (otherwise you will see Exit Code 137). This exit code can be:
- Triggered by the container engine stopping the container, for example when using the docker stop or docker-compose down commands
- Triggered by Kubernetes setting a pod to Terminating status, and giving containers a 30 second period to gracefully shut down
What to do if a container terminated with Exit Code 143?
Check host logs to see the context in which the operating system sent the SIGTERM signal. If you are using Kubernetes, check the kubelet logs to see if and when the pod was shut down.
In general, Exit Code 143 does not require troubleshooting. It means the container was properly shut down after being instructed to do so by the host.
Exit Code 255: Exit Status Out Of Range
When you see exit code 255, it implies the main entrypoint of a container stopped with that status. It means that the container stopped, but it is not known for what reason.
What to do if a container terminated with Exit Code 255?
- If the container is running in a virtual machine, first try removing overlay networks configured on the virtual machine and recreating them.
- If this does not solve the problem, try deleting and recreating the virtual machine, then rerunning the container on it.
- Failing the above, bash into the container and examine logs or other clues about the entrypoint process and why it is failing.
Whenever containers fail within a pod, or Kubernetes instructs a pod to terminate for any reason, containers will shut down with exit codes. Identifying the exit code can help you understand the underlying cause of a pod exception.
You can use the following command to view pod errors: kubectl describe pod [name]
The result will look something like this:
Use the Exit Code provided by kubectl to troubleshoot the issue:
- If the Exit Code is 0 – the container exited normally, no troubleshooting is required
- If the Exit Code is between1-128 – the container terminated due to an internal error, such as a missing or invalid command in the image specification
- If the Exit Code is between 129-255 – the container was stopped as the result of an operating signal, such as SIGKILL or SIGINT
- If the Exit Code was exit(-1) or another value outside the 0-255 range, kubectl translates it to a value within the 0-255 range.
Refer to the relevant section above to see how to troubleshoot the container for each exit code.
Troubleshooting Kubernetes Pod Termination with Komodor
As a Kubernetes administrator or user, pods or containers terminating unexpectedly can be a pain and can result in severe production issues. The troubleshooting process in Kubernetes is complex and, without the right tools, can be stressful, ineffective, and time-consuming.
Some best practices can help minimize the chances of container failure affecting your applications, but eventually, something will go wrong—simply because it can.
This is the reason why we created Komodor, a tool that helps dev and ops teams stop wasting their precious time looking for needles in (hay)stacks every time things go wrong.
Acting as a single source of truth (SSOT) for all of your k8s troubleshooting needs, Komodor offers:
- Change intelligence: Every issue is a result of a change. Within seconds we can help you understand exactly who did what and when.
- In-depth visibility: A complete activity timeline, showing all code and config changes, deployments, alerts, code diffs, pod logs and etc. All within one pane of glass with easy drill-down options.
- Insights into service dependencies: An easy way to understand cross-service changes and visualize their ripple effects across your entire system.
- Seamless notifications: Direct integration with your existing communication channels (e.g., Slack) so you’ll have all the information you need, when you need it.
See Our Additional Guides on Key Observability Topics
Together with our content partners, we have authored in-depth guides on several other topics that can also be useful as you explore the world of observability.
Источник
Open
Issue created Apr 19, 2017 by
gitlab-ci $CI_REGISTRY_IMAGE is not set
Summary
Maybe we are doing something wrong, but, already tried a lot to get this working and the $CI_REGISTRY_IMAGE environment variable is not being set when CI runs.
Steps to reproduce
Use the docker sample .gitlab-ci.yml file:
image: docker:latest
services:
- docker:dind
before_script:
- docker info
- export
build:
stage: build
script:
- docker login -u "gitlab-ci-token" -p "$CI_JOB_TOKEN" $CI_REGISTRY
- docker build --pull -t "$CI_REGISTRY_IMAGE:latest" .
- docker push "$CI_REGISTRY_IMAGE:latest"
The export
command on before_script does not show the CI_REGISTRY_IMAGE variable. Copied the portion where it should be set.
export CI_PROJECT_URL='http://gitlab.sample/group/sample'
export CI_REGISTRY='docker.sampleserver.net'
export CI_REGISTRY_PASSWORD='xxxxxxxxxxxxxxxxxxxx'
export CI_REGISTRY_USER='gitlab-ci-token'
export CI_REPOSITORY_URL='http://gitlab-ci-token:xxxxxxxxxxxxxxxxxxxx@gitlab.sample/group/sample.git'
export CI_RUNNER_DESCRIPTION='Shared Runner'
What is the current bug behavior?
When running the build step it gives an error:
$ docker login -u "gitlab-ci-token" -p "$CI_JOB_TOKEN" $CI_REGISTRY
Login Succeeded
$ docker build --pull -t "$CI_REGISTRY_IMAGE:latest" .
invalid argument ":latest" for t: invalid reference format
See 'docker build --help'.
ERROR: Job failed: exit code 125
What is the expected correct behavior?
It should set CI_REGISTRY_IMAGE
Output of checks
docker info:
$ docker info
WARNING: No swap limit support
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 17.04.0-ce
Storage Driver: vfs
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary:
containerd version: 422e31ce907fd9c3833a38d7b8fdd023e5a76e73
runc version: 9c2d8d184e5da67c95d601382adf14862e4f2228
init version: 949e6fa
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.4.0-62-generic
Operating System: Alpine Linux v3.5 (containerized)
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 488.4MiB
Name: dab683fbc13e
ID: VVCA:N2CB:WYHK:IW6F:ICM7:S224:4TNR:OX75:2BM5:PW4P:3IZE:RWFG
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Results of GitLab environment info
root@git:/etc/gitlab# sudo gitlab-rake gitlab:env:info
System information
System: Ubuntu 14.04
Current User: git
Using RVM: no
Ruby Version: 2.3.3p222
Gem Version: 2.6.6
Bundler Version:1.13.7
Rake Version: 10.5.0
Redis Version: 3.2.5
Git Version: 2.10.2
Sidekiq Version:4.2.7
GitLab information
Version: 9.0.5
Revision: a6b9899d
Directory: /opt/gitlab/embedded/service/gitlab-rails
DB Adapter: postgresql
URL: http://***********
HTTP Clone URL: http:///some-group/some-project.git
SSH Clone URL: git@:some-group/some-project.git
Using LDAP: yes
Using Omniauth: no
GitLab Shell
Version: 5.0.0
Repository storage paths:
- default: /var/opt/gitlab/git-data/repositories
Hooks: /opt/gitlab/embedded/service/gitlab-shell/hooks/
Git: /opt/gitlab/embedded/bin/git
Results of GitLab application Check
root@git:/etc/gitlab# sudo gitlab-rake gitlab:check SANITIZE=true
Checking GitLab Shell …
GitLab Shell version >= 5.0.0 ? … OK (5.0.0)
Repo base directory exists?
default… yes
Repo storage directories are symlinks?
default… no
Repo paths owned by git:git?
default… yes
Repo paths access is drwxrws—?
default… yes
hooks directories in repos are links: …
4/1 … ok
10/4 … ok
3/5 … ok
12/14 … ok
12/15 … ok
12/16 … ok
3/18 … ok
10/19 … ok
16/23 … ok
12/24 … ok
16/27 … ok
17/30 … ok
17/31 … ok
18/32 … ok
18/33 … ok
18/34 … ok
18/35 … ok
20/43 … ok
4/44 … ok
16/45 … ok
3/51 … ok
23/52 … ok
2/58 … repository is empty
20/60 … ok
3/61 … ok
3/62 … ok
24/63 … ok
27/64 … ok
27/65 … ok
27/66 … ok
27/67 … ok
27/68 … ok
27/69 … ok
15/70 … ok
15/71 … ok
35/73 … ok
35/74 … ok
12/75 … ok
12/76 … ok
12/77 … ok
37/78 … ok
37/79 … ok
37/80 … ok
39/81 … ok
15/82 … ok
3/83 … ok
2/84 … ok
12/85 … ok
3/86 … ok
40/87 … ok
40/88 … ok
3/89 … ok
3/90 … ok
3/91 … ok
3/92 … ok
3/93 … ok
41/94 … ok
42/95 … ok
18/96 … ok
44/97 … ok
45/98 … ok
45/99 … ok
45/100 … ok
45/101 … ok
45/102 … ok
45/103 … ok
45/104 … ok
45/105 … ok
45/106 … ok
45/107 … ok
45/108 … ok
45/109 … ok
45/110 … ok
46/111 … ok
4/112 … ok
3/113 … ok
2/114 … ok
47/115 … ok
27/116 … ok
18/117 … ok
3/118 … repository is empty
48/120 … ok
48/121 … ok
48/122 … ok
48/123 … ok
48/124 … ok
2/125 … ok
12/126 … ok
48/127 … ok
48/128 … ok
48/129 … ok
4/130 … ok
47/131 … repository is empty
4/132 … ok
45/133 … ok
48/135 … ok
40/136 … ok
48/137 … ok
51/138 … ok
52/139 … ok
4/140 … ok
4/141 … ok
4/142 … ok
4/143 … repository is empty
48/144 … ok
45/145 … ok
2/148 … repository is empty
Running /opt/gitlab/embedded/service/gitlab-shell/bin/check
Check GitLab API access: OK
Access to /var/opt/gitlab/.ssh/authorized_keys: OK
Send ping to redis server: OK
gitlab-shell self-check successful
Checking GitLab Shell … Finished
Checking Sidekiq …
Running? … yes
Number of Sidekiq processes … 1
Checking Sidekiq … Finished
Checking Reply by email …
Reply by email is disabled in config/gitlab.yml
Checking Reply by email … Finished
Checking LDAP …
Server: ldapmain
LDAP authentication… Success
LDAP users with access to your GitLab server (only showing the first 100 results)
SANITIZED
Checking LDAP … Finished
Checking GitLab …
Git configured with autocrlf=input? … yes
Database config exists? … yes
All migrations up? … yes
Database contains orphaned GroupMembers? … no
GitLab config exists? … yes
GitLab config outdated? … no
Log directory writable? … yes
Tmp directory writable? … yes
Uploads directory setup correctly? … yes
Init script exists? … skipped (omnibus-gitlab has no init script)
Init script up-to-date? … skipped (omnibus-gitlab has no init script)
projects have namespace: …
4/1 … yes
10/4 … yes
3/5 … yes
12/14 … yes
12/15 … yes
12/16 … yes
3/18 … yes
10/19 … yes
16/23 … yes
12/24 … yes
16/27 … yes
17/30 … yes
17/31 … yes
18/32 … yes
18/33 … yes
18/34 … yes
18/35 … yes
20/43 … yes
4/44 … yes
16/45 … yes
3/51 … yes
23/52 … yes
2/58 … yes
20/60 … yes
3/61 … yes
3/62 … yes
24/63 … yes
27/64 … yes
27/65 … yes
27/66 … yes
27/67 … yes
27/68 … yes
27/69 … yes
15/70 … yes
15/71 … yes
35/73 … yes
35/74 … yes
12/75 … yes
12/76 … yes
12/77 … yes
37/78 … yes
37/79 … yes
37/80 … yes
39/81 … yes
15/82 … yes
3/83 … yes
2/84 … yes
12/85 … yes
3/86 … yes
40/87 … yes
40/88 … yes
3/89 … yes
3/90 … yes
3/91 … yes
3/92 … yes
3/93 … yes
41/94 … yes
42/95 … yes
18/96 … yes
44/97 … yes
45/98 … yes
45/99 … yes
45/100 … yes
45/101 … yes
45/102 … yes
45/103 … yes
45/104 … yes
45/105 … yes
45/106 … yes
45/107 … yes
45/108 … yes
45/109 … yes
45/110 … yes
46/111 … yes
4/112 … yes
3/113 … yes
2/114 … yes
47/115 … yes
27/116 … yes
18/117 … yes
3/118 … yes
48/120 … yes
48/121 … yes
48/122 … yes
48/123 … yes
48/124 … yes
2/125 … yes
12/126 … yes
48/127 … yes
48/128 … yes
48/129 … yes
4/130 … yes
47/131 … yes
4/132 … yes
45/133 … yes
48/135 … yes
40/136 … yes
48/137 … yes
51/138 … yes
52/139 … yes
4/140 … yes
4/141 … yes
4/142 … yes
4/143 … yes
48/144 … yes
45/145 … yes
2/148 … yes
Redis version >= 2.8.0? … yes
Ruby version >= 2.1.0 ? … yes (2.3.3)
Your git bin path is «/opt/gitlab/embedded/bin/git»
Git version >= 2.7.3 ? … yes (2.10.2)
Active users: 20
Checking GitLab … Finished
Possible fixes
For the time being we solved the problem using:
docker build --pull -t "$CI_REGISTRY/$CI_PROJECT_PATH:latest" .
docker push "$CI_REGISTRY/$CI_PROJECT_PATH:latest"
$ docker info
Containers: 2
Running: 2
Paused: 0
Stopped: 0
Images: 2
Server Version: swarm/1.2.8
Role: primary
Strategy: spread
Filters: health, port, containerslots, dependency, affinity, constraint, whitelist
Nodes: 2
(unknown): 192.168.99.101:2376
└ ID:
└ Status: Pending
└ Containers: 0
└ Reserved CPUs: 0 / 0
└ Reserved Memory: 0 B / 0 B
└ Labels:
└ UpdatedAt: 2018-01-19T15:05:48Z
└ ServerVersion:
swarm-manager: 192.168.99.100:2376
└ ID: EWQQ:VGZY:YGCQ:GMWV:R5VI:XDZA:ZCG6:RHDQ:MJQW:UWHD:RXC5:GIUA|192.168.99.100:2376
└ Status: Healthy
└ Containers: 2 (2 Running, 0 Paused, 0 Stopped)
└ Reserved CPUs: 0 / 1
└ Reserved Memory: 0 B / 1.021 GiB
└ Labels: kernelversion=4.4.111-boot2docker, operatingsystem=Boot2Docker 18.01.0-ce (TCL 8.2.1); HEAD : 0bb7bbd - Thu Jan 11 16:32:39 UTC 2018, ostype=linux, provider=virtualbox, storagedriver=aufs
└ UpdatedAt: 2018-01-19T15:05:48Z
└ ServerVersion: 18.01.0-ce
Plugins:
Volume:
Network:
Log:
Swarm:
NodeID:
Is Manager: false
Node Address:
Kernel Version: 4.4.111-boot2docker
Operating System: linux
Architecture: amd64
CPUs: 1
Total Memory: 1.021GiB
Name: 2c7fad8d8f61
Docker Root Dir:
Debug Mode (client): false
Debug Mode (server): false
Experimental: false
Live Restore Enabled: false
WARNING: No kernel memory limit support
Summary
Unable to docker push with Bamboo as the docker login
command fails with exit code: 125
. Example:
Caused by: com.atlassian.utils.process.ProcessException: Error executing '/usr/bin/docker login -u bamboo --password-stdin docker-registry.com:5002', exit code: 125
at com.atlassian.bamboo.plugins.docker.process.DefaultDockerProcessService.execute(DefaultDockerProcessService.java:66)
at com.atlassian.bamboo.plugins.docker.process.DefaultDockerProcessService.executeSilently(DefaultDockerProcessService.java:76)
Environment
- Docker version < 17.07 (e.g. Docker 1.13) installed on the agent performing the build
- Bamboo 6.9.0 or more recent
Diagnosis
Running the below commands in a terminal on the agent server can help validate the presence of the issue:
Running the same command that Bamboo uses:
$ /usr/bin/docker login -u bamboo --password-stdin docker-registry.com:5002
..
> `unknown flag: --password-stdin`
Verifying the docker version:
$ docker --version
Docker version 1.13.1, build 64e9980/1.13.1
Cause
- Versions of docker before 17.07 do not support the argument
--password-stdin
- Bamboo 6.9.0 and more recent only support Docker 17.07: Bamboo 6.9 — Supported Platforms
Solution
Solution #1
Upgrade Docker to a supported version on your Bamboo agents. At time of writing, this is Docker 17.07.
Solution #2
Switch the authentication type on the Docker task in Bamboo to Use the agent’s native credentials and ensure that docker login
has been performed manually on each agent as the same user that runs the agent (this will cause the credentials be stored, base64 encoded in $HOME/.docker/config.json
for the user running the agent). You may also choose to configure a credential store as described in the docker login documentation:
- https://docs.docker.com/engine/reference/commandline/login/
Solution #3
Convert your Docker push task into a Bamboo Script task and manually write the command using a syntax (--password
instead of --password-stdin
) that is compatible with your Docker version. Example:
/usr/bin/docker login -u bamboo --password ${bamboo.dockerPassword} docker-registry.com:5002
/usr/bin/docker push [OPTIONS] NAME[:TAG]
- Be sure to define a dockerPassword variable in your build variables to use here.
Имеется простое приложение на flask, работает в docker контейнере, разработка ведется в репозитории gitlab. Было решено максимально упростить процесс выката изменений на «продакшн». Сделал коммит, запушил, а далее все автоматически (ci должен собрать образ, залить его в registry, подключится к docker на «продакшн» сервере и обновить контейнер.
Сразу покажу gitlab-ci.yml
image: docker:18.09.7 services: - docker:18.09.7-dind stages: - build - deploy before_script: - docker login -u gitlab-ci-token -p $CI_JOB_TOKEN $CI_REGISTRY build: only: - pushes stage: build script: - docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA -t $CI_REGISTRY_IMAGE:latest . - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA - docker push $CI_REGISTRY_IMAGE:latest deploy: only: - pushes stage: deploy before_script: - docker login -u gitlab-ci-token -p $CI_JOB_TOKEN $CI_REGISTRY - cp $DOCKER_CACERT ~/.docker/ca.pem && cp $DOCKER_CERT ~/.docker/cert.pem && cp $DOCKER_KEY ~/.docker/key.pem - export DOCKER_HOST=tcp://MYSERVER.COM:2376 DOCKER_TLS_VERIFY=1 - docker stop $CI_PROJECT_NAME || true - docker rm $CI_PROJECT_NAME || true - docker rmi $CI_REGISTRY_IMAGE:latest - docker pull $CI_REGISTRY_IMAGE:latest script: - docker run --name $CI_PROJECT_NAME -d -p 127.0.0.1:5000:5000 --restart unless-stopped -e DATABASE_URL="$DATABASE_URL" -e SECRET_KEY=$SECRET_KEY -e GOOGLE_OAUTH_CLIENT_ID=$GOOGLE_OAUTH_CLIENT_ID -e GOOGLE_OAUTH_CLIENT_SECRET=$GOOGLE_OAUTH_CLIENT_SECRET -e SENTRY_ENV=$SENTRY_ENV -e SENTRY_DSN="$SENTRY_DSN" $CI_REGISTRY_IMAGE:latest
Важные моменты по файлу.
- В before_script мы авторизируемся в локальном image registry
- only: pushes (сборка и деплой триггерится пушем в репозиторий, иначе если вы запушите 10 коммитов у вас запустится 10 процессов сборки)
- В секции deploy->before_script снова авторизируемся, но уже на нашем «продакшн» сервере при помощи сертификатов, которые должны быть добавлены как переменные (тип: file, 3 переменных: $DOCKER_CACERT (содержимое ca.pem), $DOCKER_CERT (содержимое client.pem), $DOCKER_KEY (содержимое cclient.key)) в настройках репозитория. Как включить удаленный доступ и сгенерировать сертификаты для docker я писал здесь.
- Измените строку «DOCKER_HOST=tcp://MYSERVER.COM:2376» на корректный адрес удаленного docker сервера и порт api.
- В секции script происходит запуск контейнера из только-что собранного image, с необходимыми для вашего приложения переменными окружения, измените их в соответствии с требованиями вашего софта.
-
#1
Using 7-1.12.
Backup of Ubuntu VM repeatedly fails with error 125 at 65% — same with stop and snapshot backups, same with local and two LAN storage drives. LXCs backup without error. Trying to clone the drive also fails. Any advice appreciated.
Code:
VMID NAME STATUS TIME SIZE FILENAME
105 Jammy FAILED 00:06:23 job failed with err -125 - Operation canceled
TOTAL 00:06:23 0KB
Detailed backup logs:
vzdump 105 --remove 0 --storage vm --node pve --compress zstd --mode stop --mailto *****@gmail.com
105: 2022-05-06 19:30:01 INFO: Starting Backup of VM 105 (qemu)
105: 2022-05-06 19:30:01 INFO: status = running
105: 2022-05-06 19:30:01 INFO: backup mode: stop
105: 2022-05-06 19:30:01 INFO: ionice priority: 7
105: 2022-05-06 19:30:01 INFO: VM Name: Jammy
105: 2022-05-06 19:30:01 INFO: include disk 'scsi0' 'vm:105/vm-105-disk-0.qcow2' 80G
105: 2022-05-06 19:30:01 INFO: include disk 'scsi1' 'vm:105/vm-105-disk-1.qcow2' 100G
105: 2022-05-06 19:30:01 INFO: stopping virtual guest
105: 2022-05-06 19:30:15 INFO: creating vzdump archive '/mnt/storage/vm/dump/vzdump-qemu-105-2022_05_06-19_30_01.vma.zst'
105: 2022-05-06 19:30:15 INFO: starting kvm to execute backup task
105: 2022-05-06 19:30:16 INFO: started backup task '0c50dc63-2ede-45b6-8c04-7a65b77db7f7'
105: 2022-05-06 19:30:16 INFO: resuming VM again after 15 seconds
105: 2022-05-06 19:30:19 INFO: 0% (1.2 GiB of 180.0 GiB) in 3s, read: 406.0 MiB/s, write: 351.9 MiB/s
105: 2022-05-06 19:30:22 INFO: 1% (2.3 GiB of 180.0 GiB) in 6s, read: 392.7 MiB/s, write: 350.0 MiB/s
105: 2022-05-06 19:30:31 INFO: 2% (3.7 GiB of 180.0 GiB) in 15s, read: 155.9 MiB/s, write: 154.1 MiB/s
105: 2022-05-06 19:30:36 INFO: 3% (5.6 GiB of 180.0 GiB) in 20s, read: 391.5 MiB/s, write: 365.9 MiB/s
105: 2022-05-06 19:36:09 INFO: 64% (115.2 GiB of 180.0 GiB) in 5m 53s, read: 452.2 MiB/s, write: 270.2 MiB/s
105: 2022-05-06 19:36:16 INFO: 65% (117.1 GiB of 180.0 GiB) in 6m, read: 272.4 MiB/s, write: 270.5 MiB/s
105: 2022-05-06 19:36:21 INFO: 65% (118.2 GiB of 180.0 GiB) in 6m 5s, read: 232.3 MiB/s, write: 187.3 MiB/s
105: 2022-05-06 19:36:21 ERROR: job failed with err -125 - Operation canceled
105: 2022-05-06 19:36:21 INFO: aborting backup job
105: 2022-05-06 19:36:21 INFO: resuming VM again
105: 2022-05-06 19:36:24 ERROR: Backup of VM 105 failed - job failed with err -125 - Operation canceled
Last edited: May 6, 2022
-
#2
job failed with err -125 — Operation canceled
This sounds like the backup job is stopped by your system for some reason, perhaps the OOM Killer? You could try to monitor memory/system resource usage during the backup, are there any abnormalities?
Still, some more information would be nice. Please attach your VM config qm config <vmid>
. Furthermore, journalctl
may give you some more insights what exactly is going wrong.
-
#3
I take it you mean out of memory killer? I’m not sure how to monitor system resources but it is an Intel i7 11 gen with 1 TB NVME and 32 GB ram so not short on resource.
The qm config is:
root@pve:~# qm config 105
agent: 1,fstrim_cloned_disks=1
boot: order=scsi0;net0
cores: 8
memory: 8192
meta: creation-qemu=6.1.1,ctime=1649458020
name: Jammy
net0: virtio=CA:30:6A:32:38:51,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: l26
scsi0: vm:105/vm-105-disk-0.qcow2,discard=on,size=80G,ssd=1
scsi1: vm:105/vm-105-disk-1.qcow2,discard=on,size=100G,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=15221f6b-f18a-4654-a900-66f27cef30a6
sockets: 1
vmgenid: 2c1e134c-88b9-4845-85da-ba6235c13330
journalctl shows:
May 06 20:56:03 pve kernel: blk_update_request: critical medium error, dev nvme0n1, sector 416706816 op 0x0:(READ) flags 0x4000 phys_seg 1 prio class 2
Which looks like a hardware error on the NVME drive?
-
#4
critical medium error, dev nvme0n1, sector 416706816 op 0x0:(READ) flags 0x4000 phys_seg 1 prio class 2
You are right, this very much sounds like a problem with your device.
Please run smartctl -a <device>
, this should be able to tell you if something is wrong.
Last edited: May 6, 2022
-
#5
Code:
root@pve:~# smartctl -a /dev/nvme0n1
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.30-2-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: ShiJi 1TB M.2-NVMe
Serial Number: 2021121400001
Firmware Version: U0520A0L
PCI Vendor/Subsystem ID: 0x126f
IEEE OUI Identifier: 0x000001
Total NVM Capacity: 1,000,204,886,016 [1.00 TB]
Unallocated NVM Capacity: 0
Controller ID: 1
NVMe Version: 1.3
Number of Namespaces: 1
Namespace 1 Size/Capacity: 1,000,204,886,016 [1.00 TB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 000001 0000000000
Local Time is: Fri May 6 21:17:47 2022 NZST
Firmware Updates (0x12): 1 Slot, no Reset required
Optional Admin Commands (0x0007): Security Format Frmw_DL
Optional NVM Commands (0x0015): Comp DS_Mngmt Sav/Sel_Feat
Log Page Attributes (0x03): S/H_per_NS Cmd_Eff_Lg
Maximum Data Transfer Size: 64 Pages
Warning Comp. Temp. Threshold: 83 Celsius
Critical Comp. Temp. Threshold: 85 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 9.00W - - 0 0 0 0 0 0
1 + 4.60W - - 1 1 1 1 0 0
2 + 3.80W - - 2 2 2 2 0 0
3 - 0.0250W - - 3 3 3 3 2200 3000
4 - 0.0120W - - 4 4 4 4 43000 43000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
/CODE]
I'm no expert here but this looks OK?
-
#6
SMART overall-health self-assessment test result: PASSED
seems to imply that your device is healthy. Though, when I run smartctl locally there are a lot more entries after === START OF SMART DATA SECTION ===
. Especially entries after Error Information
might be of note here. Did you accidentally truncate that?
What I meant by «monitoring system resources» would be to check e.g. how much RAM and CPU your system is using and if it is reaching some kind of threshold. You can check your memory usage simply with free
, something more sophisticated like htop
might give you some more insights though.
-
#7
The above has all the smartctl info, nil truncated. I removed the virtual 100 GB drive from the backup config and now I get a new error:
ERROR: Backup of VM 105 failed — job failed with err -61 — No data available
Googling this leads to several dead ends. Reboot does not change anything and I believe I have eliminated the LAN backup device as the problem.
The VM is working fine and all I have done since last backup is add a couple of Docker apps. The device has 8 GB allocated, plenty for what I have installed I think. Increase to 16 GB no difference.
Trying to clone the VM gives error:
drive-scsi0: transferred 16.8 GiB of 80.0 GiB (20.99%) in 28s
drive-scsi0: Cancelling block job
drive-scsi0: Done.
TASK ERROR: clone failed: block job (mirror) error: drive-scsi0: ‘mirror’ has been cancelled
Following this advice:
https://supportportal.juniper.net/s…ntegrity-of-QCOW2-file-for-VNF?language=en_US
No errors were found. Rebooting to earlier kernel has not made any difference either.
Thanks for your input.
Last edited: May 6, 2022
-
#8
Somewhere in the midst of several reboots I am now on on version 7.2-3
-
#9
Ran smartctl -a /dev/nvme0n1 again — I must have missed below first time.
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 45 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 0%
Data Units Read: 12,164,103 [6.22 TB]
Data Units Written: 11,558,119 [5.91 TB]
Host Read Commands: 65,126,755
Host Write Commands: 143,713,681
Controller Busy Time: 4,762
Power Cycles: 53
Power On Hours: 249
Unsafe Shutdowns: 9
Media and Data Integrity Errors: 127
Error Information Log Entries: 127
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Thermal Temp. 1 Transition Count: 88
Thermal Temp. 1 Total Time: 1465
Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors Logged
-
#10
Just to finish this off in case anyone is interested or has a similar problem I think the problem was bad blocks on the NVME drive.
badblocks -v /dev/nvme0n1 > ~/bad_sectors.txt found eight bad blocks. This site proved useful:
https://www.debugpoint.com/2020/07/scan-repair-bad-sector-disk-linux/
I created a new VM with same specs as old one, rsync’d my stuff over and the new VM is working and able to be backed up.
Last edited: May 9, 2022
-
- #1
Hi everyone,
here is my quick solution to the problem:
2>&1′ with exit code ‘125’: docker: conflicting options: cannot attach both user-defined and non-user-defined network-modes. See ‘docker run —help’.
First of all, you don’t need the extra argument —network xxx anymore. But first things first:
- Create the letsencrypt container like in the TDL video only with the extra argument —cap-add=NET_ADMIN
- Create a new network (i.E. my-net) in bridge mode and put your letsencrypt and nextcloud containers both in here
- Putty or ShellinaBox:
- docker restart
- docker logs -f nameofyourletsencryptcontainer
- now it will take some time to create the certificate:
- It should show: Generating DH parameters, 2048 bit long safe prime, generator 2
This is going to take a long time
……
DH parameters successfully created — 2048 bitsIMPORTANT NOTES:
— Congratulations! Your certificate and chain have been saved at …
- It should show: Generating DH parameters, 2048 bit long safe prime, generator 2
- docker restart
- Edit letsencrypt file according to your conditions: nextcloud.subdomain.conf
- Edit nextcloud file according to your conditions: config.php
- Check if all ports on your router are forwarded correctly
- restart nextcloud
For me, it works like a charm.
Regards
-
- #2
Hi,
I ran into the exact same problem and I am really grateful for your posted solution.
However, as a newbie, I am a bit unsure how to execute step 2: «Create a new network (i.E. my-net) in bridge mode and put your letsencrypt and nextcloud containers both in here»
Just to make sure: Part 1 of that step is executed through the docker UI, in the networks tab, right? I get that part. And the two containers I add by choosing the relevant network in the settings of the relevant containers?
I think I might have it right, just trying to verify before I start messing something up.
Thanks in advance!
-
- #3
I got the same error at first, so I removed the «my-net» argument and I was able to start the container. After that I could access the «server setup» page on my duckdns domain. What I want to know what is the deal with the my-net stuff, why do we use it? It worked for me without it? Do I still need it? I’m a bit confused.
-
- #4
You attach the containers to a customized docker network so Dockers internal dns service can resolve the docker container names. This is needed so containers can interact with each other.
That it works for you means that containers can find each other in some way. Either your edited the proxy-conf files to point at the ip address of your omv server or you used docker-compose (which automatically joins containers in a network) or maybe some other method I am not aware of.As a sidenote: when you edited the proxy-confs to point at an ip address it can cause your nginx reverse proxy to crash if the ip is not reachable (https://blog.linuxserver.io/20…rypt-nginx-starter-guide/)
-
- #5
(…)
-
- Edit nextcloud file according to your conditions: config.php
- Check if all ports on your router are forwarded correctly
- restart nextcloud
(…)
Hello,
thanks for your help! I almost got it done now. All the steps went smoothly untill… for some reason i still cant get to nextcloud via the subdomain (https://mysubdomain.duckdns.org).
Despite double and triple checking the trusted domains section in the config.php and restarting the nextcloud container,i keep getting «Access through untrusted domain» error.Does anyone have an idea why my duckdns subdomain still does not get trusted?
Update: i found my mistake. In my foolish first attempts i created two different folders for nextcloud. And i edited the config.php within the first obsolete folder. So my modifications did not have any effects. Its all good now!
What are Container Exit Codes
Exit codes are used by container engines, when a container terminates, to report why it was terminated.
If you are a Kubernetes user, container failures are one of the most common causes of pod exceptions, and understanding container exit codes can help you get to the root cause of pod failures when troubleshooting.
The most common exit codes used by containers are:
Code # | Name | What it means |
---|---|---|
Exit Code 0 | Purposely stopped | Used by developers to indicate that the container was automatically stopped |
Exit Code 1 | Application error | Container was stopped due to application error or incorrect reference in the image specification |
Exit Code 125 | Container failed to run error | The docker run command did not execute successfully |
Exit Code 126 | Command invoke error | A command specified in the image specification could not be invoked |
Exit Code 127 | File or directory not found | File or directory specified in the image specification was not found |
Exit Code 128 | Invalid argument used on exit | Exit was triggered with an invalid exit code (valid codes are integers between 0-255) |
Exit Code 134 | Abnormal termination (SIGABRT) | The container aborted itself using the abort() function. |
Exit Code 137 | Immediate termination (SIGKILL) | Container was immediately terminated by the operating system via SIGKILL signal |
Exit Code 139 | Segmentation fault (SIGSEGV) | Container attempted to access memory that was not assigned to it and was terminated |
Exit Code 143 | Graceful termination (SIGTERM) | Container received warning that it was about to be terminated, then terminated |
Exit Code 255 | Exit Status Out Of Range | Container exited, returning an exit code outside the acceptable range, meaning the cause of the error is not known |
Below we’ll explain how to troubleshoot failed containers on a self-managed host and in Kubernetes, and provide more details on all of the exit codes listed above.
This is part of an extensive series of guides about Observability.
The Container Lifecycle
To better understand the causes of container failure, let’s discuss the lifecycle of a container first. Taking Docker as an example – at any given time, a Docker container can be in one of several states:
- Created – the Docker container is created but not started yet (this is the status after running docker create, but before actually running the container)
- Up – the Docker container is currently running. This means the operating system process managed by the container is running. This happens when you use the commands docker start or docker run can happen using docker start or docker run.
- Paused – the container process was running, but Docker purposely paused the container. Typically this happens when you run the Docker pause command
- Exited – the Docker container has been terminated, usually because the container’s process was killed
When a container reaches the Exited status, Docker will report an exit code in the logs, to inform you what happened to the container that caused it to shut down.
Understanding Container Exit Codes
Below we cover each of the exit codes in more detail.
Exit Code 0: Purposely Stopped
Exit Code 0 is triggered by developers when they purposely stop their container after a task completes. Technically, Exit Code 0 means that the foreground process is not attached to a specific container.
What to do if a container terminated with Exit Code 0?
- Check the container logs to identify which library caused the container to exit
- Review the code of the existing library and identify why it triggered Exit Code 0, and whether it is functioning correctly
Exit Code 1: Application Error
Exit Code 1 indicates that the container was stopped due to one of the following:
- An application error – this could be a simple programming error in code run by the container, such as “divide by zero”, or advanced errors related to the runtime environment, such as Java, Python, etc
- An invalid reference – this means the image specification refers to a file that does not exist in the container image
What to do if a container terminated with Exit Code 1?
- Check the container log to see if one of the files listed in the image specification could not be found. If this is the issue, correct the image specification to point to the correct path and filename.
- If you cannot find an incorrect file reference, check the container logs for an application error, and debug the library that caused the error.
Exit Code 125: Container Failed to Run
Exit Code 125 means that the command is used to run the container. For example docker run
was invoked in the system shell but did not execute successfully. Here are common reasons this might happen:
- An undefined flag was used in the command, for example
docker run --abcd
- The user-defined in the image specification does not have sufficient permissions on the machine
- Incompatibility between the container engine and the host operating system or hardware
What to do if a container terminated with Exit Code 125?
- Check if the command used to run the container uses the proper syntax
- Check if the user running the container, or the context in which the command is executed in the image specification, has sufficient permissions to create containers on the host
- If your container engine provides other options for running a container, try them. For example, in Docker, try
docker start
instead ofdocker run
- Test if you are able to run other containers on the host using the same username or context. If not, reinstall the container engine, or resolve the underlying compatibility issue between the container engine and the host setup
Exit Code 126: Command Invoke Error
Exit Code 126 means that a command used in the container specification could not be invoked. This is often the cause of a missing dependency or an error in a continuous integration script used to run the container.
What to do if a container terminated with Exit Code 126?
- Check the container logs to see which command could not be invoked
- Try running the container specification without the command to ensure you isolate the problem
- Troubleshoot the command to ensure you are using the correct syntax and all dependencies are available
- Correct the container specification and retry running the container
Exit Code 127: File or Directory Not Found
Exit Code 127 means a command specified in the container specification refers to a non-existent file or directory.
What to do if a container terminated with Exit Code 127?
Same as Exit Code 126, identify the failing command and make sure you reference a valid filename and file path available within the container image.
Exit Code 128: Invalid Argument Used on Exit
Exit Code 128 means that code within the container triggered an exit command, but did not provide a valid exit code. The Linux exit
command only allows integers between 0-255, so if the process was exited with, for example, exit code 3.5
, the logs will report Exit Code 128.
What to do if a container terminated with Exit Code 128?
- Check the container logs to identify which library caused the container to exit.
- Identify where the offending library uses the
exit
command, and correct it to provide a valid exit code.
Exit Code 134: Abnormal Termination (SIGABRT)
Exit Code 134 means that the container abnormally terminated itself, closed the process and flushed open streams. This operation is irreversible, like SIGKILL (see Exit Code 137 below). A process can trigger SIGABRT by doing one of the following:
Calling the abort()
function in the libc
library
Calling the assert()
macro, used for debugging. The process is then aborted if the assertion is false.
What to do if a container terminated with Exit Code 134?
- Check container logs to see which library triggered the SIGABRT signal
- Check if process abortion was planned (for example because the library was in debug mode), and if not, troubleshoot the library and modify it to avoid aborting the container.
Exit Code 137: Immediate Termination (SIGKILL)
Exit Code 137 means that the container has received a SIGKILL signal from the host operating system. This signal instructs a process to terminate immediately, with no grace period. This can be either:
- Triggered when a container is killed via the container engine, for example when using the
docker kill
command - Triggered by a Linux user sending a
kill -9
command to the process - Triggered by Kubernetes after attempting to terminate a container and waiting for a grace period of 30 seconds (by default)
- Triggered automatically by the host, usually due to running out of memory. In this case, the
docker inspect
command will indicate anOOMKilled
error.
What to do if a container terminated with Exit Code 137?
- Check logs on the host to see what happened prior to the container terminating, and whether it previously received a SIGTERM signal (graceful termination) before receiving SIGKILL
- If there was a prior SIGTERM signal, check if your container process handles SIGTERM and is able to gracefully terminate
- If there was no SIGTERM and the container reported an
OOMKilled
error, troubleshoot memory issues on the host
Learn more in our detailed guide to the SIGKILL signal >>
Exit Code 139: Segmentation Fault (SIGSEGV)
Exit Code 139 means that the container received a SIGSEGV signal from the operating system. This indicates a segmentation error – a memory violation, caused by a container trying to access a memory location to which it does not have access. There are three common causes of SIGSEGV errors:
- Coding error—container process did not initialize properly, or it tried to access memory through a pointer to previously freed memory
- Incompatibility between binaries and libraries—container process runs a binary file that is not compatible with a shared library, and thus may try to access inappropriate memory addresses
- Hardware incompatibility or misconfiguration—if you see multiple segmentation errors across multiple libraries, there may be a problem with memory subsystems on the host or a system configuration issue
What to do if a container terminated with Exit Code 139?
- Check if the container process handles SIGSEGV. On both Linux and Windows, you can handle a container’s response to segmentation violations. For example, the container can collect and report a stack trace
- If you need to further troubleshoot SIGSEGV, you may need to set the operating system to allow programs to run even after a segmentation fault occurs, to allow for investigation and debugging. Then, try to intentionally cause a segmentation violation and debug the library causing the issue
- If you cannot replicate the issue, check memory subsystems on the host and troubleshoot memory configuration
Learn more in our detailed guide to the SIGSEGV signal >>
Exit Code 143: Graceful Termination (SIGTERM)
Exit Code 143 means that the container received a SIGTERM signal from the operating system, which asks the container to gracefully terminate, and the container succeeded in gracefully terminating (otherwise you will see Exit Code 137). This exit code can be:
- Triggered by the container engine stopping the container, for example when using the
docker stop
ordocker-compose
down commands - Triggered by Kubernetes setting a pod to Terminating status, and giving containers a 30 second period to gracefully shut down
What to do if a container terminated with Exit Code 143?
Check host logs to see the context in which the operating system sent the SIGTERM signal. If you are using Kubernetes, check the kubelet logs to see if and when the pod was shut down.
In general, Exit Code 143 does not require troubleshooting. It means the container was properly shut down after being instructed to do so by the host.
Learn more in our detailed guide to the SIGTERM signal >>
Exit Code 1: Application Error
Exit Code 1 indicates that the container was stopped due to one of the following:
- An application error – this could be a simple programming error in code run by the container, such as “divide by zero”, or advanced errors related to the runtime environment, such as Java, Python, etc
- An invalid reference – this means the image specification refers to a file that does not exist in the container image
What to do if a container terminated with Exit Code 1?
- Check the container log to see if one of the files listed in the image specification could not be found. If this is the issue, correct the image specification to point to the correct path and filename.
- If you cannot find an incorrect file reference, check the container logs for an application error, and debug the library that caused the error.
Exit Code 125
Exit Code 125: Container Failed to Run
Exit Code 125 means that the command is used to run the container. For example docker run
was invoked in the system shell but did not execute successfully. Here are common reasons this might happen:
- An undefined flag was used in the command, for example
docker run --abcd
- The user-defined in the image specification does not have sufficient permissions on the machine
- Incompatibility between the container engine and the host operating system or hardware
What to do if a container terminated with Exit Code 125?
- Check if the command used to run the container uses the proper syntax
- Check if the user running the container, or the context in which the command is executed in the image specification, has sufficient permissions to create containers on the host
- If your container engine provides other options for running a container, try them. For example, in Docker, try
docker start
instead ofdocker run
- Test if you are able to run other containers on the host using the same username or context. If not, reinstall the container engine, or resolve the underlying compatibility issue between the container engine and the host setup
Exit Code 126: Command Invoke Error
Exit Code 126 means that a command used in the container specification could not be invoked. This is often the cause of a missing dependency or an error in a continuous integration script used to run the container.
What to do if a container terminated with Exit Code 126?
- Check the container logs to see which command could not be invoked
- Try running the container specification without the command to ensure you isolate the problem
- Troubleshoot the command to ensure you are using the correct syntax and all dependencies are available
- Correct the container specification and retry running the container
Exit Code 127: File or Directory Not Found
Exit Code 127 means a command specified in the container specification refers to a non-existent file or directory.
What to do if a container terminated with Exit Code 127?
Same as Exit Code 126 above, identify the failing command and make sure you reference a valid filename and file path available within the container image.
Exit Code 128: Invalid Argument Used on Exit
Exit Code 128 means that code within the container triggered an exit command, but did not provide a valid exit code. The Linux exit
command only allows integers between 0-255, so if the process was exited with, for example, exit code 3.5
, the logs will report Exit Code 128.
What to do if a container terminated with Exit Code 128?
- Check the container logs to identify which library caused the container to exit.
- Identify where the offending library uses the
exit
command, and correct it to provide a valid exit code.
Exit Code 134: Abnormal Termination (SIGABRT)
Exit Code 134 means that the container abnormally terminated itself, closed the process and flushed open streams. This operation is irreversible, like SIGKILL (see Exit Code 137 below). A process can trigger SIGABRT by doing one of the following:
Calling the abort()
function in the libc
library
Calling the assert()
macro, used for debugging. The process is then aborted if the assertion is false.
What to do if a container terminated with Exit Code 134?
- Check container logs to see which library triggered the SIGABRT signal
- Check if process abortion was planned (for example because the library was in debug mode), and if not, troubleshoot the library and modify it to avoid aborting the container.
Exit Code 137: Immediate Termination (SIGKILL)
Exit Code 137 means that the container has received a SIGKILL signal from the host operating system. This signal instructs a process to terminate immediately, with no grace period. This can be either:
- Triggered when a container is killed via the container engine, for example when using the
docker kill
command - Triggered by a Linux user sending a
kill -9
command to the process - Triggered by Kubernetes after attempting to terminate a container and waiting for a grace period of 30 seconds (by default)
- Triggered automatically by the host, usually due to running out of memory. In this case, the
docker inspect
command will indicate anOOMKilled
error.
What to do if a container terminated with Exit Code 137?
- Check logs on the host to see what happened prior to the container terminating, and whether it previously received a SIGTERM signal (graceful termination) before receiving SIGKILL
- If there was a prior SIGTERM signal, check if your container process handles SIGTERM and is able to gracefully terminate
- If there was no SIGTERM and the container reported an
OOMKilled
error, troubleshoot memory issues on the host
Learn more in our detailed guide to the SIGKILL signal >>
Exit Code 139: Segmentation Fault (SIGSEGV)
Exit Code 139 means that the container received a SIGSEGV signal from the operating system. This indicates a segmentation error – a memory violation, caused by a container trying to access a memory location to which it does not have access. There are three common causes of SIGSEGV errors:
- Coding error—container process did not initialize properly, or it tried to access memory through a pointer to previously freed memory
- Incompatibility between binaries and libraries—container process runs a binary file that is not compatible with a shared library, and thus may try to access inappropriate memory addresses
- Hardware incompatibility or misconfiguration—if you see multiple segmentation errors across multiple libraries, there may be a problem with memory subsystems on the host or a system configuration issue
What to do if a container terminated with Exit Code 139?
- Check if the container process handles SIGSEGV. On both Linux and Windows, you can handle a container’s response to segmentation violations. For example, the container can collect and report a stack trace
- If you need to further troubleshoot SIGSEGV, you may need to set the operating system to allow programs to run even after a segmentation fault occurs, to allow for investigation and debugging. Then, try to intentionally cause a segmentation violation and debug the library causing the issue
- If you cannot replicate the issue, check memory subsystems on the host and troubleshoot memory configuration
Learn more in our detailed guide to the SIGSEGV signal >>
Exit Code 143: Graceful Termination (SIGTERM)
Exit Code 143 means that the container received a SIGTERM signal from the operating system, which asks the container to gracefully terminate, and the container succeeded in gracefully terminating (otherwise you will see Exit Code 137). This exit code can be:
- Triggered by the container engine stopping the container, for example when using the
docker stop
ordocker-compose
down commands - Triggered by Kubernetes setting a pod to Terminating status, and giving containers a 30 second period to gracefully shut down
What to do if a container terminated with Exit Code 143?
Check host logs to see the context in which the operating system sent the SIGTERM signal. If you are using Kubernetes, check the kubelet logs to see if and when the pod was shut down.
In general, Exit Code 143 does not require troubleshooting. It means the container was properly shut down after being instructed to do so by the host.
Learn more in our detailed guide to the SIGTERM signal >>
Exit Code 255: Exit Status Out Of Range
When you see exit code 255, it implies the main entrypoint of a container stopped with that status. It means that the container stopped, but it is not known for what reason.
What to do if a container terminated with Exit Code 255?
- If the container is running in a virtual machine, first try removing overlay networks configured on the virtual machine and recreating them.
- If this does not solve the problem, try deleting and recreating the virtual machine, then rerunning the container on it.
- Failing the above, bash into the container and examine logs or other clues about the entrypoint process and why it is failing.
Which Kubernetes Errors are Related to Container Exit Codes?
Whenever containers fail within a pod, or Kubernetes instructs a pod to terminate for any reason, containers will shut down with exit codes. Identifying the exit code can help you understand the underlying cause of a pod exception.
You can use the following command to view pod errors: kubectl describe pod [name]
The result will look something like this:
Containers: kubedns: Container ID: ... Image: ... Image ID: ... Ports: ... Host Ports: ... Args: ... State: Running Started: Fri, 15 Oct 2021 12:06:01 +0800 Last State: Terminated Reason: Error Exit Code: 255 Started: Fri, 15 Oct 2021 11:43:42 +0800 Finished: Fri, 15 Oct 2021 12:05:17 +0800 Ready: True Restart Count: 1
Use the Exit Code provided by kubectl
to troubleshoot the issue:
- If the Exit Code is 0 – the container exited normally, no troubleshooting is required
- If the Exit Code is between1-128 – the container terminated due to an internal error, such as a missing or invalid command in the image specification
- If the Exit Code is between 129-255 – the container was stopped as the result of an operating signal, such as SIGKILL or SIGINT
- If the Exit Code was
exit(-1)
or another value outside the 0-255 range,kubectl
translates it to a value within the 0-255 range.
Refer to the relevant section above to see how to troubleshoot the container for each exit code.
Troubleshooting Kubernetes Pod Termination with Komodor
As a Kubernetes administrator or user, pods or containers terminating unexpectedly can be a pain and can result in severe production issues. The troubleshooting process in Kubernetes is complex and, without the right tools, can be stressful, ineffective, and time-consuming.
Some best practices can help minimize the chances of container failure affecting your applications, but eventually, something will go wrong—simply because it can.
This is the reason why we created Komodor, a tool that helps dev and ops teams stop wasting their precious time looking for needles in (hay)stacks every time things go wrong.
Acting as a single source of truth (SSOT) for all of your k8s troubleshooting needs, Komodor offers:
- Change intelligence: Every issue is a result of a change. Within seconds we can help you understand exactly who did what and when.
- In-depth visibility: A complete activity timeline, showing all code and config changes, deployments, alerts, code diffs, pod logs and etc. All within one pane of glass with easy drill-down options.
- Insights into service dependencies: An easy way to understand cross-service changes and visualize their ripple effects across your entire system.
- Seamless notifications: Direct integration with your existing communication channels (e.g., Slack) so you’ll have all the information you need, when you need it.
See Our Additional Guides on Key Observability Topics
Together with our content partners, we have authored in-depth guides on several other topics that can also be useful as you explore the world of observability.
5xx Server Errors
- How to Fix Kubernetes ‘502 Bad Gateway’ Error
- How to Fix Kubernetes ‘Service 503’ (Service Unavailable) Error
Git Errors
- Git Revert: Rolling Back in GitOps and Kubernetes
- How to Fix ‘failed to push some refs to’ Git Errors
Zero Trust
Authored by Tigera
- Zero Trust Architecture: The Basic Building Blocks
- Zero Trust Network: Why It’s Important and Implementing Zero Trust for K8s
- Zero Trust Security: 4 Principles & 5 Simple Implementation Steps