Openshift error 143

Access Red Hat’s knowledge, guidance, and support through your subscription.
1 Non-specific error 97 Invalid user credentials 99 User does not exist 100 An application with specified name already exists 101 An application with specified name does not exist and cannot be operated on 102 A user with login already exists 103 Given namespace is already in use 104 User’s gear limit has been reached 105 Invalid application name 106 Invalid namespace 107 Invalid user login 108 Invalid SSH key 109 Invalid cartridge types 110 Invalid application type specified 111 Invalid action 112 Invalid API 113 Invalid auth key 114 Invalid auth iv 115 Too many cartridges of one type per user 116 Invalid SSH key type 117 Invalid SSH key name or tag 118 SSH key name does not exist 119 SSH key or key name not specified 120 SSH key name already exists 121 SSH key already exists 122 Last SSH key for user 123 No SSH key for user 124 Could not delete default or primary key 125 Invalid template 126 Invalid event 127 A domain with specified namespace does not exist and cannot be operated on 128 Could not delete domain because domain has valid applications 129 The application is not configured with this cartridge 130 Invalid parameters to estimates controller 131 Error during estimation 132 Insufficient Access Rights 133 Could not delete user 134 Invalid gear profile 135 Cartridge not found in the application 136 Cartridge already embedded in the application 137 Cartridge cannot be added or removed from the application 138 User deletion not permitted for normal or non-subaccount user 139 Could not delete user because user has valid domain or applications 140 Alias already in use 141 Unable to find nameservers for domain 150 A plan with specified id does not exist 151 Billing account was not found for user 152 Billing account status not active 153 User has more consumed gears than the new plan allows 154 User has gears that the new plan does not allow 155 Error getting account information from billing provider 156 Updating user plan on billing provider failed 157 Plan change not allowed for subaccount user 158 Domain already exists for user 159 User has additional filesystem storage that the new plan does not allow 160 User max gear limit capability does not match with current plan 161 User gear sizes capability does not match with current plan 162 User max untracked additional filesystem storage per gear capability does not match with current plan 163 Gear group does not exist 164 User is not allowed to change storage quota 165 Invalid storage quota value provided 166 Storage value not within allowed range 167 Invalid value for nolinks parameter 168 Invalid scaling factor provided. Value out of range. 169 Could not completely distribute scales_from to all groups 170 Could not resolve DNS 171 Could not obtain lock 172 Invalid or missing private key is required for SSL certificate 173 Alias does exist for this application 174 Invalid SSL certificate 175 User is not authorized to add private certificates 176 User has private certificates that the new plan does not allow 180 This command is not available in this application 181 User maximum tracked additional filesystem storage per gear capability does not match with current plan 182 User does not have gear_sizes capability provided by current plan 183 User does not have max_untracked_addtl_storage_per_gear capability provided by current plan 184 User does not have max_tracked_addtl_storage_per_gear capability provided by current plan 185 Cartridge X can not be added without cartridge Y 186 Invalid environment variables: expected array of hashes. 187 Invalid environment variable X. Valid keys name (required), value 188 Invalid environment variable name X: specified multiple times 189 Environment name X not found in application 190 Value not specified for environment variable X 191 Specify parameters name/value or environment_variables 192 Environment name X already exists in application 193 Environment variable deletion not allowed for this operation 194 Name can only contain letters, digits and underscore and cannot begin with a digit 210 Cannot override existing location for Git repository 211 Parent directory for Git repository does not exist 212 Could not find libra_id_rsa 213 Could not read from SSH configuration file 214 Could not write to SSH configuration file 215 Host could not be created or found 216 Error in Git pull 217 Destroy aborted 218 Not found response from request 219 Unable to communicate with server 220 Plan change is not allowed for this account 221 Plan change is not allowed at this time for this account. Wait a few minutes and try again. If problem persists contact Red Hat support. 253 Could not open configuration file 255 Usage error

I am using a containerized Spring boot application in Kubernetes. But the application automatically exits and restarts with exit code 143 and error message «Error».

I am not sure how to identify the reason for this error.

My first idea was that Kubernetes stopped the container due to too high resource usage, as described here, but I can’t see the corresponding kubelet logs.

Is there any way to identify the cause/origin of the SIGTERM? Maybe from spring-boot itself, or from the JVM?

asked May 16, 2022 at 17:42

bennex's user avatar

0

Exit Code 143

  1. It denotes that the process was terminated by an external signal.

  2. The number 143 is a sum of two numbers: 128+x, # where x is the signal number sent to the process that caused it to terminate.

  3. In the example, x equals 15, which is the number of the SIGTERM signal, meaning the process was killed forcibly.

Hope this helps better.

answered Sep 23, 2022 at 14:09

Gupta's user avatar

GuptaGupta

8,1164 gold badges45 silver badges58 bronze badges

I’ve just run into this exact same problem. I was able to track down the origin of the Exit Code 143 by looking at the logs on the Kubernetes nodes (note, the logs on the node not the pod). (I use Lens as an easy way to get a node shell but there are other ways)

Then if you look in /var/log/messages for terminated you’ll see something like this:

Feb  2 11:52:27 np-26992252-3 kubelet[23125]: I0202 11:52:27.541751   23125 kubelet.go:2214] "SyncLoop (probe)" probe="liveness" status="unhealthy" pod="default/app-compute-deployment-56ccffd87f-8s78v"
Feb  2 11:52:27 np-26992252-3 kubelet[23125]: I0202 11:52:27.541920   23125 kubelet.go:2214] "SyncLoop (probe)" probe="readiness" status="" pod="default/app-compute-deployment-56ccffd87f-8s78v"
Feb  2 11:52:27 np-26992252-3 kubelet[23125]: I0202 11:52:27.543274   23125 kuberuntime_manager.go:707] "Message for Container of pod" containerName="app" containerStatusID={Type:containerd ID:c3426d6b07fe3bd60bcbe675bab73b6b4b3619ef4639e1c23bca82692633765e} pod="default/app-comp
ute-deployment-56ccffd87f-8s78v" containerMessage="Container app failed liveness probe, will be restarted"
Feb  2 11:52:27 np-26992252-3 kubelet[23125]: I0202 11:52:27.543374   23125 kuberuntime_container.go:723] "Killing container with a grace period" pod="default/app-compute-deployment-56ccffd87f-8s78v" podUID=89fdc1a2-3a3b-4d57-8a4d-ab115e52dc85 containerName="app" containerID="con
tainerd://c3426d6b07fe3bd60bcbe675bab73b6b4b3619ef4639e1c23bca82692633765e" gracePeriod=30
Feb  2 11:52:27 np-26992252-3 containerd[22741]: time="2023-02-02T11:52:27.543834687Z" level=info msg="StopContainer for "c3426d6b07fe3bd60bcbe675bab73b6b4b3619ef4639e1c23bca82692633765e" with timeout 30 (s)"
Feb  2 11:52:27 np-26992252-3 containerd[22741]: time="2023-02-02T11:52:27.544593294Z" level=info msg="Stop container "c3426d6b07fe3bd60bcbe675bab73b6b4b3619ef4639e1c23bca82692633765e" with signal terminated"

The bit to look out for is containerMessage="Container app failed liveness probe, will be restarted"

answered Feb 3 at 14:34

David's user avatar

DavidDavid

1,8422 gold badges21 silver badges35 bronze badges

Hello,

EDITED — If I restart my machine and then do the telepresence, it works first time, but after I exit and run it second time, I get 143 error code and it continues.

I get exited with code 143 when trying to telepresence a docker container. It seems telepresence cannot start the httpd in foreground. I have other docker container with PHP CLI which could be telepresenced properly. Only in this case I need apache2 server and it does not work. Can you please help?

THE ERROR

Successfully built f959b2c2c059
Successfully tagged xxxx-xxxx:latest
T: How Telepresence uses sudo: 
T: https://www.telepresence.io/reference/install#dependencies
T: Invoking sudo. Please enter your sudo password.
[sudo] password for xxxx: 
T: Volumes are rooted at $TELEPRESENCE_ROOT. See 
T: https://telepresence.io/howto/volumes.html for details.
T: Starting network proxy to cluster by swapping out Deployment xxxx-web 
T: with a proxy
T: Forwarding remote port 80 to local port 80.

T: Setup complete. Launching your container.
Starting webserver
AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 172.17.0.2. Set the 'ServerName' directive globally to suppress this message
T: **Your process exited with return code 143.**
T: Exit cleanup in progress
T: Swapping Deployment xxxx-web back to its original state

Here is the telepresence.sh file

#!/bin/bash
docker build . -t xxxx-xxxx

telepresence --mount=/tmp/known --namespace xxxx-14447706-develop --swap-deployment xxxx-web --docker-run --rm -it 
  -v /tmp/known/secret:/secret -v $(pwd):/code 
  xxxx-xxxx

Here is the telepresence.log file

   0.0 TEL | Telepresence 0.104 launched at Tue Feb 25 14:37:26 2020
   0.0 TEL |   /usr/bin/telepresence --mount=/tmp/known --namespace xxxx-14447706-develop --swap-deployment xxxx-web --docker-run --rm -it -v /tmp/known/secret:/secret -v /home/xxxx/code/delete/xxxx:/code xxxx-xxxx
   0.0 TEL | uname: uname_result(system='Linux', node='xxxx-Aspire-V3-572G', release='4.15.0-88-generic', version='#88-Ubuntu SMP Tue Feb 11 20:11:34 UTC 2020', machine='x86_64', processor='x86_64')
   0.0 TEL | Platform: linux
   0.0 TEL | WSL: False
   0.0 TEL | Python 3.6.9 (default, Nov  7 2019, 10:44:02)
   0.0 TEL | [GCC 8.3.0]
   0.0 TEL | BEGIN SPAN main.py:40(main)
   0.0 TEL | BEGIN SPAN startup.py:83(set_kube_command)
   0.0 TEL | Found kubectl -> /usr/local/bin/kubectl
   0.0 TEL | [1] Capturing: kubectl config current-context
   0.1 TEL | [1] captured in 0.06 secs.
   0.1 TEL | [2] Capturing: kubectl --context do-lon1-xxxx-xxxx version --short
   0.4 TEL | [2] captured in 0.30 secs.
   0.4 TEL | [3] Capturing: kubectl --context do-lon1-xxxx-xxxx config view -o json
   0.4 TEL | [3] captured in 0.06 secs.
   0.4 TEL | [4] Capturing: kubectl --context do-lon1-xxxx-xxxx get ns xxxx-14447706-develop
   4.0 TEL | [4] captured in 3.53 secs.
   4.0 TEL | [5] Capturing: kubectl --context do-lon1-xxxx-xxxx api-versions
   4.3 TEL | [5] captured in 0.30 secs.
   4.3 TEL | Command: kubectl 1.17.3
   4.3 TEL | Context: do-lon1-xxxx-xxxx, namespace: xxxx-14447706-develop, version: 1.15.4
   4.3 TEL | Warning: kubectl 1.17.3 may not work correctly with cluster version 1.15.4 due to the version discrepancy. See https://kubernetes.io/docs/setup/version-skew-policy/ for more information.
   4.3 TEL | END SPAN startup.py:83(set_kube_command)    4.2s
   4.3 TEL | Found ssh -> /usr/bin/ssh
   4.3 TEL | [6] Capturing: ssh -V
   4.3 TEL | [6] captured in 0.01 secs.
   4.3 TEL | Found docker -> /usr/bin/docker
   4.3 TEL | [7] Capturing: docker run --rm -v /tmp/tel-yyhfl23v:/tel alpine:3.6 cat /tel/session_id.txt
   9.8   7 | 0d047cf55f334f55b45a93a50ec8ed83
   9.8 TEL | [7] captured in 5.55 secs.
   9.8 TEL | Found sudo -> /usr/bin/sudo
   9.8 TEL | [8] Running: sudo -n echo -n
   9.8 TEL | [8] ran in 0.01 secs.
   9.8 TEL | Found sshfs -> /usr/bin/sshfs
   9.8 TEL | Found fusermount -> /bin/fusermount
   9.8 >>> | Volumes are rooted at $TELEPRESENCE_ROOT. See https://telepresence.io/howto/volumes.html for details.
   9.8 TEL | [9] Running: kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop get pods telepresence-connectivity-check --ignore-not-found
  10.0 TEL | [9] ran in 0.17 secs.
  10.6 TEL | Scout info: {'latest_version': '0.104', 'application': 'telepresence', 'notices': []}
  10.6 TEL | BEGIN SPAN deployment.py:193(supplant_deployment)
  10.6 >>> | Starting network proxy to cluster by swapping out Deployment xxxx-web with a proxy
  10.6 TEL | BEGIN SPAN remote.py:75(get_deployment_json)
  10.6 TEL | [10] Capturing: kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop get deployment -o json xxxx-web
  10.7 TEL | [10] captured in 0.13 secs.
  10.7 TEL | END SPAN remote.py:75(get_deployment_json)    0.1s
  10.7 TEL | [11] Running: kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop delete deployment xxxx-web-0d047cf55f334f55b45a93a50ec8ed83 --ignore-not-found
  10.8 TEL | [11] ran in 0.14 secs.
  10.8 TEL | [12] Running: kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop apply -f -
  11.4  12 | deployment.extensions/xxxx-web-0d047cf55f334f55b45a93a50ec8ed83 created
  11.4 TEL | [12] ran in 0.52 secs.
  11.4 TEL | [13] Running: kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop scale deployment xxxx-web --replicas=0
  11.6  13 | deployment.extensions/xxxx-web scaled
  11.6 TEL | [13] ran in 0.23 secs.
  11.6 TEL | END SPAN deployment.py:193(supplant_deployment)    1.0s
  11.6 TEL | BEGIN SPAN remote.py:142(get_remote_info)
  11.6 TEL | BEGIN SPAN remote.py:75(get_deployment_json)
  11.6 TEL | [14] Capturing: kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop get deployment -o json --selector=telepresence=0d047cf55f334f55b45a93a50ec8ed83
  11.8 TEL | [14] captured in 0.22 secs.
  11.8 TEL | END SPAN remote.py:75(get_deployment_json)    0.2s
  11.8 TEL | Searching for Telepresence pod:
  11.8 TEL |   with name xxxx-web-0d047cf55f334f55b45a93a50ec8ed83-*
  11.8 TEL |   with labels {'app': 'xxxx-web', 'telepresence': '0d047cf55f334f55b45a93a50ec8ed83'}
  11.8 TEL | [15] Capturing: kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop get pod -o json --selector=telepresence=0d047cf55f334f55b45a93a50ec8ed83
  12.0 TEL | [15] captured in 0.18 secs.
  12.0 TEL | Checking xxxx-web-0d047cf55f334f55b45a93a50ec8ed83-749cbcbfd4-7pm7s
  12.0 TEL | Looks like we've found our pod!
  12.0 TEL | BEGIN SPAN remote.py:104(wait_for_pod)
  12.0 TEL | [16] Capturing: kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop get pod xxxx-web-0d047cf55f334f55b45a93a50ec8ed83-749cbcbfd4-7pm7s -o json
  12.1 TEL | [16] captured in 0.16 secs.
  12.4 TEL | [17] Capturing: kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop get pod xxxx-web-0d047cf55f334f55b45a93a50ec8ed83-749cbcbfd4-7pm7s -o json
  12.5 TEL | [17] captured in 0.13 secs.
  12.8 TEL | [18] Capturing: kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop get pod xxxx-web-0d047cf55f334f55b45a93a50ec8ed83-749cbcbfd4-7pm7s -o json
  12.9 TEL | [18] captured in 0.15 secs.
  13.2 TEL | [19] Capturing: kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop get pod xxxx-web-0d047cf55f334f55b45a93a50ec8ed83-749cbcbfd4-7pm7s -o json
  13.3 TEL | [19] captured in 0.13 secs.
  13.6 TEL | [20] Capturing: kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop get pod xxxx-web-0d047cf55f334f55b45a93a50ec8ed83-749cbcbfd4-7pm7s -o json
  13.7 TEL | [20] captured in 0.15 secs.
  14.0 TEL | [21] Capturing: kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop get pod xxxx-web-0d047cf55f334f55b45a93a50ec8ed83-749cbcbfd4-7pm7s -o json
  14.1 TEL | [21] captured in 0.14 secs.
  14.4 TEL | [22] Capturing: kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop get pod xxxx-web-0d047cf55f334f55b45a93a50ec8ed83-749cbcbfd4-7pm7s -o json
  14.5 TEL | [22] captured in 0.14 secs.
  14.8 TEL | [23] Capturing: kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop get pod xxxx-web-0d047cf55f334f55b45a93a50ec8ed83-749cbcbfd4-7pm7s -o json
  14.9 TEL | [23] captured in 0.16 secs.
  15.2 TEL | [24] Capturing: kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop get pod xxxx-web-0d047cf55f334f55b45a93a50ec8ed83-749cbcbfd4-7pm7s -o json
  15.3 TEL | [24] captured in 0.13 secs.
  15.5 TEL | [25] Capturing: kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop get pod xxxx-web-0d047cf55f334f55b45a93a50ec8ed83-749cbcbfd4-7pm7s -o json
  15.7 TEL | [25] captured in 0.13 secs.
  15.9 TEL | [26] Capturing: kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop get pod xxxx-web-0d047cf55f334f55b45a93a50ec8ed83-749cbcbfd4-7pm7s -o json
  16.1 TEL | [26] captured in 0.16 secs.
  16.3 TEL | [27] Capturing: kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop get pod xxxx-web-0d047cf55f334f55b45a93a50ec8ed83-749cbcbfd4-7pm7s -o json
  16.5 TEL | [27] captured in 0.15 secs.
  16.8 TEL | [28] Capturing: kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop get pod xxxx-web-0d047cf55f334f55b45a93a50ec8ed83-749cbcbfd4-7pm7s -o json
  16.9 TEL | [28] captured in 0.15 secs.
  17.2 TEL | [29] Capturing: kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop get pod xxxx-web-0d047cf55f334f55b45a93a50ec8ed83-749cbcbfd4-7pm7s -o json
  17.3 TEL | [29] captured in 0.13 secs.
  17.5 TEL | [30] Capturing: kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop get pod xxxx-web-0d047cf55f334f55b45a93a50ec8ed83-749cbcbfd4-7pm7s -o json
  17.7 TEL | [30] captured in 0.13 secs.
  17.9 TEL | [31] Capturing: kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop get pod xxxx-web-0d047cf55f334f55b45a93a50ec8ed83-749cbcbfd4-7pm7s -o json
  18.1 TEL | [31] captured in 0.14 secs.
  18.1 TEL | END SPAN remote.py:104(wait_for_pod)    6.1s
  18.1 TEL | END SPAN remote.py:142(get_remote_info)    6.5s
  18.1 TEL | BEGIN SPAN connect.py:37(connect)
  18.1 TEL | [32] Launching kubectl logs: kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop logs -f xxxx-web-0d047cf55f334f55b45a93a50ec8ed83-749cbcbfd4-7pm7s --container xxxx-web --tail=10
  18.1 TEL | [33] Launching kubectl port-forward: kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop port-forward xxxx-web-0d047cf55f334f55b45a93a50ec8ed83-749cbcbfd4-7pm7s 46439:8022
  18.1 TEL | [34] Running: ssh -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -q -p 46439 telepresence@127.0.0.1 /bin/true
  18.1 TEL | [34] exit 255 in 0.02 secs.
  18.3  32 | Retrieving this pod's namespace from the process environment
  18.3  32 | Pod's namespace is 'xxxx-14447706-develop'
  18.3  32 | Listening...
  18.3  32 | 2020-02-25T14:37:44+0000 [-] Loading ./forwarder.py...
  18.3  32 | 2020-02-25T14:37:44+0000 [-] /etc/resolv.conf changed, reparsing
  18.3  32 | 2020-02-25T14:37:44+0000 [-] Resolver added ('10.245.0.10', 53) to server list
  18.3  32 | 2020-02-25T14:37:44+0000 [-] SOCKSv5Factory starting on 9050
  18.3  32 | 2020-02-25T14:37:44+0000 [socks.SOCKSv5Factory#info] Starting factory <socks.SOCKSv5Factory object at 0x7fe97cf16a90>
  18.3  32 | 2020-02-25T14:37:44+0000 [-] DNSDatagramProtocol starting on 9053
  18.3  32 | 2020-02-25T14:37:44+0000 [-] Starting protocol <twisted.names.dns.DNSDatagramProtocol object at 0x7fe97cf16e48>
  18.3  32 | 2020-02-25T14:37:44+0000 [-] Loaded.
  18.3  32 | 2020-02-25T14:37:44+0000 [twisted.scripts._twistd_unix.UnixAppLogger#info] twistd 19.10.0 (/usr/bin/python3.6 3.6.8) starting up.
  18.3  32 | 2020-02-25T14:37:44+0000 [twisted.scripts._twistd_unix.UnixAppLogger#info] reactor class: twisted.internet.epollreactor.EPollReactor.
  18.3 TEL | [35] Running: ssh -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -q -p 46439 telepresence@127.0.0.1 /bin/true
  18.3 TEL | [35] exit 255 in 0.01 secs.
  18.4  33 | Forwarding from 127.0.0.1:46439 -> 8022
  18.4  33 | Forwarding from [::1]:46439 -> 8022
  18.6 TEL | [36] Running: ssh -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -q -p 46439 telepresence@127.0.0.1 /bin/true
  18.6  33 | Handling connection for 46439
  18.9 TEL | [36] ran in 0.27 secs.
  18.9 >>> | Forwarding remote port 80 to local port 80.
  18.9 >>> | 
  18.9 TEL | Launching Web server for proxy poll
  18.9 TEL | [37] Launching SSH port forward (socks and proxy poll): ssh -N -oServerAliveInterval=1 -oServerAliveCountMax=10 -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -q -p 46439 telepresence@127.0.0.1 -L127.0.0.1:36037:127.0.0.1:9050 -R9055:127.0.0.1:36361
  18.9 TEL | END SPAN connect.py:37(connect)    0.8s
  18.9 TEL | BEGIN SPAN remote_env.py:29(get_remote_env)
  18.9 TEL | [38] Capturing: kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop exec xxxx-web-0d047cf55f334f55b45a93a50ec8ed83-749cbcbfd4-7pm7s --container xxxx-web -- python3 podinfo.py
  18.9  33 | Handling connection for 46439
  19.5 TEL | [38] captured in 0.65 secs.
  19.5 TEL | END SPAN remote_env.py:29(get_remote_env)    0.6s
  19.5 TEL | BEGIN SPAN mount.py:30(mount_remote_volumes)
  19.5 TEL | [39] Running: sudo sshfs -p 46439 -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -o allow_other telepresence@127.0.0.1:/ /tmp/known
  19.5  33 | Handling connection for 46439
  19.8 TEL | [39] ran in 0.28 secs.
  19.8 TEL | END SPAN mount.py:30(mount_remote_volumes)    0.3s
  19.8 TEL | BEGIN SPAN container.py:127(run_docker_command)
  19.8 TEL | [40] Launching Network container: docker run --publish=127.0.0.1:39775:38022/tcp --hostname=xxxx-web-0d047cf55f334f55b45a93a50ec8ed83-749cbcbfd4-7pm7s --dns=10.245.0.10 --dns-search=xxxx-14447706-develop.svc.cluster.local --dns-search=svc.cluster.local --dns-search=cluster.local --dns-opt=ndots:5 --rm --privileged --name=telepresence-1582641466-1587806-12268 datawire/telepresence-local:0.104 proxy '{"cidrs": ["0/0"], "expose_ports": [[80, 80]], "to_pod": [], "from_pod": []}'
  19.8 TEL | [41] Running: ssh -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -q -p 39775 root@127.0.0.1 /bin/true
  19.8 TEL | [41] exit 255 in 0.01 secs.
  20.1 TEL | [42] Running: ssh -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -q -p 39775 root@127.0.0.1 /bin/true
  20.1 TEL | [42] exit 255 in 0.02 secs.
  20.3 TEL | [43] Running: ssh -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -q -p 39775 root@127.0.0.1 /bin/true
  20.3 TEL | [43] exit 255 in 0.01 secs.
  20.6 TEL | [44] Running: ssh -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -q -p 39775 root@127.0.0.1 /bin/true
  20.6 TEL | [44] exit 255 in 0.02 secs.
  20.9 TEL | [45] Running: ssh -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -q -p 39775 root@127.0.0.1 /bin/true
  20.9 TEL | [45] exit 255 in 0.01 secs.
  21.1 TEL | [46] Running: ssh -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -q -p 39775 root@127.0.0.1 /bin/true
  21.1 TEL | [46] exit 255 in 0.01 secs.
  21.4 TEL | [47] Running: ssh -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -q -p 39775 root@127.0.0.1 /bin/true
  21.4 TEL | [47] exit 255 in 0.01 secs.
  21.7 TEL | [48] Running: ssh -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -q -p 39775 root@127.0.0.1 /bin/true
  21.7 TEL | [48] exit 255 in 0.01 secs.
  21.9 TEL | [49] Running: ssh -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -q -p 39775 root@127.0.0.1 /bin/true
  21.9 TEL | [49] exit 255 in 0.01 secs.
  22.2 TEL | [50] Running: ssh -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -q -p 39775 root@127.0.0.1 /bin/true
  22.6  40 | [INFO  tini (1)] Spawned child process 'python3' with pid '6'
  22.8  40 |    0.0 TEL | Telepresence 0+unknown launched at Tue Feb 25 14:37:49 2020
  22.8  40 |    0.0 TEL |   /usr/bin/entrypoint.py proxy '{"cidrs": ["0/0"], "expose_ports": [[80, 80]], "to_pod": [], "from_pod": []}'
  22.8  40 |    0.0 TEL | uname: uname_result(system='Linux', node='xxxx-web-0d047cf55f334f55b45a93a50ec8ed83-749cbcbfd4-7pm7s', release='4.15.0-88-generic', version='#88-Ubuntu SMP Tue Feb 11 20:11:34 UTC 2020', machine='x86_64', processor='')
  22.8  40 |    0.0 TEL | Platform: linux
  22.8  40 |    0.0 TEL | WSL: False
  22.8  40 |    0.0 TEL | Python 3.6.8 (default, Apr 22 2019, 10:28:12)
  22.8  40 |    0.0 TEL | [GCC 6.3.0]
  22.8  40 |    0.0 TEL | [1] Running: /usr/sbin/sshd -e
  22.8  40 |    0.0 TEL | [1] ran in 0.01 secs.
  22.8  40 |    0.0 TEL | [2] Running: ssh -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -q -p 38023 telepresence@127.0.0.1 /bin/true
  22.8  40 |    0.0 TEL | [2] exit 255 in 0.01 secs.
  23.1  40 |    0.3 TEL | [3] Running: ssh -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -q -p 38023 telepresence@127.0.0.1 /bin/true
  23.1  40 |    0.3 TEL | [3] exit 255 in 0.01 secs.
  23.3  40 |    0.5 TEL | [4] Running: ssh -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -q -p 38023 telepresence@127.0.0.1 /bin/true
  23.4  40 |    0.5 TEL | [4] exit 255 in 0.01 secs.
  23.4 TEL | [50] ran in 1.17 secs.
  23.4 TEL | [51] Launching Local SSH port forward: ssh -N -oServerAliveInterval=1 -oServerAliveCountMax=10 -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -q -p 39775 root@127.0.0.1 -R 38023:127.0.0.1:46439
  23.4 TEL | [52] Running: docker run --network=container:telepresence-1582641466-1587806-12268 --rm datawire/telepresence-local:0.104 wait
  23.6  40 |    0.8 TEL | [5] Running: ssh -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -q -p 38023 telepresence@127.0.0.1 /bin/true
  23.6  33 | Handling connection for 46439
  24.0  40 |    1.2 TEL | [5] ran in 0.37 secs.
  24.0  40 |    1.2 TEL | [6] Capturing: netstat -n
  24.0  40 |    1.2 TEL | [6] captured in 0.00 secs.
  24.0  40 |    1.2 TEL | [7] Launching SSH port forward (exposed ports): ssh -N -oServerAliveInterval=1 -oServerAliveCountMax=10 -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -q -p 38023 telepresence@127.0.0.1 -R '*:80:127.0.0.1:80'
  24.0  40 |    1.2 TEL | Everything launched. Waiting to exit...
  24.0  33 | Handling connection for 46439
  24.0  40 |    1.2 TEL | BEGIN SPAN runner.py:725(wait_for_exit)
  24.2  40 | Starting sshuttle proxy.
  24.4  40 | firewall manager: Starting firewall with Python version 3.6.8
  24.4  40 | firewall manager: ready method name nat.
  24.4  40 | IPv6 enabled: False
  24.4  40 | UDP enabled: False
  24.4  40 | DNS enabled: True
  24.4  40 | TCP redirector listening on ('127.0.0.1', 12300).
  24.4  40 | DNS listening on ('127.0.0.1', 12300).
  24.4  40 | Starting client with Python version 3.6.8
  24.4  40 | c : connecting to server...
  24.4  33 | Handling connection for 46439
  24.5  40 | Warning: Permanently added '[127.0.0.1]:38023' (ECDSA) to the list of known hosts.
  24.8  40 | Starting server with Python version 3.6.8
  24.8  40 |  s: latency control setting = True
  24.8  40 |  s: available routes:
  24.8  40 | c : Connected.
  24.8  40 | firewall manager: setting up.
  24.9  40 | >> iptables -t nat -N sshuttle-12300
  24.9  40 | >> iptables -t nat -F sshuttle-12300
  24.9  40 | >> iptables -t nat -I OUTPUT 1 -j sshuttle-12300
  24.9  40 | >> iptables -t nat -I PREROUTING 1 -j sshuttle-12300
  24.9  40 | >> iptables -t nat -A sshuttle-12300 -j RETURN --dest 172.17.0.2/32 -p tcp
  24.9  40 | >> iptables -t nat -A sshuttle-12300 -j RETURN --dest 172.17.0.1/32 -p tcp
  24.9  40 | >> iptables -t nat -A sshuttle-12300 -j RETURN --dest 127.0.0.1/32 -p tcp
  24.9  40 | >> iptables -t nat -A sshuttle-12300 -j REDIRECT --dest 0.0.0.0/0 -p tcp --to-ports 12300 -m ttl ! --ttl 42
  25.1  40 | >> iptables -t nat -A sshuttle-12300 -j REDIRECT --dest 10.245.0.10/32 -p udp --dport 53 --to-ports 12300 -m ttl ! --ttl 42
  25.1  40 | >> iptables -t nat -A sshuttle-12300 -j REDIRECT --dest 224.0.0.252/32 -p udp --dport 5355 --to-ports 12300 -m ttl ! --ttl 42
  25.1  40 | conntrack v1.4.4 (conntrack-tools): 0 flow entries have been deleted.
  26.1  52 | [INFO  tini (1)] Spawned child process 'python3' with pid '7'
  26.3  40 | c : DNS request from ('172.17.0.2', 39094) to None: 80 bytes
  26.4  40 | c : DNS request from ('172.17.0.2', 37549) to None: 54 bytes
  27.4  52 | [INFO  tini (1)] Main child exited normally (with status '100')
  29.1 TEL | [52] exit 100 in 5.70 secs.
  29.1 TEL | [53] Capturing: docker run --help
  29.1 TEL | [53] captured in 0.08 secs.
  29.1 TEL | END SPAN container.py:127(run_docker_command)    9.3s
  29.1 >>> | Setup complete. Launching your container.
  29.1 TEL | Everything launched. Waiting to exit...
  29.1 TEL | BEGIN SPAN runner.py:725(wait_for_exit)
  33.8 TEL | Main process (docker run --name=telepresence-1582641475-4157932-12268 --network=container:telepresence-1582641466-1587806-12268 -e=TELEPRESENCE_POD -e=TELEPRESENCE_CONTAINER -e=TELEPRESENCE_MOUNTS -e=TELEPRESENCE_CONTAINER_NAMESPACE -e=xxxx_WEB_SERVICE_HOST -e=xxxx_WEB_SERVICE_PORT -e=xxxx_WEB_PORT_80_TCP -e=xxxx_WEB_PORT_80_TCP_PROTO -e=xxxx_WEB_PORT_80_TCP_ADDR -e=KUBERNETES_PORT_443_TCP_PROTO -e=KUBERNETES_SERVICE_HOST -e=KUBERNETES_SERVICE_PORT -e=KUBERNETES_PORT -e=KUBERNETES_PORT_443_TCP -e=KUBERNETES_PORT_443_TCP_PORT -e=KUBERNETES_PORT_443_TCP_ADDR -e=xxxx_WEB_PORT_80_TCP_PORT -e=xxxx_WEB_SERVICE_PORT_HTTP -e=xxxx_WEB_PORT -e=KUBERNETES_SERVICE_PORT_HTTPS -e=TELEPRESENCE_ROOT -e=TELEPRESENCE_METHOD --volume=/tmp/known:/tmp/known --init --rm -it -v /tmp/known/secret:/secret -v /home/xxxx/code/delete/xxxx:/code xxxx-xxxx)
  33.8 TEL |  exited with code 143.
  33.9 TEL | END SPAN runner.py:725(wait_for_exit)    4.7s
  33.9 >>> | Your process exited with return code 143.
  33.9 TEL | EXITING successful session.
  33.9 >>> | Exit cleanup in progress
  33.9 TEL | (Cleanup) Terminate local container
  33.9 TEL | Shutting down containers...
  33.9 TEL | (Cleanup) Kill BG process [51] Local SSH port forward
  33.9 TEL | [51] Local SSH port forward: exit 0
  33.9  40 | Connection to 127.0.0.1 closed by remote host.
  33.9 TEL | (Cleanup) Kill BG process [40] Network container
  33.9 TEL | [54] Running: docker stop --time=1 telepresence-1582641466-1587806-12268
  33.9  40 |   11.0   7 | Connection to 127.0.0.1 closed by remote host.
  33.9  40 |   11.1 TEL | [7] SSH port forward (exposed ports): exit 255
  33.9  40 | >> iptables -t nat -D OUTPUT -j sshuttle-12300
  33.9  40 | >> iptables -t nat -D PREROUTING -j sshuttle-12300
  33.9  40 | >> iptables -t nat -F sshuttle-12300
  33.9  40 | >> iptables -t nat -X sshuttle-12300
  33.9  40 | firewall manager: Error trying to undo /etc/hosts changes.
  33.9  40 | firewall manager: ---> Traceback (most recent call last):
  33.9  40 | firewall manager: --->   File "/usr/lib/python3.6/site-packages/sshuttle/firewall.py", line 274, in main
  33.9  40 | firewall manager: --->     restore_etc_hosts(port_v6 or port_v4)
  33.9  40 | firewall manager: --->   File "/usr/lib/python3.6/site-packages/sshuttle/firewall.py", line 50, in restore_etc_hosts
  33.9  40 | firewall manager: --->     rewrite_etc_hosts({}, port)
  33.9  40 | firewall manager: --->   File "/usr/lib/python3.6/site-packages/sshuttle/firewall.py", line 29, in rewrite_etc_hosts
  33.9  40 | firewall manager: --->     os.link(HOSTSFILE, BAKFILE)
  33.9  40 | firewall manager: ---> OSError: [Errno 18] Cross-device link: '/etc/hosts' -> '/etc/hosts.sbak'
  33.9  40 |   11.1 TEL | END SPAN runner.py:725(wait_for_exit)    9.9s
  33.9  40 |   11.1 >>> |
  33.9  40 |   11.1 >>> | Background process (SSH port forward (exposed ports)) exited with return code 255. Command was:
  33.9  40 |   11.1 >>> |   ssh -N -oServerAliveInterval=1 -oServerAliveCountMax=10 -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -q -p 38023 telepresence@127.0.0.1 -R '*:80:127.0.0.1:80'
  33.9  40 |   11.1 >>> |
  33.9  40 |   11.1 >>> | Recent output was:
  33.9  40 |   11.1 >>> |   Connection to 127.0.0.1 closed by remote host.
  33.9  40 |   11.1 >>> |
  33.9  40 |   11.1 >>> |
  33.9  40 |   11.1 >>> | Proxy to Kubernetes exited. This is typically due to a lost connection.
  33.9  40 |   11.1 >>> |
  33.9  40 |   11.1 TEL | EXITING with status code 255
  33.9  40 | 
  33.9  40 | T: Background process (SSH port forward (exposed ports)) exited with return code 255. Command was:
  33.9  40 | T:   ssh -N -oServerAliveInterval=1 -oServerAliveCountMax=10 -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -q -p 38023 telepresence@127.0.0.1 -R '*:80:127.0.0.1:80'
  33.9  40 | 
  33.9  40 | T: Recent output was:
  33.9  40 | T:   Connection to 127.0.0.1 closed by remote host.
  33.9  40 | 
  33.9  40 | 
  33.9  40 | T: Proxy to Kubernetes exited. This is typically due to a lost connection.
  33.9  40 | 
  33.9  40 | c : fatal: server died with error code 255
  33.9  40 |   11.1 TEL | Main process (sshuttle-telepresence -v --dns --method nat -e 'ssh -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null' -r telepresence@127.0.0.1:38023 -x 172.17.0.2 -x 172.17.0.1 0/0)
  33.9  40 |   11.1 TEL |  exited with code 99.
  34.0  40 | [INFO  tini (1)] Main child exited normally (with status '255')
  35.5  54 | telepresence-1582641466-1587806-12268
  35.5 TEL | [54] ran in 1.66 secs.
  35.5 TEL | (Cleanup) Unmount remote filesystem
  35.5 TEL | [55] Running: sudo fusermount -z -u /tmp/known
  35.5 TEL | [40] Network container: exit 255
  35.6 TEL | [55] ran in 0.04 secs.
  35.6 TEL | (Cleanup) Kill BG process [37] SSH port forward (socks and proxy poll)
  35.6 TEL | (Cleanup) Kill Web server for proxy poll
  35.6 TEL | [37] SSH port forward (socks and proxy poll): exit 0
  35.9 TEL | (Cleanup) Kill BG process [33] kubectl port-forward
  35.9 TEL | (Cleanup) Kill BG process [32] kubectl logs
  35.9 TEL | [33] kubectl port-forward: exit -15
  35.9 TEL | [32] kubectl logs: exit -15
  35.9 TEL | Background process (kubectl logs) exited with return code -15. Command was:
  35.9 TEL |   kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop logs -f xxxx-web-0d047cf55f334f55b45a93a50ec8ed83-749cbcbfd4-7pm7s --container xxxx-web --tail=10
  35.9 TEL | 
  35.9 TEL | Recent output was:
  35.9 TEL |   2020-02-25T14:37:44+0000 [-] Loading ./forwarder.py...
  35.9 TEL |   2020-02-25T14:37:44+0000 [-] /etc/resolv.conf changed, reparsing
  35.9 TEL |   2020-02-25T14:37:44+0000 [-] Resolver added ('10.245.0.10', 53) to server list
  35.9 TEL |   2020-02-25T14:37:44+0000 [-] SOCKSv5Factory starting on 9050
  35.9 TEL |   2020-02-25T14:37:44+0000 [socks.SOCKSv5Factory#info] Starting factory <socks.SOCKSv5Factory object at 0x7fe97cf16a90>
  35.9 TEL |   2020-02-25T14:37:44+0000 [-] DNSDatagramProtocol starting on 9053
  35.9 TEL |   2020-02-25T14:37:44+0000 [-] Starting protocol <twisted.names.dns.DNSDatagramProtocol object at 0x7fe97cf16e48>
  35.9 TEL |   2020-02-25T14:37:44+0000 [-] Loaded.
  35.9 TEL |   2020-02-25T14:37:44+0000 [twisted.scripts._twistd_unix.UnixAppLogger#info] twistd 19.10.0 (/usr/bin/python3.6 3.6.8) starting up.
  35.9 TEL |   2020-02-25T14:37:44+0000 [twisted.scripts._twistd_unix.UnixAppLogger#info] reactor class: twisted.internet.epollreactor.EPollReactor.
  35.9 TEL | (Cleanup) Re-scale original deployment
  35.9 TEL | [56] Running: kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop scale deployment xxxx-web --replicas=1
  36.1  56 | deployment.extensions/xxxx-web scaled
  36.1 TEL | [56] ran in 0.19 secs.
  36.1 TEL | (Cleanup) Delete new deployment
  36.1 >>> | Swapping Deployment xxxx-web back to its original state
  36.1 TEL | [57] Running: kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop delete deployment xxxx-web-0d047cf55f334f55b45a93a50ec8ed83
  36.3  57 | deployment.extensions "xxxx-web-0d047cf55f334f55b45a93a50ec8ed83" deleted
  36.4 TEL | [57] ran in 0.29 secs.
  36.4 TEL | (Cleanup) Kill sudo privileges holder
  36.4 TEL | (Cleanup) Stop time tracking
  36.4 TEL | END SPAN main.py:40(main)   36.4s
  36.4 TEL | SPAN SUMMARY:
  36.4 TEL |   36.4s main.py:40(main)
  36.4 TEL |    4.2s   startup.py:83(set_kube_command)
  36.4 TEL |    0.1s     1 kubectl config current-context
  36.4 TEL |    0.3s     2 kubectl --context do-lon1-xxxx-xxxx version --short
  36.4 TEL |    0.1s     3 kubectl --context do-lon1-xxxx-xxxx config view -o json
  36.4 TEL |    3.5s     4 kubectl --context do-lon1-xxxx-xxxx get ns xxxx-14447706-develop
  36.4 TEL |    0.3s     5 kubectl --context do-lon1-xxxx-xxxx api-versions
  36.4 TEL |    0.0s   6 ssh -V
  36.4 TEL |    5.5s   7 docker run --rm -v /tmp/tel-yyhfl23v:/tel alpine:3.6 cat /tel/session_id.txt
  36.4 TEL |    0.0s   8 sudo -n echo -n
  36.4 TEL |    0.2s   9 kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop g
  36.4 TEL |    1.0s   deployment.py:193(supplant_deployment)
  36.4 TEL |    0.1s     remote.py:75(get_deployment_json)
  36.4 TEL |    0.1s       10 kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop
  36.4 TEL |    0.1s     11 kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop
  36.4 TEL |    0.5s     12 kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop
  36.4 TEL |    0.2s     13 kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop
  36.4 TEL |    6.5s   remote.py:142(get_remote_info)
  36.4 TEL |    0.2s     remote.py:75(get_deployment_json)
  36.4 TEL |    0.2s       14 kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop
  36.4 TEL |    0.2s     15 kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop
  36.4 TEL |    6.1s     remote.py:104(wait_for_pod)
  36.4 TEL |    0.2s       16 kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop
  36.4 TEL |    0.1s       17 kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop
  36.4 TEL |    0.1s       18 kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop
  36.4 TEL |    0.1s       19 kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop
  36.4 TEL |    0.2s       20 kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop
  36.4 TEL |    0.1s       21 kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop
  36.4 TEL |    0.1s       22 kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop
  36.4 TEL |    0.2s       23 kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop
  36.4 TEL |    0.1s       24 kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop
  36.4 TEL |    0.1s       25 kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop
  36.4 TEL |    0.2s       26 kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop
  36.4 TEL |    0.2s       27 kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop
  36.4 TEL |    0.1s       28 kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop
  36.4 TEL |    0.1s       29 kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop
  36.4 TEL |    0.1s       30 kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop
  36.4 TEL |    0.1s       31 kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop
  36.4 TEL |    0.8s   connect.py:37(connect)
  36.4 TEL |    0.0s     34 ssh -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -q
  36.4 TEL |    0.0s     35 ssh -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -q
  36.4 TEL |    0.3s     36 ssh -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -q
  36.4 TEL |    0.6s   remote_env.py:29(get_remote_env)
  36.4 TEL |    0.6s     38 kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop
  36.4 TEL |    0.3s   mount.py:30(mount_remote_volumes)
  36.4 TEL |    0.3s     39 sudo sshfs -p 46439 -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsF
  36.4 TEL |    9.3s   container.py:127(run_docker_command)
  36.4 TEL |    0.0s     41 ssh -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -q
  36.4 TEL |    0.0s     42 ssh -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -q
  36.4 TEL |    0.0s     43 ssh -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -q
  36.4 TEL |    0.0s     44 ssh -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -q
  36.4 TEL |    0.0s     45 ssh -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -q
  36.4 TEL |    0.0s     46 ssh -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -q
  36.4 TEL |    0.0s     47 ssh -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -q
  36.4 TEL |    0.0s     48 ssh -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -q
  36.4 TEL |    0.0s     49 ssh -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -q
  36.4 TEL |    1.2s     50 ssh -F /dev/null -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -q
  36.4 TEL |    5.7s     52 docker run --network=container:telepresence-1582641466-1587806-12268 --rm dat
  36.4 TEL |    0.1s     53 docker run --help
  36.4 TEL |    4.7s   runner.py:725(wait_for_exit)
  36.4 TEL |    1.7s   54 docker stop --time=1 telepresence-1582641466-1587806-12268
  36.4 TEL |    0.0s   55 sudo fusermount -z -u /tmp/known
  36.4 TEL |    0.2s   56 kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop
  36.4 TEL |    0.3s   57 kubectl --context do-lon1-xxxx-xxxx --namespace xxxx-14447706-develop
  36.4 TEL | (Cleanup) Remove temporary directory
  36.4 TEL | (Cleanup) Save caches
  36.8 TEL | (sudo privileges holder thread exiting)

Here is the DockerFile

# Stage 1
# Run composer
FROM composer:1.6 as composer
COPY ./composer.json /app
COPY ./composer.lock /app

RUN composer install --no-interaction --no-dev --optimize-autoloader

# Stage 2
# Swagger
FROM swaggerapi/swagger-ui:latest as swagger

# Stage 3
# PHP container
FROM alpine:3.8
MAINTAINER Xibo xxxx <info@xibo-xxxx.com>

# Install apache, PHP, and supplimentary programs.
RUN apk update && apk upgrade && apk add tar 
    bash 
    curl 
    php7 
    php7-apache2 
    php7-zmq 
    php7-json 
    php7-gd 
    php7-dom 
    php7-zip 
    php7-gettext 
    php7-soap 
    php7-iconv 
    php7-curl 
    php7-session 
    php7-ctype 
    php7-fileinfo 
    php7-xml 
    php7-simplexml 
    php7-mbstring 
    php7-zlib 
    php7-openssl 
    php7-mongodb 
    php7-redis 
    php7-tokenizer 
    ssmtp 
    apache2 
    ca-certificates 
    tzdata 
    && rm -rf /var/cache/apk/*

COPY ./docker/httpd-foreground /usr/local/bin/httpd-foreground
COPY ./docker/php.ini /etc/php7/conf.d/custom.ini

# Composer generated vendor files
COPY --from=composer /app/vendor /code/vendor

COPY ./docker/apache.conf /etc/apache2/conf.d/xxxx.conf
COPY . /code

# Make apache less insecure
RUN rm /etc/apache2/conf.d/info.conf && 
    rm /etc/apache2/conf.d/userdir.conf

# Swagger
RUN mkdir /code/web/swagger
COPY --from=swagger /usr/share/nginx/html /code/web/swagger
RUN sed -i "s#https://petstore.swagger.io/v2/swagger.json#/swagger.json#" /code/web/swagger/index.html

EXPOSE 80

CMD ["/usr/local/bin/httpd-foreground"]

Here is the http-foreground

#!/bin/bash
echo "Starting webserver"

set -e

# Apache gets grumpy about PID files pre-existing
rm -rf /run/apache2/*
mkdir -p /run/apache2/

/usr/sbin/httpd -DFOREGROUND

What are Container Exit Codes

Exit codes are used by container engines, when a container terminates, to report why it was terminated.

If you are a Kubernetes user, container failures are one of the most common causes of pod exceptions, and understanding container exit codes can help you get to the root cause of pod failures when troubleshooting.

The most common exit codes used by containers are:

Code # Name What it means
Exit Code 0 Purposely stopped Used by developers to indicate that the container was automatically stopped
Exit Code 1 Application error Container was stopped due to application error or incorrect reference in the image specification
Exit Code 125 Container failed to run error The docker run command did not execute successfully
Exit Code 126 Command invoke error A command specified in the image specification could not be invoked
Exit Code 127 File or directory not found File or directory specified in the image specification was not found
Exit Code 128 Invalid argument used on exit Exit was triggered with an invalid exit code (valid codes are integers between 0-255)
Exit Code 134 Abnormal termination (SIGABRT) The container aborted itself using the abort() function.
Exit Code 137 Immediate termination (SIGKILL) Container was immediately terminated by the operating system via SIGKILL signal
Exit Code 139 Segmentation fault (SIGSEGV) Container attempted to access memory that was not assigned to it and was terminated
Exit Code 143 Graceful termination (SIGTERM) Container received warning that it was about to be terminated, then terminated
Exit Code 255 Exit Status Out Of Range Container exited, returning an exit code outside the acceptable range, meaning the cause of the error is not known

Below we’ll explain how to troubleshoot failed containers on a self-managed host and in Kubernetes, and provide more details on all of the exit codes listed above.

This is part of an extensive series of guides about Observability.

The Container Lifecycle

To better understand the causes of container failure, let’s discuss the lifecycle of a container first. Taking Docker as an example – at any given time, a Docker container can be in one of several states:

  • Created – the Docker container is created but not started yet (this is the status after running docker create, but before actually running the container)
  • Up – the Docker container is currently running. This means the operating system process managed by the container is running. This happens when you use the commands docker start or docker run can happen using docker start or docker run.
  • Paused – the container process was running, but Docker purposely paused the container. Typically this happens when you run the Docker pause command
  • Exited – the Docker container has been terminated, usually because the container’s process was killed

When a container reaches the Exited status, Docker will report an exit code in the logs, to inform you what happened to the container that caused it to shut down.

Understanding Container Exit Codes

Below we cover each of the exit codes in more detail.

Exit Code 0: Purposely Stopped

Exit Code 0 is triggered by developers when they purposely stop their container after a task completes. Technically, Exit Code 0 means that the foreground process is not attached to a specific container.

What to do if a container terminated with Exit Code 0?

  1. Check the container logs to identify which library caused the container to exit
  2. Review the code of the existing library and identify why it triggered Exit Code 0, and whether it is functioning correctly

Exit Code 1: Application Error

Exit Code 1 indicates that the container was stopped due to one of the following:

  • An application error – this could be a simple programming error in code run by the container, such as “divide by zero”, or advanced errors related to the runtime environment, such as Java, Python, etc
  • An invalid reference – this means the image specification refers to a file that does not exist in the container image

What to do if a container terminated with Exit Code 1?

  1. Check the container log to see if one of the files listed in the image specification could not be found. If this is the issue, correct the image specification to point to the correct path and filename.
  2. If you cannot find an incorrect file reference, check the container logs for an application error, and debug the library that caused the error.

Exit Code 125: Container Failed to Run

Exit Code 125 means that the command is used to run the container. For example docker run was invoked in the system shell but did not execute successfully. Here are common reasons this might happen:

  • An undefined flag was used in the command, for example docker run --abcd
  • The user-defined in the image specification does not have sufficient permissions on the machine
  • Incompatibility between the container engine and the host operating system or hardware

What to do if a container terminated with Exit Code 125?

  1. Check if the command used to run the container uses the proper syntax
  2. Check if the user running the container, or the context in which the command is executed in the image specification, has sufficient permissions to create containers on the host
  3. If your container engine provides other options for running a container, try them. For example, in Docker, try docker start instead of docker run
  4. Test if you are able to run other containers on the host using the same username or context. If not, reinstall the container engine, or resolve the underlying compatibility issue between the container engine and the host setup

Exit Code 126: Command Invoke Error

Exit Code 126 means that a command used in the container specification could not be invoked. This is often the cause of a missing dependency or an error in a continuous integration script used to run the container.

What to do if a container terminated with Exit Code 126?

  1. Check the container logs to see which command could not be invoked
  2. Try running the container specification without the command to ensure you isolate the problem
  3. Troubleshoot the command to ensure you are using the correct syntax and all dependencies are available
  4. Correct the container specification and retry running the container

Exit Code 127: File or Directory Not Found

Exit Code 127 means a command specified in the container specification refers to a non-existent file or directory.

What to do if a container terminated with Exit Code 127?

Same as Exit Code 126, identify the failing command and make sure you reference a valid filename and file path available within the container image.

Exit Code 128: Invalid Argument Used on Exit

Exit Code 128 means that code within the container triggered an exit command, but did not provide a valid exit code. The Linux exit command only allows integers between 0-255, so if the process was exited with, for example, exit code 3.5, the logs will report Exit Code 128.

What to do if a container terminated with Exit Code 128?

  1. Check the container logs to identify which library caused the container to exit.
  2. Identify where the offending library uses the exit command, and correct it to provide a valid exit code.

Exit Code 134: Abnormal Termination (SIGABRT)

Exit Code 134 means that the container abnormally terminated itself, closed the process and flushed open streams. This operation is irreversible, like SIGKILL (see Exit Code 137 below). A process can trigger SIGABRT by doing one of the following:

Calling the abort() function in the libc library
Calling the assert() macro, used for debugging. The process is then aborted if the assertion is false.

What to do if a container terminated with Exit Code 134?

  1. Check container logs to see which library triggered the SIGABRT signal
  2. Check if process abortion was planned (for example because the library was in debug mode), and if not, troubleshoot the library and modify it to avoid aborting the container.

Exit Code 137: Immediate Termination (SIGKILL)

Exit Code 137 means that the container has received a SIGKILL signal from the host operating system. This signal instructs a process to terminate immediately, with no grace period. This can be either:

  • Triggered when a container is killed via the container engine, for example when using the docker kill command
  • Triggered by a Linux user sending a kill -9 command to the process
  • Triggered by Kubernetes after attempting to terminate a container and waiting for a grace period of 30 seconds (by default)
  • Triggered automatically by the host, usually due to running out of memory. In this case, the docker inspect command will indicate an OOMKilled error.

What to do if a container terminated with Exit Code 137?

  1. Check logs on the host to see what happened prior to the container terminating, and whether it previously received a SIGTERM signal (graceful termination) before receiving SIGKILL
  2. If there was a prior SIGTERM signal, check if your container process handles SIGTERM and is able to gracefully terminate
  3. If there was no SIGTERM and the container reported an OOMKilled error, troubleshoot memory issues on the host

Learn more in our detailed guide to the SIGKILL signal >>

Exit Code 139: Segmentation Fault (SIGSEGV)

Exit Code 139 means that the container received a SIGSEGV signal from the operating system. This indicates a segmentation error – a memory violation, caused by a container trying to access a memory location to which it does not have access. There are three common causes of SIGSEGV errors:

  1. Coding error—container process did not initialize properly, or it tried to access memory through a pointer to previously freed memory
  2. Incompatibility between binaries and libraries—container process runs a binary file that is not compatible with a shared library, and thus may try to access inappropriate memory addresses
  3. Hardware incompatibility or misconfiguration—if you see multiple segmentation errors across multiple libraries, there may be a problem with memory subsystems on the host or a system configuration issue

What to do if a container terminated with Exit Code 139?

  1. Check if the container process handles SIGSEGV. On both Linux and Windows, you can handle a container’s response to segmentation violations. For example, the container can collect and report a stack trace
  2. If you need to further troubleshoot SIGSEGV, you may need to set the operating system to allow programs to run even after a segmentation fault occurs, to allow for investigation and debugging. Then, try to intentionally cause a segmentation violation and debug the library causing the issue
  3. If you cannot replicate the issue, check memory subsystems on the host and troubleshoot memory configuration

Learn more in our detailed guide to the SIGSEGV signal >>

Exit Code 143: Graceful Termination (SIGTERM)

Exit Code 143 means that the container received a SIGTERM signal from the operating system, which asks the container to gracefully terminate, and the container succeeded in gracefully terminating (otherwise you will see Exit Code 137). This exit code can be:

  • Triggered by the container engine stopping the container, for example when using the docker stop or docker-compose down commands
  • Triggered by Kubernetes setting a pod to Terminating status, and giving containers a 30 second period to gracefully shut down

What to do if a container terminated with Exit Code 143?

Check host logs to see the context in which the operating system sent the SIGTERM signal. If you are using Kubernetes, check the kubelet logs to see if and when the pod was shut down.

In general, Exit Code 143 does not require troubleshooting. It means the container was properly shut down after being instructed to do so by the host.

Learn more in our detailed guide to the SIGTERM signal >>

Exit Code 1: Application Error

Exit Code 1 indicates that the container was stopped due to one of the following:

  • An application error – this could be a simple programming error in code run by the container, such as “divide by zero”, or advanced errors related to the runtime environment, such as Java, Python, etc
  • An invalid reference – this means the image specification refers to a file that does not exist in the container image

What to do if a container terminated with Exit Code 1?

  1. Check the container log to see if one of the files listed in the image specification could not be found. If this is the issue, correct the image specification to point to the correct path and filename.
  2. If you cannot find an incorrect file reference, check the container logs for an application error, and debug the library that caused the error.

Exit Code 125

Exit Code 125: Container Failed to Run

Exit Code 125 means that the command is used to run the container. For example docker run was invoked in the system shell but did not execute successfully. Here are common reasons this might happen:

  • An undefined flag was used in the command, for example docker run --abcd
  • The user-defined in the image specification does not have sufficient permissions on the machine
  • Incompatibility between the container engine and the host operating system or hardware

What to do if a container terminated with Exit Code 125?

  1. Check if the command used to run the container uses the proper syntax
  2. Check if the user running the container, or the context in which the command is executed in the image specification, has sufficient permissions to create containers on the host
  3. If your container engine provides other options for running a container, try them. For example, in Docker, try docker start instead of docker run
  4. Test if you are able to run other containers on the host using the same username or context. If not, reinstall the container engine, or resolve the underlying compatibility issue between the container engine and the host setup

Exit Code 126: Command Invoke Error

Exit Code 126 means that a command used in the container specification could not be invoked. This is often the cause of a missing dependency or an error in a continuous integration script used to run the container.

What to do if a container terminated with Exit Code 126?

  1. Check the container logs to see which command could not be invoked
  2. Try running the container specification without the command to ensure you isolate the problem
  3. Troubleshoot the command to ensure you are using the correct syntax and all dependencies are available
  4. Correct the container specification and retry running the container

Exit Code 127: File or Directory Not Found

Exit Code 127 means a command specified in the container specification refers to a non-existent file or directory.

What to do if a container terminated with Exit Code 127?

Same as Exit Code 126 above, identify the failing command and make sure you reference a valid filename and file path available within the container image.

Exit Code 128: Invalid Argument Used on Exit

Exit Code 128 means that code within the container triggered an exit command, but did not provide a valid exit code. The Linux exit command only allows integers between 0-255, so if the process was exited with, for example, exit code 3.5, the logs will report Exit Code 128.

What to do if a container terminated with Exit Code 128?

  1. Check the container logs to identify which library caused the container to exit.
  2. Identify where the offending library uses the exit command, and correct it to provide a valid exit code.

Exit Code 134: Abnormal Termination (SIGABRT)

Exit Code 134 means that the container abnormally terminated itself, closed the process and flushed open streams. This operation is irreversible, like SIGKILL (see Exit Code 137 below). A process can trigger SIGABRT by doing one of the following:

Calling the abort() function in the libc library
Calling the assert() macro, used for debugging. The process is then aborted if the assertion is false.

What to do if a container terminated with Exit Code 134?

  1. Check container logs to see which library triggered the SIGABRT signal
  2. Check if process abortion was planned (for example because the library was in debug mode), and if not, troubleshoot the library and modify it to avoid aborting the container.

Exit Code 137: Immediate Termination (SIGKILL)

Exit Code 137 means that the container has received a SIGKILL signal from the host operating system. This signal instructs a process to terminate immediately, with no grace period. This can be either:

  • Triggered when a container is killed via the container engine, for example when using the docker kill command
  • Triggered by a Linux user sending a kill -9 command to the process
  • Triggered by Kubernetes after attempting to terminate a container and waiting for a grace period of 30 seconds (by default)
  • Triggered automatically by the host, usually due to running out of memory. In this case, the docker inspect command will indicate an OOMKilled error.

What to do if a container terminated with Exit Code 137?

  1. Check logs on the host to see what happened prior to the container terminating, and whether it previously received a SIGTERM signal (graceful termination) before receiving SIGKILL
  2. If there was a prior SIGTERM signal, check if your container process handles SIGTERM and is able to gracefully terminate
  3. If there was no SIGTERM and the container reported an OOMKilled error, troubleshoot memory issues on the host

Learn more in our detailed guide to the SIGKILL signal >>

Exit Code 139: Segmentation Fault (SIGSEGV)

Exit Code 139 means that the container received a SIGSEGV signal from the operating system. This indicates a segmentation error – a memory violation, caused by a container trying to access a memory location to which it does not have access. There are three common causes of SIGSEGV errors:

  1. Coding error—container process did not initialize properly, or it tried to access memory through a pointer to previously freed memory
  2. Incompatibility between binaries and libraries—container process runs a binary file that is not compatible with a shared library, and thus may try to access inappropriate memory addresses
  3. Hardware incompatibility or misconfiguration—if you see multiple segmentation errors across multiple libraries, there may be a problem with memory subsystems on the host or a system configuration issue

What to do if a container terminated with Exit Code 139?

  1. Check if the container process handles SIGSEGV. On both Linux and Windows, you can handle a container’s response to segmentation violations. For example, the container can collect and report a stack trace
  2. If you need to further troubleshoot SIGSEGV, you may need to set the operating system to allow programs to run even after a segmentation fault occurs, to allow for investigation and debugging. Then, try to intentionally cause a segmentation violation and debug the library causing the issue
  3. If you cannot replicate the issue, check memory subsystems on the host and troubleshoot memory configuration

Learn more in our detailed guide to the SIGSEGV signal >>

Exit Code 143: Graceful Termination (SIGTERM)

Exit Code 143 means that the container received a SIGTERM signal from the operating system, which asks the container to gracefully terminate, and the container succeeded in gracefully terminating (otherwise you will see Exit Code 137). This exit code can be:

  • Triggered by the container engine stopping the container, for example when using the docker stop or docker-compose down commands
  • Triggered by Kubernetes setting a pod to Terminating status, and giving containers a 30 second period to gracefully shut down

What to do if a container terminated with Exit Code 143?

Check host logs to see the context in which the operating system sent the SIGTERM signal. If you are using Kubernetes, check the kubelet logs to see if and when the pod was shut down.

In general, Exit Code 143 does not require troubleshooting. It means the container was properly shut down after being instructed to do so by the host.

Learn more in our detailed guide to the SIGTERM signal >>

Exit Code 255: Exit Status Out Of Range

When you see exit code 255, it implies the main entrypoint of a container stopped with that status. It means that the container stopped, but it is not known for what reason.

What to do if a container terminated with Exit Code 255?

  1. If the container is running in a virtual machine, first try removing overlay networks configured on the virtual machine and recreating them.
  2. If this does not solve the problem, try deleting and recreating the virtual machine, then rerunning the container on it.
  3. Failing the above, bash into the container and examine logs or other clues about the entrypoint process and why it is failing.

Which Kubernetes Errors are Related to Container Exit Codes?

Whenever containers fail within a pod, or Kubernetes instructs a pod to terminate for any reason, containers will shut down with exit codes. Identifying the exit code can help you understand the underlying cause of a pod exception.

You can use the following command to view pod errors: kubectl describe pod [name]

The result will look something like this:

Containers:
kubedns:
Container ID: ...
Image: ...
Image ID: ...
Ports: ...
Host Ports: ...
Args: ...
State: Running
   Started: Fri, 15 Oct 2021 12:06:01 +0800
Last State: Terminated
   Reason: Error
   Exit Code: 255
   Started: Fri, 15 Oct 2021 11:43:42 +0800
   Finished: Fri, 15 Oct 2021 12:05:17 +0800
Ready: True
Restart Count: 1

Use the Exit Code provided by kubectl to troubleshoot the issue:

  • If the Exit Code is 0 – the container exited normally, no troubleshooting is required
  • If the Exit Code is between1-128 – the container terminated due to an internal error, such as a missing or invalid command in the image specification
  • If the Exit Code is between 129-255 – the container was stopped as the result of an operating signal, such as SIGKILL or SIGINT
  • If the Exit Code was exit(-1) or another value outside the 0-255 range, kubectl translates it to a value within the 0-255 range.

Refer to the relevant section above to see how to troubleshoot the container for each exit code.

Troubleshooting Kubernetes Pod Termination with Komodor

As a Kubernetes administrator or user, pods or containers terminating unexpectedly can be a pain and can result in severe production issues. The troubleshooting process in Kubernetes is complex and, without the right tools, can be stressful, ineffective, and time-consuming.

Some best practices can help minimize the chances of container failure affecting your applications, but eventually, something will go wrong—simply because it can.

This is the reason why we created Komodor, a tool that helps dev and ops teams stop wasting their precious time looking for needles in (hay)stacks every time things go wrong.

Acting as a single source of truth (SSOT) for all of your k8s troubleshooting needs, Komodor offers:

  • Change intelligence: Every issue is a result of a change. Within seconds we can help you understand exactly who did what and when.
  • In-depth visibility: A complete activity timeline, showing all code and config changes, deployments, alerts, code diffs, pod logs and etc. All within one pane of glass with easy drill-down options.
  • Insights into service dependencies: An easy way to understand cross-service changes and visualize their ripple effects across your entire system.
  • Seamless notifications: Direct integration with your existing communication channels (e.g., Slack) so you’ll have all the information you need, when you need it.

See Our Additional Guides on Key Observability Topics

Together with our content partners, we have authored in-depth guides on several other topics that can also be useful as you explore the world of observability.

5xx Server Errors

  • How to Fix Kubernetes ‘502 Bad Gateway’ Error
  • How to Fix Kubernetes ‘Service 503’ (Service Unavailable) Error

Git Errors

  • Git Revert: Rolling Back in GitOps and Kubernetes
  • How to Fix ‘failed to push some refs to’ Git Errors

Zero Trust
Authored by Tigera

  • Zero Trust Architecture: The Basic Building Blocks
  • Zero Trust Network: Why It’s Important and Implementing Zero Trust for K8s
  • Zero Trust Security: 4 Principles & 5 Simple Implementation Steps

Inspired by my SO, I decided to write this, which hopes to tackle the notorious memory-related problem with Apache-Spark, when handling big data. The error’s most important messages are:

16/09/01 18:07:54 WARN TaskSetManager: Lost task 113.0 in stage 0.0 (TID 461, gsta32512.foo.com): ExecutorLostFailure (executor 28 exited caused by one of the running tasks) Reason: Container marked as failed: container_e05_1472185459203_255575_01_000183 on host: gsta32512.foo.com. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal

16/09/01 18:11:39 WARN TaskSetManager: Lost task 503.0 in stage 0.0 (TID 739, gsta31371.foo.com):
java.lang.OutOfMemoryError: GC overhead limit exceeded
    at ...
16/09/01 18:11:39 WARN TaskSetManager: Lost task 512.0 in stage 0.0 (TID 748, gsta31371.foo.com): 
java.lang.OutOfMemoryError: Java heap space
    at ...
16/09/01 18:11:39 WARN TaskSetManager: Lost task 241.0 in stage 0.0 (TID 573, gsta31371.foo.com):
java.io.IOException: Filesystem closed
    at ...
16/09/01 18:11:41 ERROR YarnScheduler: Lost executor 1 on gsta31371.foo.com: Container marked as failed: container_e05_1472185459203_255575_01_000004 on host: gsta31371.foo.com. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal

In order to tackle memory issues with Spark, you first have to understand what happens under the hood. I won’t expand as in memoryOverhead issue in Spark, but I would like one to have this in mind: Cores, Memory and MemoryOverhead are three things that one can tune to hope for making his job succeed.

memoryOverhead is simply to let your container (the driver or executor(s)) to run until its memory footprint reaches the memoryOverhead limit. Once it exceeds it, it is doomed to be assassinated by YARN. Here two relevant flags:

spark.yarn.executor.memoryOverhead          4096
spark.yarn.driver.memoryOverhead            8192

Memory is important too. You see, the cluster I am using currently has machines that can use up to 8 cores and 12G memory (that is heap memory). I am running Python with Spark (PySPark), so all the code of mine runs off the heap. For that reason, I have to allocate “not much” memory (since this will cut the memory I am allowed to use from the total memory; i.e. that if the total memory I am allowed to use is 20G and I am requesting 12G, then 8G will be left for my Python application to use. While if i request 4G, then 16G will be left for my Python application, since the memory you are requesting is heap memory). Set it with these flags:

spark.executor.memory                       4G
spark.driver.memory                         4G

The number of Cores is also very important; The number of cores you configure (4 vs 8) affects the number of concurrent tasks you can run. With 4 cores you can run 4 tasks in parallel, this affects the amount of execution memory being used. The spark executor memory is shared between these tasks. So with 12G heap memory running 8 tasks, each gets about 1.5GB with 12GB heap running 4 tasks each gets 3GB memory. This is obviously just a rough approximation. Set it with these flags:

spark.executor.cores                        4
spark.driver.cores                          4

This page was written while me being on that case:
Who were you DenverCoder9? What did you see?

About the ‘java.lang.OutOfMemoryError: GC overhead limit exceeded’, I would suggest this tutorial: A Beginner’s Guide on Troubleshooting Spark Applications, which points to Tuning Java Garbage Collection for Apache Spark Applications. If the tasks are GC’ing still you should reconfigure the memory. For instance leave the number of cores the same say 4, but increase memory `spark.executor.memory=8G`.

Have questions? Comments? Did you find a bug? Let me know! 😀
Page created by G. (George) Samaras (DIT)

Понравилась статья? Поделить с друзьями:

Читайте также:

  • Openvpn reconnecting tls error
  • Openservice failed error 1060
  • Openvpn proto option error tap mode is not supported
  • Openvpn profile import error
  • Openserver ошибка запуска mysql

  • 0 0 голоса
    Рейтинг статьи
    Подписаться
    Уведомить о
    guest

    0 комментариев
    Старые
    Новые Популярные
    Межтекстовые Отзывы
    Посмотреть все комментарии