Upstream connect error or disconnect reset before headers reset reason connection termination хром

Bug description When I deploy a service after some point I am getting upstream connect error or disconnect/reset before headers. reset reason: connection termination error. It is random and i have ...

Bug description

When I deploy a service after some point I am getting

upstream connect error or disconnect/reset before headers. reset reason: connection termination error. It is random and i have found out that it only happens (not all the time) but if i got error it is definitely after this line connection is no more.

kubectl logs -n namespace $POD -c istio-proxy

2020-09-24T17:12:35.507004Z     warning envoy config    [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:92] StreamAggregatedResources gRPC config stream closed: 13, 

Also after getting this error and can’t connect to my Istio endpoint/service, 30 minutes later (roughly) i get this line

2020-09-24T17:43:49.473767Z     warning envoy config    [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:92] StreamAggregatedResources gRPC config stream closed: 13,

connection is reinstated and i can connect the service again. It does not happen all the time but when it happens it lasts 30 min exact time to be connection reinstated.

I made a curl test every 30 seconds and it starts at 2020-09-24-17-12 and finishes at 2020-09-24-17-43

tracing-rest-checker-pg8hp tracing-rest-checker 2020-09-24-17-11
tracing-rest-checker-pg8hp tracing-rest-checker {"data":{"names":["proba"],"ndarray":[[0.39731466202150834]]},"meta":{}}
tracing-rest-checker-pg8hp tracing-rest-checker
tracing-rest-checker-pg8hp tracing-rest-checker ################################################################################
tracing-rest-checker-pg8hp tracing-rest-checker ################################################################################
tracing-rest-checker-pg8hp tracing-rest-checker 2020-09-24-17-12
tracing-rest-checker-pg8hp tracing-rest-checker {"data":{"names":["proba"],"ndarray":[[0.39731466202150834]]},"meta":{}}
tracing-rest-checker-pg8hp tracing-rest-checker
tracing-rest-checker-pg8hp tracing-rest-checker ################################################################################
tracing-rest-checker-pg8hp tracing-rest-checker ################################################################################
tracing-rest-checker-pg8hp tracing-rest-checker 2020-09-24-17-12
tracing-rest-checker-pg8hp tracing-rest-checker upstream connect error or disconnect/reset before headers. reset reason: connection termination
tracing-rest-checker-pg8hp tracing-rest-checker ################################################################################
tracing-rest-checker-pg8hp tracing-rest-checker ################################################################################
tracing-rest-checker-pg8hp tracing-rest-checker 2020-09-24-17-13
tracing-rest-checker-pg8hp tracing-rest-checker upstream connect error or disconnect/reset before headers. reset reason: connection termination
tracing-rest-checker-pg8hp tracing-rest-checker ################################################################################
tracing-rest-checker-pg8hp tracing-rest-checker ################################################################################
...
...
...
tracing-rest-checker-pg8hp tracing-rest-checker ################################################################################
tracing-rest-checker-pg8hp tracing-rest-checker 2020-09-24-17-42
tracing-rest-checker-pg8hp tracing-rest-checker upstream connect error or disconnect/reset before headers. reset reason: connection termination
tracing-rest-checker-pg8hp tracing-rest-checker ################################################################################
tracing-rest-checker-pg8hp tracing-rest-checker ################################################################################
tracing-rest-checker-pg8hp tracing-rest-checker 2020-09-24-17-43
tracing-rest-checker-pg8hp tracing-rest-checker upstream connect error or disconnect/reset before headers. reset reason: connection termination
tracing-rest-checker-pg8hp tracing-rest-checker ################################################################################
tracing-rest-checker-pg8hp tracing-rest-checker ################################################################################
tracing-rest-checker-pg8hp tracing-rest-checker 2020-09-24-17-43
tracing-rest-checker-pg8hp tracing-rest-checker upstream connect error or disconnect/reset before headers. reset reason: connection termination
tracing-rest-checker-pg8hp tracing-rest-checker ################################################################################
tracing-rest-checker-pg8hp tracing-rest-checker ################################################################################
tracing-rest-checker-pg8hp tracing-rest-checker 2020-09-24-17-44
tracing-rest-checker-pg8hp tracing-rest-checker {"data":{"names":["proba"],"ndarray":[[0.39731466202150834]]},"meta":{}}
tracing-rest-checker-pg8hp tracing-rest-checker
tracing-rest-checker-pg8hp tracing-rest-checker ################################################################################
tracing-rest-checker-pg8hp tracing-rest-checker ################################################################################
tracing-rest-checker-pg8hp tracing-rest-checker 2020-09-24-17-44
tracing-rest-checker-pg8hp tracing-rest-checker {"data":{"names":["proba"],"ndarray":[[0.39731466202150834]]},"meta":{}}
tracing-rest-checker-pg8hp tracing-rest-checker
tracing-rest-checker-pg8hp tracing-rest-checker ################################################################################

Based on the suggestion I also got the before and after proxy config dump with

 while true ; do  kubectl exec -n istio-system  $(kubectl get pods -n istio-system -l app=istiod -o name) -- curl 'localhost:8080/debug/config_dump?proxyID=tracing-and-logging-rest-tracing-0-model1-865b5596fd-jl2vp.emtech'  > after/after_config_dump.txt-$(date -d '1 hour ago' +%Y-%m-%d-%H-%M); sleep 300;  done

Details logs are attached

istio-bug.zip

[ ] Docs
[ ] Installation
[ X] Networking
[ ] Performance and Scalability
[ ] Extensions and Telemetry
[ ] Security
[ ] Test and Release
[ X] User Experience
[ ] Developer Infrastructure

Expected behavior
To be able to connect endpoint all the time.

Steps to reproduce the bug

Version (include the output of istioctl version --remote and kubectl version --short and helm version if you used Helm)
$ istioctl version —remote
client version: 1.6.8
control plane version: 1.6.8
data plane version: 1.6.8 (7 proxies)
$ kubectl version —short
Client Version: v1.18.6
Server Version: v1.16.13

How was Istio installed?
istioctl install

Environment where bug was observed (cloud vendor, OS, etc)
I have seen this bug on

AWS EKS 1.17
Kind 1.17 Cluster (onprem)
Azure AKS 1.16

Содержание

  1. upstream connect error or disconnect/reset before headers. reset reason: connection termination #19966
  2. Comments
  3. Footer
  4. upstream connect error or disconnect/reset before headers #2852
  5. Comments
  6. «upstream connect error or disconnect/reset before headers. reset reason: connection failure» error for .NET Core apps run in docker-compose #15727
  7. Comments

upstream connect error or disconnect/reset before headers. reset reason: connection termination #19966

Bug description
I upgraded istio from 1.3.6 to 1.4.2 and suddenly getting below error. Are there any changes that I need to make on version 1.4.2 to run previous applications? How can I debug this error to know the actual issue? In the logs there is no info other than error code 503.

upstream connect error or disconnect/reset before headers. reset reason: connection termination

I checked service is up and running with the valid endpoint.

service.yaml

Application istio-proxy logs

ingress gateway logs

Extra info

Expected behavior
The application should run without error message over ingress gateway.

Version (include the output of istioctl version —remote and kubectl version and helm version if you used Helm)
1.4.2

How was Istio installed?
helm template

Environment where bug was observed (cloud vendor, OS, etc)
AKS

The text was updated successfully, but these errors were encountered:

Not sure why is this happening but when I added name in Service ports it worked.

Just commenting here to say that I encountered this same error ( upstream connect error or disconnect/reset before headers. reset reason: connection termination ) when I upgraded from 1.3 to 1.4 and wasted a ton of time trying to debug and figure out what exactly was causing it. I was able to downgrade to 1.3.x with no issue so it was not a huge blocker or anything but just had no idea how to fix it.

Your solution of adding names to the ports in the Kubernetes Services worked for me and I am very grateful.

This should be documented somewhere as it is not obvious. Kubernetes Service port names are optional if you only have a single port and I am sure a lot of other people are hitting this wall. Here for example.

Thx @rnkhouse, it works for me too

I had this same issue too when I upgraded to Istio 1.4.6, but I did NOT see it with Istio 1.4.3. However, simply giving the port a name did not work. I had previously named it interface , but that resulted in the above error. When I named it http , then it worked fine.

see it too with istio 1.4.4

I’ve just run into this as well (tested in 1.4.0 — same symptom was observed on 1.4.6) — this feels like something that should’ve been mentioned at https://istio.io/news/releases/1.4.x/announcing-1.4/upgrade-notes/
It looks like things like https://github.com/helm/charts/blob/master/stable/concourse/templates/web-svc.yaml#L36 are incompatible with this requirement?

Setting PILOT_ENABLE_PROTOCOL_SNIFFING_FOR_OUTBOUND=false in the istio-pilot deployment environment and deleting the istio-ingressgateway/concourse-web pods has also done the trick, with an atc ServicePort name.
I’ve also found that skipping 1.4.x entirely and going to 1.5 is fine.

Had same issue for jaeger service. Having istio 1.4.3 version.
Changed port name from query-http to http-query and it worked!
Please fix it.

Not sure why is this happening but when I added name in Service ports it worked.

This one worked for us too.. phew.. great save.

FWIW I had the same problem with the service port names. Though in my case it was that grpcurl could talk to the gRPC server backend behind envoy, where some webapp could not. So I changed name from grpc to grpc-web and made it work for both the webapp and grpcurl . There is something about upgrading HTTP 1.1 to HTTP 2 that I do not fully understand why the Kubernetes service name would have such an effect. grpcurl speaks HTTP 2 natively whereas the gRPC web magic does not.

ports:
— port: 12306
name: web-http
targetPort: 12306

ports:
— port: 12306
name: grpc-web-http
targetPort: 12306

Just to make it clear for others, the name is not a free text.

© 2023 GitHub, Inc.

You can’t perform that action at this time.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.

Источник

upstream connect error or disconnect/reset before headers #2852

I and others have recently been seeing the «upstream connect error or disconnect/reset before headers» error with some frequency.

It doesn’t seem to be deterministic, for example, only one of the below requests failed.

and upon refreshing the page, a different one, or more, of those same requests may fail.

The errors seem to dissipate after refreshing the page a few times, and I have not yet encountered this while port-forwarding, as opposed to using the «cluster.endpoints.project.cloud.goog» URL for my deployment.

I wasn’t sure if this should be its own issue, or should be added to #1710.

The text was updated successfully, but these errors were encountered:

I think upstream errors are an issue indicating that Ambassador thinks the backend its forwarding traffic to is unhealthy.

Are there particular backends you are seeing this error with?

I have seen the same error. This is happening to me when loading runs for a scheduled pipeline in pipeline UI. @jlewi Do you think this can be caused by pipeline?

FWIW this is happening among a batch of requests. the rest of the requests succeeded indicating the backend should be running.

I think upstream errors are an issue indicating that Ambassador thinks the backend its forwarding traffic to is unhealthy.

Are there particular backends you are seeing this error with?

This is happening with the root Kubeflow UX on Kubeflow deploayments with IAM enabled.
It seems to be happening more and more. Previously it was happening after waiting for several hours. Not it can happen after few minutes.

@Ark-kun @IronPan @rileyjbauer when you observe this error can you take a look and provide your Ambassador pod logs?

I noticed this and when I looked at the logs (see below) I saw errors like the following

If you observe this I might suggest trying to kill all your Ambassador pods.

Ambassador tries to setup a K8s watch on the APIServer to be notified about service changes. It looks like it is having a problem establishing a connection to the APIServer.

The problem might be dependent on Ambassador as well as your APIServer; is your APIServer under a lot of load?

We are using
quay.io/datawire/ambassador:0.37.0

It might be worth trying a newer version of Ambassador.

@ellis-bigelow Do you recall what the performance issues with Ambassador you saw were?
ambassador-5cf8cd97d5-pqrsw.pods.txt

I ran into this problem while installing seldon to my cluster. I added it in twice, once as seldon and another time as seldon-core. This might have been the root cause for this issue, as well as argocd not syncing.

Thanks for the direction @jlewi

I tried killing the pods and after the new ones were up, but I continued to see the errors, and there didn’t seem to be anything notable in the ambassador or API server pod logs

Seeing this too from recent master in EC2. Went down to 1 ambassador replica but no joy.

Re posting as this thread seems more recent and active.

Envoy upstream had an issue that only recently was fixed in dev but not yet fixed in any stable version where if the service it was proxying to ended the connection with a FIN/ACK envoy would responding with only an ACK and still leave it in its connection pool and would send the next request to that service using that connection.

The service would receive it, say a get request, and then send a RST as since it had already FIN/ACK ed it doens’t have a way to reply to the request.

Its a roll of the dice whether your request get loaded to an http connection in the pool that is already dead but envoy doesn’t know it or goes to a live one which is why the symptoms of this issue are so intermittent.

May be related to what your seeing, to confirm if you have a way to capture packets on the service side you should see the weird behavoir of the service doing a FIN/ACK but envoy only responding with ACK and then sometime later sending another request on that TCP stream triggering the service to send a RST .

In envoy 1.10 they improved the message you get back so after upstream connect error or disconnect/reset before headers you will get more information, in my case got a message like connection terminated so if you upgrade to the latest envoy you may at least get additional information to confirm this what the source of the problem is even if it isn’t this specific envoy issue.

Источник

«upstream connect error or disconnect/reset before headers. reset reason: connection failure» error for .NET Core apps run in docker-compose #15727

Description:
Hello, I have 2 .NET Core apps (Razor-pages web app and GRPC Service) running in docker-compose. Both are running in different localhost ports. If I access them via localhost, like:

  • http://localhost:5105/ or http://127.0.0.1:5105 — for the web app,
  • http://localhost:5104/ or http://127.0.0.1:5104 — for the GRPC
    both are working. But when I added the envoy configuration listener and clusters and trying to access via:
  • http://localhost:8080/imageslibs
  • http://localhost:8080/imagesservice

Envoy returns the exception upstream connect error or disconnect/reset before headers. reset reason: connection failure for both apps.
The docker-compose.yml:
version: ‘3.4’

Config:
Envoy’s dockerfile:

front-envoy_1 | [2021-03-28 16:47:54.444][14][debug][http] [source/common/http/conn_manager_impl.cc:255] [C6] new stream
front-envoy_1 | [2021-03-28 16:47:54.445][14][debug][http] [source/common/http/conn_manager_impl.cc:883] [C6][S14144009116599918894] request headers complete (end_stream=true):
front-envoy_1 | ‘:authority’, ‘localhost:8080’
front-envoy_1 | ‘:path’, ‘/imageslibs’
front-envoy_1 | ‘:method’, ‘GET’
front-envoy_1 | ‘connection’, ‘keep-alive’
front-envoy_1 | ‘cache-control’, ‘max-age=0’
front-envoy_1 | ‘sec-ch-ua’, ‘»Google Chrome»;v=»89″, «Chromium»;v=»89″, «;Not A Brand»;v=»99″‘
front-envoy_1 | ‘sec-ch-ua-mobile’, ‘?0’
front-envoy_1 | ‘upgrade-insecure-requests’, ‘1’
front-envoy_1 | ‘user-agent’, ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36’
front-envoy_1 | ‘accept’, ‘text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,/;q=0.8,application/signed-exchange;v=b3;q=0.9′
front-envoy_1 | ‘sec-fetch-site’, ‘none’
front-envoy_1 | ‘sec-fetch-mode’, ‘navigate’
front-envoy_1 | ‘sec-fetch-user’, ‘?1’
front-envoy_1 | ‘sec-fetch-dest’, ‘document’
front-envoy_1 | ‘accept-encoding’, ‘gzip, deflate, br’
front-envoy_1 | ‘accept-language’, ‘en-US,en;q=0.9’
front-envoy_1 | ‘cookie’, ‘idsrv.session=NlW8VRtzuNJguQYDdVVpIA; .AspNetCore.Cookie=CfDJ8BR22IBZi6xAvAD2wBqZBlG2IUeWsw7hHPiNq4LrY2HBNRWyhGZ2gZuzRIbMi9MLO7IDORqkSIvDTuZDsLDz6RYtLccXi9x2CwlSzHS169Pgs3hs6biCcFKuriLkWZ4lpWHv4OCqZdO4lGgWmdzcrf2ctQbQOA-xPS7O7NSoQ0-a8VGjjthlIolqaxh5gYLtvvdjSI043UZWVOCb_ZDnFNiD4H_WKAtpKmdENFk_4NbSZmmQ3Indj2ty72kNNUUv8OLEswzxI5dBGA9AYI7i-lzMjbl8GjXNhplHR5j7XJTgG7i9dsF2antRfonV_IpL4sabtmLhdti-ZaumXhPewS702E_1BKo-8ELV3LOMfiE_jdkKJTPR15sCSWkSo0-nllUoQczL7de0F8KMolWK8KoB13z8E388w2juHXnmiDYQIAn3MWzKUvhH_bhgK_ZBCEExWvDqgGRRBroI90Nvg6IAwc_-PoJcPE1HE2i6ouzdkNXoBRg6IQWmelHAtDb8uI2CYzYeBu3zYrnJq28vOhAx_Qpr_y7A0GenqHyJO5cw; .AspNetCore.Antiforgery.9TtSrW0hzOs=CfDJ8Do6rlT2pe5IndjlZXmKm7GvuVL61tmcxXKqGH7eWnem071yNAndO5zwY5WDwxxHjY8CnoRIsalbkPMWIIq_ZFysZ-fkQJJdPm78T8dCxUe5DGeKiJqu5GjjEldMAkcnvmYjNYO9Ht13ldBWwzbBUqs’
front-envoy_1 |
front-envoy_1 | [2021-03-28 16:47:54.445][14][debug][http] [source/common/http/filter_manager.cc:774] [C6][S14144009116599918894] request end stream
front-envoy_1 | [2021-03-28 16:47:54.445][14][debug][router] [source/common/router/router.cc:426] [C6][S14144009116599918894] cluster ‘imageslibs’ match for URL ‘/imageslibs’
front-envoy_1 | [2021-03-28 16:47:54.446][14][debug][router] [source/common/router/router.cc:583] [C6][S14144009116599918894] router decoding headers:
front-envoy_1 | ‘:authority’, ‘localhost:8080’
front-envoy_1 | ‘:path’, ‘/imageslibs’
front-envoy_1 | ‘:method’, ‘GET’
front-envoy_1 | ‘:scheme’, ‘http’
front-envoy_1 | ‘cache-control’, ‘max-age=0’
front-envoy_1 | ‘sec-ch-ua’, ‘»Google Chrome»;v=»89″, «Chromium»;v=»89″, «;Not A Brand»;v=»99″‘
front-envoy_1 | ‘sec-ch-ua-mobile’, ‘?0’
front-envoy_1 | ‘upgrade-insecure-requests’, ‘1’
front-envoy_1 | ‘user-agent’, ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36’
front-envoy_1 | ‘accept’, ‘text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,/;q=0.8,application/signed-exchange;v=b3;q=0.9′
front-envoy_1 | ‘sec-fetch-site’, ‘none’
front-envoy_1 | ‘sec-fetch-mode’, ‘navigate’
front-envoy_1 | ‘sec-fetch-user’, ‘?1’
front-envoy_1 | ‘sec-fetch-dest’, ‘document’
front-envoy_1 | ‘accept-encoding’, ‘gzip, deflate, br’
front-envoy_1 | ‘accept-language’, ‘en-US,en;q=0.9’
front-envoy_1 | ‘cookie’, ‘idsrv.session=NlW8VRtzuNJguQYDdVVpIA; .AspNetCore.Cookie=CfDJ8BR22IBZi6xAvAD2wBqZBlG2IUeWsw7hHPiNq4LrY2HBNRWyhGZ2gZuzRIbMi9MLO7IDORqkSIvDTuZDsLDz6RYtLccXi9x2CwlSzHS169Pgs3hs6biCcFKuriLkWZ4lpWHv4OCqZdO4lGgWmdzcrf2ctQbQOA-xPS7O7NSoQ0-a8VGjjthlIolqaxh5gYLtvvdjSI043UZWVOCb_ZDnFNiD4H_WKAtpKmdENFk_4NbSZmmQ3Indj2ty72kNNUUv8OLEswzxI5dBGA9AYI7i-lzMjbl8GjXNhplHR5j7XJTgG7i9dsF2antRfonV_IpL4sabtmLhdti-ZaumXhPewS702E_1BKo-8ELV3LOMfiE_jdkKJTPR15sCSWkSo0-nllUoQczL7de0F8KMolWK8KoB13z8E388w2juHXnmiDYQIAn3MWzKUvhH_bhgK_ZBCEExWvDqgGRRBroI90Nvg6IAwc_-PoJcPE1HE2i6ouzdkNXoBRg6IQWmelHAtDb8uI2CYzYeBu3zYrnJq28vOhAx_Qpr_y7A0GenqHyJO5cw; .AspNetCore.Antiforgery.9TtSrW0hzOs=CfDJ8Do6rlT2pe5IndjlZXmKm7GvuVL61tmcxXKqGH7eWnem071yNAndO5zwY5WDwxxHjY8CnoRIsalbkPMWIIq_ZFysZ-fkQJJdPm78T8dCxUe5DGeKiJqu5GjjEldMAkcnvmYjNYO9Ht13ldBWwzbBUqs’
front-envoy_1 | ‘x-forwarded-proto’, ‘http’
front-envoy_1 | ‘x-request-id’, ‘6def488d-7020-4a79-acee-d1bd5a9f7252’
front-envoy_1 | ‘x-envoy-expected-rq-timeout-ms’, ‘15000’
front-envoy_1 |
front-envoy_1 | [2021-03-28 16:47:54.446][14][debug][pool] [source/common/http/conn_pool_base.cc:79] queueing stream due to no available connections
front-envoy_1 | [2021-03-28 16:47:54.446][14][debug][pool] [source/common/conn_pool/conn_pool_base.cc:229] trying to create new connection
front-envoy_1 | [2021-03-28 16:47:54.446][14][debug][pool] [source/common/conn_pool/conn_pool_base.cc:132] creating a new connection
front-envoy_1 | [2021-03-28 16:47:54.446][14][debug][client] [source/common/http/codec_client.cc:41] [C8] connecting
front-envoy_1 | [2021-03-28 16:47:54.446][14][debug][connection] [source/common/network/connection_impl.cc:861] [C8] connecting to 127.0.0.1:5105
front-envoy_1 | [2021-03-28 16:47:54.446][14][debug][connection] [source/common/network/connection_impl.cc:880] [C8] connection in progress
front-envoy_1 | [2021-03-28 16:47:54.446][14][debug][connection] [source/common/network/connection_impl.cc:671] [C8] delayed connection error: 111
front-envoy_1 | [2021-03-28 16:47:54.447][14][debug][connection] [source/common/network/connection_impl.cc:243] [C8] closing socket: 0
front-envoy_1 | [2021-03-28 16:47:54.447][14][debug][client] [source/common/http/codec_client.cc:101] [C8] disconnect. resetting 0 pending requests
front-envoy_1 | [2021-03-28 16:47:54.447][14][debug][pool] [source/common/conn_pool/conn_pool_base.cc:380] [C8] client disconnected, failure reason:
front-envoy_1 | [2021-03-28 16:47:54.447][14][debug][router] [source/common/router/router.cc:1040] [C6][S14144009116599918894] upstream reset: reset reason: connection failure, transport failure reason:
front-envoy_1 | [2021-03-28 16:47:54.447][14][debug][http] [source/common/http/filter_manager.cc:858] [C6][S14144009116599918894] Sending local reply with details upstream_reset_before_response_started
front-envoy_1 | [2021-03-28 16:47:54.447][14][debug][http] [source/common/http/conn_manager_impl.cc:1454] [C6][S14144009116599918894] encoding headers via codec (end_stream=false):
front-envoy_1 | ‘:status’, ‘503’
front-envoy_1 | ‘content-length’, ’91’
front-envoy_1 | ‘content-type’, ‘text/plain’
front-envoy_1 | ‘date’, ‘Sun, 28 Mar 2021 16:47:54 GMT’
front-envoy_1 | ‘server’, ‘envoy’

Here is the localhost:9999/clusters output:

imageslibs::default_priority::max_connections::1024
imageslibs::default_priority::max_pending_requests::1024
imageslibs::default_priority::max_requests::1024
imageslibs::default_priority::max_retries::3
imageslibs::high_priority::max_connections::1024
imageslibs::high_priority::max_pending_requests::1024
imageslibs::high_priority::max_requests::1024
imageslibs::high_priority::max_retries::3
imageslibs::added_via_api::false
imageslibs::127.0.0.1:5105::cx_active::0
imageslibs::127.0.0.1:5105::cx_connect_fail::2
imageslibs::127.0.0.1:5105::cx_total::2
imageslibs::127.0.0.1:5105::rq_active::0
imageslibs::127.0.0.1:5105::rq_error::2
imageslibs::127.0.0.1:5105::rq_success::0
imageslibs::127.0.0.1:5105::rq_timeout::0
imageslibs::127.0.0.1:5105::rq_total::0
imageslibs::127.0.0.1:5105::hostname::127.0.0.1
imageslibs::127.0.0.1:5105::health_flags::healthy
imageslibs::127.0.0.1:5105::weight::1
imageslibs::127.0.0.1:5105::region::
imageslibs::127.0.0.1:5105::zone::
imageslibs::127.0.0.1:5105::sub_zone::
imageslibs::127.0.0.1:5105::canary::false
imageslibs::127.0.0.1:5105::priority::0
imageslibs::127.0.0.1:5105::success_rate::-1.0
imageslibs::127.0.0.1:5105::local_origin_success_rate::-1.0
secure_imageslibs::default_priority::max_connections::1024
secure_imageslibs::default_priority::max_pending_requests::1024
secure_imageslibs::default_priority::max_requests::1024
secure_imageslibs::default_priority::max_retries::3
secure_imageslibs::high_priority::max_connections::1024
secure_imageslibs::high_priority::max_pending_requests::1024
secure_imageslibs::high_priority::max_requests::1024
secure_imageslibs::high_priority::max_retries::3
secure_imageslibs::added_via_api::false
secure_imageslibs::127.0.0.1:9105::cx_active::0
secure_imageslibs::127.0.0.1:9105::cx_connect_fail::0
secure_imageslibs::127.0.0.1:9105::cx_total::0
secure_imageslibs::127.0.0.1:9105::rq_active::0
secure_imageslibs::127.0.0.1:9105::rq_error::0
secure_imageslibs::127.0.0.1:9105::rq_success::0
secure_imageslibs::127.0.0.1:9105::rq_timeout::0
secure_imageslibs::127.0.0.1:9105::rq_total::0
secure_imageslibs::127.0.0.1:9105::hostname::127.0.0.1
secure_imageslibs::127.0.0.1:9105::health_flags::healthy
secure_imageslibs::127.0.0.1:9105::weight::1
secure_imageslibs::127.0.0.1:9105::region::
secure_imageslibs::127.0.0.1:9105::zone::
secure_imageslibs::127.0.0.1:9105::sub_zone::
secure_imageslibs::127.0.0.1:9105::canary::false
secure_imageslibs::127.0.0.1:9105::priority::0
secure_imageslibs::127.0.0.1:9105::success_rate::-1.0
secure_imageslibs::127.0.0.1:9105::local_origin_success_rate::-1.0
imagesservice::default_priority::max_connections::1024
imagesservice::default_priority::max_pending_requests::1024
imagesservice::default_priority::max_requests::1024
imagesservice::default_priority::max_retries::3
imagesservice::high_priority::max_connections::1024
imagesservice::high_priority::max_pending_requests::1024
imagesservice::high_priority::max_requests::1024
imagesservice::high_priority::max_retries::3
imagesservice::added_via_api::false
imagesservice::127.0.0.1:5104::cx_active::0
imagesservice::127.0.0.1:5104::cx_connect_fail::1
imagesservice::127.0.0.1:5104::cx_total::1
imagesservice::127.0.0.1:5104::rq_active::0
imagesservice::127.0.0.1:5104::rq_error::1
imagesservice::127.0.0.1:5104::rq_success::0
imagesservice::127.0.0.1:5104::rq_timeout::0
imagesservice::127.0.0.1:5104::rq_total::0
imagesservice::127.0.0.1:5104::hostname::127.0.0.1
imagesservice::127.0.0.1:5104::health_flags::healthy
imagesservice::127.0.0.1:5104::weight::1
imagesservice::127.0.0.1:5104::region::
imagesservice::127.0.0.1:5104::zone::
imagesservice::127.0.0.1:5104::sub_zone::
imagesservice::127.0.0.1:5104::canary::false
imagesservice::127.0.0.1:5104::priority::0
imagesservice::127.0.0.1:5104::success_rate::-1.0
imagesservice::127.0.0.1:5104::local_origin_success_rate::-1.0

The text was updated successfully, but these errors were encountered:

Источник

Here’s what “upstream connect error or disconnect/reset before headers connection failure” means and how to fix it:

If you are an everyday user, and you see this message while browsing the internet, then it simply means that you need to clear your cache and cookies.

If you are a developer and see this message, then you need to check your service routes, destination rules, and/or traffic management with applications.

So if you want to learn all about what this 503 error means exactly and how to fix it, then this article is for you.

Let’s delve deeper into it!

Upstream Connect Error or Disconnect/Reset: Meaning? (Fix)

Young hipster woman upset and confused while working on her laptop

Upstream connect error or disconnect reset before headers reset reason connection failure.

That’s a very specific, yet unclear error message to see.

What is it trying to tell you?

Let’s start with an overview.

This is a 503 error message.

It’s a generic message that actually applies to a lot of different scenarios, and the fix for it will depend on the specific scenario at hand.

In general, this error is telling you that there is a connection error, and that error is linked to routing services and rules. 

That leaves an absolute ton of possibilities, but I’ll take you through the most common sources.

Then, we can talk about troubleshooting and fixing the problem.

confused woman looking at laptop screen

That covers the very zoomed-out picture of this error message, but if you’re getting it, then you probably want to get it to go away.

To fix the problem, we have to address the root cause.

That’s the essence of troubleshooting, and it definitely applies here.

There’s a problem when it comes to identifying the cause of this error.

There are basically two instances where you’re going to see this error, and they are completely different.

One place where you’ll run into it is when you’re coding specific functions that relate to network connection management.

I’m going to break down the three most common scenarios that lead to this error in the next few sections.

But, the other common time you see this error is when you’re browsing the internet.

That means that I’m really answering this question for two very different groups of people.

One group is developing or coding networking resources.

The other group is just browsing the internet.

As you might imagine, it’s hard to consolidate all of that into a single, concise answer.

So, I’m going to split this up.

First, I’ll tackle the developer problems.

If you’re just trying to browse the internet and don’t want to get deep into networking and how it works, then skip to the section that is clearly labeled as not for developers and programmers.

That said, if you want to take a peek behind the curtain and learn a little more about networking, I’ll try to keep these explanations as light as possible.

#1 Reconfiguring Service Routes

male hands working on laptop keyboard on table

I mentioned before that this is a 503 error.

One common place you’ll find it is when reconfiguring service routes.

The boiled-down essence here is that it’s easy to mix up service routing and rules such that the system can receive subsets before they are designed.

Naturally, the system doesn’t know what to do in that case, and you get a 503 error.

The key to avoiding this problem with service route reconfiguring is to follow what you might call a “make-before-break” rule.

Essentially, the steps force the system to add the new subset first and then update the virtual services.

#2 Setting Destination Rules

confused and upset young stylish man, sitting on sofa in living room, working online

Considering the issue above, it should not come as a surprise that you can trigger 503 errors when setting destination rules.

Most commonly, destination rules are the issue if you see the 503 errors right after a request to a service. 

This issue goes hand in hand with the one above.

The problem is still that the destination rule is creating the issue.

The difference is that this isn’t necessarily a problem with receiving subsets before they have been designed.

Virtually any destination rule error can lead to a 503 message.

Since there are so many ways these rules can break down and so many ways the problems can manifest, I’m going to cheat a little.

If you noticed that the problem correlates with new destination rules, then you can follow this guide.

It breaks down the most common destination rule problems and shows you how to overcome them.

#2 Traffic Management With Applications

Two charming women interacting with each other while working on their laptops

The third primary issue is related to conflicts between applications and any proxy sidecar.

In other words, the applications that work with your traffic management rules might not know those rules, and the application can do things that don’t play well with the traffic management system.

That’s pretty vague because, once again, there are a lot of specific possibilities.

The gist is that you’re trying to offload as much error recovery to the applications as you can.

That will minimize these conflicts and resolve most instances of 503 errors.

Upset young woman using laptop at home feels frustrated about computer problem

Considering the detailed problems we just covered, what can you do about the 503 error?

I included some solutions and linked to even more, but if you’re looking for a general guide, then here’s another way to think about the whole thing.

This specific message is telling you that there’s a timing problem between connect errors and disconnect resets.

Somewhere in your system, you have conflicting rules that are trying to do things out of order.

The best way to find the specific area is to focus on rules changes as they relate to traffic management.

Essentially, start with what you touched most recently, and work your way backward from there.

Ok, but What if I’m Not a Developer or Programmer? (3 Steps)

young woman sitting at desk in bedroom working on laptop, cat sitting next to the laptop

Alright. That was a relatively deep walk-through of connection rules development.

If you’re still with me, that’s great.

We’re going to switch gears and look at this from a simple user perspective.

You don’t need to know any coding to run into this problem, and I’m going to show you how to solve it without any coding either.

It’s actually pretty simple.

#1 The Walmart Bug

Tired young mother working with laptop on floor in messy room

But, it still makes more sense when you know more about what went wrong.

So, I’m going to cite one of the most prolific examples of everyday 503 errors.

In 2020, Walmart’s website ran into widespread issues.

Users could browse the site just fine, but when they tried to go to a specific product page to make a purchase, they got the 503 error.

It popped up word for word as I mentioned before: Upstream connect error or disconnect reset before headers reset reason for connection failure.

People were just trying to buy some stuff, and they got hit with this crazy message.

What are you supposed to do with it?

#2 An Easy Fix

woman's hand clicking on wireless mouse with laptop on wooden table

Well, the message is actually giving you very specific advice, once you know how to read it.

It’s telling you that your computer and the Walmart servers had a connection failure, and when they tried to automatically fix that connection problem, things broke down.

A quick note: I’m using the famous Walmart bug as an example, but the problems and solutions discussed here will work any time you see this message while browsing the web.

What that means is that there is some piece of information that is tied to your connection to the Walmart site that is messing up the automatic reconnect protocols.

While that might sound a little vague and mysterious, it actually tells us exactly where the problem lies.

The only information that could exist in this space would have to be stored in your browser’s cache.

This is related to your cookies.

Basically, when the error first went wrong, your computer remembered the problem, and so it just kept doing things the wrong way over and over again.

The solution requires you to make your computer forget the bad rule that it’s following.

To do that, you simply need to clear your cache and cookies.

#3 Clearing the Cache

woman looks pleased and relieved looking at laptop screen

The famous Walmart problem-plagued Chrome users, so I’ll walk you through how to do this on Google Chrome.

If you use a different browser, you can just look up how to clear cache and cookies.

Before we go through the steps, let me explain what is going to happen here.

We’re not deleting anything that is particularly important.

Your internet cache is just storing information related to the websites you visit.

Then, if you go back to that website or reload it, the stored information means that your computer doesn’t actually have to download as much information, and everything can load a little faster and easier.

So, when you delete this cache, it’s going to do a few things.

It’s going to slow down your first visit to any site that no longer has cached files.

But after you visit a site, it will build new cache files, and things will work normally.

This is also going to make your computer forget your sign-in information for any sites that require such.

Sticking with Walmart as an example, if you were signed into the website with your account, then after you clear the cache, you’re going to be automatically signed out again.

Make sure you know your passwords and usernames.

Because of this last issue, some people don’t like to clear their cache.

If you’re worried about that, then you don’t have to clear everything.

Just clear the cache back through the day when the error started.

Ok. With all of that covered, let’s go through the steps: 

  • Look for the three dots and click on them (this opens the tools menu).
  • Choose “history” from the list.
  • Choose the time frame on the right that covers the data you want to clear.
  • Click on “Clear browsing data.”
  • Look at the checkboxes. You can choose cookies, cached images and files, and browsing history.
  • To be sure you resolve the 503 error, clear the cookies and cached files.
  • Click on “Clear Data” and you’re done.

описание проблемы

Среда K8s (v1.13.5) + Istio (v1.1.7) была установлена ​​в тестовой среде, и в один день в кластере Istio было выпущено более 30 сервисов (интерфейсные, внутренние, шлюз), и связанные с Istio были настроены правила маршрутизации. Позже я с полной уверенностью проверил маршрутизацию между службами, только щелкнув внешнюю страницу, чтобы вызвать шлюз, а затем шлюз вызвал другие внутренние службы (веб-интерфейс -> шлюз -> серверная часть). end service), но в фактическом тесте В процессе, шлюз всегда будет сообщать код ответа http внутренней службы 503, а сам шлюз также время от времени будет сообщать код ошибки 503, и кажется, что нет никакой закономерности в сроках сообщения об ошибке, что меня смущает … ..

Связанная проблема

Первое, что приходит в голову, это найти связанные проблемы в github-> istio. Для конкретных проблем, пожалуйста, перейдите по следующей ссылке:

503 «upstream connect error or disconnect/reset before headers» in 1.1 with low traffic

Sporadic 503 errors

Almost every app gets UC errors, 0.012% of all requests in 24h period

В выпуске много дискуссий по поводу 503. Istio представила концепцию sidecar (посланник). Простое понимание sidecar — это прокси локальной сети, висящий перед каждым конкретным приложением в сервисной сетке (соответствует Pod в K8s. . Существует несколько контейнеров: istio-proxy, app, оба могут обмениваться данными через localhost). В Istio дополнительный компонент реализован за счет расширения Envoy. Дополнительный элемент обеспечивает удобство (маршрутизация, предохранитель, конфигурация пула соединений и т. Д.), Но В то же время это также усложняет вызовы между службами. Исходный простой вызов Application1-> Application2 становится вызовом Application1-> Envoy1-> Envoy2-> Application2 в Istio, как показано ниже:

По сути, любые проблемы в процессе связи между Envoy2 и Application2 будут упакованы в 503, отправлены обратно в Enovy1 и, наконец, возвращены в Application1.

Путем повторного изучения Issue было обнаружено, что проблема 503, обычно упоминаемая в Issue, связана с тем, что пул соединений в Envoy2 кэширует недопустимые соединения в Applicaiton2. Envoy2 вызывает Application2 через недопустимое соединение, вызывая сброс соединения, а затем инкапсулирует Envoy2 как 503 и вернулся к нижестоящему вызывающему,

Типичные характеристики этого 503 можно просмотреть в журнале istio-proxy соответствующего приложения.Команда для настройки уровня журнала istio-proxy выглядит следующим образом:

curl -X POST localhost:15000/logging?level=trace

Типичный журнал 503 выглядит следующим образом:

[2019-06-28 13:02:36.790][37][debug][pool] [external/envoy/source/common/http/http1/conn_pool.cc:97] [C26] using existing connection
[2019-06-28 13:02:36.790][37][debug][router] [external/envoy/source/common/router/router.cc:1210] [C21][S3699665653477458718] pool ready
[2019-06-28 13:02:36.790][37][debug][connection] [external/envoy/source/common/network/connection_impl.cc:518] [C26] remote close
[2019-06-28 13:02:36.790][37][debug][connection] [external/envoy/source/common/network/connection_impl.cc:188] [C26] closing socket: 0
[2019-06-28 13:02:36.791][37][debug][client] [external/envoy/source/common/http/codec_client.cc:82] [C26] disconnect. resetting 1 pending requests
[2019-06-28 13:02:36.791][37][debug][client] [external/envoy/source/common/http/codec_client.cc:105] [C26] request reset
[2019-06-28 13:02:36.791][37][debug][router] [external/envoy/source/common/router/router.cc:671] [C21][S3699665653477458718] upstream reset: reset reason connection termination
[2019-06-28 13:02:36.791][37][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:1137] [C21][S3699665653477458718] Sending local reply with details upstream_reset_before_response_started{connection termination}
[2019-06-28 13:02:36.791][37][debug][filter] [src/envoy/http/mixer/filter.cc:141] Called Mixer::Filter : encodeHeaders 2
[2019-06-28 13:02:36.791][37][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:1329] [C21][S3699665653477458718] encoding headers via codec (end_stream=false):
‘:status’, ‘503’
‘content-length’, ’95’
‘content-type’, ‘text/plain’
‘date’, ‘Fri, 28 Jun 2019 13:02:36 GMT’
‘server’, ‘istio-envoy’ 

В приведенном выше журналеupstream reset: reset reason connection terminationЭто означает, что соединение в пуле соединений посланника было прервано;

Базовое решение

Для решения вышеуказанных проблем можно использовать следующие 4 метода оптимизации:
(1) Измените HTTPRetry (попытки, perTryTimeout, retryOn) в VirtualService и установите стратегию повтора ошибок.
(Примечание: вам необходимо установить тайм-аут в Envoy одновременно (ссылка на Envoy), то есть общее время повтора должно быть меньше тайм-аута,
HttpRoute.timeout необходимо установить одновременно в Istio);

(2) Измените HTTPSettings.idleTimeout в DestinationRule, чтобы установить время простоя кэширования соединений в пуле соединений envoy;

(3) Измените HTTPSettings.maxRequestsPerConnection в DestinationRule на 1 (закройте Keeplive, соединение не будет повторно использоваться и производительность снизится);

(4) Измените tomcat connectionTimeout (конфигурация Springboot server.connectionTimeout), чтобы увеличить время ожидания соединения для веб-контейнера;

В то же время вы можете обратиться к следующей статье, чтобы узнать о методах устранения неполадок 503 в Istio:

[Английская версия] Istio: 503’s с UC’s и TCP Fun Times

[Китайская версия] Istio: 503, UC и TCP

В целом расследование делится на 4 основных метода:

(1) Просмотр записей отслеживания JagerUI (установка теговerror=true);

(2) Просмотр метрик (Istio, Envoy);

(3) Просмотрите журнал отладки istio-proxy;

(4) захват сетевых пакетов;

Я использовал только методы (1) (3) (4) в самом процессе устранения неполадок.

JaggerUI

При использовании метода (1) Jagger для устранения проблем (вы можете временно установить PILOT_TRACE_SAMPLING на 100, то есть отслеживать все), вам необходимо обратить внимание на следующие моменты:

(1) Установите ошибку тегов = true в условиях запроса, чтобы быстро найти информацию для отслеживания ошибок;

(2) Обратите внимание на информацию response_flags в деталях отслеживания. Это поле указывает тип отказа ответа и может быстро определить причину отказа;

См. Описание response_flagsДокументация посланника:

журнал istio-proxy

В методе использования (3) установите уровень журнала istio-proxy на отладку (трассировку) и сосредоточьтесь на следующем содержимом журнала:

(1) код ответа HTTP, например «503»;

(2) Найдите соответствующий журнал над кодом ответа http (например, 503):upstream reset: reset reason connection termination, Причина неудачного позиционирования;

(3) Продолжайте искать способ подключения выше:using existing connection | creating a new connection(Существующее соединение ИЛИ новое соединение);

обычноУже подключенПроблема в том, что соединение, кэшированное в пуле соединений enovy, вначале недействительно, иНовое соединениеЕсли есть проблема, вам нужно найти другие причины. Ниже будет показано, с чем я столкнулся на практике.Новое соединениеОбъясните проблему;

Сетевой захват

Вы можете использовать плагин kubectl ksniff, но мне не удалось выполнить фактический процесс использования (проблема в том, что wirehark-gtk не запустился), поэтому была использована исходная команда tcpdump. Основные шаги следующие:

(1) Войдите в операционную среду контейнера приложения: kubectl exec -it xxx -c app -n tsp / bin / bash;

(2) Выполните команду tcpdump и выведите результат в файл: sudo tcpdump -ni lo port 8080 -vv -w my-packets.pcap;
-i определяет сетевую карту как lo (loopback) и наблюдает только за трафиком между локальным Envoy и приложением (Envoy и приложение находятся на одном хосте и обмениваются данными через localhost)
-n display ip (преобразовать домен в IP)
порт указывает, что отслеживается только порт 8080 (порт связи, доступный для приложения)
-vv показать подробную информацию
-w Указанный результат выводится в файл my-packet.pcap

(3) Войдите в рабочий узел модуля и скопируйте файл результатов my-packets.pcap на шаге (2) на узел узла через docker cp;

(4) Получите my-packets.pcap на хосте узла и просмотрите его через wirehark;

Примечание. Контейнер istio-proxy является файловой системой только для чтения и не может записывать файлы, поэтому выберите приложение для tcpdump в конкретном контейнере приложения;

Источник моей проблемы

После вышеупомянутого броска я изменил свои VirtualService и DestionationRule, но проблема 503. Я также рассмотрел, было ли это связано с ограничениями подключения хоста и настройками сети (ulimit, tcp_tw_recycle и т. Д.). Версия Istio была обновлена ​​(с 1.1.7 до 1.1.11, версия после 1.1.7 содержит исправление для ошибки 503), но, как бы сложно это ни было, версия 503 не была удалена;

И что странно, на гитхабе все говорилиusing existing connectionПроблема возникает, но яcreating a new connectionПроблема, мой полный журнал выглядит следующим образом:

[2019-07-16 08:59:23.853][31][debug][pool] [external/envoy/source/common/http/http1/conn_pool.cc:92] creating a new connection
[2019-07-16 08:59:23.853][31][debug][client] [external/envoy/source/common/http/codec_client.cc:26] [C297] connecting
[2019-07-16 08:59:23.853][31][debug][connection] [external/envoy/source/common/network/connection_impl.cc:644] [C297] connecting to 127.0.0.1:8080
[2019-07-16 08:59:23.853][31][debug][connection] [external/envoy/source/common/network/connection_impl.cc:653] [C297] connection in progress
[2019-07-16 08:59:23.853][31][debug][pool] [external/envoy/source/common/http/conn_pool_base.cc:20] queueing request due to no available connections
[2019-07-16 08:59:23.853][31][debug][filter] [src/envoy/http/mixer/filter.cc:94] Called Mixer::Filter : decodeData (84, false)
[2019-07-16 08:59:23.853][31][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:1040] [C93][S18065063288515590867] request end stream
[2019-07-16 08:59:23.853][31][debug][filter] [src/envoy/http/mixer/filter.cc:94] Called Mixer::Filter : decodeData (0, true)
[2019-07-16 08:59:23.853][31][debug][connection] [external/envoy/source/common/network/connection_impl.cc:526] [C297] delayed connection error: 111
[2019-07-16 08:59:23.853][31][debug][connection] [external/envoy/source/common/network/connection_impl.cc:183] [C297] closing socket: 0
[2019-07-16 08:59:23.853][31][debug][client] [external/envoy/source/common/http/codec_client.cc:82] [C297] disconnect. resetting 0 pending requests
[2019-07-16 08:59:23.853][31][debug][pool] [external/envoy/source/common/http/http1/conn_pool.cc:133] [C297] client disconnected, failure reason: 
[2019-07-16 08:59:23.853][31][debug][pool] [external/envoy/source/common/http/http1/conn_pool.cc:173] [C297] purge pending, failure reason: 
[2019-07-16 08:59:23.853][31][debug][router] [external/envoy/source/common/router/router.cc:644] [C93][S18065063288515590867] upstream reset: reset reason connection failure
[2019-07-16 08:59:23.853][31][debug][filter] [src/envoy/http/mixer/filter.cc:133] Called Mixer::Filter : encodeHeaders 2
[2019-07-16 08:59:23.853][31][trace][http] [external/envoy/source/common/http/conn_manager_impl.cc:1200] [C93][S18065063288515590867] encode headers called: filter=0x5c79f40 status=0
[2019-07-16 08:59:23.853][31][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:1305] [C93][S18065063288515590867] encoding headers via codec (end_stream=false):
‘:status’, ‘503’
‘content-length’, ’91’
‘content-type’, ‘text/plain’
‘date’, ‘Tue, 16 Jul 2019 08:59:23 GMT’
‘server’, ‘istio-envoy’

Через журнал я обнаружил, что моя проблема возникла, когда Enovy подключился к локальному приложению 127.0.0.1:8080 порту.connection failure, И response_flags в JaggerUI — это UF (сбой соединения с восходящей службой), и этот сбой является периодическим, иногда успешным, а иногда — неудачным;

В пятницу утром, когда погода была ясной (после почти недели метания> _ <|||), я заметил следующее явление:

Проверьте мой контейнер приложения через приложение docker ps | grep, почему все контейнеры приложения были активны 6 или 7 минут;

Похоже, проблема обнаружена. Так много контейнеров обычно работают в течение 6 или 7 минут, что означает, что контейнер приложения постоянно перезапускается. Причина перезапуска контейнера приложения заключается в том, что проверка работоспособности K8s не удалась. Сразу поехал проверять работоспособность K8s. Проверяем конфигурацию:

Порт, предоставленный контейнером, содержитPort = 8080, а tcpSocket.port, установленный в livenessProbe, равен 80. Эти два значения не совсем правильные, и из-за конфигурации проверки работоспособности:

Отложенное обнаружение 300 с (5 минут) + первая ошибка обнаружения + неудачная повторная попытка (3-1) раза * Интервал повторной попытки 60 с = 5 минут + 2 * 1 минута = более 7 минут (примерно от 7 до 8 минут)

В результате приложение будет обнаружено как неисправное через 7-8 минут, что приведет к тому, что контейнер приложения будет работать не более 8 минут, и он будет постоянно перезапускаться, а процесс перезапуска неизбежно приведет к тому, что посланник будет подключиться к приложению.connection failure, Существует также периодически возникающая проблема 503. В то же время, наблюдение, что интерфейсный интерфейс (синхронизированный пульс) запрашивает у серверной службы отчет за период времени 503, также согласуется со временем перезапуска контейнера приложения, кроме того подтверждающие причину сбоя подключения:

Ошибка конфигурации проверки работоспособностиВызвать непрерывный перезапуск контейнера приложения и вызвать его во время процесса перезапускаconnection failure

После изменения livenessProbe во всех развертываниях предыдущая проблема 503 исчезла …

Я могу снова пойти повеселиться в эти выходные …

подводить итоги

Из-за моей неосторожности была вызвана ошибка конфигурации проверки работоспособности, которая, в свою очередь, вызвала проблемы с Istio 503. У меня до сих пор нет полного понимания соответствующей конфигурации, и мне нужно углубить исследование;

Однако, устраняя проблему 503, я лучше понимаю метод устранения неполадок Isito, и я могу быстро найти проблему в будущем;

Не сдавайся легкомысленно …

Понравилась статья? Поделить с друзьями:
  • Upstream connect error or disconnect reset before headers reset reason connection failureперевести
  • Upstream connect error or disconnect reset before headers reset reason connection failure что это
  • Upstream connect error or disconnect reset before headers reset reason connection failure почему
  • Usb разъем плохой контакт как исправить
  • Upstream connect error or disconnect reset before headers reset reason connection failure перевод