Transport loopywriter run returning connection error desc transport is closing - Исправление ошибок и поиск оптимальных решений проблем

Hi ,

I am installing Istio in the Azure Kubernetes cluster (AKS) using istio operator. once the istio is installed, I am creating the virtual service to route the traffic. but from past few days, I am seeing Site is not reachable when I hit the URL of the service
and Ingress Gateway is printing the below logs.

[Envoy (Epoch 0)] [2020-11-30 07:45:54.938][26][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:54] Unable to establish new stream
2020-11-30T07:45:55.382650Z     info    sds     node:router~172.18.40.10~istio-ingressgateway-76d59b8f49-qsvp6.istio-system~istio-system.svc.cluster.local-1 resource:ROOTCA new connection
2020-11-30T07:45:55.382679Z     info    sds     node:router~172.18.40.10~istio-ingressgateway-76d59b8f49-qsvp6.istio-system~istio-system.svc.cluster.local-2 resource:default new connection
2020-11-30T07:45:55.558572Z     info    Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 1 successful, 0 rejected; lds updates: 0 successful, 0 rejected
2020-11-30T07:45:55.770409Z     info    cache   Root cert has changed, start rotating root cert for SDS clients
2020-11-30T07:45:55.770676Z     info    sds     node:router~172.18.40.10~istio-ingressgateway-76d59b8f49-qsvp6.istio-system~istio-system.svc.cluster.local-2 resource:default pushed key/cert pair to proxy
2020-11-30T07:45:55.770711Z     info    sds     node:router~172.18.40.10~istio-ingressgateway-76d59b8f49-qsvp6.istio-system~istio-system.svc.cluster.local-2 resource:default pushed secret
2020-11-30T07:45:55.783291Z     info    sds     node:router~172.18.40.10~istio-ingressgateway-76d59b8f49-qsvp6.istio-system~istio-system.svc.cluster.local-1 resource:ROOTCA pushed root cert to proxy
2020-11-30T07:45:55.783317Z     info    sds     node:router~172.18.40.10~istio-ingressgateway-76d59b8f49-qsvp6.istio-system~istio-system.svc.cluster.local-1 resource:ROOTCA pushed secret
2020-11-30T07:45:57.560354Z     info    Envoy proxy is ready
[Envoy (Epoch 0)] [2020-11-30 08:15:51.977][26][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:91] gRPC config stream closed: 13,
2020-11-30T08:16:39.176830Z     info    transport: loopyWriter.run returning. connection error: desc = "transport is closing"
2020-11-30T08:16:39.176923Z     info    pickfirstBalancer: HandleSubConnStateChange: 0xc00031df70, {CONNECTING <nil>}
2020-11-30T08:16:39.182270Z     info    pickfirstBalancer: HandleSubConnStateChange: 0xc00031df70, {READY <nil>}
[Envoy (Epoch 0)] [2020-11-30 08:46:20.185][26][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:91] gRPC config stream closed: 13,
2020-11-30T08:48:05.413973Z     info    transport: loopyWriter.run returning. connection error: desc = "transport is closing"
2020-11-30T08:48:05.414033Z     info    pickfirstBalancer: HandleSubConnStateChange: 0xc00031df70, {CONNECTING <nil>}
2020-11-30T08:48:05.419261Z     info    pickfirstBalancer: HandleSubConnStateChange: 0xc00031df70, {READY <nil>}
2020-11-30T09:17:55.718346Z     info    transport: loopyWriter.run returning. connection error: desc = "transport is closing"
2020-11-30T09:17:55.718396Z     info    pickfirstBalancer: HandleSubConnStateChange: 0xc00031df70, {CONNECTING <nil>}
2020-11-30T09:17:55.724485Z     info    pickfirstBalancer: HandleSubConnStateChange: 0xc00031df70, {READY <nil>}
[Envoy (Epoch 0)] [2020-11-30 09:18:36.376][26][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:91] gRPC config stream closed: 13,
2020-11-30T09:48:51.632720Z     info    transport: loopyWriter.run returning. connection error: desc = "transport is closing"
2020-11-30T09:48:51.632852Z     info    pickfirstBalancer: HandleSubConnStateChange: 0xc00031df70, {CONNECTING <nil>}
2020-11-30T09:48:51.638595Z     info    pickfirstBalancer: HandleSubConnStateChange: 0xc00031df70, {READY <nil>}
[Envoy (Epoch 0)] [2020-11-30 09:50:48.693][26][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:91] gRPC config stream closed: 13,
2020-11-30T10:20:10.098491Z     info    transport: loopyWriter.run returning. connection error: desc = "transport is closing"
2020-11-30T10:20:10.098549Z     info    pickfirstBalancer: HandleSubConnStateChange: 0xc00031df70, {CONNECTING <nil>}
2020-11-30T10:20:10.103931Z     info    pickfirstBalancer: HandleSubConnStateChange: 0xc00031df70, {READY <nil>}
[Envoy (Epoch 0)] [2020-11-30 10:23:30.186][26][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:91] gRPC config stream closed: 13,
2020-11-30T10:52:25.858932Z     info    transport: loopyWriter.run returning. connection error: desc = "transport is closing"
2020-11-30T10:52:25.858992Z     info    pickfirstBalancer: HandleSubConnStateChange: 0xc00031df70, {CONNECTING <nil>}
2020-11-30T10:52:25.864645Z     info    pickfirstBalancer: HandleSubConnStateChange: 0xc00031df70, {READY <nil>}
[Envoy (Epoch 0)] [2020-11-30 10:54:42.187][26][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:91] gRPC config stream closed: 13,
[Envoy (Epoch 0)] [2020-11-30 11:23:47.335][26][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:91] gRPC config stream closed: 13,

Versions

istio version : 1.5.5
kubenetes version:

Client Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.6-beta.0", GitCommit:"e7f962ba86f4ce7033828210ca3556393c377bcc", GitTreeState:"clean", BuildDate:"2020-01-15T08:26:26Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"windows/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.11", GitCommit:"3a3612132641768edd7f7e73d07772225817f630", GitTreeState:"clean", BuildDate:"2020-09-02T06:46:33Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

How it was Installed

I used Istio Operator to install the istio. the below is the yaml file

Profile used: default profile

apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
  namespace: istio-system
  name: istiocontrolplane
spec:
  profile: default
  components:
    pilot:
      k8s:
        resources:
          requests:
            cpu: 500m
            memory: 2048Mi
          limits:
            cpu: 2000m
            memory: 2048Mi
    base:
      enabled: true
    telemetry:
      enabled: true
      k8s:
        resources:
          requests:
            cpu: 500m
            memory: 2048Mi
          limits:
            cpu: 2000m
            memory: 2048Mi
    sidecarInjector:
      enabled: true
    ingressGateways:
    - name: istio-ingressgateway
      enabled: true
      k8s:
        hpaSpec:
          maxReplicas: 5
          minReplicas: 1
          scaleTargetRef:
            apiVersion: apps/v1
            kind: Deployment
            name: istio-ingressgateway
          metrics:
            - type: Resource
              resource:
                name: cpu
                targetAverageUtilization: 60
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 1500m
            memory: 1024Mi
        strategy:
          rollingUpdate:
            maxSurge: "100%"
            maxUnavailable: "25%"
    egressGateways:
    - name: istio-egressgateway
      enabled: false
    citadel:
      enabled: true
            
  addonComponents:
    grafana:
      enabled: true
      k8s:
        replicaCount: 1
    kiali:
      enabled: true
      k8s:
        replicaCount: 1
    tracing:
      enabled: true

and once the istio is installed succesfully, I am creating the virtual services to route. below is the sample virtual service file

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: demo-virtual-service
  namespace: istio-system
spec:
  hosts:
  - "*"
  gateways:
  - istio-ingressgateway
  http:
  - match:
    - uri:
        prefix: /demo/
    route:
    - destination:
        port: 
          number: 8080
        host: demo-service.demo.svc.cluster.local

Источник

Содержание

transport: loopyWriter.run returning. connection error: desc = «transport is closing» #401
Comments
1.19.2 kube-apiserver can’t write to etcd — «transport is closing» #95334
Comments
transport is closing error #2298
Comments
What version of gRPC are you using?
What version of Go are you using ( go version )?
What operating system (Linux, Windows, …) and version?
What did you do?
What did you expect to see?
grpc transport closing issues #1462
Comments
Client
Describe Your Environment
Expected Behavior
Actual Behavior
Footer
Go gRPC Noisy Transport Closing Log #2901
Comments
Footer

transport: loopyWriter.run returning. connection error: desc = «transport is closing» #401

kiam not work well.
k8s version «v1.14.6»
kiam deployed in the external openstack cloud (for get keys in kms from aws)
kiam version v3.5

maybe anyone can help to me?

The text was updated successfully, but these errors were encountered:

also i created roles by TF

hm . i found that wget -qO- http://169.254.169.254 geting info of internal openstack api.

UPD
i found problem

«error warming credentials: RequestError: send request failedncaused by: Post https://sts.amazonaws.com/: x509: certificate signed by unknown authority
and solve
it changed path

also in servers log i see

that they can get key

but the problem of

INFO: 2020/05/06 16:51:56 transport: loopyWriter.run returning. connection error: desc = «transport is closing»

WARNING: 2020/05/06 16:55:06 transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:443->127.0.0.1:53962: read: connection reset by peer

problem is saving

I found that #385 talked that still need access to host ec2 api.
Does this mean that Kiam can only work inside AWS?

Kiam does not need to be ran in AWS
What is the actual problem you’re seeing? you’ve posted some info and warning logs, but those don’t necessarily indicate any actual problems.
Are your Kiam servers running and passing their health checks? Are you seeing any error messages in the Kiam server? Are your Kiam agents running and passing their health checks? Are they reporting any errors in their logs?

Kiam does not need to be ran in AWS
What is the actual problem you’re seeing? you’ve posted some info and warning logs, but those don’t necessarily indicate any actual problems.
Are your Kiam servers running and passing their health checks? Are you seeing any error messages in the Kiam server? Are your Kiam agents running and passing their health checks? Are they reporting any errors in their logs?

Источник

1.19.2 kube-apiserver can’t write to etcd — «transport is closing» #95334

What happened: We run a self-hosted kubernetes v1.18.2 cluster on Flatcar Container Linux and are testing an upgrade to v1.19.2. While doing this on a separate staging cluster we noticed that the kube-apiserver has problems writing to etcd (meaning etcd is empty) without any obvious reason.

What you expected to happen: kube-apiserver running as smoothly as always without any errors.

How to reproduce it (as minimally and precisely as possible): Setup a new kubernetes v1.19.2 cluster with etcd 3.3.12

Anything else we need to know?: These are the logs from the kube-apiserver:

However the etcd cluster is healthy and I can read and write from a separate etcd pod inside the cluster:

I don’t think that the etcd version is playing a huge role since someone has reported the same issue (comment in #94155) with k8s v1.19.0 and etcd 3.4.13.

Environment:

Kubernetes version: v1.19.2
Cloud provider or hardware configuration: AWS
OS: Flatcar Container Linux 2605.6.0
Kernel: 5.4.66-flatcar
Install tools: Manually provisioned with Rancher kubelet container & official kube-* images
Network plugin and version (if this is a network-related bug): Flannel v0.11.0
Others:
- Etcd: 3.3.12

The text was updated successfully, but these errors were encountered:

Источник

transport is closing error #2298

What version of gRPC are you using?

What version of Go are you using ( go version )?

What operating system (Linux, Windows, …) and version?

MacOS and Linux. Problem appears on both OS.

What did you do?

The client connects to server and listens for data on the stream(server side streaming). The connection state shows READY and then transitions to TRANSIENT_FAILURE.

INFO: 2018/09/07 21:50:07 pickfirstBalancer: HandleSubConnStateChange: 0xc420566d00, CONNECTING
INFO: 2018/09/07 21:50:07 pickfirstBalancer: HandleSubConnStateChange: 0xc420566d00, READY
INFO: 2018/09/07 21:50:07 pickfirstBalancer: HandleSubConnStateChange: 0xc420566d00, TRANSIENT_FAILURE
INFO: 2018/09/07 21:50:07 pickfirstBalancer: HandleSubConnStateChange: 0xc420566d00, CONNECTING
INFO: 2018/09/07 21:50:07 pickfirstBalancer: HandleSubConnStateChange: 0xc420566d00, READY

After connection becomes READY, calling Recv() method on stream throws following error,
INFO: 2018/09/07 21:50:07 transport: loopyWriter.run returning. connection error: desc = «transport is closing»

The server does not start streaming as soon as client connects but rather waits for some other processing to happen and then starts streaming data. At the client side I tried to set keepalive parameter PermitWithoutStream to true but the error persists. I don’t override any other default client or server keepalive parameters.

What did you expect to see?

The text was updated successfully, but these errors were encountered:

Источник

grpc transport closing issues #1462

Client

Spanner
Cloud Tasks

Describe Your Environment

Google App Engine Flexible.

Pinned dependencies:
cloud.google.com/go v0.40.0
google.golang.org/grpc v1.21.1

Expected Behavior

No connection errors and GRPC disconnect errors.

Actual Behavior

Seeing both Spanner connection and Cloud Tasks connection errors that seem grpc related:

I enabled grpc logging and am observing several connections dying every few minutes:

At first it seems related to grpc/grpc-go#2663, but the that retry backoff issue is fixed in the grpc version we are using.

The text was updated successfully, but these errors were encountered:

There a wiiiiiide range of reasons explaining why the transport is closing:

A proxy is closing your connection.
You’re using an intermittently-failing network connection.
There’s a bug in gRPC, or our libraries.
etc.

Do you have any reason to suspect that this is a problem in our libraries, specifically?

There a wiiiiiide range of reasons explaining why the transport is closing:

A proxy is closing your connection.
You’re using an intermittently-failing network connection.
There’s a bug in gRPC, or our libraries.
etc.

Do you have any reason to suspect that this is a problem in our libraries, specifically?

We’re using this Google Cloud Library in Google App Engine Flexible app with no special configuration or anything. Everything is standard in this deployment. The application is running on the default GAE network and I don’t believe there is a proxy involved in network connections to the Cloud services. I can’t think of anything we’re doing that’s special that would cause it.

I don’t know if it’s specific to the GCloud libraries or gRPC, but I’ve also been discussing on this gRPC issue: grpc/grpc-go#2663.

I don’t know how to diagnose this.

Oh right, good points. Hmmm. I’m also not sure how to diagnose this. Let me check with some folks and see if I can find how to proceed, or who can help you out.

Do you have some code to help reproduce or help us figure out what’s happening? It looks like your context is being canceled — where is that happening? This would be somewhere in your code (the spanner/tasks library doesn’t/shouldn’t cancel any contexts, but just pass yours right through).

The «transport is closing» messages are normal operation. From the logs you pasted, it doesn’t look like they correlate to actual failed RPCs.

I should provide a more detailed explanation on the issues we’re encountering.

On 5/26/19, from 1:20AM to 1:44AM EST we saw a large number of errors relating to requests to Spanner and Cloud Tasks. They were all context cancellation related, e.g.:

Cloud Task errors:

We are not doing any explicit context cancellations in our HTTP server app. We are using the the Go http.Server to serve requests (which make RPC calls to Spanner+Cloud Tasks) with default configuration for timeouts:

We do configure a max idle connections on the http transport:

My thought was that the context cancellations are coming from HTTP request timeouts due to blocking/hanging RPC calls.

Also, this interval on 5/26/19 is the only extended period of time in which we had a burst of these errors. Perhaps there was a network issue during that time? I checked the Google Cloud service status dashboard and did not see any reported incidents during that time though.

Other than that period on 5/26/19, we see infrequent Cloud Task errors (one every couple of days) that are seemingly gRPC related:

Digging into those errors is how I arrived at the gRPC issue grpc/grpc-go#2663. I enabled gRPC logging and saw the GRPC transport disconnect messages I posted earlier:

I’m not sure if those transport disconnect/reconnection messages indicate any issue or they can be safely ignored, but this comment by a gRPC developer gave me pause: grpc/grpc-go#2663 (comment)

Maybe there isn’t a gRPC transport issue but we just had an isolated network issue on 5/26/19? I’m still surprised at the frequency of gRPC transport connection resets on our GAE Flexible app as I don’t see those occur as frequently when running from my local machine.

The codebase is large, but I could provide a sample app based on it if it would help.

Any help would be greatly appreciated.

I can reproduce this in a situation with very low load (in this case: 1 query per 5 minutes). And my log shows exactly the same as @mikesun’s, i.e. a TRANSIENT_FAILURE and then CONNECT, READY. The timing is also similar. The problem occurs at intervals of whole minutes, i.e. ending on :38 seconds every time. My hunch is that the log entries are cause by the session pool maintainer that runs exactly once per minute.

(The COL1, code:INT64 string_value:»1″ etc. is from my test query).

It seems that this is cause by Cloud Spanner closing the network connection serverside if there has been no activity for a certain amount of time. If I run the same test application against a local test server, I do not get the same messages. According to the code where the message is logged, it should also not be considered an error, hence also the fact that it is logged at INFO level.

Closing this issue as this is not something that is actionable in the client library.

You can’t perform that action at this time.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.

Источник

Go gRPC Noisy Transport Closing Log #2901

Go Version: go version go1.12.6 darwin/amd64
Go gRPC version: v1.22.0

I run a gRPC server like this:

I have some things registered to this server. Then I use gRPC cli to call an RPC.
Every time I do grpc_cli call localhost:9080 Live «» for instance, it says Rpc succeeded with OK status from gRPC cli which is good. But in my service log I see <«level»:»info»,»message»:»transport: loopyWriter.run returning. connection error: desc = «transport is closing»»,»system»:»grpc»,»grpc_log»:true>. There were previous issues for people seeing this, but they all just said «upgrade Go gRPC» and I did upgrade. I see it’s coming from here

infof ( «transport: loopyWriter.run returning. %v» , err )

Any help will be appreciated 🙂

I see there is a GRPC_GO_LOG_VERBOSITY_LEVEL which I can set to 1 to remove the log, but this does not work for me because i replace the logger with grpc zap and that does not respect the GRPC_GO_LOG_VERBOSITY_LEVEL.

I’m wondering if we can by default disable this log to all users if it is meant to happen every time you make a gRPC call (or make it a debug level log instead of info) or if I’m doing something wrong and it’s not suppose to happen all the time.

The text was updated successfully, but these errors were encountered:

This log happens when the underlying connection is closed (for many potential reasons). So it could be useful sometimes for debugging.

In your case, every time grpc_cli is called, it creates a new connection to the server, sends the RPC, and kills the connection. So it triggers this log a lot.
If your connections are stable, you shouldn’t see this log very often.

I agree it can be spamming, and that’s why we put it as info log at verbose level 2 (we don’t have debug level logs).

On the logging library. I would think it’s reasonable to ask for verbose level support in grpc zap (it’s still using v1 logger interface, verbose is part of v2).
Also, it should be pretty easy to wrap grpc zap into a v2 interface and add support for verbose yourself.

Thank you for your response @menghanl, this makes sense to me. I think the default verbose level is 0 when I take away the grpc_zap stuff it uses that and logging isn’t spammy. I can close this question. I opened this instead grpc-ecosystem/go-grpc-middleware#219. If you’re also maintainer at go grpc middleware let me know and I can take on this work.

@kushp Thanks for filing the issue. But grpc middleware is not maintained by us.

No worries, I got in touch with a maintainer of grpc middleware @menghanl ! I have this open PR grpc-ecosystem/go-grpc-middleware#221 which should fix my problem as it will implement V2 logger interface and adhere to verbosity etc.

You can’t perform that action at this time.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.

Источник

Description

Xingxing Xia

2021-02-02 11:09:53 UTC

Description of problem:
KAS/OAS/OauthAS logs spam: loopyWriter.run returning. connection error: desc = "transport is closing"
But seems these messages are not indicative of any problem we care about, so they quite confuse users

Version-Release number of selected component (if applicable):
4.7.0-0.nightly-2021-01-31-031653

How reproducible:
*Always*
The spam can be seen in any env on your hand, so will not attach must-gather

Steps to Reproduce:
1. $ oc logs --timestamps -n openshift-kube-apiserver -c kube-apiserver kube-apiserver-ip-10-143-1-199.ec2.internal > logs/kube-apiserver-ip-10-143-1-199.ec2.internal.log
$ oc logs --timestamps -n openshift-apiserver -c openshift-apiserver apiserver-6cc8bf5c8f-7gr7n > logs/apiserver-6cc8bf5c8f-7gr7n.log
$ oc logs --timestamps -n openshift-oauth-apiserver oauth-apiserver-6bd4fbd859-l2f8b > logs/oauth-apiserver-6bd4fbd859-l2f8b.log

2.
$ grep 'loopyWriter.run returning. connection error: desc = "transport is closing"' logs/kube-apiserver-ip-10-143-1-199.ec2.internal.log | wc -l
253
$ grep 'loopyWriter.run returning. connection error: desc = "transport is closing"' logs/apiserver-6cc8bf5c8f-7gr7n.log | wc -l
827
$ grep 'loopyWriter.run returning. connection error: desc = "transport is closing"' logs/oauth-apiserver-6bd4fbd859-l2f8b.log | wc -l
804

Take OAS for example:
$ grep 'loopyWriter.run returning. connection error: desc = "transport is closing"' logs/apiserver-6cc8bf5c8f-7gr7n.log | grep "^2021-02-02T09:50"
2021-02-02T09:50:02.522899544Z I0202 09:50:02.522863       1 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = "transport is closing"
2021-02-02T09:50:05.128339529Z I0202 09:50:05.128313       1 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = "transport is closing"
2021-02-02T09:50:13.314228703Z I0202 09:50:13.314204       1 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = "transport is closing"
2021-02-02T09:50:35.511715474Z I0202 09:50:35.511680       1 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = "transport is closing"
2021-02-02T09:50:45.454702141Z I0202 09:50:45.454672       1 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = "transport is closing"
2021-02-02T09:50:47.907317981Z I0202 09:50:47.907288       1 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = "transport is closing"

Actual results:
2. KAS/OAS/OauthAS spam such logs, they confuse users

Expected results:
2. No such confusing spam

Additional info:
Bug 1844448#c0 once ever showed such logs too, now not see:
$ oc logs --timestamps -n openshift-etcd-operator etcd-operator-fb8cbf695-hlzf2 > logs/etcd-operator-fb8cbf695-hlzf2.log
$ grep 'loopyWriter.run.*connection error' logs/etcd-operator-fb8cbf695-hlzf2.log # none

Also not see in etcd container logs.

Comment 3

Michal Fojtik

2021-03-07 20:26:55 UTC

This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Keywords if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.

Comment 4

Fatima

2021-04-06 08:21:58 UTC

We have a customer facing this for kube-apiserver pods. One of the pods show:

~~~
$ grep 'loopyWriter.run returning. connection error: desc = "transport is closing"' kas1.logs | wc -l
13583
~~~

where kas1 is "oc logs <podname> -c kube-apiserver > kas1.logs" for one of the masters. This master is showing more than 60% cpu consumption and specifically this pod is consuming more cpu.

If any information is required please let me know I'll update the bug.

Comment 5

Fatima

2021-04-08 00:39:29 UTC

Dear team, 

Any updates here?

Thanks.

Comment 6

Fatima

2021-04-16 05:38:03 UTC

Dear team, 

Any updates here?

Thanks.

Comment 14

Sam Batschelet

2021-06-18 20:47:14 UTC

We will keep the BZ open until it lands.

Comment 18

Sandeep

2021-07-07 06:25:20 UTC

steps followed are :

1. Install cluster version : 4.9.0-0.nightly-2021-07-06-025559
2. check the logs of apiserver and saw the  message "loopyWriter.run returning. connection error: desc = "transport is closing"

The same issue is also seen in Server Version: 4.7.0-0.nightly-2021-07-06-050841

Comment 20

Sam Batschelet

2021-07-07 09:55:47 UTC

1.22 rebase has not landed in openshift so we can retest when that happens.

Comment 21

Mridul Markandey

2021-07-12 19:14:38 UTC

We have a customer facing the same issue that the kube-api server is consuming max of CPU resources and the Kube-apiserver pods logs shows this output:
~~~
ccResolverWrapper: sending update to cc: {[{https://localhost:2379  <nil> 0 <nil>}] <nil> <nil>}
2021-07-09T06:36:50.297990740Z I0709 06:36:50.297885      17 clientconn.go:948] ClientConn switching balancer to "pick_first"
2021-07-09T06:36:50.300944216Z I0709 06:36:50.298272      17 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc02705cd60, {CONNECTING <nil>}
2021-07-09T06:36:50.318796017Z I0709 06:36:50.313817      17 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc02705cd60, {READY <nil>}
2021-07-09T06:36:50.318796017Z I0709 06:36:50.315980      17 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = "transport is closing"
2021-07-09T06:36:52.532112913Z W0709 06:36:52.531817      17 admission.go:98] ClusterQuotaAdmission received non-meta object *unstructured.Unstructured
2021-07-09T06:36:52.582434039Z W0709 06:36:52.582254      17 admission.go:98] ClusterQuotaAdmission received non-meta object *unstructured.Unstructured
2021-07-09T06:36:52.731282776Z W0709 06:36:52.731187      17 admission.go:98] ClusterQuotaAdmission received non-meta object *unstructured.Unstructured
2021-07-09T06:36:52.931948178Z W0709 06:36:52.931846      17 admission.go:98] ClusterQuotaAdmission received non-meta object *unstructured.Unstructured
2021-07-09T06:36:53.131199569Z W0709 06:36:53.131091      17 admission.go:98] ClusterQuotaAdmission received non-meta object *unstructured.Unstructured
2021-07-09T06:36:53.330927116Z W0709 06:36:53.330835      17 admission.go:98] ClusterQuotaAdmission received non-meta object *unstructured.Unstructured
2021-07-09T06:36:53.531801993Z W0709 06:36:53.531689      17 admission.go:98] ClusterQuotaAdmission received non-meta object *unstructured.Unstructured
~~~
Can you suggest a workaround for this? If more information is required please let me know I'll update the bug.

Comment 22

Mridul Markandey

2021-07-14 13:35:00 UTC

Hello Team, 

Any update on this? 

Thanks

Comment 23

Sam Batschelet

2021-07-14 13:47:10 UTC

There is no workaround for etcd client balancer gRPC logs in apiserver, we are going to improve this in 4.9.

>2021-07-09T06:36:50.297990740Z I0709 06:36:50.297885      17 clientconn.go:948] ClientConn switching balancer to "pick_first"
>2021-07-09T06:36:50.300944216Z I0709 06:36:50.298272      17 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc02705cd60, {CONNECTING <nil>}
>2021-07-09T06:36:50.318796017Z I0709 06:36:50.313817      17 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc02705cd60, {READY <nil>}
>2021-07-09T06:36:50.318796017Z I0709 06:36:50.315980      17 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = "transport is closing"

This is not root cause for CPU if you have issues around that you should start a new bug against apiserver.

Источник

1，etcdserver: user name is empty

The reason is that the user and password are not specified when the execution instruction is executed. Through-user specified, it is divided into two types, one is-user = root: 123, and the other is-user = root to interact with the instruction.

2，etcdserver: permission denied

Do not have the permissions of executing the instruction, pay attention to designate the user to execute, and is a user who has the operation of the key permissions.

3，context deadline exceeded

Two reasons:

1. The reason is that the address is wrong, that is, the Endpoint of the connection is wrong

2. It is caused by CANCEL () before the normal operation is completed

4，[transport] transport: loopyWriter.run returning. connection error: desc = «transport is closing»

andWith Maintenance Client: Context Canceled is similar.

Client is closed in the code, and the next time it continues to use the client.

with maintenance client: context canceled

5，grpc: the client connection is closing

The code is executed between two operationscancel()And next time you continue to use this CTX, the report is caused

6，etcdserver: root user does not have root role

This prompt is generally caused by creating a root user before opening the authentication, which can be ignored, because at this time the root role already exists.

If you want to deal with it, or when the above error appears, check the corner table and find that there is no root role, you can disable the certification and open the certification again, so that you will not report the wrong mistake.

Associated command

Open certification
etcdctl  --user=root:123  auth disable

 Close certification
etcdctl  --user=root:123  auth enable

 View authentication status
 ETCDCTL -user = root: 123 Auth Status Result (False indicates that the certification has been opened):
Authentication Status: true
AuthRevision: 2

 View the corner list
etcdctl  --user=root:123  role list

 View Root characters
etcdctl  --user=root:123  role get root

Источник

Пожалуйста, ответьте на эти вопросы, прежде чем отправлять свою проблему.

Какую версию gRPC вы используете?

1.18.0

Какую версию Go вы используете ( `go version` )?

1.11.5

Какая операционная система (Linux, Windows,…) и версия?

Alpine Linux 3.9 на движке Google Kubernetes Engine

Что ты сделал?

После обновления я заметил значительное увеличение количества ошибок grpc, приводящих к неудачным запросам и сбоям в обслуживании.

Мы не используем никаких настроек подключения на клиентах, и у нас нет никаких специальных настроек на сервере (keepalive, fastfail и т. Д.). Единственное, что у нас есть на соединениях, это Mutual TLS. Примерно это настроено так:

Клиент:

creds := grpc.WithTransportCredentials(credentials.NewTLS(&tls.Config{
        RootCAs:      certPool,
        Certificates: []tls.Certificate{cert},
        MinVersion:   tls.VersionTLS12,
}))
grpc.Dial("foo.foo.svc.cluster.local:80", creds)

Сервер:

creds := grpc.Creds(credentials.NewTLS(&tls.Config{
    Certificates:             []tls.Certificate{cert},
    ClientCAs:                certPool,
    ClientAuth:               tls.RequireAndVerifyClientCert,
    MinVersion:               tls.VersionTLS12,
    NextProtos:               []string{"h2"},
    PreferServerCipherSuites: true,
    CurvePreferences: []tls.CurveID{
        tls.CurveP256,
        tls.X25519,
    },
    CipherSuites: []uint16{
        tls.TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,
        tls.TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,
        tls.TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,
        tls.TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,
        tls.TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,
        tls.TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,
    },
}))
grpc.NewServer(creds)

Это может быть связано с одной из этих двух других проблем.
https://github.com/grpc/grpc-go/issues/2653
https://github.com/grpc/grpc-go/issues/2636

Основываясь на отзывах по одной из связанных выше проблем, я установил эти переменные ENV для одной службы, у которой было много сбоев: GRPC_GO_LOG_VERBOSITY_LEVEL=99 GRPC_GO_LOG_SEVERITY_LEVEL=info

Я добавил (__GRPC CALL HERE__) в конец строк журнала ниже, где ошибка была зарегистрирована как неудачный вызов grpc, который мы сделали.

WARNING: 2019/03/01 04:41:42 grpc: addrConn.createTransport failed to connect to {foo.foo.svc.cluster.local:80 0  <nil>}. Err :connection error: desc = "transport: authentication handshake failed: context deadline exceeded". Reconnecting...
INFO: 2019/03/01 04:41:42 Subchannel Connectivity change to TRANSIENT_FAILURE
INFO: 2019/03/01 04:41:42 pickfirstBalancer: HandleSubConnStateChange: 0xc000806290, TRANSIENT_FAILURE
timestamp=2019-03-01T04:41:42.887562062Z level=error error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: authentication handshake failed: context deadline exceeded"" (__GRPC CALL HERE))
INFO: 2019/03/01 04:41:43 Subchannel Connectivity change to CONNECTING
INFO: 2019/03/01 04:41:43 pickfirstBalancer: HandleSubConnStateChange: 0xc000806290, CONNECTING
INFO: 2019/03/01 04:41:44 Subchannel Connectivity change to READY
INFO: 2019/03/01 04:41:44 pickfirstBalancer: HandleSubConnStateChange: 0xc000806290, READY
INFO: 2019/03/01 04:45:22 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2019/03/01 04:45:22 Subchannel Connectivity change to TRANSIENT_FAILURE
INFO: 2019/03/01 04:45:22 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2019/03/01 04:45:22 Subchannel Connectivity change to TRANSIENT_FAILURE
INFO: 2019/03/01 04:45:23 Subchannel Connectivity change to CONNECTING
INFO: 2019/03/01 04:45:23 Subchannel Connectivity change to CONNECTING
INFO: 2019/03/01 04:45:28 Subchannel Connectivity change to READY
INFO: 2019/03/01 04:45:28 Subchannel Connectivity change to READY

INFO: 2019/03/01 05:34:23 Subchannel Connectivity change to CONNECTING
INFO: 2019/03/01 05:34:23 Subchannel Connectivity change to READY
INFO: 2019/03/01 05:35:25 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2019/03/01 05:35:25 Subchannel Connectivity change to TRANSIENT_FAILURE
INFO: 2019/03/01 05:35:27 Subchannel Connectivity change to CONNECTING
INFO: 2019/03/01 05:35:32 Subchannel Connectivity change to READY
INFO: 2019/03/01 05:35:59 Subchannel Connectivity change to TRANSIENT_FAILURE
INFO: 2019/03/01 05:35:59 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2019/03/01 05:36:01 Subchannel Connectivity change to CONNECTING
INFO: 2019/03/01 05:36:01 Subchannel Connectivity change to READY
INFO: 2019/03/01 05:38:19 Subchannel Connectivity change to TRANSIENT_FAILURE
INFO: 2019/03/01 05:38:19 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2019/03/01 05:38:19 Subchannel Connectivity change to TRANSIENT_FAILURE
INFO: 2019/03/01 05:38:19 pickfirstBalancer: HandleSubConnStateChange: 0xc0008293d0, TRANSIENT_FAILURE
INFO: 2019/03/01 05:38:19 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
timestamp=2019-03-01T05:38:19.972966567Z level=error error="rpc error: code = Unavailable desc = transport is closing" (__GRPC CALL HERE__)
INFO: 2019/03/01 05:38:19 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2019/03/01 05:38:19 Subchannel Connectivity change to TRANSIENT_FAILURE
INFO: 2019/03/01 05:38:19 pickfirstBalancer: HandleSubConnStateChange: 0xc000806290, TRANSIENT_FAILURE
INFO: 2019/03/01 05:38:21 Subchannel Connectivity change to CONNECTING
INFO: 2019/03/01 05:38:21 pickfirstBalancer: HandleSubConnStateChange: 0xc0008293d0, CONNECTING
INFO: 2019/03/01 05:38:21 Subchannel Connectivity change to CONNECTING
INFO: 2019/03/01 05:38:21 pickfirstBalancer: HandleSubConnStateChange: 0xc000806290, CONNECTING
INFO: 2019/03/01 05:38:21 Subchannel Connectivity change to CONNECTING
INFO: 2019/03/01 05:38:21 Subchannel Connectivity change to READY
INFO: 2019/03/01 05:38:21 transport: loopyWriter.run returning. connection error: desc = "transport is closing"

INFO: 2019/03/01 16:13:05 Subchannel Connectivity change to TRANSIENT_FAILURE
INFO: 2019/03/01 16:13:05 pickfirstBalancer: HandleSubConnStateChange: 0xc0008293d0, TRANSIENT_FAILURE
INFO: 2019/03/01 16:13:05 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
timestamp=2019-03-01T16:13:05.92908361Z level=error error="rpc error: code = Unavailable desc = transport is closing" (__GRPC CALL HERE__)
INFO: 2019/03/01 16:13:07 Subchannel Connectivity change to CONNECTING
INFO: 2019/03/01 16:13:07 pickfirstBalancer: HandleSubConnStateChange: 0xc0008293d0, CONNECTING
INFO: 2019/03/01 16:13:07 Subchannel Connectivity change to READY
INFO: 2019/03/01 16:13:07 pickfirstBalancer: HandleSubConnStateChange: 0xc0008293d0, READY
INFO: 2019/03/01 16:20:17 Subchannel Connectivity change to TRANSIENT_FAILURE
INFO: 2019/03/01 16:20:17 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2019/03/01 16:20:18 Subchannel Connectivity change to CONNECTING
INFO: 2019/03/01 16:20:23 Subchannel Connectivity change to READY
INFO: 2019/03/01 16:42:24 Subchannel Connectivity change to TRANSIENT_FAILURE
INFO: 2019/03/01 16:42:24 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2019/03/01 16:42:25 Subchannel Connectivity change to CONNECTING
INFO: 2019/03/01 16:42:25 Subchannel Connectivity change to READY
INFO: 2019/03/01 17:20:47 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2019/03/01 17:20:47 Subchannel Connectivity change to TRANSIENT_FAILURE
INFO: 2019/03/01 17:20:49 Subchannel Connectivity change to CONNECTING
INFO: 2019/03/01 17:20:54 Subchannel Connectivity change to READY
INFO: 2019/03/01 17:43:01 Subchannel Connectivity change to TRANSIENT_FAILURE
INFO: 2019/03/01 17:43:01 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2019/03/01 17:43:02 Subchannel Connectivity change to CONNECTING
INFO: 2019/03/01 17:43:07 Subchannel Connectivity change to READY

Все 18 Комментарий

Можете ли вы также получить журналы на стороне сервера, чтобы узнать, содержит ли он дополнительную информацию?

Ошибка «Превышен крайний срок рукопожатия» выглядит очень подозрительно. Было бы полезно, если бы мы знали, почему это произошло.

Еще одна вещь, которую стоит попробовать: вы можете установить GRPC_GO_REQUIRE_HANDSHAKE=off на стороне клиента и посмотреть, изменится ли что-нибудь? (# 2406)

@menghanl У меня здесь

Пример программы

клиент

package main

import (
    "context"
    "log"
    "time"

    "github.com/golang/glog"
    "github.com/kazegusuri/grpc-test/pb"
    "google.golang.org/grpc"
    _ "google.golang.org/grpc/grpclog/glogger"
)

func main() {
    ctx := context.Background()
    conn1, err := grpc.DialContext(ctx, "127.0.0.1:9000", grpc.WithInsecure())
    if err != nil {
        log.Fatalf("grpc.DialContext failed: %v", err)
    }
    defer func() { _ = conn1.Close() }()

    cli1 := pb.NewEchoClient(conn1)
    for {
        time.Sleep(1 * time.Millisecond)
        _, err := cli1.Echo(ctx, &pb.EchoMessage{Message: "hello"})
        if err != nil {
            glog.Infof("goroutine: echo failed: %v", err)
        } else {
            glog.Infof("goroutine: echo succeeded")
        }
    }
}

сервер

func main() {
    server := newServer()
    s := grpc.NewServer(grpc.KeepaliveParams(keepalive.ServerParameters{
        MaxConnectionAge: 10 * time.Second, // this kills connection 10s after connected
    }))
    pb.RegisterEchoServer(s, server)

    lis, err := net.Listen("tcp", ":9000")
    if err != nil {
        log.Fatalf("failed to listen: %v", err)
    }

    if err := s.Serve(lis); err != nil {
        log.Fatalf("err %vn", err)
    }

    select {}
}

Полученные результаты

С указанными выше программами я запускаю сервер с версией 1.19.0 и клиент с версией 1.17, 1.18, 1.19.
Сервер прерывает соединение с клиентом через 10 секунд после подключения.
Клиент должен повторно подключиться к серверу, но его поведение отличается в клиентских версиях.
В версиях 1.18 и 1.19 клиенту требуется более 1 секунды для повторного подключения сервера.

v1.17.0 (default)

I0303 02:22:59.745529    6773 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc00001e030, TRANSIENT_FAILURE
I0303 02:22:59.745566    6773 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc00001e030, CONNECTING
I0303 02:22:59.745732    6773 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc00001e030, READY

v1.18.0 (default)

I0303 02:20:58.820793    5169 clientconn.go:941] Subchannel Connectivity change to TRANSIENT_FAILURE
I0303 02:20:58.820892    5169 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc00016e010, TRANSIENT_FAILURE
I0303 02:20:59.821028    5169 clientconn.go:941] Subchannel Connectivity change to CONNECTING
I0303 02:20:59.821189    5169 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc00016e010, CONNECTING
I0303 02:20:59.822058    5169 clientconn.go:941] Subchannel Connectivity change to READY
I0303 02:20:59.822155    5169 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc00016e010, READY

v1.18.0 (GRPC_GO_REQUIRE_HANDSHAKE=off)

I0303 02:25:59.393033    8568 clientconn.go:941] Subchannel Connectivity change to TRANSIENT_FAILURE
I0303 02:25:59.393096    8568 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc00017c010, TRANSIENT_FAILURE
I0303 02:26:00.393385    8568 clientconn.go:941] Subchannel Connectivity change to CONNECTING
I0303 02:26:00.393510    8568 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc00017c010, CONNECTING
I0303 02:26:00.393937    8568 clientconn.go:941] Subchannel Connectivity change to READY
I0303 02:26:00.393979    8568 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc00017c010, READY


v1.19.0 (default)

I0303 02:28:57.655932   11292 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc000160010, TRANSIENT_FAILURE
I0303 02:28:58.656125   11292 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc000160010, CONNECTING
I0303 02:28:58.656489   11292 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc000160010, READY

v1.19.0 (GRPC_GO_REQUIRE_HANDSHAKE=off)

I0303 02:30:30.642447   11438 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc000162010, TRANSIENT_FAILURE
I0303 02:30:31.642680   11438 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc000162010, CONNECTING
I0303 02:30:31.643004   11438 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc000162010, READY

Это вызывает снижение производительности на сервере, который предоставляет одно вспомогательное соединение, такое как Google Spanner.
Spanner, кажется, уничтожает соединения на стороне сервера за час, затем соединение становится временным отказом. В течение секунды временного сбоя до готовности соединение остается недоступным.

Если можно попытаться создать транспорт сразу после временного сбоя для первой попытки, как насчет добавления здесь такого кода?

if ac.backoffIdx == 0 {
    backoffFor = 0
}

У меня есть вопрос.
Я нашел baseDelay = 1.0 * time.Second в здесь .
Думаю, это значение немного великовато. Почему по умолчанию используется 1 секунда?
1 секунда — это слишком много для пользовательского интерфейса.

Я столкнулся с той же проблемой, используя клиент gc bigtable go (на основе grpc), сервер bigtable grpc, похоже, имеет максимальный возраст подключения в один час, что означает, что каждый час я буквально прекращаю обрабатывать свой трафик на одну секунду.

Я согласен с @vvakame, это значение по умолчанию слишком велико для многих случаев использования и должно быть легко настроено в других клиентах Google на основе этого репо.

Ошибка в логике переподключения, время отсрочки не сбрасывается после создания подключения.

Исправление находится в # 2669. Можете ли вы, ребята, попробовать это и посмотреть, решит ли он проблемы, которые вы видите? Благодаря!

Закрытие. Прокомментируйте, если у вас все еще возникают проблемы.

Надеюсь, что нет, поскольку я собираюсь убрать эту опцию в ближайшее время (на этой или на следующей неделе). Пожалуйста, дайте мне знать, если у вас возникнут проблемы с его отключением.

@dfawley

Похоже, это все еще проблема для нас. Я сообщил об этом в службу поддержки Google Cloud, которая указала мне на эту ветку и предложила обходной путь — установить GRPC_GO_REQUIRE_HANDSHAKE=off . Похоже, что это уже не так.

Мы запускаем гибкое приложение Golang Google App Engine с Go v1.11 и grpc-gio v1.20.0.

С помощью Spanner мы видим таймауты контекста:

spanner: code = “Canceled”, desc = “context canceled, lastErr is <<nil>>”

В облачных задачах мы видим следующие ошибки RPC:

all SubConns are in TransientFailure, latest connection error:

Эти ошибки возникают не постоянно, а только периодически и пачками.

Эти ошибки возникают не постоянно, а только периодически и пачками.

Как долго длятся эти всплески и как часто они возникают?

Помогает ли вам установка GRPC_GO_REQUIRE_HANDSHAKE=off ?

Эти ошибки возникают не постоянно, а только периодически и пачками.

Как долго длятся эти всплески и как часто они возникают?

В настоящее время у нас нет производственной среды с какой-либо значимой нагрузкой, но в пробном производственном цикле, который мы запускали в течение 3 дней, у нас была одна серия ошибок в течение ~ 20 минут. В противном случае мы видим это реже одного раза в день, в течение минуты.

Я включил ведение журнала GRPC в одном из наших развертываний, и, похоже, я постоянно получаю ошибки TRANSIENT_FAILURE независимо от того, установлен ли GRPC_GO_REQUIRE_HANDSHAKE=off .

grpc v1.21.1 и GRPC_GO_REQUIRE_HANDSHAKE=off

2019-06-14 20:50:14 default[20190614t163806]  INFO: 2019/06/14 20:50:14 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
2019-06-14 20:50:14 default[20190614t163806]  INFO: 2019/06/14 20:50:14 pickfirstBalancer: HandleSubConnStateChange: 0xc000239980, TRANSIENT_FAILURE
2019-06-14 20:50:14 default[20190614t163806]  INFO: 2019/06/14 20:50:14 pickfirstBalancer: HandleSubConnStateChange: 0xc000239980, CONNECTING
2019-06-14 20:50:14 default[20190614t163806]  INFO: 2019/06/14 20:50:14 pickfirstBalancer: HandleSubConnStateChange: 0xc000239980, READY
2019-06-14 20:50:14 default[20190614t163806]  INFO: 2019/06/14 20:50:14 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
2019-06-14 20:50:14 default[20190614t163806]  INFO: 2019/06/14 20:50:14 pickfirstBalancer: HandleSubConnStateChange: 0xc000239770, TRANSIENT_FAILURE
2019-06-14 20:50:14 default[20190614t163806]  INFO: 2019/06/14 20:50:14 pickfirstBalancer: HandleSubConnStateChange: 0xc000239770, CONNECTING
2019-06-14 20:50:14 default[20190614t163806]  INFO: 2019/06/14 20:50:14 pickfirstBalancer: HandleSubConnStateChange: 0xc000239770, READY
2019-06-14 20:50:14 default[20190614t163806]  INFO: 2019/06/14 20:50:14 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
2019-06-14 20:50:14 default[20190614t163806]  INFO: 2019/06/14 20:50:14 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, TRANSIENT_FAILURE
2019-06-14 20:50:14 default[20190614t163806]  INFO: 2019/06/14 20:50:14 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, CONNECTING
2019-06-14 20:50:14 default[20190614t163806]  INFO: 2019/06/14 20:50:14 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, READY
2019-06-14 20:50:14 default[20190614t163806]  INFO: 2019/06/14 20:50:14 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
2019-06-14 20:50:14 default[20190614t163806]  INFO: 2019/06/14 20:50:14 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, TRANSIENT_FAILURE
2019-06-14 20:50:14 default[20190614t163806]  INFO: 2019/06/14 20:50:14 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, CONNECTING
2019-06-14 20:50:14 default[20190614t163806]  INFO: 2019/06/14 20:50:14 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, READY

Повторное подключение происходит немедленно, и я не испытывал сбоев RPC с all SubConns are in TransientFailure , так что я думаю, это приемлемое поведение?

Мы настраиваем http.DefaultTransport.(*http.Transport).MaxIdleConnsPerHost = 500 , это будет проблемой?

@mikesun

Полагаю, это приемлемое поведение?

Это ненормально, что ваше соединение умирает так часто (несколько раз в секунду). Это происходит постоянно или только эти записи? К сожалению, ведение журнала не позволяет понять, почему закрывается соединение. Но он переходит в состояние READY, что означает, по крайней мере, что TLS был успешным.

Мы настраиваем http.DefaultTransport. (* Http.Transport) .MaxIdleConnsPerHost = 500, это будет проблемой?

Настройки HTTP-сервера Go не должны влиять на grpc-go — у нас есть собственная реализация сервера.

Это ненормально, что ваше соединение умирает так часто (несколько раз в секунду). Это происходит постоянно или только эти записи? К сожалению, ведение журнала не позволяет понять, почему закрывается соединение. Но он переходит в состояние READY, что означает, по крайней мере, что TLS был успешным.

Кажется, что это происходит группами (~ 3 / сек) каждые несколько минут (1-4 мин). Это гибкий экземпляр GAE практически без нагрузки:

A  INFO: 2019/06/17 16:10:15 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
A  INFO: 2019/06/17 16:10:15 pickfirstBalancer: HandleSubConnStateChange: 0xc000239900, TRANSIENT_FAILURE
A  INFO: 2019/06/17 16:10:15 pickfirstBalancer: HandleSubConnStateChange: 0xc000239900, CONNECTING
A  INFO: 2019/06/17 16:10:15 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
A  INFO: 2019/06/17 16:10:15 pickfirstBalancer: HandleSubConnStateChange: 0xc000239870, TRANSIENT_FAILURE
A  INFO: 2019/06/17 16:10:15 pickfirstBalancer: HandleSubConnStateChange: 0xc000239870, CONNECTING
A  INFO: 2019/06/17 16:10:15 pickfirstBalancer: HandleSubConnStateChange: 0xc000239900, READY
A  INFO: 2019/06/17 16:10:15 pickfirstBalancer: HandleSubConnStateChange: 0xc000239870, READY
A  INFO: 2019/06/17 16:10:37 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
A  INFO: 2019/06/17 16:10:37 pickfirstBalancer: HandleSubConnStateChange: 0xc000238ec0, TRANSIENT_FAILURE
A  INFO: 2019/06/17 16:10:37 pickfirstBalancer: HandleSubConnStateChange: 0xc000238ec0, CONNECTING
A  INFO: 2019/06/17 16:10:38 pickfirstBalancer: HandleSubConnStateChange: 0xc000238ec0, READY
A  INFO: 2019/06/17 16:10:38 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
A  INFO: 2019/06/17 16:10:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, TRANSIENT_FAILURE
A  INFO: 2019/06/17 16:10:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, CONNECTING
A  INFO: 2019/06/17 16:10:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, READY
A  INFO: 2019/06/17 16:10:38 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
A  INFO: 2019/06/17 16:10:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, TRANSIENT_FAILURE
A  INFO: 2019/06/17 16:10:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, CONNECTING
A  INFO: 2019/06/17 16:10:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, READY
A  INFO: 2019/06/17 16:14:38 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
A  INFO: 2019/06/17 16:14:38 pickfirstBalancer: HandleSubConnStateChange: 0xc000238ec0, TRANSIENT_FAILURE
A  INFO: 2019/06/17 16:14:38 pickfirstBalancer: HandleSubConnStateChange: 0xc000238ec0, CONNECTING
A  INFO: 2019/06/17 16:14:38 pickfirstBalancer: HandleSubConnStateChange: 0xc000238ec0, READY
A  INFO: 2019/06/17 16:14:38 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
A  INFO: 2019/06/17 16:14:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, TRANSIENT_FAILURE
A  INFO: 2019/06/17 16:14:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, CONNECTING
A  INFO: 2019/06/17 16:14:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, READY
A  INFO: 2019/06/17 16:14:38 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
A  INFO: 2019/06/17 16:14:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, TRANSIENT_FAILURE
A  INFO: 2019/06/17 16:14:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, CONNECTING
A  INFO: 2019/06/17 16:14:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, READY
A  INFO: 2019/06/17 16:18:38 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
A  INFO: 2019/06/17 16:18:38 pickfirstBalancer: HandleSubConnStateChange: 0xc000238ec0, TRANSIENT_FAILURE
A  INFO: 2019/06/17 16:18:38 pickfirstBalancer: HandleSubConnStateChange: 0xc000238ec0, CONNECTING
A  INFO: 2019/06/17 16:18:38 pickfirstBalancer: HandleSubConnStateChange: 0xc000238ec0, READY
A  INFO: 2019/06/17 16:18:38 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
A  INFO: 2019/06/17 16:18:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, TRANSIENT_FAILURE
A  INFO: 2019/06/17 16:18:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, CONNECTING
A  INFO: 2019/06/17 16:18:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, READY
A  INFO: 2019/06/17 16:18:38 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
A  INFO: 2019/06/17 16:18:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, TRANSIENT_FAILURE
A  INFO: 2019/06/17 16:18:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, CONNECTING
A  INFO: 2019/06/17 16:18:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, READY
A  INFO: 2019/06/17 16:22:38 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
A  INFO: 2019/06/17 16:22:38 pickfirstBalancer: HandleSubConnStateChange: 0xc000238ec0, TRANSIENT_FAILURE
A  INFO: 2019/06/17 16:22:38 pickfirstBalancer: HandleSubConnStateChange: 0xc000238ec0, CONNECTING
A  INFO: 2019/06/17 16:22:38 pickfirstBalancer: HandleSubConnStateChange: 0xc000238ec0, READY
A  INFO: 2019/06/17 16:22:38 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
A  INFO: 2019/06/17 16:22:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, TRANSIENT_FAILURE
A  INFO: 2019/06/17 16:22:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, CONNECTING
A  INFO: 2019/06/17 16:22:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, READY
A  INFO: 2019/06/17 16:22:38 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
A  INFO: 2019/06/17 16:22:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, TRANSIENT_FAILURE
A  INFO: 2019/06/17 16:22:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, CONNECTING
A  INFO: 2019/06/17 16:22:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, READY

@mikesun — последнее, о чем я могу подумать: убедитесь, что вы не используете версию grpc-go между # 2740 (82fdf27 — 21 день назад) и # 2862 (a1d4c28 — 10 дней назад). У нас была ошибка в этом диапазоне, которая потенциально могла вызвать подобные проблемы.

Если это не так, то скорее всего проблема не на стороне клиента. Журналы со стороны сервера / прокси потребуются для дальнейшей отладки. Кроме того, поскольку ваши проблемы не связаны с этой проблемой, не могли бы вы сообщить о новой проблеме, если подозреваете, что проблема связана с grpc-go?

Была ли эта страница полезной?

0 / 5 — 0 рейтинги

Источник

I got an issue with gRPC service. I’m using grpc-web on my client side Angular application.

I constantly receiving CORS error:

has been blocked by CORS policy: No ‘Access-Control-Allow-Origin’ header is present on the requested resource. If an opaque response serves your needs, set the request’s mode to ‘no-cors’ to fetch the resource with CORS disabled.

and also sometimes I do receive the following message on the grpc server:

transport: loopyWriter.run returning. connection error: desc = «transport is closing»

and the logging shows that grpc request received but no response sent back to the client app

Here are my configs:

Consul LDS:

[{
	"name": "listener_0",
	"address": {
		"socket_address": {
			"address": "0.0.0.0",
			"port_value": 8080
		}
	},
	"filter_chains": [{
		"filters": [{
			"name": "envoy.http_connection_manager",
			"config": {
				"codec_type": "auto",
				"stat_prefix": "ingress_http",
				"route_config": {
					"name": "local_route",
					"virtual_hosts": [{
						"name": "local_service",
						"domains": [
							"*"
						],
						"routes": [{
							"match": {
								"prefix": "/App.Grpc.MyApp1"
							},
							"route": {
								"cluster": "main_service",
								"max_grpc_timeout": "0s"
							}
						}],
            "cors": {
              "allow_origin": [
                "*"
              ],
              "allow_methods": "GET, PUT, DELETE, POST, OPTIONS",
              "allow_headers": "keep-alive, user-agent, cache-control, content-type, content-transfer-encoding, Authorization, x-ip, x-accept-content-transfer-encoding, x-accept-response-streaming, x-user-agent, x-grpc-web, grpc-timeout",
              "max_age": "1728000",
              "expose_headers": "grpc-status, grpc-message",
              "enable": true
            }
					}]
				},
				"http_filters": [{
						"name": "envoy.cors"
					},
					{
						"name": "envoy.grpc_web"
					},
					{
						"name": "envoy.router"
					}
				]
			}
		}]
	}]
}]

Consul CDS:

[{
	"name": "MyApp1",
	"connect_timeout": "0.25s",
	"type": "logical_dns",
	"lb_policy": "round_robin",
	"http2_protocol_options": {},
	"hosts": [{
		"socket_address": {
			"address": "host.docker.internal", # also tried docker service name my_app
			"port_value": 50053
		}
	}]
}]

When I’m running envoy and go service without Pilot, everything works fine with exactly same configs, trying to figure out how to update envoy dynamically. Any help would be appreciated!

Источник

transport: loopyWriter.run returning. connection error: desc = «transport is closing» #401

1.19.2 kube-apiserver can’t write to etcd — «transport is closing» #95334

transport is closing error #2298

What version of gRPC are you using?

What version of Go are you using ( go version )?

What operating system (Linux, Windows, …) and version?

What did you do?

What did you expect to see?

grpc transport closing issues #1462

Client

Describe Your Environment

Expected Behavior

Actual Behavior

Go gRPC Noisy Transport Closing Log #2901

1，etcdserver: user name is empty

2，etcdserver: permission denied

3，context deadline exceeded

4，[transport] transport: loopyWriter.run returning. connection error: desc = «transport is closing»

5，grpc: the client connection is closing

6，etcdserver: root user does not have root role

Associated command

Какую версию gRPC вы используете?

Какую версию Go вы используете ( go version )?

Какая операционная система (Linux, Windows,…) и версия?

Что ты сделал?

Все 18 Комментарий

Пример программы

клиент

сервер

Полученные результаты

Читайте также:

Какую версию Go вы используете ( `go version` )?