Hi ,
I am installing Istio in the Azure Kubernetes cluster (AKS) using istio operator. once the istio is installed, I am creating the virtual service to route the traffic. but from past few days, I am seeing Site is not reachable when I hit the URL of the service
and Ingress Gateway is printing the below logs.
[Envoy (Epoch 0)] [2020-11-30 07:45:54.938][26][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:54] Unable to establish new stream
2020-11-30T07:45:55.382650Z info sds node:router~172.18.40.10~istio-ingressgateway-76d59b8f49-qsvp6.istio-system~istio-system.svc.cluster.local-1 resource:ROOTCA new connection
2020-11-30T07:45:55.382679Z info sds node:router~172.18.40.10~istio-ingressgateway-76d59b8f49-qsvp6.istio-system~istio-system.svc.cluster.local-2 resource:default new connection
2020-11-30T07:45:55.558572Z info Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 1 successful, 0 rejected; lds updates: 0 successful, 0 rejected
2020-11-30T07:45:55.770409Z info cache Root cert has changed, start rotating root cert for SDS clients
2020-11-30T07:45:55.770676Z info sds node:router~172.18.40.10~istio-ingressgateway-76d59b8f49-qsvp6.istio-system~istio-system.svc.cluster.local-2 resource:default pushed key/cert pair to proxy
2020-11-30T07:45:55.770711Z info sds node:router~172.18.40.10~istio-ingressgateway-76d59b8f49-qsvp6.istio-system~istio-system.svc.cluster.local-2 resource:default pushed secret
2020-11-30T07:45:55.783291Z info sds node:router~172.18.40.10~istio-ingressgateway-76d59b8f49-qsvp6.istio-system~istio-system.svc.cluster.local-1 resource:ROOTCA pushed root cert to proxy
2020-11-30T07:45:55.783317Z info sds node:router~172.18.40.10~istio-ingressgateway-76d59b8f49-qsvp6.istio-system~istio-system.svc.cluster.local-1 resource:ROOTCA pushed secret
2020-11-30T07:45:57.560354Z info Envoy proxy is ready
[Envoy (Epoch 0)] [2020-11-30 08:15:51.977][26][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:91] gRPC config stream closed: 13,
2020-11-30T08:16:39.176830Z info transport: loopyWriter.run returning. connection error: desc = "transport is closing"
2020-11-30T08:16:39.176923Z info pickfirstBalancer: HandleSubConnStateChange: 0xc00031df70, {CONNECTING <nil>}
2020-11-30T08:16:39.182270Z info pickfirstBalancer: HandleSubConnStateChange: 0xc00031df70, {READY <nil>}
[Envoy (Epoch 0)] [2020-11-30 08:46:20.185][26][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:91] gRPC config stream closed: 13,
2020-11-30T08:48:05.413973Z info transport: loopyWriter.run returning. connection error: desc = "transport is closing"
2020-11-30T08:48:05.414033Z info pickfirstBalancer: HandleSubConnStateChange: 0xc00031df70, {CONNECTING <nil>}
2020-11-30T08:48:05.419261Z info pickfirstBalancer: HandleSubConnStateChange: 0xc00031df70, {READY <nil>}
2020-11-30T09:17:55.718346Z info transport: loopyWriter.run returning. connection error: desc = "transport is closing"
2020-11-30T09:17:55.718396Z info pickfirstBalancer: HandleSubConnStateChange: 0xc00031df70, {CONNECTING <nil>}
2020-11-30T09:17:55.724485Z info pickfirstBalancer: HandleSubConnStateChange: 0xc00031df70, {READY <nil>}
[Envoy (Epoch 0)] [2020-11-30 09:18:36.376][26][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:91] gRPC config stream closed: 13,
2020-11-30T09:48:51.632720Z info transport: loopyWriter.run returning. connection error: desc = "transport is closing"
2020-11-30T09:48:51.632852Z info pickfirstBalancer: HandleSubConnStateChange: 0xc00031df70, {CONNECTING <nil>}
2020-11-30T09:48:51.638595Z info pickfirstBalancer: HandleSubConnStateChange: 0xc00031df70, {READY <nil>}
[Envoy (Epoch 0)] [2020-11-30 09:50:48.693][26][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:91] gRPC config stream closed: 13,
2020-11-30T10:20:10.098491Z info transport: loopyWriter.run returning. connection error: desc = "transport is closing"
2020-11-30T10:20:10.098549Z info pickfirstBalancer: HandleSubConnStateChange: 0xc00031df70, {CONNECTING <nil>}
2020-11-30T10:20:10.103931Z info pickfirstBalancer: HandleSubConnStateChange: 0xc00031df70, {READY <nil>}
[Envoy (Epoch 0)] [2020-11-30 10:23:30.186][26][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:91] gRPC config stream closed: 13,
2020-11-30T10:52:25.858932Z info transport: loopyWriter.run returning. connection error: desc = "transport is closing"
2020-11-30T10:52:25.858992Z info pickfirstBalancer: HandleSubConnStateChange: 0xc00031df70, {CONNECTING <nil>}
2020-11-30T10:52:25.864645Z info pickfirstBalancer: HandleSubConnStateChange: 0xc00031df70, {READY <nil>}
[Envoy (Epoch 0)] [2020-11-30 10:54:42.187][26][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:91] gRPC config stream closed: 13,
[Envoy (Epoch 0)] [2020-11-30 11:23:47.335][26][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:91] gRPC config stream closed: 13,
Versions
istio version : 1.5.5
kubenetes version:
Client Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.6-beta.0", GitCommit:"e7f962ba86f4ce7033828210ca3556393c377bcc", GitTreeState:"clean", BuildDate:"2020-01-15T08:26:26Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"windows/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.11", GitCommit:"3a3612132641768edd7f7e73d07772225817f630", GitTreeState:"clean", BuildDate:"2020-09-02T06:46:33Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
How it was Installed
I used Istio Operator to install the istio. the below is the yaml file
Profile used: default profile
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
namespace: istio-system
name: istiocontrolplane
spec:
profile: default
components:
pilot:
k8s:
resources:
requests:
cpu: 500m
memory: 2048Mi
limits:
cpu: 2000m
memory: 2048Mi
base:
enabled: true
telemetry:
enabled: true
k8s:
resources:
requests:
cpu: 500m
memory: 2048Mi
limits:
cpu: 2000m
memory: 2048Mi
sidecarInjector:
enabled: true
ingressGateways:
- name: istio-ingressgateway
enabled: true
k8s:
hpaSpec:
maxReplicas: 5
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: istio-ingressgateway
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 60
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 1500m
memory: 1024Mi
strategy:
rollingUpdate:
maxSurge: "100%"
maxUnavailable: "25%"
egressGateways:
- name: istio-egressgateway
enabled: false
citadel:
enabled: true
addonComponents:
grafana:
enabled: true
k8s:
replicaCount: 1
kiali:
enabled: true
k8s:
replicaCount: 1
tracing:
enabled: true
and once the istio is installed succesfully, I am creating the virtual services to route. below is the sample virtual service file
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: demo-virtual-service
namespace: istio-system
spec:
hosts:
- "*"
gateways:
- istio-ingressgateway
http:
- match:
- uri:
prefix: /demo/
route:
- destination:
port:
number: 8080
host: demo-service.demo.svc.cluster.local
Содержание
- transport: loopyWriter.run returning. connection error: desc = «transport is closing» #401
- Comments
- 1.19.2 kube-apiserver can’t write to etcd — «transport is closing» #95334
- Comments
- transport is closing error #2298
- Comments
- What version of gRPC are you using?
- What version of Go are you using ( go version )?
- What operating system (Linux, Windows, …) and version?
- What did you do?
- What did you expect to see?
- grpc transport closing issues #1462
- Comments
- Client
- Describe Your Environment
- Expected Behavior
- Actual Behavior
- Footer
- Go gRPC Noisy Transport Closing Log #2901
- Comments
- Footer
transport: loopyWriter.run returning. connection error: desc = «transport is closing» #401
kiam not work well.
k8s version «v1.14.6»
kiam deployed in the external openstack cloud (for get keys in kms from aws)
kiam version v3.5
maybe anyone can help to me?
The text was updated successfully, but these errors were encountered:
also i created roles by TF
hm . i found that wget -qO- http://169.254.169.254 geting info of internal openstack api.
UPD
i found problem
«error warming credentials: RequestError: send request failedncaused by: Post https://sts.amazonaws.com/: x509: certificate signed by unknown authority
and solve
it changed path
also in servers log i see
that they can get key
but the problem of
INFO: 2020/05/06 16:51:56 transport: loopyWriter.run returning. connection error: desc = «transport is closing»
WARNING: 2020/05/06 16:55:06 transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:443->127.0.0.1:53962: read: connection reset by peer
problem is saving
I found that #385 talked that still need access to host ec2 api.
Does this mean that Kiam can only work inside AWS?
Kiam does not need to be ran in AWS
What is the actual problem you’re seeing? you’ve posted some info and warning logs, but those don’t necessarily indicate any actual problems.
Are your Kiam servers running and passing their health checks? Are you seeing any error messages in the Kiam server? Are your Kiam agents running and passing their health checks? Are they reporting any errors in their logs?
Kiam does not need to be ran in AWS
What is the actual problem you’re seeing? you’ve posted some info and warning logs, but those don’t necessarily indicate any actual problems.
Are your Kiam servers running and passing their health checks? Are you seeing any error messages in the Kiam server? Are your Kiam agents running and passing their health checks? Are they reporting any errors in their logs?
Источник
1.19.2 kube-apiserver can’t write to etcd — «transport is closing» #95334
What happened: We run a self-hosted kubernetes v1.18.2 cluster on Flatcar Container Linux and are testing an upgrade to v1.19.2. While doing this on a separate staging cluster we noticed that the kube-apiserver has problems writing to etcd (meaning etcd is empty) without any obvious reason.
What you expected to happen: kube-apiserver running as smoothly as always without any errors.
How to reproduce it (as minimally and precisely as possible): Setup a new kubernetes v1.19.2 cluster with etcd 3.3.12
Anything else we need to know?: These are the logs from the kube-apiserver:
However the etcd cluster is healthy and I can read and write from a separate etcd pod inside the cluster:
I don’t think that the etcd version is playing a huge role since someone has reported the same issue (comment in #94155) with k8s v1.19.0 and etcd 3.4.13.
Environment:
- Kubernetes version: v1.19.2
- Cloud provider or hardware configuration: AWS
- OS: Flatcar Container Linux 2605.6.0
- Kernel: 5.4.66-flatcar
- Install tools: Manually provisioned with Rancher kubelet container & official kube-* images
- Network plugin and version (if this is a network-related bug): Flannel v0.11.0
- Others:
- Etcd: 3.3.12
The text was updated successfully, but these errors were encountered:
Источник
transport is closing error #2298
What version of gRPC are you using?
What version of Go are you using ( go version )?
What operating system (Linux, Windows, …) and version?
MacOS and Linux. Problem appears on both OS.
What did you do?
The client connects to server and listens for data on the stream(server side streaming). The connection state shows READY and then transitions to TRANSIENT_FAILURE.
INFO: 2018/09/07 21:50:07 pickfirstBalancer: HandleSubConnStateChange: 0xc420566d00, CONNECTING
INFO: 2018/09/07 21:50:07 pickfirstBalancer: HandleSubConnStateChange: 0xc420566d00, READY
INFO: 2018/09/07 21:50:07 pickfirstBalancer: HandleSubConnStateChange: 0xc420566d00, TRANSIENT_FAILURE
INFO: 2018/09/07 21:50:07 pickfirstBalancer: HandleSubConnStateChange: 0xc420566d00, CONNECTING
INFO: 2018/09/07 21:50:07 pickfirstBalancer: HandleSubConnStateChange: 0xc420566d00, READY
After connection becomes READY, calling Recv() method on stream throws following error,
INFO: 2018/09/07 21:50:07 transport: loopyWriter.run returning. connection error: desc = «transport is closing»
The server does not start streaming as soon as client connects but rather waits for some other processing to happen and then starts streaming data. At the client side I tried to set keepalive parameter PermitWithoutStream to true but the error persists. I don’t override any other default client or server keepalive parameters.
What did you expect to see?
The text was updated successfully, but these errors were encountered:
Источник
grpc transport closing issues #1462
Client
Spanner
Cloud Tasks
Describe Your Environment
Google App Engine Flexible.
Pinned dependencies:
cloud.google.com/go v0.40.0
google.golang.org/grpc v1.21.1
Expected Behavior
No connection errors and GRPC disconnect errors.
Actual Behavior
Seeing both Spanner connection and Cloud Tasks connection errors that seem grpc related:
I enabled grpc logging and am observing several connections dying every few minutes:
At first it seems related to grpc/grpc-go#2663, but the that retry backoff issue is fixed in the grpc version we are using.
The text was updated successfully, but these errors were encountered:
There a wiiiiiide range of reasons explaining why the transport is closing:
- A proxy is closing your connection.
- You’re using an intermittently-failing network connection.
- There’s a bug in gRPC, or our libraries.
- etc.
Do you have any reason to suspect that this is a problem in our libraries, specifically?
There a wiiiiiide range of reasons explaining why the transport is closing:
- A proxy is closing your connection.
- You’re using an intermittently-failing network connection.
- There’s a bug in gRPC, or our libraries.
- etc.
Do you have any reason to suspect that this is a problem in our libraries, specifically?
We’re using this Google Cloud Library in Google App Engine Flexible app with no special configuration or anything. Everything is standard in this deployment. The application is running on the default GAE network and I don’t believe there is a proxy involved in network connections to the Cloud services. I can’t think of anything we’re doing that’s special that would cause it.
I don’t know if it’s specific to the GCloud libraries or gRPC, but I’ve also been discussing on this gRPC issue: grpc/grpc-go#2663.
I don’t know how to diagnose this.
Oh right, good points. Hmmm. I’m also not sure how to diagnose this. Let me check with some folks and see if I can find how to proceed, or who can help you out.
Do you have some code to help reproduce or help us figure out what’s happening? It looks like your context is being canceled — where is that happening? This would be somewhere in your code (the spanner/tasks library doesn’t/shouldn’t cancel any contexts, but just pass yours right through).
The «transport is closing» messages are normal operation. From the logs you pasted, it doesn’t look like they correlate to actual failed RPCs.
I should provide a more detailed explanation on the issues we’re encountering.
On 5/26/19, from 1:20AM to 1:44AM EST we saw a large number of errors relating to requests to Spanner and Cloud Tasks. They were all context cancellation related, e.g.:
Cloud Task errors:
We are not doing any explicit context cancellations in our HTTP server app. We are using the the Go http.Server to serve requests (which make RPC calls to Spanner+Cloud Tasks) with default configuration for timeouts:
We do configure a max idle connections on the http transport:
My thought was that the context cancellations are coming from HTTP request timeouts due to blocking/hanging RPC calls.
Also, this interval on 5/26/19 is the only extended period of time in which we had a burst of these errors. Perhaps there was a network issue during that time? I checked the Google Cloud service status dashboard and did not see any reported incidents during that time though.
Other than that period on 5/26/19, we see infrequent Cloud Task errors (one every couple of days) that are seemingly gRPC related:
Digging into those errors is how I arrived at the gRPC issue grpc/grpc-go#2663. I enabled gRPC logging and saw the GRPC transport disconnect messages I posted earlier:
I’m not sure if those transport disconnect/reconnection messages indicate any issue or they can be safely ignored, but this comment by a gRPC developer gave me pause: grpc/grpc-go#2663 (comment)
Maybe there isn’t a gRPC transport issue but we just had an isolated network issue on 5/26/19? I’m still surprised at the frequency of gRPC transport connection resets on our GAE Flexible app as I don’t see those occur as frequently when running from my local machine.
The codebase is large, but I could provide a sample app based on it if it would help.
Any help would be greatly appreciated.
I can reproduce this in a situation with very low load (in this case: 1 query per 5 minutes). And my log shows exactly the same as @mikesun’s, i.e. a TRANSIENT_FAILURE and then CONNECT, READY. The timing is also similar. The problem occurs at intervals of whole minutes, i.e. ending on :38 seconds every time. My hunch is that the log entries are cause by the session pool maintainer that runs exactly once per minute.
(The COL1, code:INT64 string_value:»1″ etc. is from my test query).
It seems that this is cause by Cloud Spanner closing the network connection serverside if there has been no activity for a certain amount of time. If I run the same test application against a local test server, I do not get the same messages. According to the code where the message is logged, it should also not be considered an error, hence also the fact that it is logged at INFO level.
Closing this issue as this is not something that is actionable in the client library.
© 2023 GitHub, Inc.
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Источник
Go gRPC Noisy Transport Closing Log #2901
Go Version: go version go1.12.6 darwin/amd64
Go gRPC version: v1.22.0
I run a gRPC server like this:
I have some things registered to this server. Then I use gRPC cli to call an RPC.
Every time I do grpc_cli call localhost:9080 Live «» for instance, it says Rpc succeeded with OK status from gRPC cli which is good. But in my service log I see <«level»:»info»,»message»:»transport: loopyWriter.run returning. connection error: desc = «transport is closing»»,»system»:»grpc»,»grpc_log»:true>. There were previous issues for people seeing this, but they all just said «upgrade Go gRPC» and I did upgrade. I see it’s coming from here
infof ( «transport: loopyWriter.run returning. %v» , err )
Any help will be appreciated 🙂
I see there is a GRPC_GO_LOG_VERBOSITY_LEVEL which I can set to 1 to remove the log, but this does not work for me because i replace the logger with grpc zap and that does not respect the GRPC_GO_LOG_VERBOSITY_LEVEL.
I’m wondering if we can by default disable this log to all users if it is meant to happen every time you make a gRPC call (or make it a debug level log instead of info) or if I’m doing something wrong and it’s not suppose to happen all the time.
The text was updated successfully, but these errors were encountered:
This log happens when the underlying connection is closed (for many potential reasons). So it could be useful sometimes for debugging.
In your case, every time grpc_cli is called, it creates a new connection to the server, sends the RPC, and kills the connection. So it triggers this log a lot.
If your connections are stable, you shouldn’t see this log very often.
I agree it can be spamming, and that’s why we put it as info log at verbose level 2 (we don’t have debug level logs).
On the logging library. I would think it’s reasonable to ask for verbose level support in grpc zap (it’s still using v1 logger interface, verbose is part of v2).
Also, it should be pretty easy to wrap grpc zap into a v2 interface and add support for verbose yourself.
Thank you for your response @menghanl, this makes sense to me. I think the default verbose level is 0 when I take away the grpc_zap stuff it uses that and logging isn’t spammy. I can close this question. I opened this instead grpc-ecosystem/go-grpc-middleware#219. If you’re also maintainer at go grpc middleware let me know and I can take on this work.
@kushp Thanks for filing the issue. But grpc middleware is not maintained by us.
No worries, I got in touch with a maintainer of grpc middleware @menghanl ! I have this open PR grpc-ecosystem/go-grpc-middleware#221 which should fix my problem as it will implement V2 logger interface and adhere to verbosity etc.
© 2023 GitHub, Inc.
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Источник
Description
Xingxing Xia
2021-02-02 11:09:53 UTC
Description of problem: KAS/OAS/OauthAS logs spam: loopyWriter.run returning. connection error: desc = "transport is closing" But seems these messages are not indicative of any problem we care about, so they quite confuse users Version-Release number of selected component (if applicable): 4.7.0-0.nightly-2021-01-31-031653 How reproducible: *Always* The spam can be seen in any env on your hand, so will not attach must-gather Steps to Reproduce: 1. $ oc logs --timestamps -n openshift-kube-apiserver -c kube-apiserver kube-apiserver-ip-10-143-1-199.ec2.internal > logs/kube-apiserver-ip-10-143-1-199.ec2.internal.log $ oc logs --timestamps -n openshift-apiserver -c openshift-apiserver apiserver-6cc8bf5c8f-7gr7n > logs/apiserver-6cc8bf5c8f-7gr7n.log $ oc logs --timestamps -n openshift-oauth-apiserver oauth-apiserver-6bd4fbd859-l2f8b > logs/oauth-apiserver-6bd4fbd859-l2f8b.log 2. $ grep 'loopyWriter.run returning. connection error: desc = "transport is closing"' logs/kube-apiserver-ip-10-143-1-199.ec2.internal.log | wc -l 253 $ grep 'loopyWriter.run returning. connection error: desc = "transport is closing"' logs/apiserver-6cc8bf5c8f-7gr7n.log | wc -l 827 $ grep 'loopyWriter.run returning. connection error: desc = "transport is closing"' logs/oauth-apiserver-6bd4fbd859-l2f8b.log | wc -l 804 Take OAS for example: $ grep 'loopyWriter.run returning. connection error: desc = "transport is closing"' logs/apiserver-6cc8bf5c8f-7gr7n.log | grep "^2021-02-02T09:50" 2021-02-02T09:50:02.522899544Z I0202 09:50:02.522863 1 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = "transport is closing" 2021-02-02T09:50:05.128339529Z I0202 09:50:05.128313 1 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = "transport is closing" 2021-02-02T09:50:13.314228703Z I0202 09:50:13.314204 1 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = "transport is closing" 2021-02-02T09:50:35.511715474Z I0202 09:50:35.511680 1 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = "transport is closing" 2021-02-02T09:50:45.454702141Z I0202 09:50:45.454672 1 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = "transport is closing" 2021-02-02T09:50:47.907317981Z I0202 09:50:47.907288 1 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = "transport is closing" Actual results: 2. KAS/OAS/OauthAS spam such logs, they confuse users Expected results: 2. No such confusing spam Additional info: Bug 1844448#c0 once ever showed such logs too, now not see: $ oc logs --timestamps -n openshift-etcd-operator etcd-operator-fb8cbf695-hlzf2 > logs/etcd-operator-fb8cbf695-hlzf2.log $ grep 'loopyWriter.run.*connection error' logs/etcd-operator-fb8cbf695-hlzf2.log # none Also not see in etcd container logs.
Comment 3
Michal Fojtik
2021-03-07 20:26:55 UTC
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Keywords if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.
Comment 4
Fatima
2021-04-06 08:21:58 UTC
We have a customer facing this for kube-apiserver pods. One of the pods show: ~~~ $ grep 'loopyWriter.run returning. connection error: desc = "transport is closing"' kas1.logs | wc -l 13583 ~~~ where kas1 is "oc logs <podname> -c kube-apiserver > kas1.logs" for one of the masters. This master is showing more than 60% cpu consumption and specifically this pod is consuming more cpu. If any information is required please let me know I'll update the bug.
Comment 5
Fatima
2021-04-08 00:39:29 UTC
Dear team, Any updates here? Thanks.
Comment 6
Fatima
2021-04-16 05:38:03 UTC
Dear team, Any updates here? Thanks.
Comment 14
Sam Batschelet
2021-06-18 20:47:14 UTC
We will keep the BZ open until it lands.
Comment 18
Sandeep
2021-07-07 06:25:20 UTC
steps followed are : 1. Install cluster version : 4.9.0-0.nightly-2021-07-06-025559 2. check the logs of apiserver and saw the message "loopyWriter.run returning. connection error: desc = "transport is closing" The same issue is also seen in Server Version: 4.7.0-0.nightly-2021-07-06-050841
Comment 20
Sam Batschelet
2021-07-07 09:55:47 UTC
1.22 rebase has not landed in openshift so we can retest when that happens.
Comment 21
Mridul Markandey
2021-07-12 19:14:38 UTC
We have a customer facing the same issue that the kube-api server is consuming max of CPU resources and the Kube-apiserver pods logs shows this output: ~~~ ccResolverWrapper: sending update to cc: {[{https://localhost:2379 <nil> 0 <nil>}] <nil> <nil>} 2021-07-09T06:36:50.297990740Z I0709 06:36:50.297885 17 clientconn.go:948] ClientConn switching balancer to "pick_first" 2021-07-09T06:36:50.300944216Z I0709 06:36:50.298272 17 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc02705cd60, {CONNECTING <nil>} 2021-07-09T06:36:50.318796017Z I0709 06:36:50.313817 17 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc02705cd60, {READY <nil>} 2021-07-09T06:36:50.318796017Z I0709 06:36:50.315980 17 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = "transport is closing" 2021-07-09T06:36:52.532112913Z W0709 06:36:52.531817 17 admission.go:98] ClusterQuotaAdmission received non-meta object *unstructured.Unstructured 2021-07-09T06:36:52.582434039Z W0709 06:36:52.582254 17 admission.go:98] ClusterQuotaAdmission received non-meta object *unstructured.Unstructured 2021-07-09T06:36:52.731282776Z W0709 06:36:52.731187 17 admission.go:98] ClusterQuotaAdmission received non-meta object *unstructured.Unstructured 2021-07-09T06:36:52.931948178Z W0709 06:36:52.931846 17 admission.go:98] ClusterQuotaAdmission received non-meta object *unstructured.Unstructured 2021-07-09T06:36:53.131199569Z W0709 06:36:53.131091 17 admission.go:98] ClusterQuotaAdmission received non-meta object *unstructured.Unstructured 2021-07-09T06:36:53.330927116Z W0709 06:36:53.330835 17 admission.go:98] ClusterQuotaAdmission received non-meta object *unstructured.Unstructured 2021-07-09T06:36:53.531801993Z W0709 06:36:53.531689 17 admission.go:98] ClusterQuotaAdmission received non-meta object *unstructured.Unstructured ~~~ Can you suggest a workaround for this? If more information is required please let me know I'll update the bug.
Comment 22
Mridul Markandey
2021-07-14 13:35:00 UTC
Hello Team, Any update on this? Thanks
Comment 23
Sam Batschelet
2021-07-14 13:47:10 UTC
There is no workaround for etcd client balancer gRPC logs in apiserver, we are going to improve this in 4.9.
>2021-07-09T06:36:50.297990740Z I0709 06:36:50.297885 17 clientconn.go:948] ClientConn switching balancer to "pick_first"
>2021-07-09T06:36:50.300944216Z I0709 06:36:50.298272 17 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc02705cd60, {CONNECTING <nil>}
>2021-07-09T06:36:50.318796017Z I0709 06:36:50.313817 17 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc02705cd60, {READY <nil>}
>2021-07-09T06:36:50.318796017Z I0709 06:36:50.315980 17 controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = "transport is closing"
This is not root cause for CPU if you have issues around that you should start a new bug against apiserver.
1,etcdserver: user name is empty
The reason is that the user and password are not specified when the execution instruction is executed. Through-user specified, it is divided into two types, one is-user = root: 123, and the other is-user = root to interact with the instruction.
2,etcdserver: permission denied
Do not have the permissions of executing the instruction, pay attention to designate the user to execute, and is a user who has the operation of the key permissions.
3,context deadline exceeded
Two reasons:
1. The reason is that the address is wrong, that is, the Endpoint of the connection is wrong
2. It is caused by CANCEL () before the normal operation is completed
4,[transport] transport: loopyWriter.run returning. connection error: desc = «transport is closing»
andWith Maintenance Client: Context Canceled is similar.
Client is closed in the code, and the next time it continues to use the client.
with maintenance client: context canceled
5,grpc: the client connection is closing
The code is executed between two operationscancel()And next time you continue to use this CTX, the report is caused
6,etcdserver: root user does not have root role
This prompt is generally caused by creating a root user before opening the authentication, which can be ignored, because at this time the root role already exists.
If you want to deal with it, or when the above error appears, check the corner table and find that there is no root role, you can disable the certification and open the certification again, so that you will not report the wrong mistake.
Associated command
Open certification
etcdctl --user=root:123 auth disable
Close certification
etcdctl --user=root:123 auth enable
View authentication status
ETCDCTL -user = root: 123 Auth Status Result (False indicates that the certification has been opened):
Authentication Status: true
AuthRevision: 2
View the corner list
etcdctl --user=root:123 role list
View Root characters
etcdctl --user=root:123 role get root
Пожалуйста, ответьте на эти вопросы, прежде чем отправлять свою проблему.
Какую версию gRPC вы используете?
1.18.0
Какую версию Go вы используете ( go version
)?
1.11.5
Какая операционная система (Linux, Windows,…) и версия?
Alpine Linux 3.9 на движке Google Kubernetes Engine
Что ты сделал?
После обновления я заметил значительное увеличение количества ошибок grpc, приводящих к неудачным запросам и сбоям в обслуживании.
Мы не используем никаких настроек подключения на клиентах, и у нас нет никаких специальных настроек на сервере (keepalive, fastfail и т. Д.). Единственное, что у нас есть на соединениях, это Mutual TLS. Примерно это настроено так:
Клиент:
creds := grpc.WithTransportCredentials(credentials.NewTLS(&tls.Config{
RootCAs: certPool,
Certificates: []tls.Certificate{cert},
MinVersion: tls.VersionTLS12,
}))
grpc.Dial("foo.foo.svc.cluster.local:80", creds)
Сервер:
creds := grpc.Creds(credentials.NewTLS(&tls.Config{
Certificates: []tls.Certificate{cert},
ClientCAs: certPool,
ClientAuth: tls.RequireAndVerifyClientCert,
MinVersion: tls.VersionTLS12,
NextProtos: []string{"h2"},
PreferServerCipherSuites: true,
CurvePreferences: []tls.CurveID{
tls.CurveP256,
tls.X25519,
},
CipherSuites: []uint16{
tls.TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,
tls.TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,
tls.TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,
tls.TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,
tls.TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,
tls.TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,
},
}))
grpc.NewServer(creds)
Это может быть связано с одной из этих двух других проблем.
https://github.com/grpc/grpc-go/issues/2653
https://github.com/grpc/grpc-go/issues/2636
Основываясь на отзывах по одной из связанных выше проблем, я установил эти переменные ENV для одной службы, у которой было много сбоев: GRPC_GO_LOG_VERBOSITY_LEVEL=99 GRPC_GO_LOG_SEVERITY_LEVEL=info
Я добавил (__GRPC CALL HERE__)
в конец строк журнала ниже, где ошибка была зарегистрирована как неудачный вызов grpc, который мы сделали.
WARNING: 2019/03/01 04:41:42 grpc: addrConn.createTransport failed to connect to {foo.foo.svc.cluster.local:80 0 <nil>}. Err :connection error: desc = "transport: authentication handshake failed: context deadline exceeded". Reconnecting...
INFO: 2019/03/01 04:41:42 Subchannel Connectivity change to TRANSIENT_FAILURE
INFO: 2019/03/01 04:41:42 pickfirstBalancer: HandleSubConnStateChange: 0xc000806290, TRANSIENT_FAILURE
timestamp=2019-03-01T04:41:42.887562062Z level=error error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: authentication handshake failed: context deadline exceeded"" (__GRPC CALL HERE))
INFO: 2019/03/01 04:41:43 Subchannel Connectivity change to CONNECTING
INFO: 2019/03/01 04:41:43 pickfirstBalancer: HandleSubConnStateChange: 0xc000806290, CONNECTING
INFO: 2019/03/01 04:41:44 Subchannel Connectivity change to READY
INFO: 2019/03/01 04:41:44 pickfirstBalancer: HandleSubConnStateChange: 0xc000806290, READY
INFO: 2019/03/01 04:45:22 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2019/03/01 04:45:22 Subchannel Connectivity change to TRANSIENT_FAILURE
INFO: 2019/03/01 04:45:22 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2019/03/01 04:45:22 Subchannel Connectivity change to TRANSIENT_FAILURE
INFO: 2019/03/01 04:45:23 Subchannel Connectivity change to CONNECTING
INFO: 2019/03/01 04:45:23 Subchannel Connectivity change to CONNECTING
INFO: 2019/03/01 04:45:28 Subchannel Connectivity change to READY
INFO: 2019/03/01 04:45:28 Subchannel Connectivity change to READY
INFO: 2019/03/01 05:34:23 Subchannel Connectivity change to CONNECTING
INFO: 2019/03/01 05:34:23 Subchannel Connectivity change to READY
INFO: 2019/03/01 05:35:25 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2019/03/01 05:35:25 Subchannel Connectivity change to TRANSIENT_FAILURE
INFO: 2019/03/01 05:35:27 Subchannel Connectivity change to CONNECTING
INFO: 2019/03/01 05:35:32 Subchannel Connectivity change to READY
INFO: 2019/03/01 05:35:59 Subchannel Connectivity change to TRANSIENT_FAILURE
INFO: 2019/03/01 05:35:59 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2019/03/01 05:36:01 Subchannel Connectivity change to CONNECTING
INFO: 2019/03/01 05:36:01 Subchannel Connectivity change to READY
INFO: 2019/03/01 05:38:19 Subchannel Connectivity change to TRANSIENT_FAILURE
INFO: 2019/03/01 05:38:19 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2019/03/01 05:38:19 Subchannel Connectivity change to TRANSIENT_FAILURE
INFO: 2019/03/01 05:38:19 pickfirstBalancer: HandleSubConnStateChange: 0xc0008293d0, TRANSIENT_FAILURE
INFO: 2019/03/01 05:38:19 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
timestamp=2019-03-01T05:38:19.972966567Z level=error error="rpc error: code = Unavailable desc = transport is closing" (__GRPC CALL HERE__)
INFO: 2019/03/01 05:38:19 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2019/03/01 05:38:19 Subchannel Connectivity change to TRANSIENT_FAILURE
INFO: 2019/03/01 05:38:19 pickfirstBalancer: HandleSubConnStateChange: 0xc000806290, TRANSIENT_FAILURE
INFO: 2019/03/01 05:38:21 Subchannel Connectivity change to CONNECTING
INFO: 2019/03/01 05:38:21 pickfirstBalancer: HandleSubConnStateChange: 0xc0008293d0, CONNECTING
INFO: 2019/03/01 05:38:21 Subchannel Connectivity change to CONNECTING
INFO: 2019/03/01 05:38:21 pickfirstBalancer: HandleSubConnStateChange: 0xc000806290, CONNECTING
INFO: 2019/03/01 05:38:21 Subchannel Connectivity change to CONNECTING
INFO: 2019/03/01 05:38:21 Subchannel Connectivity change to READY
INFO: 2019/03/01 05:38:21 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2019/03/01 16:13:05 Subchannel Connectivity change to TRANSIENT_FAILURE
INFO: 2019/03/01 16:13:05 pickfirstBalancer: HandleSubConnStateChange: 0xc0008293d0, TRANSIENT_FAILURE
INFO: 2019/03/01 16:13:05 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
timestamp=2019-03-01T16:13:05.92908361Z level=error error="rpc error: code = Unavailable desc = transport is closing" (__GRPC CALL HERE__)
INFO: 2019/03/01 16:13:07 Subchannel Connectivity change to CONNECTING
INFO: 2019/03/01 16:13:07 pickfirstBalancer: HandleSubConnStateChange: 0xc0008293d0, CONNECTING
INFO: 2019/03/01 16:13:07 Subchannel Connectivity change to READY
INFO: 2019/03/01 16:13:07 pickfirstBalancer: HandleSubConnStateChange: 0xc0008293d0, READY
INFO: 2019/03/01 16:20:17 Subchannel Connectivity change to TRANSIENT_FAILURE
INFO: 2019/03/01 16:20:17 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2019/03/01 16:20:18 Subchannel Connectivity change to CONNECTING
INFO: 2019/03/01 16:20:23 Subchannel Connectivity change to READY
INFO: 2019/03/01 16:42:24 Subchannel Connectivity change to TRANSIENT_FAILURE
INFO: 2019/03/01 16:42:24 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2019/03/01 16:42:25 Subchannel Connectivity change to CONNECTING
INFO: 2019/03/01 16:42:25 Subchannel Connectivity change to READY
INFO: 2019/03/01 17:20:47 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2019/03/01 17:20:47 Subchannel Connectivity change to TRANSIENT_FAILURE
INFO: 2019/03/01 17:20:49 Subchannel Connectivity change to CONNECTING
INFO: 2019/03/01 17:20:54 Subchannel Connectivity change to READY
INFO: 2019/03/01 17:43:01 Subchannel Connectivity change to TRANSIENT_FAILURE
INFO: 2019/03/01 17:43:01 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2019/03/01 17:43:02 Subchannel Connectivity change to CONNECTING
INFO: 2019/03/01 17:43:07 Subchannel Connectivity change to READY
Все 18 Комментарий
Можете ли вы также получить журналы на стороне сервера, чтобы узнать, содержит ли он дополнительную информацию?
Ошибка «Превышен крайний срок рукопожатия» выглядит очень подозрительно. Было бы полезно, если бы мы знали, почему это произошло.
Еще одна вещь, которую стоит попробовать: вы можете установить GRPC_GO_REQUIRE_HANDSHAKE=off
на стороне клиента и посмотреть, изменится ли что-нибудь? (# 2406)
@menghanl У меня здесь
Пример программы
клиент
package main
import (
"context"
"log"
"time"
"github.com/golang/glog"
"github.com/kazegusuri/grpc-test/pb"
"google.golang.org/grpc"
_ "google.golang.org/grpc/grpclog/glogger"
)
func main() {
ctx := context.Background()
conn1, err := grpc.DialContext(ctx, "127.0.0.1:9000", grpc.WithInsecure())
if err != nil {
log.Fatalf("grpc.DialContext failed: %v", err)
}
defer func() { _ = conn1.Close() }()
cli1 := pb.NewEchoClient(conn1)
for {
time.Sleep(1 * time.Millisecond)
_, err := cli1.Echo(ctx, &pb.EchoMessage{Message: "hello"})
if err != nil {
glog.Infof("goroutine: echo failed: %v", err)
} else {
glog.Infof("goroutine: echo succeeded")
}
}
}
сервер
func main() {
server := newServer()
s := grpc.NewServer(grpc.KeepaliveParams(keepalive.ServerParameters{
MaxConnectionAge: 10 * time.Second, // this kills connection 10s after connected
}))
pb.RegisterEchoServer(s, server)
lis, err := net.Listen("tcp", ":9000")
if err != nil {
log.Fatalf("failed to listen: %v", err)
}
if err := s.Serve(lis); err != nil {
log.Fatalf("err %vn", err)
}
select {}
}
Полученные результаты
С указанными выше программами я запускаю сервер с версией 1.19.0 и клиент с версией 1.17, 1.18, 1.19.
Сервер прерывает соединение с клиентом через 10 секунд после подключения.
Клиент должен повторно подключиться к серверу, но его поведение отличается в клиентских версиях.
В версиях 1.18 и 1.19 клиенту требуется более 1 секунды для повторного подключения сервера.
v1.17.0 (default)
I0303 02:22:59.745529 6773 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc00001e030, TRANSIENT_FAILURE
I0303 02:22:59.745566 6773 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc00001e030, CONNECTING
I0303 02:22:59.745732 6773 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc00001e030, READY
v1.18.0 (default)
I0303 02:20:58.820793 5169 clientconn.go:941] Subchannel Connectivity change to TRANSIENT_FAILURE
I0303 02:20:58.820892 5169 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc00016e010, TRANSIENT_FAILURE
I0303 02:20:59.821028 5169 clientconn.go:941] Subchannel Connectivity change to CONNECTING
I0303 02:20:59.821189 5169 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc00016e010, CONNECTING
I0303 02:20:59.822058 5169 clientconn.go:941] Subchannel Connectivity change to READY
I0303 02:20:59.822155 5169 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc00016e010, READY
v1.18.0 (GRPC_GO_REQUIRE_HANDSHAKE=off)
I0303 02:25:59.393033 8568 clientconn.go:941] Subchannel Connectivity change to TRANSIENT_FAILURE
I0303 02:25:59.393096 8568 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc00017c010, TRANSIENT_FAILURE
I0303 02:26:00.393385 8568 clientconn.go:941] Subchannel Connectivity change to CONNECTING
I0303 02:26:00.393510 8568 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc00017c010, CONNECTING
I0303 02:26:00.393937 8568 clientconn.go:941] Subchannel Connectivity change to READY
I0303 02:26:00.393979 8568 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc00017c010, READY
v1.19.0 (default)
I0303 02:28:57.655932 11292 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc000160010, TRANSIENT_FAILURE
I0303 02:28:58.656125 11292 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc000160010, CONNECTING
I0303 02:28:58.656489 11292 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc000160010, READY
v1.19.0 (GRPC_GO_REQUIRE_HANDSHAKE=off)
I0303 02:30:30.642447 11438 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc000162010, TRANSIENT_FAILURE
I0303 02:30:31.642680 11438 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc000162010, CONNECTING
I0303 02:30:31.643004 11438 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc000162010, READY
Это вызывает снижение производительности на сервере, который предоставляет одно вспомогательное соединение, такое как Google Spanner.
Spanner, кажется, уничтожает соединения на стороне сервера за час, затем соединение становится временным отказом. В течение секунды временного сбоя до готовности соединение остается недоступным.
Если можно попытаться создать транспорт сразу после временного сбоя для первой попытки, как насчет добавления здесь такого кода?
if ac.backoffIdx == 0 {
backoffFor = 0
}
У меня есть вопрос.
Я нашел baseDelay = 1.0 * time.Second
в здесь .
Думаю, это значение немного великовато. Почему по умолчанию используется 1 секунда?
1 секунда — это слишком много для пользовательского интерфейса.
Я столкнулся с той же проблемой, используя клиент gc bigtable go (на основе grpc), сервер bigtable grpc, похоже, имеет максимальный возраст подключения в один час, что означает, что каждый час я буквально прекращаю обрабатывать свой трафик на одну секунду.
Я согласен с @vvakame, это значение по умолчанию слишком велико для многих случаев использования и должно быть легко настроено в других клиентах Google на основе этого репо.
Ошибка в логике переподключения, время отсрочки не сбрасывается после создания подключения.
Исправление находится в # 2669. Можете ли вы, ребята, попробовать это и посмотреть, решит ли он проблемы, которые вы видите? Благодаря!
Закрытие. Прокомментируйте, если у вас все еще возникают проблемы.
Надеюсь, что нет, поскольку я собираюсь убрать эту опцию в ближайшее время (на этой или на следующей неделе). Пожалуйста, дайте мне знать, если у вас возникнут проблемы с его отключением.
@dfawley
Похоже, это все еще проблема для нас. Я сообщил об этом в службу поддержки Google Cloud, которая указала мне на эту ветку и предложила обходной путь — установить GRPC_GO_REQUIRE_HANDSHAKE=off
. Похоже, что это уже не так.
Мы запускаем гибкое приложение Golang Google App Engine с Go v1.11 и grpc-gio v1.20.0.
С помощью Spanner мы видим таймауты контекста:
spanner: code = “Canceled”, desc = “context canceled, lastErr is <<nil>>”
В облачных задачах мы видим следующие ошибки RPC:
all SubConns are in TransientFailure, latest connection error:
Эти ошибки возникают не постоянно, а только периодически и пачками.
Эти ошибки возникают не постоянно, а только периодически и пачками.
Как долго длятся эти всплески и как часто они возникают?
Помогает ли вам установка GRPC_GO_REQUIRE_HANDSHAKE=off
?
Эти ошибки возникают не постоянно, а только периодически и пачками.
Как долго длятся эти всплески и как часто они возникают?
В настоящее время у нас нет производственной среды с какой-либо значимой нагрузкой, но в пробном производственном цикле, который мы запускали в течение 3 дней, у нас была одна серия ошибок в течение ~ 20 минут. В противном случае мы видим это реже одного раза в день, в течение минуты.
Я включил ведение журнала GRPC в одном из наших развертываний, и, похоже, я постоянно получаю ошибки TRANSIENT_FAILURE
независимо от того, установлен ли GRPC_GO_REQUIRE_HANDSHAKE=off
.
grpc v1.21.1
и GRPC_GO_REQUIRE_HANDSHAKE=off
2019-06-14 20:50:14 default[20190614t163806] INFO: 2019/06/14 20:50:14 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
2019-06-14 20:50:14 default[20190614t163806] INFO: 2019/06/14 20:50:14 pickfirstBalancer: HandleSubConnStateChange: 0xc000239980, TRANSIENT_FAILURE
2019-06-14 20:50:14 default[20190614t163806] INFO: 2019/06/14 20:50:14 pickfirstBalancer: HandleSubConnStateChange: 0xc000239980, CONNECTING
2019-06-14 20:50:14 default[20190614t163806] INFO: 2019/06/14 20:50:14 pickfirstBalancer: HandleSubConnStateChange: 0xc000239980, READY
2019-06-14 20:50:14 default[20190614t163806] INFO: 2019/06/14 20:50:14 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
2019-06-14 20:50:14 default[20190614t163806] INFO: 2019/06/14 20:50:14 pickfirstBalancer: HandleSubConnStateChange: 0xc000239770, TRANSIENT_FAILURE
2019-06-14 20:50:14 default[20190614t163806] INFO: 2019/06/14 20:50:14 pickfirstBalancer: HandleSubConnStateChange: 0xc000239770, CONNECTING
2019-06-14 20:50:14 default[20190614t163806] INFO: 2019/06/14 20:50:14 pickfirstBalancer: HandleSubConnStateChange: 0xc000239770, READY
2019-06-14 20:50:14 default[20190614t163806] INFO: 2019/06/14 20:50:14 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
2019-06-14 20:50:14 default[20190614t163806] INFO: 2019/06/14 20:50:14 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, TRANSIENT_FAILURE
2019-06-14 20:50:14 default[20190614t163806] INFO: 2019/06/14 20:50:14 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, CONNECTING
2019-06-14 20:50:14 default[20190614t163806] INFO: 2019/06/14 20:50:14 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, READY
2019-06-14 20:50:14 default[20190614t163806] INFO: 2019/06/14 20:50:14 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
2019-06-14 20:50:14 default[20190614t163806] INFO: 2019/06/14 20:50:14 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, TRANSIENT_FAILURE
2019-06-14 20:50:14 default[20190614t163806] INFO: 2019/06/14 20:50:14 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, CONNECTING
2019-06-14 20:50:14 default[20190614t163806] INFO: 2019/06/14 20:50:14 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, READY
Повторное подключение происходит немедленно, и я не испытывал сбоев RPC с all SubConns are in TransientFailure
, так что я думаю, это приемлемое поведение?
Мы настраиваем http.DefaultTransport.(*http.Transport).MaxIdleConnsPerHost = 500
, это будет проблемой?
@mikesun
Полагаю, это приемлемое поведение?
Это ненормально, что ваше соединение умирает так часто (несколько раз в секунду). Это происходит постоянно или только эти записи? К сожалению, ведение журнала не позволяет понять, почему закрывается соединение. Но он переходит в состояние READY, что означает, по крайней мере, что TLS был успешным.
Мы настраиваем http.DefaultTransport. (* Http.Transport) .MaxIdleConnsPerHost = 500, это будет проблемой?
Настройки HTTP-сервера Go не должны влиять на grpc-go — у нас есть собственная реализация сервера.
Это ненормально, что ваше соединение умирает так часто (несколько раз в секунду). Это происходит постоянно или только эти записи? К сожалению, ведение журнала не позволяет понять, почему закрывается соединение. Но он переходит в состояние READY, что означает, по крайней мере, что TLS был успешным.
Кажется, что это происходит группами (~ 3 / сек) каждые несколько минут (1-4 мин). Это гибкий экземпляр GAE практически без нагрузки:
A INFO: 2019/06/17 16:10:15 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
A INFO: 2019/06/17 16:10:15 pickfirstBalancer: HandleSubConnStateChange: 0xc000239900, TRANSIENT_FAILURE
A INFO: 2019/06/17 16:10:15 pickfirstBalancer: HandleSubConnStateChange: 0xc000239900, CONNECTING
A INFO: 2019/06/17 16:10:15 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
A INFO: 2019/06/17 16:10:15 pickfirstBalancer: HandleSubConnStateChange: 0xc000239870, TRANSIENT_FAILURE
A INFO: 2019/06/17 16:10:15 pickfirstBalancer: HandleSubConnStateChange: 0xc000239870, CONNECTING
A INFO: 2019/06/17 16:10:15 pickfirstBalancer: HandleSubConnStateChange: 0xc000239900, READY
A INFO: 2019/06/17 16:10:15 pickfirstBalancer: HandleSubConnStateChange: 0xc000239870, READY
A INFO: 2019/06/17 16:10:37 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
A INFO: 2019/06/17 16:10:37 pickfirstBalancer: HandleSubConnStateChange: 0xc000238ec0, TRANSIENT_FAILURE
A INFO: 2019/06/17 16:10:37 pickfirstBalancer: HandleSubConnStateChange: 0xc000238ec0, CONNECTING
A INFO: 2019/06/17 16:10:38 pickfirstBalancer: HandleSubConnStateChange: 0xc000238ec0, READY
A INFO: 2019/06/17 16:10:38 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
A INFO: 2019/06/17 16:10:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, TRANSIENT_FAILURE
A INFO: 2019/06/17 16:10:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, CONNECTING
A INFO: 2019/06/17 16:10:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, READY
A INFO: 2019/06/17 16:10:38 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
A INFO: 2019/06/17 16:10:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, TRANSIENT_FAILURE
A INFO: 2019/06/17 16:10:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, CONNECTING
A INFO: 2019/06/17 16:10:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, READY
A INFO: 2019/06/17 16:14:38 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
A INFO: 2019/06/17 16:14:38 pickfirstBalancer: HandleSubConnStateChange: 0xc000238ec0, TRANSIENT_FAILURE
A INFO: 2019/06/17 16:14:38 pickfirstBalancer: HandleSubConnStateChange: 0xc000238ec0, CONNECTING
A INFO: 2019/06/17 16:14:38 pickfirstBalancer: HandleSubConnStateChange: 0xc000238ec0, READY
A INFO: 2019/06/17 16:14:38 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
A INFO: 2019/06/17 16:14:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, TRANSIENT_FAILURE
A INFO: 2019/06/17 16:14:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, CONNECTING
A INFO: 2019/06/17 16:14:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, READY
A INFO: 2019/06/17 16:14:38 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
A INFO: 2019/06/17 16:14:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, TRANSIENT_FAILURE
A INFO: 2019/06/17 16:14:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, CONNECTING
A INFO: 2019/06/17 16:14:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, READY
A INFO: 2019/06/17 16:18:38 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
A INFO: 2019/06/17 16:18:38 pickfirstBalancer: HandleSubConnStateChange: 0xc000238ec0, TRANSIENT_FAILURE
A INFO: 2019/06/17 16:18:38 pickfirstBalancer: HandleSubConnStateChange: 0xc000238ec0, CONNECTING
A INFO: 2019/06/17 16:18:38 pickfirstBalancer: HandleSubConnStateChange: 0xc000238ec0, READY
A INFO: 2019/06/17 16:18:38 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
A INFO: 2019/06/17 16:18:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, TRANSIENT_FAILURE
A INFO: 2019/06/17 16:18:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, CONNECTING
A INFO: 2019/06/17 16:18:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, READY
A INFO: 2019/06/17 16:18:38 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
A INFO: 2019/06/17 16:18:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, TRANSIENT_FAILURE
A INFO: 2019/06/17 16:18:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, CONNECTING
A INFO: 2019/06/17 16:18:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, READY
A INFO: 2019/06/17 16:22:38 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
A INFO: 2019/06/17 16:22:38 pickfirstBalancer: HandleSubConnStateChange: 0xc000238ec0, TRANSIENT_FAILURE
A INFO: 2019/06/17 16:22:38 pickfirstBalancer: HandleSubConnStateChange: 0xc000238ec0, CONNECTING
A INFO: 2019/06/17 16:22:38 pickfirstBalancer: HandleSubConnStateChange: 0xc000238ec0, READY
A INFO: 2019/06/17 16:22:38 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
A INFO: 2019/06/17 16:22:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, TRANSIENT_FAILURE
A INFO: 2019/06/17 16:22:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, CONNECTING
A INFO: 2019/06/17 16:22:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, READY
A INFO: 2019/06/17 16:22:38 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
A INFO: 2019/06/17 16:22:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, TRANSIENT_FAILURE
A INFO: 2019/06/17 16:22:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, CONNECTING
A INFO: 2019/06/17 16:22:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, READY
@mikesun — последнее, о чем я могу подумать: убедитесь, что вы не используете версию grpc-go между # 2740 (82fdf27 — 21 день назад) и # 2862 (a1d4c28 — 10 дней назад). У нас была ошибка в этом диапазоне, которая потенциально могла вызвать подобные проблемы.
Если это не так, то скорее всего проблема не на стороне клиента. Журналы со стороны сервера / прокси потребуются для дальнейшей отладки. Кроме того, поскольку ваши проблемы не связаны с этой проблемой, не могли бы вы сообщить о новой проблеме, если подозреваете, что проблема связана с grpc-go?
Была ли эта страница полезной?
0 / 5 — 0 рейтинги
I got an issue with gRPC service. I’m using grpc-web on my client side Angular application.
I constantly receiving CORS error:
has been blocked by CORS policy: No ‘Access-Control-Allow-Origin’ header is present on the requested resource. If an opaque response serves your needs, set the request’s mode to ‘no-cors’ to fetch the resource with CORS disabled.
and also sometimes I do receive the following message on the grpc server:
transport: loopyWriter.run returning. connection error: desc = «transport is closing»
and the logging shows that grpc request received but no response sent back to the client app
Here are my configs:
Consul LDS:
[{
"name": "listener_0",
"address": {
"socket_address": {
"address": "0.0.0.0",
"port_value": 8080
}
},
"filter_chains": [{
"filters": [{
"name": "envoy.http_connection_manager",
"config": {
"codec_type": "auto",
"stat_prefix": "ingress_http",
"route_config": {
"name": "local_route",
"virtual_hosts": [{
"name": "local_service",
"domains": [
"*"
],
"routes": [{
"match": {
"prefix": "/App.Grpc.MyApp1"
},
"route": {
"cluster": "main_service",
"max_grpc_timeout": "0s"
}
}],
"cors": {
"allow_origin": [
"*"
],
"allow_methods": "GET, PUT, DELETE, POST, OPTIONS",
"allow_headers": "keep-alive, user-agent, cache-control, content-type, content-transfer-encoding, Authorization, x-ip, x-accept-content-transfer-encoding, x-accept-response-streaming, x-user-agent, x-grpc-web, grpc-timeout",
"max_age": "1728000",
"expose_headers": "grpc-status, grpc-message",
"enable": true
}
}]
},
"http_filters": [{
"name": "envoy.cors"
},
{
"name": "envoy.grpc_web"
},
{
"name": "envoy.router"
}
]
}
}]
}]
}]
Consul CDS:
[{
"name": "MyApp1",
"connect_timeout": "0.25s",
"type": "logical_dns",
"lb_policy": "round_robin",
"http2_protocol_options": {},
"hosts": [{
"socket_address": {
"address": "host.docker.internal", # also tried docker service name my_app
"port_value": 50053
}
}]
}]
When I’m running envoy and go service without Pilot, everything works fine with exactly same configs, trying to figure out how to update envoy dynamically. Any help would be appreciated!