При попытке соединения возникла ошибка bad healthcheck status

I have installed ONLYOFFICE 4.1.4 and Community Document Server 0.1.3 on Nextcloud 18.0.0 running in a Docker container (official image) on Ubuntu 19.10 x64. When I go to the ONLYOFFICE settings th...

@axcairns

I have installed ONLYOFFICE 4.1.4 and Community Document Server 0.1.3 on Nextcloud 18.0.0 running in a Docker container (official image) on Ubuntu 19.10 x64.

When I go to the ONLYOFFICE settings the «Document Editing Service address» and «Server address for internal requests from the Document Editing Service» are prefilled with the address of my Nextcloud instance but the «Document Editing Service address for internal requests from the server» is empty.

If I hit save on the settings page I get an error toast with the message «Error when trying to connect (Bad healthcheck status)». There are no Onlyoffice options in either the create file menu or the file context menu.

@axcairns

Still not working with 0.1.5.

I noticed this in my Nextcloud logs (via settings > Logging) —

[PHP] Error: fclose(): supplied resource is not a valid stream resource at /var/www/html/custom_apps/documentserver_community/lib/Document/FontManager.php#66

GET /index.phpapps/update/documentserver_community
from 59.167.111.65 by allan at 2020-02-04T12:59:10+00:00

… and this in my docker logs for the nextcloud container —

[Tue Feb 04 20:59:38.166593 2020] [authz_core:error] [pid 125] [client 59.167.111.65:0] AH01630: client denied by server configuration: /var/www/html/config

@SenchoPoro42

@Wdavery

Stuck at this point too, also behind a nginx reverse proxy.

@klaask

Similar setup and healthcheck error on our side:
nginx reverse proxy & ssl termination (no docker)

To fix this error we added a different subdomain as «Document Editing Service address» and «Server address for internal requests from the Document Editing Service» and as trusted_domain in config.php.

@SenchoPoro42

I fixed my issue by letting removing the ONLYOFFICE application only, then installing it again. The correct server address was already filled in, but I just avoided pressing save on it and it then worked correctly and could connect.

@LordMort

The following fixed my issue on a NC 19.0.6 installation with OO 0.1.8 DSCE.

  1. In the OO-settings page I simply removed ALL server-entries and REFRAINED from clicking the safe-button.
  2. After that I switched to another settings page, the «Overview» in my case, and returned to the OO-settings without changing anything else.
  3. The OO-settings had been auto-filled with working entries, see below. Clicking on the safe-button at that time successfully enabled OO in my NC-installation.

The entry that had been auto-filled was:
https://<MY.DOMAIN.TLD>/index.php/apps/documentserver_community/

@TheSimu

@LordMort : Thanks! Your the hero of the day with that URL — did the trick on v.21 and current OO as of today…

@MaoMaoCake

The following fixed my issue on a NC 19.0.6 installation with OO 0.1.8 DSCE.

  1. In the OO-settings page I simply removed ALL server-entries and REFRAINED from clicking the safe-button.
  2. After that I switched to another settings page, the «Overview» in my case, and returned to the OO-settings without changing anything else.
  3. The OO-settings had been auto-filled with working entries, see below. Clicking on the safe-button at that time successfully enabled OO in my NC-installation.

The entry that had been auto-filled was: https://<MY.DOMAIN.TLD>/index.php/apps/documentserver_community/

if this dosent work try adding /index.php/apps/documentserver_community/ to the end of the url
for example your server is at https://<MY.DOMAIN.TLD>/ fill in https://<MY.DOMAIN.TLD>/index.php/apps/documentserver_community/

@iamhakeembey

The following fixed my issue on a NC 19.0.6 installation with OO 0.1.8 DSCE.

  1. In the OO-settings page I simply removed ALL server-entries and REFRAINED from clicking the safe-button.
  2. After that I switched to another settings page, the «Overview» in my case, and returned to the OO-settings without changing anything else.
  3. The OO-settings had been auto-filled with working entries, see below. Clicking on the safe-button at that time successfully enabled OO in my NC-installation.

The entry that had been auto-filled was: https://<MY.DOMAIN.TLD>/index.php/apps/documentserver_community/

In my case I had to click «Save», but simply changing the URL manually works the same. Just ensure that it includes the «index.php». Also, it’s important that both «Nextcloud Office» and «Collabora Online — Built-in CODE Server» are disabled. Otherwise, documents will be stalled at the loading screen as [at least one of] the two apps appear to interfere with ONLYOFFICE somehow. Once disabled, everything worked fine for me.

Comments

@w-le

I just had an instance of rest-api v1.17.11 get into this state:

  • api responses were OK for almost all requests made by backoffice (except ones that involve redis)
  • healthcheck was passing
  • However there must have been a redis connection issue — I believe the redis pod restarted on a different IP a few hours before this rest-api issue was reported (about 2021-03-15T23:00:00Z).
    redis pod previous status info is here:
terminated
Reason: Completed - exit code: 0
Started at: 2021-03-15T01:32:47Z
Finished at: 2021-03-15T10:02:06Z
  • api returns this error when attempting to list the Commands of any Module:
ERROR: Socket::ConnectError: Error connecting to 'redis-headless.redis:6379': Address not available (Redis::CannotConnectError)
  from app/lib/redis/src/redis/socket_wrapper.cr:10:5 in 'connect'
  from app/lib/redis-cluster/src/cluster/bootstrap.cr:47:7 in 'redis'
  from app/lib/redis-cluster/src/redis.cr:69:13 in 'connect!'
  from usr/share/crystal/src/indexable.cr:269:9 in '???'
  from app/lib/placeos-driver/src/placeos-driver/proxy/system.cr:44:5 in 'get__api_engine_v2_systems___sys_id_functions__module_slug'
  from usr/share/crystal/src/primitives.cr:255:3 in 'call'
  from usr/share/crystal/src/http/server/handler.cr:28:7 in 'call'
  from usr/share/crystal/src/http/server/handler.cr:28:7 in 'call'
  from usr/share/crystal/src/http/server/request_processor.cr:50:11 in 'process'
  from usr/share/crystal/src/http/server.cr:498:5 in '->'
  from usr/share/crystal/src/primitives.cr:255:3 in 'run'
  from ???

Recreating the api pod immediately resolved the issue.

Can we change the api healthcheck such that it fails if there is any current connection issue with Redis? OR, is that already the current healthcheck behaviour in the current version? This way kubernetes will auto restart api so that it connect to the new redis ip.

Alternatively, how often is the ip resolved for the redis hostname? if it’s currently resolved only once upon the first initialization redis cluster bootstrap, then perhaps changing this to resolve upon every cluster bootstrap would ensure the api pod never gets into this state and would not even need to be restarted to reconnect to redis.

@w-le



Copy link


Contributor

Author

Unfortunately the same issue occurred again today.

Interestingly Core had no issues reconnecting to Redis — hey @caspiano are there any differences between the implementation of the Redis client in core vs rest-api that may point to an easy change in rest-api such that it is able to reconnect to Redis seamlessly in cases where Redis is briefly unavailable?

@caspiano

There isn’t much difference, all interactions with redis are via the same PlaceOS::Driver::Storage interface or the underlying client of PlaceOS::Driver::Storage.

Looking into this.

@w-le



Copy link


Contributor

Author

More info that may or may not be helpful:

I was trying to reproduce the issue by restarting redis in a similar k8s environment (same client, uat instead of prod) but when doing this rest-api was able to reconnect without issues after redis was back up.

In this controlled test, during the brief redis downtime, rest-api will output this repeatedly as expected, and then connect fine afterwards
https://gist.github.com/w-le/8f424d1371fc7d9505789497967a41c9

So I’m not sure why the behaviour in this controlled test is different to that described in the OP. In fact we did not even see the same specific error type «Address not available (Redis::CannotConnectError)» outputted, even though that error was outputted both times when the issue occurred (Redis exits unexpected with code 0) in production.

@w-le



Copy link


Contributor

Author

unfortunately this happened again today, causing a third outage in the same prod instance.
Not yet sure why it can get stuck in this state while other times redis can restart and rest-api will reconnect to it just fine.

Maybe it’s related to the number of open redis connections/files within the api container exceeding a maximum or something? I have previously seen a redis error stating «too many connections» in this instance.

I’ll try something like this to increase max open files to see if it helps:https://stackoverflow.com/a/63700455
Will uat deploy today and prod deploy tonight.

evel=DEBUG time=2021-03-30T02:13:16Z source=place_os.api.session ws_request_id=1 sys_id=sys-GhPzxGRnmFJ module_name=Bookings index=1 name=bookings message=Session (bind)
level=WARN time=2021-03-30T02:13:16Z source=place_os.api.session sys_id=sys-GhPzxGRnmFJ module_name=Bookings index=1 name=bookings message=websocket binding could not find system
Socket::ConnectError: Error connecting to 'redis-headless.redis:6379': Address not available (Redis::CannotConnectError)
from app/lib/redis/src/redis/socket_wrapper.cr:10:5 in 'connect'
from app/lib/redis-cluster/src/cluster/bootstrap.cr:47:7 in 'redis'
from app/lib/redis-cluster/src/redis.cr:69:13 in 'connect!'
from usr/share/crystal/src/indexable.cr:269:9 in '???'
from app/lib/placeos-driver/src/placeos-driver/subscriptions/indirect_subscription.cr:44:5 in 'perform_subscribe'
from app/lib/placeos-driver/src/placeos-driver/subscriptions.cr:72:7 in 'bind'
from app/src/placeos-rest-api/session.cr:664:24 in '__send__'
from app/src/placeos-rest-api/session.cr:563:7 in '->'
from usr/share/crystal/src/primitives.cr:255:3 in 'run'
from app/src/placeos-rest-api/controllers/systems.cr:11:3 in '->'
from usr/share/crystal/src/primitives.cr:255:3 in 'process'
from usr/share/crystal/src/http/server.cr:498:5 in '->'
from usr/share/crystal/src/primitives.cr:255:3 in 'run'
from ???

@caspiano

I’ll add the healthcheck now.

We have had a file descriptor issue due to the creation of redis clients before.
@stakach could you investigate the redis connection issue?

@stakach

Comments

@w-le

I just had an instance of rest-api v1.17.11 get into this state:

  • api responses were OK for almost all requests made by backoffice (except ones that involve redis)
  • healthcheck was passing
  • However there must have been a redis connection issue — I believe the redis pod restarted on a different IP a few hours before this rest-api issue was reported (about 2021-03-15T23:00:00Z).
    redis pod previous status info is here:
terminated
Reason: Completed - exit code: 0
Started at: 2021-03-15T01:32:47Z
Finished at: 2021-03-15T10:02:06Z
  • api returns this error when attempting to list the Commands of any Module:
ERROR: Socket::ConnectError: Error connecting to 'redis-headless.redis:6379': Address not available (Redis::CannotConnectError)
  from app/lib/redis/src/redis/socket_wrapper.cr:10:5 in 'connect'
  from app/lib/redis-cluster/src/cluster/bootstrap.cr:47:7 in 'redis'
  from app/lib/redis-cluster/src/redis.cr:69:13 in 'connect!'
  from usr/share/crystal/src/indexable.cr:269:9 in '???'
  from app/lib/placeos-driver/src/placeos-driver/proxy/system.cr:44:5 in 'get__api_engine_v2_systems___sys_id_functions__module_slug'
  from usr/share/crystal/src/primitives.cr:255:3 in 'call'
  from usr/share/crystal/src/http/server/handler.cr:28:7 in 'call'
  from usr/share/crystal/src/http/server/handler.cr:28:7 in 'call'
  from usr/share/crystal/src/http/server/request_processor.cr:50:11 in 'process'
  from usr/share/crystal/src/http/server.cr:498:5 in '->'
  from usr/share/crystal/src/primitives.cr:255:3 in 'run'
  from ???

Recreating the api pod immediately resolved the issue.

Can we change the api healthcheck such that it fails if there is any current connection issue with Redis? OR, is that already the current healthcheck behaviour in the current version? This way kubernetes will auto restart api so that it connect to the new redis ip.

Alternatively, how often is the ip resolved for the redis hostname? if it’s currently resolved only once upon the first initialization redis cluster bootstrap, then perhaps changing this to resolve upon every cluster bootstrap would ensure the api pod never gets into this state and would not even need to be restarted to reconnect to redis.

@w-le



Copy link


Contributor

Author

Unfortunately the same issue occurred again today.

Interestingly Core had no issues reconnecting to Redis — hey @caspiano are there any differences between the implementation of the Redis client in core vs rest-api that may point to an easy change in rest-api such that it is able to reconnect to Redis seamlessly in cases where Redis is briefly unavailable?

@caspiano

There isn’t much difference, all interactions with redis are via the same PlaceOS::Driver::Storage interface or the underlying client of PlaceOS::Driver::Storage.

Looking into this.

@w-le



Copy link


Contributor

Author

More info that may or may not be helpful:

I was trying to reproduce the issue by restarting redis in a similar k8s environment (same client, uat instead of prod) but when doing this rest-api was able to reconnect without issues after redis was back up.

In this controlled test, during the brief redis downtime, rest-api will output this repeatedly as expected, and then connect fine afterwards
https://gist.github.com/w-le/8f424d1371fc7d9505789497967a41c9

So I’m not sure why the behaviour in this controlled test is different to that described in the OP. In fact we did not even see the same specific error type «Address not available (Redis::CannotConnectError)» outputted, even though that error was outputted both times when the issue occurred (Redis exits unexpected with code 0) in production.

@w-le



Copy link


Contributor

Author

unfortunately this happened again today, causing a third outage in the same prod instance.
Not yet sure why it can get stuck in this state while other times redis can restart and rest-api will reconnect to it just fine.

Maybe it’s related to the number of open redis connections/files within the api container exceeding a maximum or something? I have previously seen a redis error stating «too many connections» in this instance.

I’ll try something like this to increase max open files to see if it helps:https://stackoverflow.com/a/63700455
Will uat deploy today and prod deploy tonight.

evel=DEBUG time=2021-03-30T02:13:16Z source=place_os.api.session ws_request_id=1 sys_id=sys-GhPzxGRnmFJ module_name=Bookings index=1 name=bookings message=Session (bind)
level=WARN time=2021-03-30T02:13:16Z source=place_os.api.session sys_id=sys-GhPzxGRnmFJ module_name=Bookings index=1 name=bookings message=websocket binding could not find system
Socket::ConnectError: Error connecting to 'redis-headless.redis:6379': Address not available (Redis::CannotConnectError)
from app/lib/redis/src/redis/socket_wrapper.cr:10:5 in 'connect'
from app/lib/redis-cluster/src/cluster/bootstrap.cr:47:7 in 'redis'
from app/lib/redis-cluster/src/redis.cr:69:13 in 'connect!'
from usr/share/crystal/src/indexable.cr:269:9 in '???'
from app/lib/placeos-driver/src/placeos-driver/subscriptions/indirect_subscription.cr:44:5 in 'perform_subscribe'
from app/lib/placeos-driver/src/placeos-driver/subscriptions.cr:72:7 in 'bind'
from app/src/placeos-rest-api/session.cr:664:24 in '__send__'
from app/src/placeos-rest-api/session.cr:563:7 in '->'
from usr/share/crystal/src/primitives.cr:255:3 in 'run'
from app/src/placeos-rest-api/controllers/systems.cr:11:3 in '->'
from usr/share/crystal/src/primitives.cr:255:3 in 'process'
from usr/share/crystal/src/http/server.cr:498:5 in '->'
from usr/share/crystal/src/primitives.cr:255:3 in 'run'
from ???

@caspiano

I’ll add the healthcheck now.

We have had a file descriptor issue due to the creation of redis clients before.
@stakach could you investigate the redis connection issue?

@stakach

Понравилась статья? Поделить с друзьями:
  • При сбросе пароля произошла ошибка инстаграм
  • При попытке подписать документ электронно цифровой подписью выдает ошибку
  • При сбросе виндовс 10 выдает ошибку
  • При попытке подключения к информационной базе произошла ошибка метод объекта не обнаружен connect
  • При сборе почты произошла ошибка как ее устранить