Server selection error server selection timeout current topology

I've mostly copied your docker-compose from here: https://github.com/rwynn/monstache/blob/rel6/docker/test/docker-compose.test.yml and scripts from https://github.com/rwynn/monstache/tree/rel6/...

@justintilson

I’ve mostly copied your docker-compose from here: https://github.com/rwynn/monstache/blob/rel6/docker/test/docker-compose.test.yml and scripts from https://github.com/rwynn/monstache/tree/rel6/docker/test/mongodb/scripts. I’ve included my docker-compose.yml at the end of this post.

docker run --rm --env-file .env --network ptrac --name monstache rwynn/monstache:6.7.1

I’ve tried the root user, admin user, and app user with the same results:

ERROR 2020/11/25 20:46:11 Unable to connect to MongoDB using URL mongodb://REDACTED@mongo:27017/?replicaSet=rs0: server selection error: server selection timeout, current topology: { Type: ReplicaSetNoPrimary, Servers: [{ Addr: 127.0.0.1:27017, Type: Unknown, State: Connected, Average RTT: 0, Last error: connection() : dial tcp 127.0.0.1:27017: connect: connection refused }, ] }

The docker run command appears to get reading the .env file correctly. The contents are below

MONSTACHE_MONGO_URL=mongodb://root:password@mongo:27017/?replicaSet=rs0

Mongo is up and running. When I login to the container and then mongo, it looks like the replication set is configured correctly:

$ docker exec -it mongo /bin/bash
root@mongo:/scripts# mongo
MongoDB shell version v4.4.1
connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("2348a2c4-60d3-4d4c-bebc-bb46ac63ba4f") }
MongoDB server version: 4.4.1
rs0:PRIMARY> rs.status()
{
	"set" : "rs0",
	"date" : ISODate("2020-11-25T21:08:34.890Z"),
	"myState" : 1,

My docker-compose.yml

version: '3.7'
services:
  mongo:
    image: mongo
    hostname: mongo
    logging:
      driver: none
    container_name: mongo
    command: /scripts/mongo-run.sh
    working_dir: /scripts
    volumes:
      - ./scripts/mongo:/scripts
      - ./data/mongo:/data/db
    environment:
      MONGO_REPLICA_SET_NAME: rs0
      MONGO_REPLICAS: mongo:27017
      MONGO_REPLICA_SET_MEMBERS: "[{'_id':0,'host':'mongo:27017','priority':1}]"
      MONGO_USER_ROOT_NAME: root
      MONGO_USER_ROOT_PASSWORD: password
      MONGO_AUTH_SOURCE: admin
      MONGO_BIND_IP: "0.0.0.0"
      MONGO_DB_NAME: ptrac
    ports:
      - "27017:27017"
    networks:
      - ptrac
    healthcheck:
      test: "[ -f /data/health.check ] && exit 0 || exit 1"
      interval: 1s
      timeout: 30s
      retries: 300
    restart: unless-stopped
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.10.0
    container_name: elasticsearch
    hostname: elasticsearch
    environment:
      - xpack.security.enabled=false
      - discovery.type=single-node
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    cap_add:
      - IPC_LOCK
    volumes:
      - ./data/elasticsearch:/usr/share/elasticsearch/data
      - ./config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml
    ports:
      - 9200:9200
      - 9300:9300
    networks:
      - ptrac
volumes:
  data:
  scripts:
networks:
  ptrac:
    external: true

@rwynn

Hi @justintilson is it possible that monstache is not ending up on the same docker network as the elasticsearch and mongodb services for some reason? Can you verify with docker inspect <containername>?

I know that docker compose will prefix network names, but it shouldn’t be doing that for external type networks like you have afaik.

@justintilson

Thx for getting back to me Ryan. I’m blocked on this for a few days with hardware issues on my dev machine. I’ll be working again this weekend and will follow up then.

@DSamuylov

Hi @rwynn, I am having a similar issue. I basically start MongoDB, Elasticsearch and Monstache with separate commands, and I do not specify any network in docker-compose.yml. So the 3 services are running in different networks.

I can connect to MongoDB and Elasticsearch:

  1. from Safari I can see messages on the default ports
  2. from Python client using the same connection string that I provide to Monstache:
import pymongo

client = pymongo.MongoClient(
    "mongodb://admin:test@localhost:27017/?replicaSet=my-mongo-cluster"
)

However I get the error:

Attaching to monstache
monstache            | ERROR 2021/04/28 20:25:03 Unable to connect to MongoDB using URL mongodb://REDACTED@localhost:27017/?replicaSet=my-mongo-cluster: server selection error: server selection timeout, current topology: { Type: ReplicaSetNoPrimary, Servers: [{ Addr: localhost:27017, Type: Unknown, Average RTT: 0, Last error: connection() error occured during connection handshake: dial tcp 127.0.0.1:27017: connect: connection refused }, ] }

It is especially strange to see «Type: ReplicaSetNoPrimary», whereas I can see using rs.status() that one of the members is Primary:

...
  members: [
    {
      _id: 0,
      name: 'localhost:27017',
      health: 1,
      state: 1,
      stateStr: 'PRIMARY',
...

Is it a hard requirement that the containers have to be in the same network? What if I would like to deploy monstache on a separate server from Elasticsearch and Mongodb?

@DSamuylov

@luharprashant

Can you pls share the solution, below is my docker file and I’m trying to connect like
clientOptions := options.Client().ApplyURI("mongodb://datastore:27017")

version: '3'
services:
  datastore:
    image: mongo
    restart: always
    ports:
      - 27017:27017
    volumes:
      - ./mongodata/db:/data/db
      - ./mongodata/shared:/usr/games
  web:
    restart: always
    build: .
    ports: ["50051:50051"]
    depends_on:
      - datastore
#  mongo-express:
#    image: mongo-express
#    restart: always
#    ports:
#      - 8081:8081

@povesteam

You are using a Go driver, so you can try forcing a direct connection using connect=direct option, so that mongo driver ignores the replica set hosts and uses only the provided seed.

Your connection URL could look like this:

mongodb://user:password@mongo:27017/?connect=direct

You can also try with the directConnection=true option, for if your go mongo driver is newer than 1.4

Relevant links:

  • go documentation
  • mongo documentation
  • mongo go driver source code

@okikechinonso

Can you pls share the solution, below is my docker file and I’m trying to connect like clientOptions := options.Client().ApplyURI("mongodb://datastore:27017")

version: '3'
services:
  datastore:
    image: mongo
    restart: always
    ports:
      - 27017:27017
    volumes:
      - ./mongodata/db:/data/db
      - ./mongodata/shared:/usr/games
  web:
    restart: always
    build: .
    ports: ["50051:50051"]
    depends_on:
      - datastore
#  mongo-express:
#    image: mongo-express
#    restart: always
#    ports:
#      - 8081:8081
version: "3.9"  # optional since v1.27.0

services:
  countries-api:
    build:
      context: .
      dockerfile: dockerfile
    ports:
      - "8081:8081"
    networks:
      - api
    volumes:
      - .:/app
      - $GOPATH/pkg/mod:/go/pkg/mod
    depends_on:
      - db
  db:
    image: "mongo:latest"
    ports:
        - "44004:44004"
    volumes:
        - database-data:/data/db
    networks:
      - api

volumes:
  database-data:
    driver: local
networks:
  api:
    driver: bridge

mongos 3.4 refuses to connect to mongods with maxWireVersion

The read preference tag_sets parameter is an ordered list of tag sets used to restrict the eligibility of servers, such as for data center awareness.

Clients MUST raise an error if a non-empty tag set is given in tag_sets and the mode field is ‘primary’.

A read preference tag set ( T ) matches a server tag set ( S ) – or equivalently a server tag set ( S ) matches a read preference tag set ( T ) — if T is a subset of S (i.e. T ⊆ S ).

For example, the read preference tag set «< dc: ‘ny’, rack: ‘2’ >» matches a secondary server with tag set «< dc: ‘ny’, rack: ‘2’, size: ‘large’ >«.

A tag set that is an empty document matches any server, because the empty tag set is a subset of any tag set. This means the default tag_sets parameter ( [<>] ) matches all servers.

Tag sets are applied after filtering servers by mode and maxStalenessSeconds , and before selecting one server within the latency window.

Eligibility MUST be determined from tag_sets as follows:

  • If the tag_sets list is empty then all candidate servers are eligible servers. (Note, the default of [<>] means an empty list probably won’t often be seen, but if the client does not forbid an empty list, this rule MUST be implemented to handle that case.)
  • If the tag_sets list is not empty, then tag sets are tried in order until a tag set matches at least one candidate server. All candidate servers matching that tag set are eligible servers. Subsequent tag sets in the list are ignored.
  • If the tag_sets list is not empty and no tag set in the list matches any candidate server, no servers are eligible servers.

The read preference hedge parameter is a document that configures how the server will perform hedged reads. It consists of the following keys:

  • enabled : Enables or disables hedging

Hedged reads are automatically enabled in MongoDB 4.4+ when using a nearest read preference. To explicitly enable hedging, the hedge document must be passed. An empty document uses server defaults to control hedging, but the enabled key may be set to true or false to explicitly enable or disable hedged reads.

Drivers MAY allow users to specify an empty hedge document if they accept documents for read preference options. Any driver that exposes a builder API for read preference objects MUST NOT allow an empty hedge document to be constructed. In this case, the user MUST specify a value for enabled , which MUST default to true . If the user does not call a hedge API method, drivers MUST NOT send a hedge option to the server.

Drivers MUST allow users to configure a default read preference on a MongoClient object. Drivers MAY allow users to configure a default read preference on a Database or Collection object.

A read preference MAY be specified as an object, document or individual mode , tag_sets , and maxStalenessSeconds parameters, depending on what is most idiomatic for the language.

If more than one object has a default read preference, the default of the most specific object takes precedence. I.e. Collection is preferred over Database , which is preferred over MongoClient .

Drivers MAY allow users to set a read preference on queries on a per-operation basis similar to how hint or batchSize are set. E.g., in Python:

If a server of type Mongos or LoadBalancer is selected for a read operation, the read preference is passed to the selected mongos through the use of $readPreference (as a Global Command Argument for OP_MSG or a query modifier for OP_QUERY) and, for OP_QUERY only, the SecondaryOk wire protocol flag, according to the following rules.

  • For mode ‘primary’, drivers MUST NOT set $readPreference
  • For all other read preference modes (i.e. ‘secondary’, ‘primaryPreferred’, . ), drivers MUST set $readPreference

If the read preference contains only a mode parameter and the mode is ‘primary’ or ‘secondaryPreferred’, for maximum backwards compatibility with older versions of mongos, drivers MUST only use the value of the SecondaryOk wire protocol flag (i.e. set or unset) to indicate the desired read preference and MUST NOT use a $readPreference query modifier.

Therefore, when sending queries to a mongos or load balancer, the following rules apply:

  • For mode ‘primary’, drivers MUST NOT set the SecondaryOk wire protocol flag and MUST NOT use $readPreference
  • For mode ‘secondary’, drivers MUST set the SecondaryOk wire protocol flag and MUST also use $readPreference
  • For mode ‘primaryPreferred’, drivers MUST set the SecondaryOk wire protocol flag and MUST also use $readPreference
  • For mode ‘secondaryPreferred’, drivers MUST set the SecondaryOk wire protocol flag. If the read preference contains a non-empty tag_sets parameter, maxStalenessSeconds is a positive integer, or the hedge parameter is non-empty, drivers MUST use $readPreference ; otherwise, drivers MUST NOT use $readPreference
  • For mode ‘nearest’, drivers MUST set the SecondaryOk wire protocol flag and MUST also use $readPreference

The $readPreference query modifier sends the read preference as part of the query. The read preference fields tag_sets is represented in a $readPreference document using the field name tags .

When sending a read operation via OP_QUERY and any $ modifier is used, including the $readPreference modifier, the query MUST be provided using the $query modifier like so:

A valid $readPreference document for mongos or load balancer has the following requirements:

The mode field MUST be present exactly once with the mode represented in camel case:

  • ‘primary’
  • ‘secondary’
  • ‘primaryPreferred’
  • ‘secondaryPreferred’
  • ‘nearest’

If the mode field is «primary», the tags , maxStalenessSeconds , and hedge fields MUST be absent.

Otherwise, for other mode values, the tags field MUST either be absent or be present exactly once and have an array value containing at least one document. It MUST contain only documents, no other type.

The maxStalenessSeconds field MUST be either be absent or be present exactly once with an integer value.

The hedge field MUST be either absent or be a document.

Mongos or service receiving a query with $readPreference SHOULD validate the mode , tags , maxStalenessSeconds , and hedge fields according to rules 1 and 2 above, but SHOULD ignore unrecognized fields for forward-compatibility rather than throwing an error.

Because some commands are used for writes, deployment-changes or other state-changing side-effects, the use of read preference by a driver depends on the command and how it is invoked:

Write commands: insert , update , delete , findAndModify

Write commands are considered write operations and MUST follow the corresponding Rules for server selection for each topology type.

Generic command method: typically command or runCommand

The generic command method MUST act as a read operation for the purposes of server selection.

The generic command method has a default read preference of mode ‘primary’. The generic command method MUST ignore any default read preference from client, database or collection configuration. The generic command method SHOULD allow an optional read preference argument.

If an explicit read preference argument is provided as part of the generic command method call, it MUST be used for server selection, regardless of the name of the command. It is up to the user to use an appropriate read preference, e.g. not calling renameCollection with a mode of ‘secondary’.

N.B.: «used for server selection» does not supercede rules for server selection on «Standalone» topologies, which ignore any requested read preference.

Command-specific helper: methods that wrap database commands, like count , distinct , listCollections or renameCollection .

Command-specific helpers MUST act as read operations for the purposes of server selection, with read preference rules defined by the following three categories of commands:

«must-use-primary»: these commands have state-modifying effects and will only succeed on a primary. An example is renameCollection .

These command-specific helpers MUST use a read preference mode of ‘primary’, MUST NOT take a read preference argument and MUST ignore any default read preference from client, database or collection configuration. Languages with dynamic argument lists MUST throw an error if a read preference is provided as an argument.

Clients SHOULD rely on the server to return a «not writable primary» or other error if the command is «must-use-primary». Clients MAY raise an exception before sending the command if the topology type is Single and the server type is not «Standalone», «RSPrimary» or «Mongos», but the identification of the set of ‘must-use-primary’ commands is out of scope for this specification.

«should-use-primary»: these commands are intended to be run on a primary, but would succeed — albeit with possibly stale data — when run against a secondary. An example is listCollections .

These command-specific helpers MUST use a read preference mode of ‘primary’, MUST NOT take a read preference argument and MUST ignore any default read preference from client, database or collection configuration. Languages with dynamic argument lists MUST throw an error if a read preference is provided as an argument.

Clients MUST NOT raise an exception if the topology type is Single.

«may-use-secondary»: these commands run against primaries or secondaries, according to users’ read preferences. They are sometimes called «query-like» commands.

The current list of «may-use-secondary» commands includes:

  • aggregate without a write stage (e.g. $out , $merge )
  • collStats
  • count
  • dbStats
  • distinct
  • find
  • geoNear
  • geoSearch
  • group
  • mapReduce where the out option is
  • parallelCollectionScan

Associated command-specific helpers SHOULD take a read preference argument and otherwise MUST use the default read preference from client, database, or collection configuration.

For pre-5.0 servers, an aggregate command is «must-use-primary» if its pipeline contains a write stage (e.g. $out , $merge ); otherwise, it is «may-use-secondary». For 5.0+ servers, secondaries can execute an aggregate command with a write stage and all aggregate commands are «may-use-secondary». This is discussed in more detail in Read preferences and server selection in the CRUD spec.

If a client provides a specific helper for inline mapReduce, then it is «may-use-secondary» and the regular mapReduce helper is «must-use-primary». Otherwise, the mapReduce helper is «may-use-secondary» and it is the user’s responsibility to specify when running mapReduce on a secondary.

New command-specific helpers implemented in the future will be considered «must-use-primary», «should-use-primary» or «may-use-secondary» according to the specifications for those future commands. Command helper specifications SHOULD use those terms for clarity.

Server selection is a process which takes an operation type (read or write), a ClusterDescription, and optionally a read preference and, on success, returns a ServerDescription for an operation of the given type.

Server selection varies depending on whether a client is multi-threaded/asynchronous or single-threaded because a single-threaded client cannot rely on the topology state being updated in the background.

Multi-threaded drivers and single-threaded drivers with serverSelectionTryOnce set to false MUST enforce a timeout for the server selection process. The timeout MUST be computed as described in Client Side Operations Timeout: Server Selection.

A driver that uses multi-threaded or asynchronous monitoring MUST unblock waiting operations as soon as server selection completes, even if not all servers have been checked by a monitor. Put differently, the client MUST NOT block server selection while waiting for server discovery to finish.

For example, if the client is discovering a replica set and the application attempts a read operation with mode ‘primaryPreferred’, the operation MUST proceed immediately if a suitable secondary is found, rather than blocking until the client has checked all members and possibly discovered a primary.

The number of threads allowed to wait for server selection SHOULD be either (a) the same as the number of threads allowed to wait for a connection from a pool; or (b) governed by a global or client-wide limit on number of waiting threads, depending on how resource limits are implemented by a driver.

Multi-threaded or async drivers MUST keep track of the number of operations that a given server is currently executing (the server’s operationCount ). This value MUST be incremented once a server is selected for an operation and MUST be decremented once that operation has completed, regardless of its outcome. Where this value is stored is left as a implementation detail of the driver; some example locations include the Server type that also owns the connection pool for the server (if there exists such a type in the driver’s implementation) or on the pool itself. Incrementing or decrementing a server’s operationCount MUST NOT wake up any threads that are waiting for a topology update as part of server selection. See operationCount-based selection within the latency window (multi-threaded or async) for the rationale behind the way this value is used.

For multi-threaded clients, the server selection algorithm is as follows:

  1. Record the server selection start time
  2. If the topology wire version is invalid, raise an error
  3. Find suitable servers by topology type and operation type
  4. Filter the suitable servers by calling the optional, application-provided server selector.
  5. If there are any suitable servers, filter them according to Filtering suitable servers based on the latency window and continue to the next step; otherwise, goto Step #9.
  6. Choose two servers at random from the set of suitable servers in the latency window. If there is only 1 server in the latency window, just select that server and goto Step #8.
  7. Of the two randomly chosen servers, select the one with the lower operationCount . If both servers have the same operationCount , select arbitrarily between the two of them.
  8. Increment the operationCount of the selected server and return it. Do not go onto later steps.
  9. Request an immediate topology check, then block the server selection thread until the topology changes or until the server selection timeout has elapsed
  10. If server selection has timed out, raise a server selection error
  11. Goto Step #2

Single-threaded drivers do not monitor the topology in the background. Instead, they MUST periodically update the topology during server selection as described below.

When serverSelectionTryOnce is true, server selection timeouts have no effect; a single immediate topology check will be done if the topology starts stale or if the first selection attempt fails.

When serverSelectionTryOnce is false, then the server selection loops until a server is successfully selected or until the selection timeout is exceeded.

Therefore, for single-threaded clients, the server selection algorithm is as follows:

  1. Record the server selection start time
  2. Record the maximum time as start time plus the computed timeout
  3. If the topology has not been scanned in heartbeatFrequencyMS milliseconds, mark the topology stale
  4. If the topology is stale, proceed as follows:
    • record the target scan time as last scan time plus minHeartBeatFrequencyMS
    • if serverSelectionTryOnce is false and the target scan time would exceed the maximum time, raise a server selection error
    • if the current time is less than the target scan time, sleep until the target scan time
    • do a blocking immediate topology check (which must also update the last scan time and mark the topology as no longer stale)
  5. If the topology wire version is invalid, raise an error
  6. Find suitable servers by topology type and operation type
  7. Filter the suitable servers by calling the optional, application-provided server selector.
  8. If there are any suitable servers, filter them according to Filtering suitable servers based on the latency window and return one at random from the filtered servers; otherwise, mark the topology stale and continue to step #9.
  9. If serverSelectionTryOnce is true and the last scan time is newer than the selection start time, raise a server selection error; otherwise, goto Step #4
  10. If the current time exceeds the maximum time, raise a server selection error
  11. Goto Step #4

Before using a socket to the selected server, drivers MUST check whether the socket has been used in socketCheckIntervalMS milliseconds. If the socket has been idle for longer, the driver MUST update the ServerDescription for the selected server. After updating, if the server is no longer suitable, the driver MUST repeat the server selection algorithm and select a new server.

Because single-threaded selection can do a blocking immediate check, the server selection timeout is not a hard deadline. The actual maximum server selection time for any given request can vary from the timeout minus minHeartbeatFrequencyMS to the timeout plus the time required for a blocking scan.

Single-threaded drivers MUST document that when serverSelectionTryOne is true, selection may take up to the time required for a blocking scan, and when serverSelectionTryOne is false, selection may take up to the timeout plus the time required for a blocking scan.

When a deployment has topology type «Unknown», no servers are suitable for read or write operations.

A deployment of topology type Single contains only a single server of any type. Topology type Single signifies a direct connection intended to receive all read and write operations.

Therefore, read preference is ignored during server selection with topology type Single. The single server is always suitable for reads if it is available. Depending on server type, the read preference is communicated to the server differently:

  • Type Mongos: the read preference is sent to the server using the rules for Passing read preference to mongos and load balancers.
  • Type Standalone: clients MUST NOT send the read preference to the server
  • For all other types, using OP_QUERY: clients MUST always set the SecondaryOk wire protocol flag on reads to ensure that any server type can handle the request.
  • For all other types, using OP_MSG: If no read preference is configured by the application, or if the application read preference is Primary, then $readPreference MUST be set to < «mode»: «primaryPreferred» >to ensure that any server type can handle the request. If the application read preference is set otherwise, $readPreference MUST be set following Document structure.

The single server is always suitable for write operations if it is available.

During command construction, drivers MUST add a $readPreference field to the command when required by Passing read preference to mongos and load balancers; see the Load Balancer Specification for details.

A deployment with topology type ReplicaSetWithPrimary or ReplicaSetNoPrimary can have a mix of server types: RSPrimary (only in ReplicaSetWithPrimary), RSSecondary, RSArbiter, RSOther, RSGhost, Unknown or PossiblePrimary.

For the purpose of selecting a server for read operations, the same rules apply to both ReplicaSetWithPrimary and ReplicaSetNoPrimary.

To select from the topology a server that matches the user’s Read Preference:

If mode is ‘primary’, select the primary server.

If mode is ‘secondary’ or ‘nearest’:

  1. Select all secondaries if mode is ‘secondary’, or all secondaries and the primary if mode is ‘nearest’.
  2. From these, filter out servers staler than maxStalenessSeconds if it is a positive number.
  3. From the remaining servers, select servers matching the tag_sets .
  4. From these, select one server within the latency window.

If mode is ‘secondaryPreferred’, attempt the selection algorithm with mode ‘secondary’ and the user’s maxStalenessSeconds and tag_sets . If no server matches, select the primary.

If mode is ‘primaryPreferred’, select the primary if it is known, otherwise attempt the selection algorithm with mode ‘secondary’ and the user’s maxStalenessSeconds and tag_sets .

For all read preferences modes except ‘primary’, clients MUST set the SecondaryOk wire protocol flag (OP_QUERY) or $readPreference global command argument (OP_MSG) to ensure that any suitable server can handle the request. If the read preference mode is ‘primary’, clients MUST NOT set the SecondaryOk wire protocol flag (OP_QUERY) or $readPreference global command argument (OP_MSG).

If the topology type is ReplicaSetWithPrimary, only an available primary is suitable for write operations.

If the topology type is ReplicaSetNoPrimary, no servers are suitable for write operations.

A deployment of topology type Sharded contains one or more servers of type Mongos or Unknown.

For read operations, all servers of type Mongos are suitable; the mode , tag_sets , and maxStalenessSeconds read preference parameters are ignored for selecting a server, but are passed through to mongos. See Passing read preference to mongos and load balancers.

For write operations, all servers of type Mongos are suitable.

If more than one mongos is suitable, drivers MUST select a suitable server within the latency window (see Filtering suitable servers based on the latency window).

For every available server, clients MUST track the average RTT of server monitoring hello or legacy hello commands.

An Unknown server has no average RTT. When a server becomes unavailable, its average RTT MUST be cleared. Clients MAY implement this idiomatically (e.g nil, -1, etc.).

When there is no average RTT for a server, the average RTT MUST be set equal to the first RTT measurement (i.e. the first hello or legacy hello command after the server becomes available).

After the first measurement, average RTT MUST be computed using an exponentially-weighted moving average formula, with a weighting factor ( alpha ) of 0.2. If the prior average is denoted old_rtt , then the new average ( new_rtt ) is computed from a new RTT measurement ( x ) using the following formula:

A weighting factor of 0.2 was chosen to put about 85% of the weight of the average RTT on the 9 most recent observations.

Server selection results in a set of zero or more suitable servers. If more than one server is suitable, a server MUST be selected from among those within the latency window.

The localThresholdMS configuration parameter controls the size of the latency window used to select a suitable server.

The shortest average round trip time (RTT) from among suitable servers anchors one end of the latency window ( A ). The other end is determined by adding localThresholdMS ( B = A + localThresholdMS ).

A server MUST be selected from among suitable servers that have an average RTT ( RTT ) within the latency window (i.e. A ≤ RTT ≤ B ). In other words, the suitable server with the shortest average RTT is always a possible choice. Other servers could be chosen if their average RTTs are no more than localThresholdMS more than the shortest average RTT.

See either Single-threaded server selection or Multi-threaded or asynchronous server selection for information on how to select a server from among those within the latency window.

Only for single-threaded drivers.

If a server is selected that has an existing connection that has been idle for socketCheckIntervalMS, the driver MUST check the connection with the «ping» command. If the ping succeeds, use the selected connection. If not, set the server’s type to Unknown and update the Topology Description according to the Server Discovery and Monitoring Spec, and attempt once more to select a server.

The logic is expressed in this pseudocode. The algorithm for the «getServer» function is suggested below, in Single-threaded server selection implementation:

The prior read preference specification included the concept of a «request», which pinned a server to a thread for subsequent, related reads. Requests and pinning are now deprecated. See What happened to pinning? for the rationale for this change.

Drivers with an existing request API MAY continue to provide it for backwards compatibility, but MUST document that pinning for the request does not guarantee monotonic reads.

Drivers MUST NOT automatically pin the client or a thread to a particular server without an explicit start_request (or comparable) method call.

Outside a legacy «request» API, drivers MUST use server selection for each individual read operation.

The single-threaded reference implementation is the Perl master branch (work towards v1.0.0). The multi-threaded reference implementation is TBD.

These are suggestions. As always, driver authors should balance cross-language standardization with backwards compatibility and the idioms of their language.

Modes (‘primary’, ‘secondary’, . ) are constants declared in whatever way is idiomatic for the programming language. The constant values may be ints, strings, or whatever. However, when attaching modes to $readPreference camel case must be used as described above in Passing read preference to mongos and load balancers.

‘primaryPreferred’ is equivalent to selecting a server with read preference mode ‘primary’ (without tag_sets or maxStalenessSeconds ), or, if that fails, falling back to selecting with read preference mode ‘secondary’ (with tag_sets and maxStalenessSeconds , if provided).

‘secondaryPreferred’ is the inverse: selecting with mode ‘secondary’ (with tag_sets and maxStalenessSeconds ) and falling back to selecting with mode ‘primary’ (without tag_sets or maxStalenessSeconds ).

Depending on the implementation, this may result in cleaner code.

The term ‘nearest’ is unfortunate, as it implies a choice based on geographic locality or absolute lowest latency, neither of which are true.

Instead, and unlike the other read preference modes, ‘nearest’ does not favor either primaries or secondaries; instead all servers are candidates and are filtered by tag_sets and maxStalenessSeconds .

To always select the server with the lowest RTT, users should use mode ‘nearest’ without tag_sets or maxStalenessSeconds and set localThresholdMS to zero.

To distribute reads across all members evenly regardless of RTT, users should use mode ‘nearest’ without tag_sets or maxStalenessSeconds and set localThresholdMS very high so that all servers fall within the latency window.

In both cases, tag_sets and maxStalenessSeconds could be used to further restrict the set of eligible servers, if desired.

Tag set lists can be configured in the driver in whatever way is natural for the language.

The following example uses a single lock for clarity. Drivers are free to implement whatever concurrency model best suits their design.

Drivers should use server descriptions and their error attributes (if set) to return useful error messages.

For example, when there are no members matching the ReadPreference:

  • «No server available for query with ReadPreference primary»
  • «No server available for query with ReadPreference secondary»
  • «No server available for query with ReadPreference » + mode + «, tag set list » + tag_sets + «, and maxStalenessSeconds » + maxStalenessSeconds

Or, if authentication failed:

  • «Authentication failed: [specific error message]»

Here is a sketch of some pseudocode for handling error reporting when errors could be different across servers:

Cursor operations OP_GET_MORE and OP_KILL_CURSOR do not go through the server selection process. Cursor operations must be sent to the original server that received the query and sent the OP_REPLY. For exhaust cursors, the same socket must be used for OP_GET_MORE until the cursor is exhausted.

Operations that are part of a sharded transaction (after the initial command) do not go through the server selection process. Sharded transaction operations MUST be sent to the original mongos server on which the transaction was started.

Note: As of MongoDB 2.6, mongos doesn’t distribute the «text» command to secondaries, see SERVER-10947.

However, the «text» command is deprecated in 2.6, so this command-specific helper may become deprecated before this is fixed.

The server selection test plan is given in a separate document that describes the tests and supporting data files: Server Selection Tests

The prior version of the read preference spec had only a loose definition of server or topology types. The Server Discovery and Monitoring spec defines these terms explicitly and they are used here for consistency and clarity.

In order to ensure that behavior is consistent regardless of topology type, read preference behaviors are limited to those that mongos can proxy.

For example, mongos ignores read preference ‘secondary’ when a shard consists of a single server. Therefore, this spec calls for topology type Single to ignore read preferences for consistency.

The spec has been written with the intention that it can apply to both drivers and mongos and the term «client» has been used when behaviors should apply to both. Behaviors that are specific to drivers are largely limited to those for communicating with a mongos.

Because this does not apply only to secondaries and does not limit absolute latency, the name secondaryAcceptableLatencyMS is misleading.

The mongos name localThreshold misleads because it has nothing to do with locality. It also doesn’t include the MS units suffix for consistency with other time-related configuration options.

However, given a choice between the two, localThreshold is a more general term. For drivers, we add the MS suffix for clarity about units and consistency with other configuration options.

When more than one server is judged to be suitable, the spec calls for random selection to ensure a fair distribution of work among servers within the latency window.

It would be hard to ensure a fair round-robin approach given the potential for servers to come and go. Making newly available servers either first or last could lead to unbalanced work. Random selection has a better fairness guarantee and keeps the design simpler.

As operation execution slows down on a node (e.g. due to degraded server-side performance or increased network latency), checked-out pooled connections to that node will begin to remain checked out for longer periods of time. Assuming at least constant incoming operation load, more connections will then need to be opened against the node to service new operations that it gets selected for, further straining it and slowing it down. This can lead to runaway connection creation scenarios that can cripple a deployment («connection storms»). As part of DRIVERS-781, the random choice portion of multi-threaded server selection was changed to more evenly spread out the workload among suitable servers in order to prevent any single node from being overloaded. The new steps achieve this by approximating an individual server’s load via the number of concurrent operations that node is processing (operationCount) and then routing operations to servers with less load. This should reduce the number of new operations routed towards nodes that are busier and thus increase the number routed towards nodes that are servicing operations faster or are simply less busy. The previous random selection mechanism did not take load into account and could assign work to nodes that were under too much stress already.

As an added benefit, the new approach gives preference to nodes that have recently been discovered and are thus are more likely to be alive (e.g. during a rolling restart). The narrowing to two random choices first ensures new servers aren’t overly preferred however, preventing a «thundering herd» situation. Additionally, the maxConnecting provisions included in the CMAP specification prevent drivers from crippling new nodes with connection storms.

This approach is based on the «Power of Two Random Choices with Least Connections» load balancing algorithm.

An alternative approach to this would be to prefer selecting servers that already have available connections. While that approach could help reduce latency, it does not achieve the benefits of routing operations away from slow servers or of preferring newly introduced servers. Additionally, that approach could lead to the same node being selected repeatedly rather than spreading the load out among all suitable servers.

In server selection, there is a race condition that could exist between what a selected server type is believed to be and what it actually is.

The SecondaryOk wire protocol flag solves the race problem by communicating to the server whether a secondary is acceptable. The server knows its type and can return a «not writable primary» error if SecondaryOk is false and the server is a secondary.

However, because topology type Single is used for direct connections, we want read operations to succeed even against a secondary, so the SecondaryOk wire protocol flag must be sent to mongods with topology type Single.

(If the server type is Mongos, follow the rules for Passing read preference to mongos and load balancers, even for topology type Single.)

The list of commands that can go to secondaries changes over time and depends not just on the command but on parameters. For example, the mapReduce command may or may not be able to be run on secondaries depending on the value of the out parameter.

It significantly simplifies implementation for the general command method always to go to the primary unless a explicit read preference is set and rely on users of the general command method to provide a read preference appropriate to the command.

The command-specific helpers will need to implement a check of read preferences against the semantics of the command and its parameters, but keeping this logic close to the command rather than in a generic method is a better design than either delegating this check to the generic method, duplicating the logic in the generic method, or coupling both to another validation method.

Using an exponentially-weighted moving average avoids having to store and rotate an arbitrary number of RTT observations. All observations count towards the average. The weighting makes recent observations count more heavily while smoothing volatility.

Error messages should be sufficiently verbose to allow users and/or support engineers to determine the reasons for server selection failures from log or other error messages.

Single-threaded drivers in languages like PHP and Perl are typically deployed as many processes per application server. Each process must independently discover and monitor the MongoDB deployment.

When no suitable server is available (due to a partition or misconfiguration), it is better for each request to fail as soon as its process detects a problem, instead of waiting and retrying to see if the deployment recovers.

Minimizing response latency is important for maximizing request-handling capacity and for user experience (e.g. a quick fail message instead of a slow web page).

However, when a request arrives and the topology information is already stale, or no suitable server is known, making a single attempt to update the topology to service the request is acceptable.

A user of a single-threaded driver who prefers resilience in the face of topology problems, rather than short response times, can turn the «try once» mode off. Then driver rescans the topology every minHeartbeatFrequencyMS until a suitable server is found or the timeout expires.

Single-threaded clients need to make a compromise: if they check servers too frequently it slows down regular operations, but if they check too rarely they cannot proactively avoid errors.

Errors are more disruptive for single-threaded clients than for multi-threaded. If one thread in a multi-threaded process encounters an error, it warns the other threads not to use the disconnected server. But single-threaded clients are deployed as many independent processes per application server, and each process must throw an error until all have discovered that a server is down.

The compromise specified here balances the cost of frequent checks against the disruption of many errors. The client preemptively checks individual sockets that have not been used in the last socketCheckIntervalMS, which is more frequent by default than heartbeatFrequencyMS defined in the Server Discovery and Monitoring Spec.

The client checks the socket with a «ping» command, rather than «hello» or legacy hello, because it is not checking the server’s full state as in the Server Discovery and Monitoring Spec, it is only verifying that the connection is still open. We might also consider a select or poll call to check if the socket layer considers the socket closed, without requiring a round-trip to the server. However, this technique usually will not detect an uncleanly shutdown server or a network outage.

In general, backwards breaking changes have been made in the name of consistency with mongos and avoiding misleading users about monotonicity.

  • Automatic pinning (see What happened to pinning?)
  • Auto retry (replaced by the general server selection algorithm)
  • mongos «high availability» mode (effectively, mongos pinning)

Other features and behaviors have changed explicitly

  • Ignoring read preferences for topology type Single
  • Default read preference for the generic command method

Changes with grandfather clauses

  • Alternate names for localThresholdMS
  • Pinning for legacy request APIs

Internal changes with little user-visibility

The prior read preference spec, which was implemented in the versions of the drivers and mongos released concomitantly with MongoDB 2.2, stated that a thread / client should remain pinned to an RS member as long as that member matched the current mode, tags, and acceptable latency. This increased the odds that reads would be monotonic (assuming no rollback), but had the following surprising consequence:

  1. Thread / client reads with mode ‘secondary’ or ‘secondaryPreferred’, gets pinned to a secondary
  2. Thread / client reads with mode ‘primaryPreferred’, driver / mongos sees that the pinned member (a secondary) matches the mode (which allows for a secondary) and reads from secondary, even though the primary is available and preferable

The old spec also had the swapped problem, reading from the primary with ‘secondaryPreferred’, except for mongos which was changed at the last minute before release with SERVER-6565.

This left application developers with two problems:

  1. ‘primaryPreferred’ and ‘secondaryPreferred’ acted surprisingly and unpredictably within requests
  2. There was no way to specify a common need: read from a secondary if possible with ‘secondaryPreferred’, then from primary if possible with ‘primaryPreferred’, all within a request. Instead an application developer would have to do the second read with ‘primary’, which would unpin the thread but risk unavailability if only secondaries were up.

Additionally, mongos 2.4 introduced the releaseConnectionsAfterResponse option (RCAR), mongos 2.6 made it the default and mongos 2.8 will remove the ability to turn it off. This means that pinning to a mongos offers no guarantee that connections to shards are pinned. Since we can’t provide the same guarantees for replica sets and sharded clusters, we removed automatic pinning entirely and deprecated «requests». See SERVER-11956 and SERVER-12273.

Regardless, even for replica sets, pinning offers no monotonicity because of the ever-present possibility of rollbacks. Through MongoDB 2.6, secondaries did not close sockets on rollback, so a rollback could happen between any two queries without any indication to the driver.

Therefore, an inconsistent feature that doesn’t actually do what people think it does has no place in the spec and has been removed. Should the server eventually implement some form of «sessions», this spec will need to be revised accordingly.

Mongos HA has similar problems with pinning, in that one can wind up pinned to a high-latency mongos even if a lower-latency mongos later becomes available.

Selection within the latency window avoids this problem and makes server selection exactly analogous to having multiple suitable servers from a replica set. This is easier to explain and implement.

The old auto-retry mechanism was closely connected to server pinning, which has been removed. It also mandated exactly three attempts to carry out a query on different servers, with no way to disable or adjust that value, and only for the first query within a request.

To the extent that auto-retry was trying to compensate for unavailable servers, the Server Discovery and Monitoring spec and new server selection algorithm provide a more robust and configurable way to direct all queries to available servers.

After a server is selected, several error conditions could still occur that make the selected server unsuitable for sending the operation, such as:

  • the server could have shutdown the socket (e.g. a primary stepping down),
  • a connection pool could be empty, requiring new connections; those connections could fail to connect or could fail the server handshake

Once an operation is sent over the wire, several additional error conditions could occur, such as:

  • a socket timeout could occur before the server responds
  • the server might send an RST packet, indicating the socket was already closed
  • for write operations, the server might return a «not writable primary» error

This specification does not require nor prohibit drivers from attempting automatic recovery for various cases where it might be considered reasonable to do so, such as:

  • repeating server selection if, after selection, a socket is determined to be unsuitable before a message is sent on it
  • for a read operation, after a socket error, selecting a new server meeting the read preference and resending the query
  • for a write operation, after a «not writable primary» error, selecting a new server (to locate the primary) and resending the write operation

Driver-common rules for retrying operations (and configuring such retries) could be the topic of a different, future specification.

The intention of read preference’s list of tag sets is to allow a user to prefer the first tag set but fall back to members matching later tag sets. In order to know whether to fall back or not, we must first filter by all other criteria.

Say you have two secondaries:

  • Node 1, tagged <‘tag’: ‘value1’>, estimated staleness 5 minutes
  • Node 2, tagged <‘tag’: ‘value2’>, estimated staleness 1 minute

And a read preference:

If tag sets were applied before maxStalenessSeconds, we would select Node 1 since it matches the first tag set, then filter it out because it is too stale, and be left with no eligible servers.

The user’s intent in specifying two tag sets was to fall back to the second set if needed, so we filter by maxStalenessSeconds first, then tag_sets, and select Node 2.

Источник

Adblock
detector

We have 3-member MongoDB replica set. In a case that one member disconnects from the replica set due to a DNS outage, mongodb-exporter of the isolated member doesn’t provide metrics.

Our setup:

MongoDB nodes are installed natively on a separate ubuntu virtual machines, each in a different data center, creating together one replica set.
On each virtual machine is running mongodb-exporter in a docker container, exporter is connected to an instance of MongoDB which is running on the same
virtual. Config files and starting commands are included below.

Missing metrics description:

In a normal network condition, all metrics are exposed on a /metrics endpoint without any problems.

When DNS in one datacenter goes down, replica set is disconnected due to unresolvable domain names of nodes into two parts:

 First part:  primary (mongo1) + secondary (mongo2) 

 Second part:  isolated secondary node (mongo3).

Metrics on the first part are ok, however mongodb_exporter in an isolated part with one node (mongo3):

  •   doesn’t provide MongoDB metrics at all
  •   or provide complete metrics  with a huge scrape duration (more than an hour).

Why we suppose this is a bug:

  • mongo3 node is up and running
  • mongo3 node is reachable from inside of the mongodb-exporter container with ping
  • mongo3 node can provide data via pyMongo driver (although data are served very slowly and sometimes the connection even timeouts), which is
    using the same docker network as is using the mongodb-exporter
  • there are following errors in a mongodb-exporter. They are occuring regularly, but in a rather random manner. Not always on a same go-driver call, but in random fashion on various go-driver calls:
    sometimes on collecting server version,  sometimes on collecting collections metrics etc.

Starting command of a mongodb exporter:

sudo docker run --name mongodb-exporter-debug-bitnami -d --restart=always --add-host=mongo.xxx.cz:`ip addr show docker0 | grep -Po 'inet K[d.]+'` -p 9220:9216 bitnami/mongodb-exporter:0.11.0 --mongodb.uri=mongodb:' --log.level="debug" --collect.database --collect.collection --collect.topmetrics

Logs with errors  from the mongodb-exporter:

mongo3:~$ sudo docker logs mongodb-exporter-debug-bitnami
time="2020-06-03T20:47:32Z" level=info msg="Starting mongodb_exporter (version=, branch=, revision=)" source="mongodb_exporter.go:80"
time="2020-06-03T20:47:32Z" level=info msg="Build context (go=go1.13.3, user=, date=19700101-00:00:00)" source="mongodb_exporter.go:81"
time="2020-06-03T20:48:02Z" level=error msg="Could not get MongoDB BuildInfo: server selection error: server selection timeoutncurrent topology: Type: SinglenServers:nAddr: mongo.xxx.cz:27017, Type: Unknown, State: Connected, Average RTT: 0, Last error: connection(mongo.xxx.cz:27017[-3]) connection is closedn!" source="connection.go:84"
time="2020-06-03T20:48:02Z" level=error msg="Problem gathering the mongo server version: server selection error: server selection timeoutncurrent topology: Type: SinglenServers:nAddr: mongo.xxx.cz:27017, Type: Unknown, State: Connected, Average RTT: 0, Last error: connection(mongo.xxx.cz:27017[-3]) connection is closedn" source="mongodb_collector.go:195"
time="2020-06-03T20:48:02Z" level=info msg="Starting HTTP server for http: source="server.go:140"
time="2020-06-03T20:48:28Z" level=debug msg="Connected to: mongodb: source="mongodb_collector.go:208"
time="2020-06-03T20:48:28Z" level=debug msg="Collecting Server Status" source="mongodb_collector.go:261"
time="2020-06-03T20:48:38Z" level=debug msg="Connected to: mongodb: source="mongodb_collector.go:208"
time="2020-06-03T20:48:38Z" level=debug msg="Collecting Server Status" source="mongodb_collector.go:261"
time="2020-06-03T20:48:38Z" level=debug msg="Collecting Database Status From Mongod" source="mongodb_collector.go:268"
time="2020-06-03T20:48:38Z" level=debug msg="Collecting Collection Status From Mongod" source="mongodb_collector.go:276"
time="2020-06-03T20:48:48Z" level=debug msg="Collecting Database Status From Mongod" source="mongodb_collector.go:268"
time="2020-06-03T20:48:58Z" level=debug msg="Collecting Collection Status From Mongod" source="mongodb_collector.go:276"
time="2020-06-03T20:52:49Z" level=warning msg="server selection error: server selection timeoutncurrent topology: Type: SinglenServers:nAddr: mongo.xxx.cz:27017, Type: Unknown, State: Connected, Average RTT: 0, Last error: connection(mongo.xxx.cz:27017[-38]) connection is closedn. Collection stats will not be collected for this collection. This log message will be suppressed from now." source="collections_status.go:168"
time="2020-06-03T20:53:03Z" level=error msg="Could not get MongoDB BuildInfo: server selection error: server selection timeoutncurrent topology: Type: SinglenServers:nAddr: mongo.xxx.cz:27017, Type: Unknown, State: Connected, Average RTT: 0, Last error: connection(mongo.xxx.cz:27017[-40]) connection is closedn!" source="connection.go:84"
time="2020-06-03T20:53:03Z" level=error msg="Problem gathering the mongo server version: server selection error: server selection timeoutncurrent topology: Type: SinglenServers:nAddr: mongo.xxx.cz:27017, Type: Unknown, State: Connected, Average RTT: 0, Last error: connection(mongo.xxx.cz:27017[-40]) connection is closedn" source="mongodb_collector.go:195"

 Error looks like this:

time=»2020-06-03T20:52:49Z» level=warning msg=»server selection error: server selection timeoutncurrent topology: Type: SinglenServers:nAddr: mongo.xxx.cz:27017, Type: Unknown, State: Connected, Average RTT: 0, Last error: connection(mongo.xxx.cz:27017[-38]) connection is closedn. Collection stats will not be collected for this collection. This log message will be suppressed from now.» source=»collections_status.go:168″

Corresponding metrics from the /metrics endpoin:

# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 1.6116e-05
go_gc_duration_seconds{quantile="0.25"} 1.9563e-05
go_gc_duration_seconds{quantile="0.5"} 3.3891e-05
go_gc_duration_seconds{quantile="0.75"} 0.000204658
go_gc_duration_seconds{quantile="1"} 0.000233363
go_gc_duration_seconds_sum 0.000612237
go_gc_duration_seconds_count 8
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 14
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.13.3"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 2.258424e+06
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 1.6415664e+07
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 1.447154e+06
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 272796
# HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started.
# TYPE go_memstats_gc_cpu_fraction gauge
go_memstats_gc_cpu_fraction 9.38559778626198e-06
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 2.38592e+06
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 2.258424e+06
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 6.1571072e+07
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 4.653056e+06
# HELP go_memstats_heap_objects Number of allocated objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 19264
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
# TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 6.1423616e+07
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 6.6224128e+07
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 1.591217542954138e+09
# HELP go_memstats_lookups_total Total number of pointer lookups.
# TYPE go_memstats_lookups_total counter
go_memstats_lookups_total 0
# HELP go_memstats_mallocs_total Total number of mallocs.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 292060
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 13888
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 16384
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 74120
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 98304
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 4.194304e+06
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 1.754118e+06
# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 884736
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 884736
# HELP go_memstats_sys_bytes Number of bytes obtained from system.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 7.2810744e+07
# HELP go_threads Number of OS threads created.
# TYPE go_threads gauge
go_threads 13
# HELP mongodb_exporter_build_info A metric with a constant '1' value labeled by version, revision, branch, and goversion from which mongodb_exporter was built.
# TYPE mongodb_exporter_build_info gauge
mongodb_exporter_build_info{branch="",goversion="go1.13.3",revision="",version=""} 1
# HELP mongodb_exporter_last_scrape_duration_seconds Duration of the last scrape of metrics from MongoDB.
# TYPE mongodb_exporter_last_scrape_duration_seconds gauge
mongodb_exporter_last_scrape_duration_seconds 30.000872336
# HELP mongodb_exporter_last_scrape_error Whether the last scrape of metrics from MongoDB resulted in an error (1 for error, 0 for success).
# TYPE mongodb_exporter_last_scrape_error gauge
mongodb_exporter_last_scrape_error 1
# HELP mongodb_exporter_scrape_errors_total Total number of times an error occurred scraping a MongoDB.
# TYPE mongodb_exporter_scrape_errors_total counter
mongodb_exporter_scrape_errors_total 2
# HELP mongodb_exporter_scrapes_total Total number of times MongoDB was scraped for metrics.
# TYPE mongodb_exporter_scrapes_total counter
mongodb_exporter_scrapes_total 4
# HELP mongodb_up Whether MongoDB is up.
# TYPE mongodb_up gauge
mongodb_up 0
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 0.83
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1.048576e+06
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 11
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 1.8292736e+07
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.59121725123e+09
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 1.21561088e+08
# HELP process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes.
# TYPE process_virtual_memory_max_bytes gauge
process_virtual_memory_max_bytes -1
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 3
# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
# TYPE promhttp_metric_handler_requests_total counter
promhttp_metric_handler_requests_total{code="200"} 0
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0

PyMongo testing script:

import pymongo
print('mongo test')

client = pymongo.MongoClient("mongodb:)
print(client.server_info())
print(client.admin.command('ismaster'))
print('buildInfo')
print(client.admin.command('buildInfo'))
print('mongo test end')

Script output:

python ./mongo-test 
mongo test
{u'storageEngines': [u'biggie', u'devnull', u'ephemeralForTest', u'wiredTiger'], u'maxBsonObjectSize': 16777216, u'ok': 1.0, u'bits': 64, u'modules': [], u'openssl': {u'compiled': u'OpenSSL 1.0.2g 1 Mar 2016', u'running': u'OpenSSL 1.0.2g 1 Mar 2016'}, u'javascriptEngine': u'mozjs', u'version': u'4.2.1', u'gitVersion': u'edf6d45851c0b9ee15548f0f847df141764a317e', u'versionArray': [4, 2, 1, 0], u'debug': False, u'$clusterTime': {u'clusterTime': Timestamp(1591220369, 1), u'signature': {u'keyId': 6780243778163703811L, u'hash': Binary('xcax9bmxa0xc9xfb(?zxb4+>xdcOxdcUx9fcxe0x00', 0)}}, u'buildEnvironment': {u'cxxflags': u'-Woverloaded-virtual -Wno-maybe-uninitialized -fsized-deallocation -std=c++17', u'cc': u'/opt/mongodbtoolchain/v3/bin/gcc: gcc (GCC) 8.2.0', u'linkflags': u'-pthread -Wl,-z,now -rdynamic -Wl,--fatal-warnings -fstack-protector-strong -fuse-ld=gold -Wl,--build-id -Wl,--hash-style=gnu -Wl,-z,noexecstack -Wl,--warn-execstack -Wl,-z,relro', u'distarch': u'x86_64', u'cxx': u'/opt/mongodbtoolchain/v3/bin/g++: g++ (GCC) 8.2.0', u'ccflags': u'-fno-omit-frame-pointer -fno-strict-aliasing -ggdb -pthread -Wall -Wsign-compare -Wno-unknown-pragmas -Winvalid-pch -Werror -O2 -Wno-unused-local-typedefs -Wno-unused-function -Wno-deprecated-declarations -Wno-unused-const-variable -Wno-unused-but-set-variable -Wno-missing-braces -fstack-protector-strong -fno-builtin-memcmp', u'target_arch': u'x86_64', u'distmod': u'ubuntu1604', u'target_os': u'linux'}, u'sysInfo': u'deprecated', u'operationTime': Timestamp(1591220369, 1), u'allocator': u'tcmalloc'}
{u'me': u'mongo1.xxx.cz:27017', u'ismaster': True, u'maxWriteBatchSize': 100000, u'ok': 1.0, u'setName': u'rs0', u'readOnly': False, u'maxWireVersion': 8, u'connectionId': 847787, u'primary': u'mongo1.xxx.cz:27017', u'$clusterTime': {u'clusterTime': Timestamp(1591220369, 1), u'signature': {u'keyId': 6780243778163703811L, u'hash': Binary('xcax9bmxa0xc9xfb(?zxb4+>xdcOxdcUx9fcxe0x00', 0)}}, u'logicalSessionTimeoutMinutes': 30, u'hosts': [u'mongo1.xxx.cz:27017', u'mongo2.xxx.cz:27017', u'mongo3.xxx.cz:27017'], u'maxMessageSizeBytes': 48000000, u'localTime': datetime.datetime(2020, 6, 3, 21, 39, 36, 858000), u'minWireVersion': 0, u'electionId': ObjectId('7fffffff0000000000000677'), u'maxBsonObjectSize': 16777216, u'lastWrite': {u'lastWriteDate': datetime.datetime(2020, 6, 3, 21, 39, 29), u'majorityWriteDate': datetime.datetime(2020, 6, 3, 21, 39, 29), u'opTime': {u'ts': Timestamp(1591220369, 1), u't': 1655L}, u'majorityOpTime': {u'ts': Timestamp(1591220369, 1), u't': 1655L}}, u'operationTime': Timestamp(1591220369, 1), u'setVersion': 4, u'secondary': False}
buildInfo
{u'storageEngines': [u'biggie', u'devnull', u'ephemeralForTest', u'wiredTiger'], u'maxBsonObjectSize': 16777216, u'ok': 1.0, u'bits': 64, u'modules': [], u'openssl': {u'compiled': u'OpenSSL 1.0.2g 1 Mar 2016', u'running': u'OpenSSL 1.0.2g 1 Mar 2016'}, u'javascriptEngine': u'mozjs', u'version': u'4.2.1', u'gitVersion': u'edf6d45851c0b9ee15548f0f847df141764a317e', u'versionArray': [4, 2, 1, 0], u'debug': False, u'$clusterTime': {u'clusterTime': Timestamp(1591220369, 1), u'signature': {u'keyId': 6780243778163703811L, u'hash': Binary('xcax9bmxa0xc9xfb(?zxb4+>xdcOxdcUx9fcxe0x00', 0)}}, u'buildEnvironment': {u'cxxflags': u'-Woverloaded-virtual -Wno-maybe-uninitialized -fsized-deallocation -std=c++17', u'cc': u'/opt/mongodbtoolchain/v3/bin/gcc: gcc (GCC) 8.2.0', u'linkflags': u'-pthread -Wl,-z,now -rdynamic -Wl,--fatal-warnings -fstack-protector-strong -fuse-ld=gold -Wl,--build-id -Wl,--hash-style=gnu -Wl,-z,noexecstack -Wl,--warn-execstack -Wl,-z,relro', u'distarch': u'x86_64', u'cxx': u'/opt/mongodbtoolchain/v3/bin/g++: g++ (GCC) 8.2.0', u'ccflags': u'-fno-omit-frame-pointer -fno-strict-aliasing -ggdb -pthread -Wall -Wsign-compare -Wno-unknown-pragmas -Winvalid-pch -Werror -O2 -Wno-unused-local-typedefs -Wno-unused-function -Wno-deprecated-declarations -Wno-unused-const-variable -Wno-unused-but-set-variable -Wno-missing-braces -fstack-protector-strong -fno-builtin-memcmp', u'target_arch': u'x86_64', u'distmod': u'ubuntu1604', u'target_os': u'linux'}, u'sysInfo': u'deprecated', u'operationTime': Timestamp(1591220369, 1), u'allocator': u'tcmalloc'}
mongo test end

MongoDB log contains entries with long running commands:

{{COMMAND [conn1962634] command admin.$cmd command: isMaster { ismaster: 1, client: { driver:

{ name: «PyMongo», version: «3.10.1» }

, os: { type: «Linux», name: «Linux», architecture: «x86_64», version: «4.15.0-58-generic» }, platform: «CPython 3.6.10.final.0» }, $db: «admin» } numYields:0 reslen:747 locks:{} protocol:op_query 10360ms}}

 And also this warnings:

2020-06-04T00:28:44.592+0200 W COMMAND [conn1966041] Unable to gather storage statistics for a slow operation due to lock aquire timeout

Trying to insert several JSON files to MongoDB collections using shell script as following,

#!/bin/bash

NUM=50000
for ((i=o;i<NUM;i++))
do
    mongoimport --host localhost --port 27018 -u 'admin' -p 'password' --authenticationDatabase 'admin' -d random_test -c tri_${i} /home/test/json_files/json_${i}.csv --jsonArray
done  

after several successful adding, these errors were shown on terminal

Failed: connection(localhost:27017[-3]), incomplete read of message header: EOF
error connecting to host: could not connect to server: 
server selection error: server selection timeout, 
current topology: { Type: Single, Servers: 
[{ Addr: localhost:27017, Type: Unknown, 
State: Connected, Average RTT: 0, Last error: connection() : 
dial tcp [::1]:27017: connect: connection refused }, ] }

And below the eoor messages from mongo.log, that said too many open files, can I somehow limit the thread number? or what should I do to fix it?? Thanks a lot!

2020-07-21T11:13:33.613+0200 E  STORAGE  [conn971] WiredTiger error (24) [1595322813:613873][53971:0x7f7c8d228700], WT_SESSION.create: __posix_directory_sync, 151: /home/mongodb/bin/data/db/index-969--7295385362343345274.wt: directory-sync: Too many open files Raw: [1595322813:613873][53971:0x7f7c8d228700], WT_SESSION.create: __posix_directory_sync, 151: /home/mongodb/bin/data/db/index-969--7295385362343345274.wt: directory-sync: Too many open files
2020-07-21T11:13:33.613+0200 E  STORAGE  [conn971] WiredTiger error (-31804) [1595322813:613892][53971:0x7f7c8d228700], WT_SESSION.create: __wt_panic, 490: the process must exit and restart: WT_PANIC: WiredTiger library panic Raw: [1595322813:613892][53971:0x7f7c8d228700], WT_SESSION.create: __wt_panic, 490: the process must exit and restart: WT_PANIC: WiredTiger library panic
2020-07-21T11:13:33.613+0200 F  -        [conn971] Fatal Assertion 50853 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 414
2020-07-21T11:13:33.613+0200 F  -        [conn971]

***aborting after fassert() failure

Posted By: Anonymous

I run MongoDB in a docker container like this

docker run --name mongo -d -p 27017:27107 mongo

Check with docker ps shows

77f1a11295c3 mongo "docker-entrypoint.s…" 20 minutes ago Up 20 minutes 27017/tcp, 0.0.0.0:27017->27107/tcp mongo

so it’s running with the port mapped correctly.

When I try to setup a connection via Intellij

Intellij

it fails with

com.mongodb.MongoTimeoutException: Timed out after 10000 ms while waiting to connect. Client view of cluster state is {type=UNKNOWN, servers=[{address=localhost:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoSocketReadException: Exception receiving message}, caused by {java.net.SocketException: Connection reset}}].

When I try to connect to connect from a golang web server

clientOptions := options.Client().ApplyURI(«mongodb://localhost:27017»)
client, err := mongo.Connect(context.TODO(), clientOptions)

I get this error and it shuts down the web server:

server selection error: server selection timeout, current topology: { Type: Unknown, Servers: [{ Addr: localhost:27017, Type: Unknown, Average RTT: 0, Last error: connection() error occured during connection handshake: connection(localhost:27017[-64]) incomplete read of message header: read tcp 127.0.0.1:40700->127.0.0.1:27017: read: connection reset by peer }, ] }

Is this a bug in the MongoDB docker image, or is there something else I need to do?

Solution

I think you have a typo mistake while creating a container. 27017 and 27107

docker run --name mongo -d -p 27017:27017 mongo

This is clearly visible in docker ps command.

Can you try and see if it solves the problem ?

Answered By: Anonymous

Related Articles

  • android studio 0.4.2: Gradle project sync failed error
  • How can I resolve Web Component Testing error?
  • Could not install Gradle distribution from…
  • How do SO_REUSEADDR and SO_REUSEPORT differ?
  • Eclipse will not start and I haven’t changed anything
  • Unable to run Robolectric and Espresso with a…
  • Javax.net.ssl.SSLHandshakeException:…
  • TLS 1.3 server socket with Java 11 and self-signed…
  • ClassNotFoundException:…
  • When I’m testing a web app by JUnit and Mockito I get many…

Disclaimer: This content is shared under creative common license cc-by-sa 3.0. It is generated from StackExchange Website Network.

Понравилась статья? Поделить с друзьями:
  • Server returned unexpected error 110 attempting to install package
  • Server returned invalid response altstore ошибка
  • Server returned http status 500 internal server error prometheus
  • Server returned http error 404 not found
  • Server returned handshake error handshake was canceled