Socket error 146 - Исправление ошибок и поиск оптимальных решений проблем

Life has many great mysteries. For the MySQL DBA, one of life’s greatest mysteries is your question. It’s been this way going on 12 years. In fact, I found a bug report that was resolved when the disappearance of mysql.sock was caused by trying to start mysqld with mysqld already running (a.k.a. Shooting mysqld in the Foot).

I have personally addressed workarounds to this situation many times in the StackExchange:

Jul 02, 2013 : MySQL crash. Unknown cause. Signal 11 (ServerFault)
Apr 22, 2013 : /usr/libexec/mysqld: Normal shutdown, but my team doesn’t do that?
Mar 06, 2013 : How to properly kill MySQL?
Feb 28, 2013 : mysql restart won’t kill child processes on CentOS
Dec 14, 2012 : Percona-server time out on /etc/init.d/mysql start (ServerFault)
May 08, 2012 : How to properly stop MySQL server on Mac OS X?

The mystery has to do with mysql.sock (the MySQL Socket file) disappearing. Once this happens, you cannot connect to mysqld using root@localhost. In fact, no user defined with host='localhost' in the mysql.user table can connect. TCP/IP connections can still be used. The workaround is to connect to mysqld using TCP/IP.

Try create the user root@'127.0.0.1'. From the mysql client you then do this:

mysql -uroot -p -h127.0.0.1 --protocol=tcp

The only way to bring back mysql.sock is to restart mysqld WHEN mysqld IS NOT RUNNING.

You should make sure you explicitly define the mysql.sock file in /etc/my.cnf

[mysqld]
socket=/var/run/mysqld/mysql.sock

Of course, go create the folder

mkdir /var/run/mysqld
chmod 777 /var/run/mysqld

Give it a Try !!!

UPDATE 2013-12-06 10:17 EST

Based on your comments…

I included the log, please let me know if you know how to increase the detail

You are using MySQL 5.0.27, which means you are using a very old mysqld (with a truck load of bugs) and an old InnoDB Storage Engine. There may not be additional detail available. I also noted you are using a Source Distribution rather than RPM installed binaries.

bring back mysql.sock is to restart mysqld. This does not happen for me. I never have to do anything. The problem lasts a second and goes away.

If the problem remains for only a second or two, I can see what is happening. When you run service mysql start, you are actually running mysqld_safe. At the bottom of mysqld_safe, there is an infinite loop. In the infinite loop, it calls mysqld and waits for a return value. Certain return values from mysqld will make mysqld_safe terminate (normal shutdown,condition where mysqld can’t start again). Other return values will cause mysqld_safe to loop and try mysqld again.

SUGGESTION

You should upgrade to MySQL 5.6 to get the latest version of mysqld and the InnoDB Storage Engine. If the problem still persists, at least there will be less baggage due to better quality MySQL binaries. That way, you can rule out mysqld and look for other external factors. Also, you should download RPM binaries rather than compiling source because RPM version usually have more optimizations in place.

UPDATE 2013-12-06 10:39 EST

If you look back at the two links at the top of my answer

LINK#1 to Bug Report
LINK#2 to Bug Report

you will learn that the first link references MySQL 4.0.20, and the second link was from a bug report thread that said the version in question was MySQL 3.22.32 (the final release of MySQL 3 was 3.23.58)

Issues with mysql.sock has been around 13 years now due to shooting yourself in the foot (starting mysqld even though it is running), internal factors (very old version or clunky version of MySQL) or some external condition (lack of RAM or other OS resources).

Then, it looks like you found an internal factor. To verify that mysqld is correcting itself, run SHOW GLOBAL VARIABLES LIKE 'uptime';. If the number continues increasing and never resets, then you can rightly blame the clunky MySQL binary for not being more elaborate. MySQL AB was probably solving some internal issue and may have left some debug statements or trace points in the binary

UPDATE 2013-12-06 11:24 EST

Just looked inside mysqld_safe. There is code to print mysqld ended and mysqld restarted. If mysqld did indeed terminate, mysqld_safe would have logged it. If mysqld was restarted by mysqld_safe, the fresh/new instance of mysqld would print out the log information including version number. Since the version number was printed only once, it is clear that mysqld was never ended and never restarted. – George Bailey 13 mins ago

Just FYI, I found out that 146 is ECONNREFUSED, I edited it in to my question. – George Bailey 11 mins ago

Then, it looks like you found an internals issue. The source-compiled MySQL 5.0.27 mysqld may be just printing debug messages after an internal recovery. To verify mysqld is OK in terms of being up, just run SHOW GLOBAL VARIABLES LIKE 'uptime';. If the number does not reset and just keeps increasing, then mysqld is correcting itself from within. Given the old build you are using and the way it was compiled, I would not expect mysqld to be more forthcoming on its own.

You should see if mysqld was started with debug enabled. If not, try setting debug in my.cnfand see if it can get more verbose about its internals. I hope this reveals what the problem may be.

Источник

The numbers in parentheses are almost certainly system error numbers, normally reported via errno, the definitions for which are found via #include <errno.h> though on Solaris the numbers are usually in /usr/include/sys/errno.h (but can be in other places, especially on Linux and Mac OS X). You could write a simple program to see the 3 errors.

#include <stdio.h>
#include <string.h>

int main(void)
{
    puts(strerror(2));
    puts(strerror(95));
    puts(strerror(146));
    return 0;
}

Conjecture:
2 is probably ENOENT, no such file or directory; 95 may be ENOTSOCK (not a socket); 146 might be ENOTSUPP (operation not supported).

George Bailey confirms:

On my system, the answer was in /usr/include/sys/errno.h:

2=ENOENT

95=ENOTSOCK

146=ECONNREFUSED

Note that error numbers up to the mid-twenties tend to be consistent across systems as the error codes existed in 7th Edition Unix. Higher numbers diverge. For example, on Mac OS X 10.9:

2 (ENOENT): No such file or directory
95 (EMULTIHOP): Reserved
errno: no message for errno = 146
ENOTSOCK (38): Socket operation on non-socket
ECONNREFUSED (61): Connection refused

On SuSE (SLES 10 SP2 — antique, but these numbers don’t change much):

2 (ENOENT): No such file or directory
95 (EOPNOTSUPP): Operation not supported on transport endpoint
errno: no message for errno = 146
ENOTSOCK (88): Socket operation on non-socket
ECONNREFUSED (111): Connection refused

These answers were obtained via a program errno that reports on error numbers and names. It has to be compiled for each different system.

Note that there is a consistent MySQL-provided component to the messages:

Can't connect to local MySQL server through socket '/dev/null' (95)

roughly as if the format string for the printf() statement was:

"Can't connect to local MySQL server through socket '%s' (%s)n"

The name of the ‘socket’ file is being provided — very helpful — and (educated guess) the system error number, collected at some point from errno. However, errno is volatile — almost any library function may set it to a non-zero value — so you need to preserve a specific value (copy it) before doing much in the way of error reporting work, such as reading message files to get the correct translation of the format string.

Источник

0 = Success 1 = Operation not permitted 2 = No such file or directory 3 = No such process 4 = Interrupted system call 5 = Input/output error 6 = No such device or address 7 = Argument list too long 8 = Exec format error 9 = Bad file descriptor 10 = No child processes 11 = Resource temporarily unavailable 12 = Cannot allocate memory 13 = Permission denied 14 = Bad address 15 = Block device required 16 = Device or resource busy 17 = File exists 18 = Invalid cross-device link 19 = No such device 20 = Not a directory 21 = Is a directory 22 = Invalid argument 23 = Too many open files in system 24 = Too many open files 25 = Inappropriate ioctl for device 26 = Text file busy 27 = File too large 28 = No space left on device 29 = Illegal seek 30 = Read-only file system 31 = Too many links 32 = Broken pipe 33 = Numerical argument out of domain 34 = Numerical result out of range 35 = Resource deadlock avoided 36 = File name too long 37 = No locks available 38 = Function not implemented 39 = Directory not empty 40 = Too many levels of symbolic links 41 = Unknown error 41 42 = No message of desired type 43 = Identifier removed 44 = Channel number out of range 45 = Level 2 not synchronized 46 = Level 3 halted 47 = Level 3 reset 48 = Link number out of range 49 = Protocol driver not attached 50 = No CSI structure available 51 = Level 2 halted 52 = Invalid exchange 53 = Invalid request descriptor 54 = Exchange full 55 = No anode 56 = Invalid request code 57 = Invalid slot 58 = Unknown error 58 59 = Bad font file format 60 = Device not a stream 61 = No data available 62 = Timer expired 63 = Out of streams resources 64 = Machine is not on the network 65 = Package not installed 66 = Object is remote 67 = Link has been severed 68 = Advertise error 69 = Srmount error 70 = Communication error on send 71 = Protocol error 72 = Multihop attempted 73 = RFS specific error 74 = Bad message 75 = Value too large for defined data type 76 = Name not unique on network 77 = File descriptor in bad state 78 = Remote address changed 79 = Can not access a needed shared library 80 = Accessing a corrupted shared library 81 = .lib section in a.out corrupted 82 = Attempting to link in too many shared libraries 83 = Cannot exec a shared library directly 84 = Invalid or incomplete multibyte or wide character 85 = Interrupted system call should be restarted 86 = Streams pipe error 87 = Too many users 88 = Socket operation on non-socket 89 = Destination address required 90 = Message too long 91 = Protocol wrong type for socket 92 = Protocol not available 93 = Protocol not supported 94 = Socket type not supported 95 = Operation not supported 96 = Protocol family not supported 97 = Address family not supported by protocol 98 = Address already in use 99 = Cannot assign requested address 100 = Network is down 101 = Network is unreachable 102 = Network dropped connection on reset 103 = Software caused connection abort 104 = Connection reset by peer 105 = No buffer space available 106 = Transport endpoint is already connected 107 = Transport endpoint is not connected 108 = Cannot send after transport endpoint shutdown 109 = Too many references: cannot splice 110 = Connection timed out 111 = Connection refused 112 = Host is down 113 = No route to host 114 = Operation already in progress 115 = Operation now in progress 116 = Stale NFS file handle 117 = Structure needs cleaning 118 = Not a XENIX named type file 119 = No XENIX semaphores available 120 = Is a named type file 121 = Remote I/O error 122 = Disk quota exceeded 123 = No medium found 124 = Wrong medium type

Источник

Содержание

How Socket Error Codes Depend on Runtime and Operating System
Digging into the problem
SocketErrorCode
NativeErrorCode
ErrorCode
Writing cross-platform socket error handling
Overview of the native error codes
connect() — Unix, Linux System Call
SYNOPSIS
DESCRIPTION
RETURN VALUE
ERRORS
CONFORMING TO
socket() — Unix, Linux System Call
SYNOPSIS
DESCRIPTION

How Socket Error Codes Depend on Runtime and Operating System

Rider consists of several processes that send messages to each other via sockets. To ensure the reliability of the whole application, it’s important to properly handle all the socket errors. In our codebase, we had the following code which was adopted from Mono Debugger Libs and helps us communicate with debugger processes:

In the case of a failed connection because of a “ConnectionRefused” error, we are retrying the connection attempt. It works fine with .NET Framework and Mono. However, once we migrated to .NET Core, this method no longer correctly detects the “connection refused” situation on Linux and macOS. If we open the SocketException documentation, we will learn that this class has three different properties with error codes:

SocketError SocketErrorCode : Gets the error code that is associated with this exception.
int ErrorCode : Gets the error code that is associated with this exception.
int NativeErrorCode : Gets the Win32 error code associated with this exception.

What’s the difference between these properties? Should we expect different values on different runtimes or different operating systems? Which one should we use in production? Why do we have problems with ShouldRetryConnection on .NET Core? Let’s figure it all out!

Digging into the problem

If we run it on Windows, we will get the same value on .NET Framework, Mono, and .NET Core:

SocketErrorCode	ErrorCode	NativeErrorCode
.NET Framework	10061	10061	10061
Mono	10061	10061	10061
.NET Core	10061	10061	10061

10061 corresponds to the code of the connection refused socket error code in Windows (also known as WSAECONNREFUSED ). Now let’s run the same program on Linux:

SocketErrorCode	ErrorCode	NativeErrorCode
Mono	10061	10061	10061
.NET Core	10061	111	111

As you can see, Mono returns Windows-compatible error codes. The situation with .NET Core is different: it returns a Windows-compatible value for SocketErrorCode (10061) and a Linux-like value for ErrorCode and NativeErrorCode (111). Finally, let’s check macOS:

SocketErrorCode	ErrorCode	NativeErrorCode
Mono	10061	10061	10061
.NET Core	10061	61	61

Here, Mono is completely Windows-compatible again, but .NET Core returns 61 for ErrorCode and NativeErrorCode . In the IBM Knowledge Center, we can find a few more values for the connection refused error code from the Unix world (also known as ECONNREFUSED ):

AIX: 79
HP-UX: 239
Solaris: 146

For a better understanding of what’s going on, let’s check out the source code of all the properties.

SocketErrorCode

These values correspond to the Windows Sockets Error Codes.

NativeErrorCode

In .NET Core, the native code is calculated in the constructor (see SocketException.cs#L20):

The Windows implementation of GetNativeErrorForSocketError is trivial (see SocketException.Windows.cs):

The Unix implementation is more complicated (see SocketException.Unix.cs):

TryGetNativeErrorForSocketError should convert SocketError to the native Unix error code. Unfortunately, there exists no unequivocal mapping between Windows and Unix error codes. As such, the .NET team decided to create a Dictionary that maps error codes in the best possible way (see SocketErrorPal.Unix.cs):

Once we have an instance of Interop.Error , we call interopErr.Info().RawErrno . The implementation of RawErrno can be found in Interop.Errors.cs:

Here we are jumping to the native function SystemNative_ConvertErrorPalToPlatform that maps Error to the native integer code that is defined in errno.h. You can get all the values using the errno util. Here is a typical output on Linux:

Note that errno may be not available by default in your Linux distro. For example, on Debian, you should call sudo apt-get install moreutils to get this utility. Here is a typical output on macOS:

Hooray! We’ve finished our fascinating journey into the internals of socket error codes. Now you know where .NET is getting the native error code for each SocketException from!

ErrorCode

Writing cross-platform socket error handling

There was a lot of work involved in tracking down the error code to check against, but in the end, our code is much more readable now. Adding to that, this method is now also completely cross-platform, and works correctly on any runtime.

Overview of the native error codes

We executed this program on Windows, Linux, and macOS. Here are the aggregated results:

Источник

connect() — Unix, Linux System Call

connect — initiate a connection on a socket

SYNOPSIS

DESCRIPTION

The connect() system call connects the socket referred to by the file descriptor sockfd to the address specified by serv_addr. The addrlen argument specifies the size of serv_addr. The format of the address in serv_addr is determined by the address space of the socket sockfd; see socket(2) for further details.

If the socket sockfd is of type SOCK_DGRAM then serv_addr is the address to which datagrams are sent by default, and the only address from which datagrams are received. If the socket is of type SOCK_STREAM or SOCK_SEQPACKET, this call attempts to make a connection to the socket that is bound to the address specified by serv_addr.

Generally, connection-based protocol sockets may successfully connect() only once; connectionless protocol sockets may use connect() multiple times to change their association. Connectionless sockets may dissolve the association by connecting to an address with the sa_family member of sockaddr set to AF_UNSPEC.

RETURN VALUE

If the connection or binding succeeds, zero is returned. On error, -1 is returned, and errno is set appropriately.

ERRORS

The following are general socket errors only. There may be other domain-specific error codes.

Error Code	Description
EACCES	For Unix domain sockets, which are identified by pathname: Write permission is denied on the socket file, or search permission is denied for one of the directories in the path prefix. (See also path_resolution(2).)
EACCES, EPERM	The user tried to connect to a broadcast address without having the socket broadcast flag enabled or the connection request failed because of a local firewall rule.
EADDRINUSE	Local address is already in use.
EAFNOSUPPORT	The passed address didnt have the correct address family in its sa_family field.
EADDRNOTAVAIL	Non-existent interface was requested or the requested address was not local.
EALREADY	The socket is non-blocking and a previous connection attempt has not yet been completed.
EBADF	The file descriptor is not a valid index in the descriptor table.
ECONNREFUSED	No one listening on the remote address.
EFAULT	The socket structure address is outside the users address space.
EINPROGRESS	The socket is non-blocking and the connection cannot be completed immediately. It is possible to select(2) or poll(2) for completion by selecting the socket for writing. After select(2) indicates writability, use getsockopt(2) to read the SO_ERROR option at level SOL_SOCKET to determine whether connect() completed successfully (SO_ERROR is zero) or unsuccessfully (SO_ERROR is one of the usual error codes listed here, explaining the reason for the failure).
EINTR	The system call was interrupted by a signal that was caught.
EISCONN	The socket is already connected.
ENETUNREACH	Network is unreachable.
ENOTSOCK	The file descriptor is not associated with a socket.
ETIMEDOUT	Timeout while attempting connection. The server may be too busy to accept new connections. Note that for IP sockets the timeout may be very long when syncookies are enabled on the server.

CONFORMING TO

SVr4, 4.4BSD (the connect() function first appeared in 4.2BSD).

The third argument of connect() is in reality an int (and this is what 4.x BSD and libc4 and libc5 have). Some POSIX confusion resulted in the present socklen_t, also used by glibc. See also accept(2).

Unconnecting a socket by calling connect() with a AF_UNSPEC address is not yet implemented.

Источник

socket() — Unix, Linux System Call

SYNOPSIS

DESCRIPTION

The domain parameter specifies a communication domain; this selects the protocol family which will be used for communication. These families are defined in . The currently understood formats include:

Name	Purpose	Man page
Local communication
IPv4 Internet protocols
IPv6 Internet protocols
IPX — Novell protocols
Kernel user interface device
ITU-T X.25 / ISO-8208 protocol
Amateur radio AX.25 protocol
Access to raw ATM PVCs
Appletalk
Low level packet interface

The socket has the indicated type, which specifies the communication semantics. Currently defined types are:

Tag	Description
SOCK_STREAM
Provides sequenced, reliable, two-way, connection-based byte streams. An out-of-band data transmission mechanism may be supported.
SOCK_DGRAM
Supports datagrams (connectionless, unreliable messages of a fixed maximum length).
SOCK_SEQPACKET
Provides a sequenced, reliable, two-way connection-based data transmission path for datagrams of fixed maximum length; a consumer is required to read an entire packet with each read system call.
SOCK_RAW
Provides raw network protocol access.
SOCK_RDM
Provides a reliable datagram layer that does not guarantee ordering.
SOCK_PACKET
Obsolete and should not be used in new programs; see packet(7).

Some socket types may not be implemented by all protocol families; for example, SOCK_SEQPACKET is not implemented for AF_INET.

The protocol specifies a particular protocol to be used with the socket. Normally only a single protocol exists to support a particular socket type within a given protocol family, in which case protocol can be specified as 0. However, it is possible that many protocols may exist, in which case a particular protocol must be specified in this manner. The protocol number to use is specific to the communication domain in which communication is to take place; see protocols(5). See getprotoent(3) on how to map protocol name strings to protocol numbers.

Sockets of type SOCK_STREAM are full-duplex byte streams, similar to pipes. They do not preserve record boundaries. A stream socket must be in a connected state before any data may be sent or received on it. A connection to another socket is created with a connect(2) call. Once connected, data may be transferred using read(2) and write(2) calls or some variant of the send(2) and recv(2) calls. When a session has been completed a close(2) may be performed. Out-of-band data may also be transmitted as described in send(2) and received as described in recv(2).

The communications protocols which implement a SOCK_STREAM ensure that data is not lost or duplicated. If a piece of data for which the peer protocol has buffer space cannot be successfully transmitted within a reasonable length of time, then the connection is considered to be dead. When SO_KEEPALIVE is enabled on the socket the protocol checks in a protocol-specific manner if the other end is still alive. A SIGPIPE signal is raised if a process sends or receives on a broken stream; this causes naive processes, which do not handle the signal, to exit. SOCK_SEQPACKET sockets employ the same system calls as SOCK_STREAM sockets. The only difference is that read(2) calls will return only the amount of data requested, and any data remaining in the arriving packet will be discarded. Also all message boundaries in incoming datagrams are preserved.

SOCK_DGRAM and SOCK_RAW sockets allow sending of datagrams to correspondents named in sendto(2) calls. Datagrams are generally received with recvfrom(2), which returns the next datagram along with the address of its sender.

SOCK_PACKET is an obsolete socket type to receive raw packets directly from the device driver. Use packet(7) instead.

An fcntl(2) F_SETOWN operation can be used to specify a process or process group to receive a SIGURG signal when the out-of-band data arrives or SIGPIPE signal when a SOCK_STREAM connection breaks unexpectedly. This operation may also be used to set the process or process group that receives the I/O and asynchronous notification of I/O events via SIGIO. Using F_SETOWN is equivalent to an ioctl(2) call with the FIOSETOWN or SIOCSPGRP argument.

When the network signals an error condition to the protocol module (e.g., using a ICMP message for IP) the pending error flag is set for the socket. The next operation on this socket will return the error code of the pending error. For some protocols it is possible to enable a per-socket error queue to retrieve detailed information about the error; see IP_RECVERR in ip(7) .

The operation of sockets is controlled by socket level options. These options are defined in . The functions setsockopt(2) and getsockopt(2) are used to set and get options, respectively.

Источник

UPDATE 2013-12-06 10:17 EST

SUGGESTION

UPDATE 2013-12-06 10:39 EST

UPDATE 2013-12-06 11:24 EST

How Socket Error Codes Depend on Runtime and Operating System

Digging into the problem

SocketErrorCode

NativeErrorCode

ErrorCode

Writing cross-platform socket error handling

Overview of the native error codes

connect() — Unix, Linux System Call

SYNOPSIS

DESCRIPTION

RETURN VALUE

ERRORS

CONFORMING TO

socket() — Unix, Linux System Call

SYNOPSIS

DESCRIPTION

Читайте также: