Life has many great mysteries. For the MySQL DBA, one of life’s greatest mysteries is your question. It’s been this way going on 12 years. In fact, I found a bug report that was resolved when the disappearance of mysql.sock
was caused by trying to start mysqld with mysqld already running (a.k.a. Shooting mysqld in the Foot).
I have personally addressed workarounds to this situation many times in the StackExchange:
Jul 02, 2013
: MySQL crash. Unknown cause. Signal 11 (ServerFault)Apr 22, 2013
: /usr/libexec/mysqld: Normal shutdown, but my team doesn’t do that?Mar 06, 2013
: How to properly kill MySQL?Feb 28, 2013
: mysql restart won’t kill child processes on CentOSDec 14, 2012
: Percona-server time out on /etc/init.d/mysql start (ServerFault)May 08, 2012
: How to properly stop MySQL server on Mac OS X?
The mystery has to do with mysql.sock
(the MySQL Socket file) disappearing. Once this happens, you cannot connect to mysqld using root@localhost
. In fact, no user defined with host='localhost'
in the mysql.user
table can connect. TCP/IP connections can still be used. The workaround is to connect to mysqld using TCP/IP.
Try create the user root@'127.0.0.1'
. From the mysql client you then do this:
mysql -uroot -p -h127.0.0.1 --protocol=tcp
The only way to bring back mysql.sock
is to restart mysqld WHEN mysqld IS NOT RUNNING.
You should make sure you explicitly define the mysql.sock
file in /etc/my.cnf
[mysqld]
socket=/var/run/mysqld/mysql.sock
Of course, go create the folder
mkdir /var/run/mysqld
chmod 777 /var/run/mysqld
Give it a Try !!!
UPDATE 2013-12-06 10:17 EST
Based on your comments…
I included the log, please let me know if you know how to increase the detail
You are using MySQL 5.0.27, which means you are using a very old mysqld (with a truck load of bugs) and an old InnoDB Storage Engine. There may not be additional detail available. I also noted you are using a Source Distribution rather than RPM installed binaries.
bring back mysql.sock is to restart mysqld. This does not happen for me. I never have to do anything. The problem lasts a second and goes away.
If the problem remains for only a second or two, I can see what is happening. When you run service mysql start, you are actually running mysqld_safe. At the bottom of mysqld_safe, there is an infinite loop. In the infinite loop, it calls mysqld and waits for a return value. Certain return values from mysqld will make mysqld_safe terminate (normal shutdown,condition where mysqld can’t start again). Other return values will cause mysqld_safe to loop and try mysqld again.
SUGGESTION
You should upgrade to MySQL 5.6 to get the latest version of mysqld
and the InnoDB Storage Engine. If the problem still persists, at least there will be less baggage due to better quality MySQL binaries. That way, you can rule out mysqld and look for other external factors. Also, you should download RPM binaries rather than compiling source because RPM version usually have more optimizations in place.
UPDATE 2013-12-06 10:39 EST
If you look back at the two links at the top of my answer
- LINK#1 to Bug Report
- LINK#2 to Bug Report
you will learn that the first link references MySQL 4.0.20, and the second link was from a bug report thread that said the version in question was MySQL 3.22.32 (the final release of MySQL 3 was 3.23.58)
Issues with mysql.sock
has been around 13 years now due to shooting yourself in the foot (starting mysqld even though it is running), internal factors (very old version or clunky version of MySQL) or some external condition (lack of RAM or other OS resources).
Then, it looks like you found an internal factor. To verify that mysqld is correcting itself, run SHOW GLOBAL VARIABLES LIKE 'uptime';
. If the number continues increasing and never resets, then you can rightly blame the clunky MySQL binary for not being more elaborate. MySQL AB was probably solving some internal issue and may have left some debug statements or trace points in the binary
UPDATE 2013-12-06 11:24 EST
Just looked inside mysqld_safe. There is code to print mysqld ended and mysqld restarted. If mysqld did indeed terminate, mysqld_safe would have logged it. If mysqld was restarted by mysqld_safe, the fresh/new instance of mysqld would print out the log information including version number. Since the version number was printed only once, it is clear that mysqld was never ended and never restarted. – George Bailey 13 mins ago
Just FYI, I found out that 146 is ECONNREFUSED, I edited it in to my question. – George Bailey 11 mins ago
Then, it looks like you found an internals issue. The source-compiled MySQL 5.0.27 mysqld may be just printing debug messages after an internal recovery. To verify mysqld
is OK in terms of being up, just run SHOW GLOBAL VARIABLES LIKE 'uptime';
. If the number does not reset and just keeps increasing, then mysqld is correcting itself from within. Given the old build you are using and the way it was compiled, I would not expect mysqld to be more forthcoming on its own.
You should see if mysqld was started with debug enabled. If not, try setting debug in my.cnf
and see if it can get more verbose about its internals. I hope this reveals what the problem may be.
The numbers in parentheses are almost certainly system error numbers, normally reported via errno
, the definitions for which are found via #include <errno.h>
though on Solaris the numbers are usually in /usr/include/sys/errno.h
(but can be in other places, especially on Linux and Mac OS X). You could write a simple program to see the 3 errors.
#include <stdio.h>
#include <string.h>
int main(void)
{
puts(strerror(2));
puts(strerror(95));
puts(strerror(146));
return 0;
}
Conjecture:
2 is probably ENOENT, no such file or directory; 95 may be ENOTSOCK (not a socket); 146 might be ENOTSUPP (operation not supported).
George Bailey confirms:
On my system, the answer was in
/usr/include/sys/errno.h
:
- 2=ENOENT
- 95=ENOTSOCK
- 146=ECONNREFUSED
Note that error numbers up to the mid-twenties tend to be consistent across systems as the error codes existed in 7th Edition Unix. Higher numbers diverge. For example, on Mac OS X 10.9:
- 2 (ENOENT): No such file or directory
- 95 (EMULTIHOP): Reserved
- errno: no message for errno = 146
- ENOTSOCK (38): Socket operation on non-socket
- ECONNREFUSED (61): Connection refused
On SuSE (SLES 10 SP2 — antique, but these numbers don’t change much):
- 2 (ENOENT): No such file or directory
- 95 (EOPNOTSUPP): Operation not supported on transport endpoint
- errno: no message for errno = 146
- ENOTSOCK (88): Socket operation on non-socket
- ECONNREFUSED (111): Connection refused
These answers were obtained via a program errno
that reports on error numbers and names. It has to be compiled for each different system.
Note that there is a consistent MySQL-provided component to the messages:
Can't connect to local MySQL server through socket '/dev/null' (95)
roughly as if the format string for the printf()
statement was:
"Can't connect to local MySQL server through socket '%s' (%s)n"
The name of the ‘socket’ file is being provided — very helpful — and (educated guess) the system error number, collected at some point from errno
. However, errno
is volatile — almost any library function may set it to a non-zero value — so you need to preserve a specific value (copy it) before doing much in the way of error reporting work, such as reading message files to get the correct translation of the format string.
Содержание
- How Socket Error Codes Depend on Runtime and Operating System
- Digging into the problem
- SocketErrorCode
- NativeErrorCode
- ErrorCode
- Writing cross-platform socket error handling
- Overview of the native error codes
- connect() — Unix, Linux System Call
- SYNOPSIS
- DESCRIPTION
- RETURN VALUE
- ERRORS
- CONFORMING TO
- socket() — Unix, Linux System Call
- SYNOPSIS
- DESCRIPTION
How Socket Error Codes Depend on Runtime and Operating System
Rider consists of several processes that send messages to each other via sockets. To ensure the reliability of the whole application, it’s important to properly handle all the socket errors. In our codebase, we had the following code which was adopted from Mono Debugger Libs and helps us communicate with debugger processes:
In the case of a failed connection because of a “ConnectionRefused” error, we are retrying the connection attempt. It works fine with .NET Framework and Mono. However, once we migrated to .NET Core, this method no longer correctly detects the “connection refused” situation on Linux and macOS. If we open the SocketException documentation, we will learn that this class has three different properties with error codes:
- SocketError SocketErrorCode : Gets the error code that is associated with this exception.
- int ErrorCode : Gets the error code that is associated with this exception.
- int NativeErrorCode : Gets the Win32 error code associated with this exception.
What’s the difference between these properties? Should we expect different values on different runtimes or different operating systems? Which one should we use in production? Why do we have problems with ShouldRetryConnection on .NET Core? Let’s figure it all out!
Digging into the problem
If we run it on Windows, we will get the same value on .NET Framework, Mono, and .NET Core:
SocketErrorCode | ErrorCode | NativeErrorCode | |
.NET Framework | 10061 | 10061 | 10061 |
Mono | 10061 | 10061 | 10061 |
.NET Core | 10061 | 10061 | 10061 |
10061 corresponds to the code of the connection refused socket error code in Windows (also known as WSAECONNREFUSED ). Now let’s run the same program on Linux:
SocketErrorCode | ErrorCode | NativeErrorCode | |
Mono | 10061 | 10061 | 10061 |
.NET Core | 10061 | 111 | 111 |
As you can see, Mono returns Windows-compatible error codes. The situation with .NET Core is different: it returns a Windows-compatible value for SocketErrorCode (10061) and a Linux-like value for ErrorCode and NativeErrorCode (111). Finally, let’s check macOS:
SocketErrorCode | ErrorCode | NativeErrorCode | |
Mono | 10061 | 10061 | 10061 |
.NET Core | 10061 | 61 | 61 |
Here, Mono is completely Windows-compatible again, but .NET Core returns 61 for ErrorCode and NativeErrorCode . In the IBM Knowledge Center, we can find a few more values for the connection refused error code from the Unix world (also known as ECONNREFUSED ):
- AIX: 79
- HP-UX: 239
- Solaris: 146
For a better understanding of what’s going on, let’s check out the source code of all the properties.
SocketErrorCode
These values correspond to the Windows Sockets Error Codes.
NativeErrorCode
In .NET Core, the native code is calculated in the constructor (see SocketException.cs#L20):
The Windows implementation of GetNativeErrorForSocketError is trivial (see SocketException.Windows.cs):
The Unix implementation is more complicated (see SocketException.Unix.cs):
TryGetNativeErrorForSocketError should convert SocketError to the native Unix error code. Unfortunately, there exists no unequivocal mapping between Windows and Unix error codes. As such, the .NET team decided to create a Dictionary that maps error codes in the best possible way (see SocketErrorPal.Unix.cs):
Once we have an instance of Interop.Error , we call interopErr.Info().RawErrno . The implementation of RawErrno can be found in Interop.Errors.cs:
Here we are jumping to the native function SystemNative_ConvertErrorPalToPlatform that maps Error to the native integer code that is defined in errno.h. You can get all the values using the errno util. Here is a typical output on Linux:
Note that errno may be not available by default in your Linux distro. For example, on Debian, you should call sudo apt-get install moreutils to get this utility. Here is a typical output on macOS:
Hooray! We’ve finished our fascinating journey into the internals of socket error codes. Now you know where .NET is getting the native error code for each SocketException from!
ErrorCode
Writing cross-platform socket error handling
There was a lot of work involved in tracking down the error code to check against, but in the end, our code is much more readable now. Adding to that, this method is now also completely cross-platform, and works correctly on any runtime.
Overview of the native error codes
We executed this program on Windows, Linux, and macOS. Here are the aggregated results:
Источник
connect() — Unix, Linux System Call
connect — initiate a connection on a socket
SYNOPSIS
DESCRIPTION
The connect() system call connects the socket referred to by the file descriptor sockfd to the address specified by serv_addr. The addrlen argument specifies the size of serv_addr. The format of the address in serv_addr is determined by the address space of the socket sockfd; see socket(2) for further details.
If the socket sockfd is of type SOCK_DGRAM then serv_addr is the address to which datagrams are sent by default, and the only address from which datagrams are received. If the socket is of type SOCK_STREAM or SOCK_SEQPACKET, this call attempts to make a connection to the socket that is bound to the address specified by serv_addr.
Generally, connection-based protocol sockets may successfully connect() only once; connectionless protocol sockets may use connect() multiple times to change their association. Connectionless sockets may dissolve the association by connecting to an address with the sa_family member of sockaddr set to AF_UNSPEC.
RETURN VALUE
If the connection or binding succeeds, zero is returned. On error, -1 is returned, and errno is set appropriately.
ERRORS
The following are general socket errors only. There may be other domain-specific error codes.
Error Code | Description |
---|---|
EACCES | For Unix domain sockets, which are identified by pathname: Write permission is denied on the socket file, or search permission is denied for one of the directories in the path prefix. (See also path_resolution(2).) |
EACCES, EPERM | The user tried to connect to a broadcast address without having the socket broadcast flag enabled or the connection request failed because of a local firewall rule. |
EADDRINUSE | Local address is already in use. |
EAFNOSUPPORT | The passed address didnt have the correct address family in its sa_family field. |
EADDRNOTAVAIL | Non-existent interface was requested or the requested address was not local. |
EALREADY | The socket is non-blocking and a previous connection attempt has not yet been completed. |
EBADF | The file descriptor is not a valid index in the descriptor table. |
ECONNREFUSED | No one listening on the remote address. |
EFAULT | The socket structure address is outside the users address space. |
EINPROGRESS | The socket is non-blocking and the connection cannot be completed immediately. It is possible to select(2) or poll(2) for completion by selecting the socket for writing. After select(2) indicates writability, use getsockopt(2) to read the SO_ERROR option at level SOL_SOCKET to determine whether connect() completed successfully (SO_ERROR is zero) or unsuccessfully (SO_ERROR is one of the usual error codes listed here, explaining the reason for the failure). |
EINTR | The system call was interrupted by a signal that was caught. |
EISCONN | The socket is already connected. |
ENETUNREACH | Network is unreachable. |
ENOTSOCK | The file descriptor is not associated with a socket. |
ETIMEDOUT | Timeout while attempting connection. The server may be too busy to accept new connections. Note that for IP sockets the timeout may be very long when syncookies are enabled on the server. |
CONFORMING TO
SVr4, 4.4BSD (the connect() function first appeared in 4.2BSD).
The third argument of connect() is in reality an int (and this is what 4.x BSD and libc4 and libc5 have). Some POSIX confusion resulted in the present socklen_t, also used by glibc. See also accept(2).
Unconnecting a socket by calling connect() with a AF_UNSPEC address is not yet implemented.
Источник
socket() — Unix, Linux System Call
SYNOPSIS
DESCRIPTION
The domain parameter specifies a communication domain; this selects the protocol family which will be used for communication. These families are defined in . The currently understood formats include:
Name | Purpose | Man page |
Local communication | ||
IPv4 Internet protocols | ||
IPv6 Internet protocols | ||
IPX — Novell protocols | ||
Kernel user interface device | ||
ITU-T X.25 / ISO-8208 protocol | ||
Amateur radio AX.25 protocol | ||
Access to raw ATM PVCs | ||
Appletalk | ||
Low level packet interface |
The socket has the indicated type, which specifies the communication semantics. Currently defined types are:
Tag | Description |
---|---|
SOCK_STREAM | |
Provides sequenced, reliable, two-way, connection-based byte streams. An out-of-band data transmission mechanism may be supported. | |
SOCK_DGRAM | |
Supports datagrams (connectionless, unreliable messages of a fixed maximum length). | |
SOCK_SEQPACKET | |
Provides a sequenced, reliable, two-way connection-based data transmission path for datagrams of fixed maximum length; a consumer is required to read an entire packet with each read system call. | |
SOCK_RAW | |
Provides raw network protocol access. | |
SOCK_RDM | |
Provides a reliable datagram layer that does not guarantee ordering. | |
SOCK_PACKET | |
Obsolete and should not be used in new programs; see packet(7). |
Some socket types may not be implemented by all protocol families; for example, SOCK_SEQPACKET is not implemented for AF_INET.
The protocol specifies a particular protocol to be used with the socket. Normally only a single protocol exists to support a particular socket type within a given protocol family, in which case protocol can be specified as 0. However, it is possible that many protocols may exist, in which case a particular protocol must be specified in this manner. The protocol number to use is specific to the communication domain in which communication is to take place; see protocols(5). See getprotoent(3) on how to map protocol name strings to protocol numbers.
Sockets of type SOCK_STREAM are full-duplex byte streams, similar to pipes. They do not preserve record boundaries. A stream socket must be in a connected state before any data may be sent or received on it. A connection to another socket is created with a connect(2) call. Once connected, data may be transferred using read(2) and write(2) calls or some variant of the send(2) and recv(2) calls. When a session has been completed a close(2) may be performed. Out-of-band data may also be transmitted as described in send(2) and received as described in recv(2).
The communications protocols which implement a SOCK_STREAM ensure that data is not lost or duplicated. If a piece of data for which the peer protocol has buffer space cannot be successfully transmitted within a reasonable length of time, then the connection is considered to be dead. When SO_KEEPALIVE is enabled on the socket the protocol checks in a protocol-specific manner if the other end is still alive. A SIGPIPE signal is raised if a process sends or receives on a broken stream; this causes naive processes, which do not handle the signal, to exit. SOCK_SEQPACKET sockets employ the same system calls as SOCK_STREAM sockets. The only difference is that read(2) calls will return only the amount of data requested, and any data remaining in the arriving packet will be discarded. Also all message boundaries in incoming datagrams are preserved.
SOCK_DGRAM and SOCK_RAW sockets allow sending of datagrams to correspondents named in sendto(2) calls. Datagrams are generally received with recvfrom(2), which returns the next datagram along with the address of its sender.
SOCK_PACKET is an obsolete socket type to receive raw packets directly from the device driver. Use packet(7) instead.
An fcntl(2) F_SETOWN operation can be used to specify a process or process group to receive a SIGURG signal when the out-of-band data arrives or SIGPIPE signal when a SOCK_STREAM connection breaks unexpectedly. This operation may also be used to set the process or process group that receives the I/O and asynchronous notification of I/O events via SIGIO. Using F_SETOWN is equivalent to an ioctl(2) call with the FIOSETOWN or SIOCSPGRP argument.
When the network signals an error condition to the protocol module (e.g., using a ICMP message for IP) the pending error flag is set for the socket. The next operation on this socket will return the error code of the pending error. For some protocols it is possible to enable a per-socket error queue to retrieve detailed information about the error; see IP_RECVERR in ip(7) .
The operation of sockets is controlled by socket level options. These options are defined in . The functions setsockopt(2) and getsockopt(2) are used to set and get options, respectively.
Источник