NamedPipeClientStream.Connect causes CPU Hang. Is there a workaround?
PROBLEM DEFINITION
Under normal usage the code as written can cause livelock, where the CPU usage goes to 50% or higher per outstanding connection request — I have observed usage rates approaching 100%, depending on how many simultaneous connection attempts were in progress.
The snippet shown here is the relevant portion of the code in System.IO.Pipes (from Reflector).
public void Connect(int timeout)
{
// … other code here
int startTickCount = Environment.TickCount;
int elapsed = 0;
while ((timeout == -1) || ((elapsed = Environment.TickCount — startTickCount) < timeout))
{
if ( !UnsafeNativeMethods.WaitNamedPipe(this.m_normalizedPipePath, timeout — elapsed) )
{
int errorCode = Marshal.GetLastWin32Error();
if (errorCode == 2 ) // pipe does not exist
continue;
else if (errorCode == 0 ) // timeout occurred in WaitNamedPipe
break; // done
__Error.WinIOError(errorCode, string.Empty);
}
//… other code here…
return;
}
throw new TimeoutException();
}
CODE ANALYSIS
The source of the problem is the call to WaitNamedPipe(). The Win32 docuemntation states that it returns when an instance of the pipe becomes available or a timoeut occurs. However, the remarks section have an additional comment: «If no instances of the specified pipe exist, the WaitNamedPipe function returns immediately, regardless of time-out value.»
In other words, if a named pipe instance exists and a client attempts to connect, the method will wait until either the server accepts the connection request or it times out, so that a timeout will occur only if the server takes too long to wake up and accept the connection — this is the expected behavior.
But when the pipe does not exist at all WaitNamedPipe does not wait, it just returns immediately with an error code; perhaps the server has not started yet, or is not at a point where it can create a pipe.
When this occurs, the NamedPipeClientStream.Connect code goes into a tight loop that spins around calls to WaitNamedPipe; it continually polls WaitNamedPipe until the elapsed time exceeds the timeout value and the method throws a timeout exception; or if the timeout is specified as -1, it will spin until the process exits or the thread is aborted. This causes livelock/CPU hang.
The developer who wrote this obviously knew this could occur since the code explicitly handles that case, but it was not documented that it could behave this way.
Bottom line — perhaps I am mistaken, but from my perspective, when the API is used as-intended it produces undesirable results.
SOLUTIONS?
So, as to workarounds and other issues….
I’d prefer a solution where the connecting thread blocks (waits on an event) until either a pipe is available or a timeout occurs.
My current workaround is to call Connect with a short timeout period and then if Connect fails with a Timeout exception, to sleep for a longer period and then retry. This reduces the livelock problem to a shorter time but it does not eliminate it.
Is there a way to determine if a named pipe exists? If there was, then I could check if it existed before calling Connect, and if it did not, I could sleep for a while and check again. Since named pipes are implemented as a file system, perhaps a call to OpenFile? The ideal solution would be to block until the pipe (file) was created, and then wake the thread.
Another possibility is to call WaitNamedPipe with a timeout of 0 and if the return value is 0 use this to indicate that the pipe had not been created.
This might only work when the named pipe is on the same machine as the client, but since that’s my use case it’s the one I am most interested in.
Does anyone have any advice here?
As a suggestion to MSFT, I’d also prefer some other implementation options,
First, use an algorithm that does not cause livelock.
Second, if the pipe does not exist then return immediately with a suitable error code, or an exception, that makes it obvious what the problem is. This is a different case then the server existing but not responding in a timely manner.
Third, an option for returning a value of false when it times out rather than throw an exception, and return true if it succeeds. Timeouts here are not necessarily errors and should not throw exceptions.
Thanks
Below is a partial list of more common Linux and Windows operating system error codes.
The perror tool can be used to find the error message which is associated with a given error code.
Number | Error Code | Description |
---|---|---|
1 | EPERM | Operation not permitted |
2 | ENOENT | No such file or directory |
3 | ESRCH | No such process |
4 | EINTR | Interrupted system call |
5 | EIO | I/O error |
6 | ENXIO | No such device or address |
7 | E2BIG | Argument list too long |
8 | ENOEXEC | Exec format error |
9 | EBADF | Bad file number |
10 | ECHILD | No child processes |
11 | EAGAIN | Try again |
12 | ENOMEM | Out of memory |
13 | EACCES | Permission denied |
14 | EFAULT | Bad address |
15 | ENOTBLK | Block device required |
16 | EBUSY | Device or resource busy |
17 | EEXIST | File exists |
18 | EXDEV | Cross-device link |
19 | ENODEV | No such device |
20 | ENOTDIR | Not a directory |
21 | EISDIR | Is a directory |
22 | EINVAL | Invalid argument |
23 | ENFILE | File table overflow |
24 | EMFILE | Too many open files |
25 | ENOTTY | Not a typewriter |
26 | ETXTBSY | Text file busy |
27 | EFBIG | File too large |
28 | ENOSPC | No space left on device |
29 | ESPIPE | Illegal seek |
30 | EROFS | Read-only file system |
31 | EMLINK | Too many links |
32 | EPIPE | Broken pipe |
33 | EDOM | Math argument out of domain of func |
34 | ERANGE | Math result not representable |
35 | EDEADLK | Resource deadlock would occur |
36 | ENAMETOOLONG | File name too long |
37 | ENOLCK | No record locks available |
38 | ENOSYS | Function not implemented |
39 | ENOTEMPTY | Directory not empty |
40 | ELOOP | Too many symbolic links encountered |
42 | ENOMSG | No message of desired type |
43 | EIDRM | Identifier removed |
44 | ECHRNG | Channel number out of range |
45 | EL2NSYNC | Level 2 not synchronized |
46 | EL3HLT | Level 3 halted |
47 | EL3RST | Level 3 reset |
48 | ELNRNG | Link number out of range |
49 | EUNATCH | Protocol driver not attached |
50 | ENOCSI | No CSI structure available |
51 | EL2HLT | Level 2 halted |
52 | EBADE | Invalid exchange |
53 | EBADR | Invalid request descriptor |
54 | EXFULL | Exchange full |
55 | ENOANO | No anode |
56 | EBADRQC | Invalid request code |
57 | EBADSLT | Invalid slot |
59 | EBFONT | Bad font file format |
60 | ENOSTR | Device not a stream |
61 | ENODATA | No data available |
62 | ETIME | Timer expired |
63 | ENOSR | Out of streams resources |
64 | ENONET | Machine is not on the network |
65 | ENOPKG | Package not installed |
66 | EREMOTE | Object is remote |
67 | ENOLINK | Link has been severed |
68 | EADV | Advertise error |
69 | ESRMNT | Srmount error |
70 | ECOMM | Communication error on send |
71 | EPROTO | Protocol error |
72 | EMULTIHOP | Multihop attempted |
73 | EDOTDOT | RFS specific error |
74 | EBADMSG | Not a data message |
75 | EOVERFLOW | Value too large for defined data type |
76 | ENOTUNIQ | Name not unique on network |
77 | EBADFD | File descriptor in bad state |
78 | EREMCHG | Remote address changed |
79 | ELIBACC | Can not access a needed shared library |
80 | ELIBBAD | Accessing a corrupted shared library |
81 | ELIBSCN | .lib section in a.out corrupted |
82 | ELIBMAX | Attempting to link in too many shared libraries |
83 | ELIBEXEC | Cannot exec a shared library directly |
84 | EILSEQ | Illegal byte sequence |
85 | ERESTART | Interrupted system call should be restarted |
86 | ESTRPIPE | Streams pipe error |
87 | EUSERS | Too many users |
88 | ENOTSOCK | Socket operation on non-socket |
89 | EDESTADDRREQ | Destination address required |
90 | EMSGSIZE | Message too long |
91 | EPROTOTYPE | Protocol wrong type for socket |
92 | ENOPROTOOPT | Protocol not available |
93 | EPROTONOSUPPORT | Protocol not supported |
94 | ESOCKTNOSUPPORT | Socket type not supported |
95 | EOPNOTSUPP | Operation not supported on transport endpoint |
96 | EPFNOSUPPORT | Protocol family not supported |
97 | EAFNOSUPPORT | Address family not supported by protocol |
98 | EADDRINUSE | Address already in use |
99 | EADDRNOTAVAIL | Cannot assign requested address |
100 | ENETDOWN | Network is down |
101 | ENETUNREACH | Network is unreachable |
102 | ENETRESET | Network dropped connection because of reset |
103 | ECONNABORTED | Software caused connection abort |
104 | ECONNRESET | Connection reset by peer |
105 | ENOBUFS | No buffer space available |
106 | EISCONN | Transport endpoint is already connected |
107 | ENOTCONN | Transport endpoint is not connected |
108 | ESHUTDOWN | Cannot send after transport endpoint shutdown |
109 | ETOOMANYREFS | Too many references: cannot splice |
110 | ETIMEDOUT | Connection timed out |
111 | ECONNREFUSED | Connection refused |
112 | EHOSTDOWN | Host is down |
113 | EHOSTUNREACH | No route to host |
114 | EALREADY | Operation already in progress |
115 | EINPROGRESS | Operation now in progress |
116 | ESTALE | Stale NFS file handle |
117 | EUCLEAN | Structure needs cleaning |
118 | ENOTNAM | Not a XENIX named type file |
119 | ENAVAIL | No XENIX semaphores available |
120 | EISNAM | Is a named type file |
121 | EREMOTEIO | Remote I/O error |
122 | EDQUOT | Quota exceeded |
123 | ENOMEDIUM | No medium found |
124 | EMEDIUMTYPE | Wrong medium type |
125 | ECANCELED | Operation Canceled |
126 | ENOKEY | Required key not available |
127 | EKEYEXPIRED | Key has expired |
128 | EKEYREVOKED | Key has been revoked |
129 | EKEYREJECTED | Key was rejected by service |
130 | EOWNERDEAD | Owner died |
131 | ENOTRECOVERABLE | State not recoverable |
Number | Error Code | Description |
---|---|---|
1 | ERROR_INVALID_FUNCTION | Incorrect function. |
2 | ERROR_FILE_NOT_FOUND | The system cannot find the file specified. |
3 | ERROR_PATH_NOT_FOUND | The system cannot find the path specified. |
4 | ERROR_TOO_MANY_OPEN_FILES | The system cannot open the file. |
5 | ERROR_ACCESS_DENIED | Access is denied. |
6 | ERROR_INVALID_HANDLE | The handle is invalid. |
7 | ERROR_ARENA_TRASHED | The storage control blocks were destroyed. |
8 | ERROR_NOT_ENOUGH_MEMORY | Not enough storage is available to process this command. |
9 | ERROR_INVALID_BLOCK | The storage control block address is invalid. |
10 | ERROR_BAD_ENVIRONMENT | The environment is incorrect. |
11 | ERROR_BAD_FORMAT | An attempt was made to load a program with an incorrect format. |
12 | ERROR_INVALID_ACCESS | The access code is invalid. |
13 | ERROR_INVALID_DATA | The data is invalid. |
14 | ERROR_OUTOFMEMORY | Not enough storage is available to complete this operation. |
15 | ERROR_INVALID_DRIVE | The system cannot find the drive specified. |
16 | ERROR_CURRENT_DIRECTORY | The directory cannot be removed. |
17 | ERROR_NOT_SAME_DEVICE | The system cannot move the file to a different disk drive. |
18 | ERROR_NO_MORE_FILES | There are no more files. |
19 | ERROR_WRITE_PROTECT | The media is write protected. |
20 | ERROR_BAD_UNIT | The system cannot find the device specified. |
21 | ERROR_NOT_READY | The device is not ready. |
22 | ERROR_BAD_COMMAND | The device does not recognize the command. |
23 | ERROR_CRC | Data error (cyclic redundancy check). |
24 | ERROR_BAD_LENGTH | The program issued a command but the command length is incorrect. |
25 | ERROR_SEEK | The drive cannot locate a specific area or track on the disk. |
26 | ERROR_NOT_DOS_DISK | The specified disk or diskette cannot be accessed. |
27 | ERROR_SECTOR_NOT_FOUND | The drive cannot find the sector requested. |
28 | ERROR_OUT_OF_PAPER | The printer is out of paper. |
29 | ERROR_WRITE_FAULT | The system cannot write to the specified device. |
30 | ERROR_READ_FAULT | The system cannot read from the specified device. |
31 | ERROR_GEN_FAILURE | A device attached to the system is not functioning. |
32 | ERROR_SHARING_VIOLATION | The process cannot access the file because it is being used by another process. |
33 | ERROR_LOCK_VIOLATION | The process cannot access the file because another process has locked a portion of the file. |
34 | ERROR_WRONG_DISK | The wrong diskette is in the drive. Insert %2 (Volume Serial Number: %3) into drive %1. |
36 | ERROR_SHARING_BUFFER_EXCEEDED | Too many files opened for sharing. |
38 | ERROR_HANDLE_EOF | Reached the end of the file. |
39 | ERROR_HANDLE_DISK_FULL | The disk is full. |
87 | ERROR_INVALID_PARAMETER | The parameter is incorrect. |
112 | ERROR_DISK_FULL | The disk is full. |
123 | ERROR_INVALID_NAME | The file name, directory name, or volume label syntax is incorrect. |
1450 | ERROR_NO_SYSTEM_RESOURCES | Insufficient system resources exist to complete the requested service. |