Error reading ssh protocol banner paramiko - Исправление ошибок и поиск оптимальных решений проблем

Recently, I made a code that connect to work station with different usernames (thanks to a private key) based on paramiko.

I never had any issues with it, but today, I have that : SSHException: Error reading SSH protocol banner

This is strange because it happens randomly on any connections. Is there any way to fix it ?

asked Sep 1, 2014 at 15:36

It depends on what you mean by «fix». The underlying cause, as pointed out in the comments, are congestion/lack of resources. In that way, it’s similar to some HTTP codes. That’s the normal cause, it could be that the ssh server is returning the wrong header data.

429 Too Many Requests, tells the client to use rate limiting, or sometimes APIs will return 503 in a similar way, if you exceed your quota. The idea being, to try again later, with a delay.

You can attempt to handle this exception in your code, wait a little while, and try again. You can also edit your transport.py file, to set the banner timeout to something higher. If you have an application where it doesn’t matter how quickly the server responds, you could set this to 60 seconds.

EDIT:
Editing your transport file is no longer needed
as per Greg’s answer. When you call connect, you can pass a banner_timeout (which solves this issue), a timeout (for the underlying TCP), and an auth_timeout (waiting for authentication response). Greg’s answer has a code example with banner_timeout that you can directly lift.

answered Mar 24, 2015 at 4:57

TinBaneTinBane

85711 silver badges19 bronze badges

Adding to TinBane’s answers, suggesting to edit transport.py: you don’t have to do that anymore.

Since Paramiko v. 1.15.0, released in 2015, (this PR, to be precise) you can configure that value when creating Paramiko connection, like this:

client = SSHClient()
client.connect('ssh.example.com', banner_timeout=200)

In the current version of Paramiko as of writing these words, v. 2.7.1, you have 2 more timeouts that you can configure when calling connect method, for these 3 in total (source):

banner_timeout — an optional timeout (in seconds) to wait for the SSH banner to be presented.
timeout — an optional timeout (in seconds) for the TCP connect
auth_timeout — an optional timeout (in seconds) to wait for an authentication response.

answered Dec 23, 2019 at 10:37

Greg DubickiGreg Dubicki

5,4172 gold badges54 silver badges64 bronze badges

When changing the timeout value (as TinBane mentioned) in the transport.py file from 15 to higher the issue resolved partially. that is at line #484:

self.banner_timeout = 200 # It was 15

However, to resolve it permanently I added a static line to transport.py to declare the new higher value at the _check_banner(self): function.

Here is specifically the change:

It was like this:


 def _check_banner(self):
        for i in range(100):
            if i == 0:
                timeout = self.banner_timeout
            else:
                timeout = 2

After the permanent change became like this:


 def _check_banner(self):
        for i in range(100):
            if i == 0:
                timeout = self.banner_timeout
                timeout = 200 # <<<< Here is the explicit declaration 
            else:
                timeout = 2

answered Nov 25, 2019 at 21:03

paramiko seems to raise this error when I pass a non-existent filename to kwargs>key_filename. I’m sure there are other situations where this exception is raised nonsensically.

answered Jun 16, 2021 at 21:53

jberrymanjberryman

16.2k5 gold badges42 silver badges81 bronze badges

I had this issue with 12 parallel (12 threads) connections via single bastion.
As I had to solve it «quick and dirty» I’ve added a sleep time.

for target in targets:
    deployer.deploy_target(target, asynchronous=True)

Changed to:

for target in targets:
    deployer.deploy_target(target, asynchronous=True)
    time.sleep(5)

This works for me.
As well I’ve added a banner_timeout as was suggested above to make it more reliable.

client.connect(bastion_address, banner_timeout=60)

answered Oct 5, 2021 at 9:02

HoreyHorey

212 bronze badges

I’m very new to this so I doubt I’m really qualified to answer anyone’s questions, however I feel I may offer a simple solution to this issue.

After i migrated from 1 machine to another, all my scripts that worked perfectly prior all stopped working with the following error after the move:

Exception: Error reading SSH protocol banner
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/paramiko/transport.py", line 2211, in _check_banner

I tried as many people suggested above of manually amending the transport.py file but all that happened for me was it took 60 seconds to timeout rather than the default 15.

Anyway i noticed that my new machine was running a slightly older version on paramiko so I simply upgraded this and it worked.

pip3 -U paramiko

answered Aug 27, 2021 at 10:34

Well, I was also getting this with one of the juniper devices. The timeout didn’t help at all. When I used pyex with this, it created multiple ssh/netconf sessions with juniper box. Once I changed the «set system services ssh connection-limit 10» From 5, it started working.

answered Apr 14, 2022 at 11:55

In my case, to speed up the connection download rate I have created a multiprocessing Pool of 10 processes, so 10 new ssh paramiko connections were active at the same time and downloading the data at the same time.
When I increased this number to 20, I started getting this error.
So, probably some congestion and too many connections active on one machine.

answered Jun 7, 2022 at 8:43

anicicnanicicn

1836 silver badges15 bronze badges

Источник

@ktbyers, I gave your solution a try but that doesn’t seem to solve my problem. Thanks to pkapp on IRC I was able to debug a bit further what’s going on.

I started by activating the debug logs but paramiko isn’t very chatty about what it does under the hood unfortunately.

import logging
logging.basicConfig(level=logging.DEBUG)

These are the only things paramiko sends me back before throwing the traceback at me.

DEBUG:paramiko.transport:starting thread (client mode): 0xb4c74668
DEBUG:paramiko.transport:Local version/idstring: SSH-2.0-paramiko_1.16.0
ERROR:paramiko.transport:Exception: Error reading SSH protocol banner

I also learned that the %h %p won’t be automatically used by paramiko when passing them as a string to a ProxyCommand. (Even though it does seem to be working on my system, that may be the problem) Also, the nc approach looks like it works better than the OpenSSH -W flag. So my actual ProxyCommand then looked like this :

cmd = "ssh {}@{} nc {} 22".format(host_cfg.get('user'), host_cfg.get('hostname'), destination_ip)
# cmd is now "ssh root@jump_ip nc dest_ip 22" where jump_ip and dest_ip are valid IPs
sock = ProxyCommand(cmd)

Still getting the same error though, so it didn’t come from there. I added a time.sleep right after the call to ProxyCommand, checked my proxy’s logs and the returned stdout and stderr from the subprocess like this :

sock = ProxyCommand(cmd)
print(sock.process.poll())
print(sock.process.stdout.read())
print(sock.process.stderr.read())

This code yells the following output :

None
b'SSH-2.0-OpenSSH_6.7p1 Debian-5+deb8u1rn'
b''

While the time.sleep or the reading of stdout/stderr is active, the connection on my proxy is kept open. (Of course I removed these before going any further because I’m not supposed to read directly from the process) What I don’t understand is why that _check_banner function fails although the stdout of the socket is clearly beginning with SSH-…
On the other hand, as soon as the client.connect(...) is called, the connection is immediatly destroyed on my proxy. I now need a way to investigate why the connection fails this way.

For those who wants more information, here is the line in paramiko that causes that error : transport.py:1858

(Thanks again pkapp for all the help on IRC o/)

Источник

Содержание

Paramiko Error reading SSH protocol banner : SSH to a Cisco IOS Device #965
Comments
Exception: Error reading SSH protocol banner #1507
Comments
Error reading SSH protocol banner with paramiko with eventlet.spawn #337
Comments
Error reading banner from freshly started ssh server #1192
Comments

I’m trying to use ansible to run a very basic script which is to SSH to a switch and run a show version.
I’m using the standard ios_command module which is from ansible.

by default it uses Paramiko as ssh agent and I’m getting the following Error :

paramiko.transport starting thread (client mode): 0x78c7f250L
paramiko.transport Local version/idstring: SSH-2.0-paramiko_2.1.2
paramiko.transport Banner: C6509#SSH-2.0-paramiko_2.1.2
paramiko.transport Banner: Translating «SSH-2.0-paramiko_2.1.2»
paramiko.transport Banner: % Unknown command or computer name, or unable to find computer address
paramiko.transport Banner: C6509#
paramiko.transport Exception: Error reading SSH protocol banner
paramiko.transport Traceback (most recent call last):
paramiko.transport File «/usr/local/lib/python2.7/site-packages/paramiko/transport.py», line 1749, in run
paramiko.transport self._check_banner()
paramiko.transport File «/usr/local/lib/python2.7/site-packages/paramiko/transport.py», line 1897, in _check_banner
paramiko.transport raise SSHException(‘Error reading SSH protocol banner’ + str(e))
paramiko.transport SSHException: Error reading SSH protocol banner

I have tried to do some chnages and troubleshooting on paramiko packet.py and transport.py but couldnt get that working.

can anyone help me what is this SSH protocol banner checking is and how to solve this?

Thanks
Reza Toghraee

The text was updated successfully, but these errors were encountered:

I know I’ve seen this before but cannot find the ticket offhand ; you may want to try searching the other tickets. That said, this (targeting non POSIX ssh servers) is a common issue that we don’t have a lot of resources for fixing unfortunately. Leaving open for now but may close as dupe later.

Don’t have time to test this out right now, but it looks like the TCP connection remote side is just a plain telnet connection, not consistent with the opening protocol exchange for an ssh transport, where both client and server expect to send/receive a plaintext SSH version exchange (see RFC4253).

It looks like the remote side is issuing an input prompt («C6509#») and taking the text sent by the client as a command to invoke, sending an error message and reprompting («C6509#») that the paramiko client side can’t find anything looking like a SSH version string, and aborting the connection.

Can you double check that you can successfully use a standard ssh client to connect to the hostname (and port) that you are using under Ansible? Does the Ansible ios_command module have support for telnet connections, in addition to ssh connections?

What I found today was that I have the same issue and same debug log as sent by Kaag on my Cisco Catalyst 6500 , running IOS s72033-advipservicesk9-mz.151-2.SY10

However, today I tried to connect a Cisco 3850 running cat3k_caa-universalk9.16.03.02.SPA
interestingly it worked on Cisco 3850.

I captured both paramiko logs on 6500 and 3850 to compare :

============6500============
paramiko.transport: Connected (version 2.0, client Cisco-1.25)
paramiko.transport: kex algos:[u’diffie-hellman-group-exchange-sha1′, u’diffie-hellman-group14-sha1′, u’diffie-hellman-group1-sha1′] server key:[u’ssh-rsa’] client encrypt:[u’aes128-cbc’, u’3des-cbc’, u’aes192-cbc’, u’aes256-cbc’] server encrypt:[u’aes128-cbc’, u’3des-cbc’, u’aes192-cbc’, u’aes256-cbc’] client mac:[u’hmac-sha1′, u’hmac-sha1-96′, u’hmac-md5′, u’hmac-md5-96′] server mac:[u’hmac-sha1′, u’hmac-sha1-96′, u’hmac-md5′, u’hmac-md5-96′] client compress:[u’none’] server compress:[u’none’] client lang:[u»] server lang:[u»] kex follows?False
paramiko.transport: Kex agreed: diffie-hellman-group1-sha1
paramiko.transport: Cipher agreed: aes128-cbc
paramiko.transport: MAC agreed: hmac-md5
paramiko.transport: Compression agreed: none
paramiko.transport: kex engine KexGroup1 specified hash_algo
paramiko.transport: Switch to new keys .
paramiko.transport: Trying key 542f7f11dcaafae42ec947dbf96bac97 from /root/.ssh/id_rsa
paramiko.transport: userauth is OK
paramiko.transport: Exception: Illegal info request from server
paramiko.transport: Traceback (most recent call last):
paramiko.transport: File «/usr/local/lib/python2.7/site-packages/paramiko/transport.py», line 1800, in run
paramiko.transport: self.auth_handler._handler_table[ptype](self.auth_handler, m)
paramiko.transport: File «/usr/local/lib/python2.7/site-packages/paramiko/auth_handler.py», line 575, in _parse_userauth_info_request
paramiko.transport: raise SSHException(‘Illegal info request from server’)
paramiko.transport: SSHException: Illegal info request from server
paramiko.transport:
paramiko.transport: Trying SSH agent key 542f7f11dcaafae42ec947dbf96bac97
paramiko.transport: Trying discovered key 542f7f11dcaafae42ec947dbf96bac97 in /root/.ssh/id_rsa

============3850============
paramiko.transport: starting thread (client mode): 0xdeb37810L
paramiko.transport: Local version/idstring: SSH-2.0-paramiko_2.1.2
paramiko.transport: Remote version/idstring: SSH-2.0-Cisco-1.25
paramiko.transport: Connected (version 2.0, client Cisco-1.25)
paramiko.transport: kex algos:[u’diffie-hellman-group-exchange-sha1′, u’diffie-hellman-group14-sha1′] server key:[u’ssh-rsa’] client encrypt:[u’aes128-ctr’, u’aes192-ctr’, u’aes256-ctr’, u’aes128-cbc’, u’3des-cbc’, u’aes192-cbc’, u’aes256-cbc’] server encrypt:[u’aes128-ctr’, u’aes192-ctr’, u’aes256-ctr’, u’aes128-cbc’, u’3des-cbc’, u’aes192-cbc’, u’aes256-cbc’] client mac:[u’hmac-sha1′, u’hmac-sha1-96′] server mac:[u’hmac-sha1′, u’hmac-sha1-96′] client compress:[u’none’] server compress:[u’none’] client lang:[u»] server lang:[u»] kex follows?False
paramiko.transport: Kex agreed: diffie-hellman-group14-sha1
paramiko.transport: Cipher agreed: aes128-ctr
paramiko.transport: MAC agreed: hmac-sha1-96
paramiko.transport: Compression agreed: none
paramiko.transport: kex engine KexGroup14 specified hash_algo
paramiko.transport: Switch to new keys .
paramiko.transport: Trying key 542f7f11dcaafae42ec947dbf96bac97 from /root/.ssh/id_rsa
paramiko.transport: userauth is OK
paramiko.transport: Authentication (publickey) successful!
paramiko.transport: EOF in transport thread

The difference between the 2 is :

3850 : Kex agreed: diffie-hellman-group14-sha1
6500: Kex agreed: diffie-hellman-group1-sha1

3850: Cipher agreed: aes128-ctr
6500: Cipher agreed: aes128-cbc

3550: MAC agreed: hmac-sha1-96
6500: MAC agreed: hmac-md5

This seems to be related to Cisco IOS implementation of SSH on 6500 or 3560

6509#sh ip ssh
SSH Enabled — version 2.0
Authentication timeout: 120 secs; Authentication retries: 3
Minimum expected Diffie Hellman key size : 1024 bits

3850#sh ip ssh
SSH Enabled — version 2.0
Authentication methods:publickey,keyboard-interactive,password
Authentication Publickey Algorithms:x509v3-ssh-rsa,ssh-rsa
Hostkey Algorithms:x509v3-ssh-rsa,ssh-rsa
Encryption Algorithms:aes128-ctr,aes192-ctr,aes256-ctr,aes128-cbc,3des-cbc,aes192-cbc,aes256-cbc
MAC Algorithms:hmac-sha1,hmac-sha1-96
KEX Algorithms:diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1
Authentication timeout: 120 secs; Authentication retries: 3
Minimum expected Diffie Hellman key size : 2048 bits

cat test_paramiko_cisco.py
import logging
import paramiko

logging.getLogger(«paramiko»).setLevel(logging.DEBUG)
ssh = paramiko.SSHClient()
ssh.load_system_host_keys(‘/root/.ssh/known_hosts’)
ssh.set_missing_host_key_policy(paramiko.WarningPolicy())
paramiko.util.log_to_file(«/root/paramiko.log»)
ssh.connect(‘SWITCH_IP’, username=’USERNAME’,password=’KEY_PASS_PHRASE’, key_filename=’/root/.ssh/id_rsa’, allow_agent=’False’)
remote_conn=ssh.invoke_shell()
remote_conn.send(«show runn»)
output=remote_conn.recv(5000)
print output

I’m still looking forward to see how to fix the issue with IOS / 6500 .

Источник

I BELIEVE THIS IS A KNOWN «ISSUE» / OCCURRENCE — BUT I COULD NOT FIND A THREAD WITH THE SOLUTION / WORK AROUND IN.
I have a simple script that the aim is to ssh remote access all the devices in a simple device file — if I am able to reach Privilege Exec mode. I print a «Login Successful!» message, if not. I have the simple Error Handling option (for now).
The first 2 devices are known working devices. and the script returns my printed message.
However, the 3rd devices is a device I know I cannot SSH to (this is to test my Error Handling). and the 4th device is another device that is working as expected.
Below is the output I get from my script —
For the 3rd device. I can see my Error Handling working («error message. «), and the script moves onto the 4th device. however I still get the traceback error above the output.
I was wondering if this is bug / not possible to remove. or if there is a way to remove this traceback error??
(below the output is a copy of the full script)

Connecting to Device .

Connecting to Device 1.1.1.1.

Exception: Error reading SSH protocol banner
Traceback (most recent call last):
File «/usr/lib/python3.6/site-packages/paramiko/transport.py», line 2138, in _check_banner
buf = self.packetizer.readline(timeout)
File «/usr/lib/python3.6/site-packages/paramiko/packet.py», line 367, in readline
buf += self._read_timeout(timeout)
File «/usr/lib/python3.6/site-packages/paramiko/packet.py», line 576, in _read_timeout
raise socket.timeout()
socket.timeout

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File «/usr/lib/python3.6/site-packages/paramiko/transport.py», line 1966, in run
self._check_banner()
File «/usr/lib/python3.6/site-packages/paramiko/transport.py», line 2143, in _check_banner
«Error reading SSH protocol banner» + str(e)
paramiko.ssh_exception.SSHException: Error reading SSH protocol banner

error message.
Connecting to Device .

/.ssh/config’ > try: net_connect = ConnectHandler(**ios_device) net_connect.enable() print(‘Login Successful! ‘ + ‘n’) net_connect.disconnect() ##— Error Handling except Exception: print(‘error message. ‘) continue»>

The text was updated successfully, but these errors were encountered:

Источник

I’ve got SSHException: Error reading SSH protocol banner exception when I tried to spawn ssh connection with eventlet. You can run the following code to repeat this error:

The text was updated successfully, but these errors were encountered:

I’m seeing this with when using a ProxyCommand socket. Paramiko hangs at this line until the timeout is reached, then I see the SSH protocol banner error:
https://github.com/paramiko/paramiko/blob/752507a29/paramiko/transport.py#L1700

This also works:

If there are any workarounds, please let me know.

In the failure cases above, the CPU spikes during the 15 seconds until the banner timeout. During that time, paramiko is stuck in a loop between Packetizer._read_timeout() and ProxyCommand.recv() :

No idea why I marked this as «needs investigation» before — we have never supported eventlet/greenlets 🙁 That’s not to say we wouldn’t like to at some point but it’s more of a feature request than an actual bug in my opinion.

I’ve opened a new proper feature ticket for this (so hopefully anybody searching will see that it’s an unimplemented feature) and will link the two together for posterity.

Greenthread/eventlet/gevent has never really been supported by paramiko. Since paramiko is written to use threads as its concurrency model there might be several weird raises with how eventlet interacts with our threading model.

Источник

I have a paramiko client which reboots a linux server which is then reprovisioned using IPXE. I then create a new client, and poll the host with a new client waiting for the server to come back up. Just after the ssh server on my reprovisioned server starts. I noticed that I get an exception, which I which I can’t catch because it’s in a different thread, printing out on the logger that: SSHException(‘Error reading SSH protocol banner[WinError 10053] An established connection was aborted by the software in your host machine’,)

This correlates to the _check_banner function inside of the ssh transport class.

Of course I catch an issue with the client later on and then retry the connection to the host which eventually works after all services on the server are up and running (maybe 1 or 2 minutes later). It also works flawlessly after making a stable connection. This is more of an issue while connecting to a new server while it’s booting up.

I have ran through the code several times (It’s a race condition), but can’t tell if the client.connect call which generates the error succeeds, but I do know that subsequent test calls to ensure that I have a valid connection fail.
Correct me if I am wrong, but if the transport has started, it means that a connection was established to the ssh server, correct?
If this is indeed the case, there may be some additional checks from the client connection code that needs to be added so that an exception is raised in the connect method before t.start_client is called in the connect method for the client.
I would make the changes myself at this point and submit a merge request, but my paramiko-FU is still way too low to know what to check in code and how to check it.

NOTE: This is a linux server that I am connecting to. Not a Cisco switch and not a weird implementation of a windows ssh server.

P.S. Thank you in advance.

The text was updated successfully, but these errors were encountered:

Maybe this is just a matter of suppressing errors on a connection we know is failing or going to fail?

Correct me if I am wrong, but if the transport has started, it means that a connection was established to the ssh server, correct?

Once the socket connection is made, the client and server start the protocol negotiation, where both sides send a short plaintext «banner» consisting of «SSH-» + (protocol version) + «-» + (software version) + (optional comment) + «» + «», then each side can send their preferred lists of compression/ciphers/mac/keys, etc., followed by the formal setup of the secure encrypted transport. What seems to be happening is that the socket connection is accepted by the server, but the server side never sends any data (at least for 15 seconds). My guess is that the server provisioning is stalling on generating new host keys when the entropy pool for the necessary crypto-suitable random number generator is basically empty.

Paramiko handles the transport send/recv control on a separate thread, so your main thread is not blocking during this wait (good), but that makes it harder to determine exactly if/when that timeout occurs from the «main» thread (bad). It does not look like there is any type of usable event for the completion of the protocol negotiation like for most other paramiko cross-thread notifications. After a certain point, the client is going to give up waiting on the server to say something/anything.

it looks like you could try a couple of different workarounds:

Set the transport.banner_timeout higher than 15 seconds before calling connect, in hopes that the socket connection eventually is sent data.
Set the transport.banner_timeout lower, and poll for the transport.remote_version getting populated by the Transport thread when it does get the initial text line.
Test the connection using a plain socket.create_connection() and recv() to pre-screen that the SSH server is in a chatty state. If that test succeeds, then reconnect with paramiko.SSHClient() or paramiko.Transport() .
Tweak the server provisioning to ensure the host key(s) and whatever missing bits are present prior to starting sshd actively accepting socket connections.

Out of curiosity, can you test the behavior of the OpenSSH command line client ssh -vv when it connects to a freshly reprovisioned server? That’s usually considered the reference for determining «proper» handling of screwy situations like this.

Источник

Python Paramiko module is a Python-based SSH remote secure connection module, it is used for SSH remote command execution, file transfer, and other functions. The Paramiko module is not a python built-in module, so you need to run the command pip3 install Paramiko to install it manually.

1. Install Python Paramiko Module.

$ pip3 install Paramiko
Collecting Paramiko
  Downloading paramiko-2.7.2-py2.py3-none-any.whl (206 kB)
     |████████████████████████████████| 206 kB 215 kB/s 
Collecting cryptography>=2.5
  Downloading cryptography-3.3.1-cp36-abi3-macosx_10_10_x86_64.whl (1.8 MB)
     |████████████████████████████████| 1.8 MB 45 kB/s 
Collecting pynacl>=1.0.1
  Downloading PyNaCl-1.4.0-cp35-abi3-macosx_10_10_x86_64.whl (380 kB)
     |████████████████████████████████| 380 kB 41 kB/s 
Collecting bcrypt>=3.1.3
  Downloading bcrypt-3.2.0-cp36-abi3-macosx_10_9_x86_64.whl (31 kB)
Collecting six>=1.4.1
  Downloading six-1.15.0-py2.py3-none-any.whl (10 kB)
Collecting cffi>=1.12
  Downloading cffi-1.14.4-cp39-cp39-macosx_10_9_x86_64.whl (177 kB)
     |████████████████████████████████| 177 kB 26 kB/s 
Collecting pycparser
  Downloading pycparser-2.20-py2.py3-none-any.whl (112 kB)
     |████████████████████████████████| 112 kB 22 kB/s 
Installing collected packages: six, pycparser, cffi, cryptography, pynacl, bcrypt, Paramiko
Successfully installed Paramiko-2.7.2 bcrypt-3.2.0 cffi-1.14.4 cryptography-3.3.1 pycparser-2.20 pynacl-1.4.0 six-1.15.0

2. Use Python Paramiko To Upload File By SFTP Source Code.

import paramiko

"""
    Upload file, can not upload directory.
    :param host: sftp server host name or ip.
    :param port: sftp server listening port nubmer.
    :param user: sftp user name
    :param password: sftp account password
    :param server_path: remote server file path，for example：/root/test/test.txt
    :param local_path: local file path (c:/test.txt)
    :param timeout: upload connection timeout number ( an integer value, default is 10 )
    :return: bool
"""
def sftp_upload_file(host, port, user, password, server_path, local_path, timeout=10):
    try:
        # create transport object.
        t = paramiko.Transport((host, port))
        
        # set connection timeout number.
        t.banner_timeout = timeout
        
        # connect to remote sftp server
        t.connect(username=user, password=password)
        
        # get the SFTP client object.
        sftp = paramiko.SFTPClient.from_transport(t)
        
        # upload local file to remote server path.
        sftp.put(local_path, server_path)

        # close the connection.
        t.close()
        return True
    except Exception as e:
        print(e)
        return False

3. Use Python Paramiko To Download File By SFTP Source Code.

"""
   Download file, do not support download directory.
   :param host: SFTP server host name or ip address.
   :param port: SFTP server port number, the default port number is 22.
   :param user: SFTP server user name.
   :param password: SFTP server password.
   :param server_path: Download file path on server side.
   :param local_path: Local file path, download file saved as.
   :param timeout: Connection time out must be an integer number, defautl value is 10.
   :return: bool
"""

def sftp_down_file(host, port=22, user, password, server_path, local_path, timeout=10):

    try:
        # Create the transport object
        t = paramiko.Transport((host,port))

        # Set connection timeout value.
        t.banner_timeout = timeout

        # Connect to the SFTP server use username and password.
        t.connect(username=user,password=password)

        # Get SFTP client object.
        sftp = paramiko.SFTPClient.from_transport(t)

        # Download file from server side and save it to local path.
        sftp.get(server_path, local_path)

        # Close SFTP server connection.
        t.close()
        return True
    except Exception as e:
        print(e)
        return False

4. Use Python Paramiko To Implement Remote Command Execution.

"""
   Use SSH to connect to remote server and execute command
   :param host: server host name or ip address
   :param user: user name
   :param password: password
   :param cmd: the command to execute.
   :param seconds: execution timeout time, should be an integer number.
   :return: dict
"""

def ssh_exec_command(host,user,password, cmd,timeout=10):

    # return result is a python dictionary object. 'status' - value is 1 means success. 'data' - the command output data.
    result = {'status': 1, 'data': None}  
    try:
        # Create a new SSHClient instance.
        ssh = paramiko.SSHClient()  

        # Set connection timeout value.
        ssh.banner_timeout = timeout

        # Set host key, if no relevant information is stored in "known_hosts", the default behavior of SSHClient is to deny the connection, it will prompt yes/no.
        ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())

        # Connect to a remote server with a timeout of 1 second.
        ssh.connect(host, 22, user, password, timeout=timeout)  

        # Execute the command and return a list.
        stdin, stdout, stderr = ssh.exec_command(cmd,get_pty=True,timeout=timeout)  
        
        # Get execution result, the readlines() method will return a list object.
        out = stdout.readlines()    
       
        # Command execution status, 0 means success，1 means fail.
        channel = stdout.channel
        status = channel.recv_exit_status()

        # Close ssh connection.
        ssh.close()

        # Modify the returned result object.
        result['status'] = status
        result['data'] = out

        # return result to outer invoker.
        return result
    except Exception as e:
        print(e)
        print("Error, connection to server or command execution timeout!!! ip: {} command: {}".format(ip,cmd))
        return False

5. How To Fix EllipticCurvePublicKey.public_bytes Error.

When you run the above source code, you may encounter the below error. This is because Paramiko(2.4.2) relies on cryptography module, and the latest cryptography(2.5) has some deprecated APIs.

Please use EllipticCurvePublicKey.public_bytes to obtain both compressed and uncompressed point encoding.

So uninstall cryptography(2.5) and install cryptography(2.4.2) will fix this error.

# Uninstall current cryptography 2.5 version.
pip uninstall cryptography

# Install cryptography 2.4.2 version.
pip install cryptography==2.4.2

6. How To Fix Error reading SSH protocol banner Error.

To solve this problem, we need to increase the response waiting time of python Paramiko module. Modifying the value of self.banner_timeout in the paramiko/transport.py file to 300 or some other longer value will solve this problem.

Источник

Bug Description

Revision history for this message

Huh, I see this in the n-net logs:

2014-07-27 20:11:47.967 DEBUG nova.network.manager [req-802f7e4b-3989-4343-94d0-849cefdb64aa TestVolumeBootPattern-32554776 TestVolumeBootPattern-422744072] [instance: 5ba6082f-5742-447a-9d56-bb52ae8634fb] Allocated fixed ip None on network 27dd907f-ec5f-4e9e-b369-a5a3b6bd13fa allocate_fixed_ip /opt/stack/new/nova/nova/network/manager.py:925

Notice the None, that seems odd…

I do see this later:

2014-07-27 20:12:16.240 DEBUG nova.network.manager [req-94127694-71f3-46d2-a62c-118a4d1556cb TestVolumeBootPattern-32554776 TestVolumeBootPattern-422744072] [instance: 5ba6082f-5742-447a-9d56-bb52ae8634fb] Network deallocation for instance deallocate_for_instance /opt/stack/new/nova/nova/network/manager.py:561
2014-07-27 20:12:16.279 DEBUG nova.network.manager [req-94127694-71f3-46d2-a62c-118a4d1556cb TestVolumeBootPattern-32554776 TestVolumeBootPattern-422744072] [instance: 5ba6082f-5742-447a-9d56-bb52ae8634fb] Deallocate fixed ip 10.1.0.3 deallocate_fixed_ip /opt/stack/new/nova/nova/network/manager.py:946

So when was the fixed IP actually allocated, or is that just a logging bug?

Revision history for this message

Maybe bug 1349590 is related, that’s a nova-network issue with floating IPs.

Revision history for this message

Reviewed: https://review.openstack.org/110384
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a4c580ff03f4abb03970dd6de315ca0ba6849617
Submitter: Jenkins
Branch: master

commit a4c580ff03f4abb03970dd6de315ca0ba6849617
Author: Matt Riedemann <email address hidden>
Date: Tue Jul 29 10:18:13 2014 -0700

Add trace logging to allocate_fixed_ip

    The address is being logged as None in some cases
    that are failing in grenade jobs so this adds more
    trace logging to the base network manager’s
    allocate_fixed_ip method so we can see which paths
    are being taken in the code and what the outputs
    are.

Change-Id: I37de4b3bbb9e51b57eb4d048e05fc00382eed23d
Related-Bug: #1349617

Revision history for this message

Download full text (12.3 KiB)

I hit a similar issue, but a bit different, in http://logs.openstack.org/53/76053/16/check/check-grenade-dsvm-partial-ncpu/5a53b07/console.html#_2014-08-18_16_36_31_962 . Seems sometime it will fail to connected, sometime it fail to get the banner.

2014-08-18 16:36:31.962 | 2014-08-18 16:33:03,400 8863 INFO [tempest.common.ssh] Creating ssh connection to ‘172.24.4.1’ as ‘cirros’ with public key authentication
2014-08-18 16:36:31.962 | 2014-08-18 16:33:03,412 8863 INFO [paramiko.transport] Connected (version 2.0, client OpenSSH_6.6.1p1)
2014-08-18 16:36:31.962 | 2014-08-18 16:33:03,589 8863 INFO [paramiko.transport] Authentication (publickey) failed.
2014-08-18 16:36:31.962 | 2014-08-18 16:33:03,591 8863 WARNING [tempest.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.4.1 (Authentication failed.). Number attempts: 1. Retry after 2 seconds.
2014-08-18 16:36:31.962 | 2014-08-18 16:33:06,101 8863 INFO [paramiko.transport] Connected (version 2.0, client OpenSSH_6.6.1p1)
2014-08-18 16:36:31.962 | 2014-08-18 16:33:06,273 8863 INFO [paramiko.transport] Authentication (publickey) failed.
2014-08-18 16:36:31.962 | 2014-08-18 16:33:06,276 8863 WARNING [tempest.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.4.1 (Authentication failed.). Number attempts: 2. Retry after 3 seconds.
2014-08-18 16:36:31.962 | 2014-08-18 16:33:09,786 8863 INFO [paramiko.transport] Connected (version 2.0, client OpenSSH_6.6.1p1)
2014-08-18 16:36:31.962 | 2014-08-18 16:33:09,961 8863 INFO [paramiko.transport] Authentication (publickey) failed.
2014-08-18 16:36:31.963 | 2014-08-18 16:33:09,963 8863 WARNING [tempest.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.4.1 (Authentication failed.). Number attempts: 3. Retry after 4 seconds.
2014-08-18 16:36:31.963 | 2014-08-18 16:33:14,475 8863 INFO [paramiko.transport] Connected (version 2.0, client OpenSSH_6.6.1p1)
2014-08-18 16:36:31.963 | 2014-08-18 16:33:14,645 8863 INFO [paramiko.transport] Authentication (publickey) failed.
2014-08-18 16:36:31.963 | 2014-08-18 16:33:14,649 8863 WARNING [tempest.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.4.1 (Authentication failed.). Number attempts: 4. Retry after 5 seconds.
2014-08-18 16:36:31.963 | 2014-08-18 16:33:20,161 8863 INFO [paramiko.transport] Connected (version 2.0, client OpenSSH_6.6.1p1)
2014-08-18 16:36:31.963 | 2014-08-18 16:33:20,331 8863 INFO [paramiko.transport] Authentication (publickey) failed.
2014-08-18 16:36:31.963 | 2014-08-18 16:33:20,335 8863 WARNING [tempest.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.4.1 (Authentication failed.). Number attempts: 5. Retry after 6 seconds.
2014-08-18 16:36:31.963 | 2014-08-18 16:33:26,847 8863 INFO [paramiko.transport] Connected (version 2.0, client OpenSSH_6.6.1p1)
2014-08-18 16:36:31.963 | 2014-08-18 16:33:27,018 8863 INFO [paramiko.transport] Authentication (publickey) failed.
2014-08-18 16:36:31.964 | 2014-08-18 16:33:27,020 8863 WARNING [tem…

Changed in neutron:
importance:	Undecided → High
assignee:	nobody → Salvatore Orlando (salvatore-orlando)
importance:	High → Critical
milestone:	none → juno-3

Changed in neutron:
importance:	Critical → High

Revision history for this message

Just noticed a similar SSH time outs with «check-grenade-dsvm-partial-ncpu'» test job[1] from test ‘tempest/scenario/test_snapshot_pattern.py’:

————-
.
.
2014-08-27 08:28:47.776 | 2014-08-27 08:28:41,120 9490 INFO [tempest.common.debug] Host ns list[]
2014-08-27 08:28:47.777 | 2014-08-27 08:28:41,121 9490 ERROR [tempest.scenario.test_snapshot_pattern] Initializing SSH connection failed
2014-08-27 08:28:47.777 | 2014-08-27 08:28:41.121 9490 TRACE tempest.scenario.test_snapshot_pattern Traceback (most recent call last):
2014-08-27 08:28:47.777 | 2014-08-27 08:28:41.121 9490 TRACE tempest.scenario.test_snapshot_pattern File «tempest/scenario/test_snapshot_pattern.py», line 52, in _ssh_to_server
2014-08-27 08:28:47.777 | 2014-08-27 08:28:41.121 9490 TRACE tempest.scenario.test_snapshot_pattern return self.get_remote_client(server_or_ip)
2014-08-27 08:28:47.778 | 2014-08-27 08:28:41.121 9490 TRACE tempest.scenario.test_snapshot_pattern File «tempest/scenario/manager.py», line 332, in get_remote_client
2014-08-27 08:28:47.778 | 2014-08-27 08:28:41.121 9490 TRACE tempest.scenario.test_snapshot_pattern linux_client.validate_authentication()
2014-08-27 08:28:47.778 | 2014-08-27 08:28:41.121 9490 TRACE tempest.scenario.test_snapshot_pattern File «tempest/common/utils/linux/remote_client.py», line 54, in validate_authentication
2014-08-27 08:28:47.779 | 2014-08-27 08:28:41.121 9490 TRACE tempest.scenario.test_snapshot_pattern self.ssh_client.test_connection_auth()
2014-08-27 08:28:47.779 | 2014-08-27 08:28:41.121 9490 TRACE tempest.scenario.test_snapshot_pattern File «tempest/common/ssh.py», line 151, in test_connection_auth
2014-08-27 08:28:47.779 | 2014-08-27 08:28:41.121 9490 TRACE tempest.scenario.test_snapshot_pattern connection = self._get_ssh_connection()
2014-08-27 08:28:47.780 | 2014-08-27 08:28:41.121 9490 TRACE tempest.scenario.test_snapshot_pattern File «tempest/common/ssh.py», line 88, in _get_ssh_connection
2014-08-27 08:28:47.780 | 2014-08-27 08:28:41.121 9490 TRACE tempest.scenario.test_snapshot_pattern password=self.password)
2014-08-27 08:28:47.780 | 2014-08-27 08:28:41.121 9490 TRACE tempest.scenario.test_snapshot_pattern SSHTimeout: Connection to the 172.24.4.1 via SSH timed out.
2014-08-27 08:28:47.781 | 2014-08-27 08:28:41.121 9490 TRACE tempest.scenario.test_snapshot_pattern User: cirros, Password: None
2014-08-27 08:28:47.781 | 2014-08-27 08:28:41.121 9490 TRACE tempest.scenario.test_snapshot_pattern
.
.
————-

[1] http://logs.openstack.org/04/117104/2/check/check-grenade-dsvm-partial-ncpu/d3829fe/console.html

Revision history for this message

Here is what I’ve gathered so far. I looked through a few failed builds and focused on one [0] that uses the metadata service rather than config drive as it gives more clues.

1. The messages about “userdata” in the guest console don’t seem related to the failure i.e. the guest console only shows up in the logs if the build fails. I think it always says «/run/cirros/datasource/data/user-data was not ‘#!’ or executable» or “no userdata for datasource» if no “userdata” is being used, and none is. The ssh keys are part of the metadata in these tests, not the userdata portion of the metadata.

2. In the metadata service log [1], there are zero calls to e.g. «GET /2009-04-04/meta-data/user-data HTTP/1.1» further supporting no userdata relationship.

3. Ssh keys are added to the metadata in nova/api/metadata.py by nova itself, so it appears unlikely there is anything wrong there, or at least I didn’t see anything unusual. The key is created by a POST to nova [2] and nova creates the key. The key content then appears several times in the log messages of the metadata service (it seems fine, uncorrupted).

4. The error “Exception: Error reading SSH protocol banner[Errno 104] Connection reset by peer” implies a corruption of some kind (being that it seems communication wasn’t a problem otherwise, there’s a route) — this seems consistent with too low of an mtu and data getting truncated “occasionally.” In the log [3], the attempt to connect begins with connection refused (before sshd starts), then changes to authentication failure (likely before the guest has tried to pull the key from the metadata service), then changes to the ssh protocol banner read error. Which sounds like the key was retrieved but it’s corrupted (truncated?).

5. Web search for the same error yielded others having problems with mtu setting in the guest, where they can ping but not ssh with key pair, openstack [4] and cirros [5].

Is it at all possible that there’s an issue with the mtu of the guest sometimes? It would explain the randomness and the protocol banner errors, if data is getting truncated sometimes. I’m not sure where to go from here, I didn’t think anything like this would show up in the guest kernel logs.

[0] http://logs.openstack.org/38/115938/6/check/check-tempest-dsvm-neutron-pg-full-2/8833a83
[1] http://logs.openstack.org/38/115938/6/check/check-tempest-dsvm-neutron-pg-full-2/8833a83/logs/screen-q-meta.txt.gz
[2] http://logs.openstack.org/38/115938/6/check/check-tempest-dsvm-neutron-pg-full-2/8833a83/console.html#_2014-08-28_18_39_33_546
[3] http://logs.openstack.org/38/115938/6/check/check-tempest-dsvm-neutron-pg-full-2/8833a83/console.html#_2014-08-28_18_39_33_659
[4] https://ask.openstack.org/en/question/32958/unable-to-ssh-with-key-pair/
[5] https://bugs.launchpad.net/cirros/+bug/1301958

summary:

— test_volume_boot_pattern fails in grenade with «SSHException: Error
— reading SSH protocol banner[Errno 104] Connection reset by peer»
+ test_volume_boot_pattern fails with «SSHException: Error reading SSH
+ protocol banner[Errno 104] Connection reset by peer»

Changed in nova:
status:	New → Confirmed
importance:	Undecided → Critical

Revision history for this message

I think we should focus on two aspects:
1) Ping works otherwise we won’t get to SSH test
2) SSH connections shows always authentication failures before ‘SSH protocol banner’ errors.

I don’t know about the MTU possibility, but I wouldn’t expect it to happen on single host tests.

Revision history for this message

I was thinking maybe the auth failure might happen before the guest reads the public key from metadata, then after it reads a corrupted key, it keeps sending back a truncated or otherwise invalid data in response to the SSH connection request. I read more about the paramiko error «Error reading SSH protocol banner[Errno 104]» and it can also mean the remote host didn’t send a banner at all (not responding at all, like Salvatore mentioned in comment #10).

I combed logs some more and didn’t find anything useful so I’m now going to try to reproduce the issue locally using devstack. I’d like to see the logs inside the guest (sshd logs, etc) after this happens. Which makes me wonder if we could add something to tempest to mount the guest disk if ssh failure like this happens and capture some of the guest logs for debugging.

Revision history for this message

Melanie,

we have been discussing this issue in openstack-qa.
since we too have been unable to find any evidence regarding issues with user data, we’re going to validate the MTU hypothesis you made.

I’m going to push a patch to match it to cirros’ MTU in the gate.
On the other hand a new patches cirros build with the fix for the bug you pointed out will be released soon.

Revision history for this message

Salvatore,

Okay. I agree MTU seems unlikely to be the issue but I’m glad if we can rule it out for sure.

Do you think we could do a verbose ssh in the tempest test (like ssh -vvv) to see the details of the exchange when the failure happens?

Revision history for this message

I don’t think paramiko allow us to do that. Bypassing paramiko in tempest is too much code churn I think.

I will try to reproduce in a local environment. it should not be too hard as I can intercept this failure also on VMware NSX-CI.

Revision history for this message

Thanks for the pointer melanie. I’ll see locally first how hard it would be and whether it requires changes on the infrastructure side. This is debugging info worth having (unlike the pedant namespace info we dump which I never find useful).

Revision history for this message

Cool. I’m trying some things locally in tempest too to see what happens when I call the log_to_file function. If I get something working in tempest, I’ll put up a patch (if you haven’t already found a way).

summary:

— test_volume_boot_pattern fails with «SSHException: Error reading SSH
— protocol banner[Errno 104] Connection reset by peer»
+ SSHException: Error reading SSH protocol banner[Errno 104] Connection
+ reset by peer

Changed in neutron:
milestone:	juno-3 → juno-rc1

Revision history for this message

Thanks melanie — that’s good stuff to have.
I have a few local repro environments locally when I’m running a tweaked tempest that will not destroy the vm to which the SSH connection failed.

Revision history for this message

I reproduced the failure and I can confirm I have no authorized_keys file in the failing instance.
To reproduce the failure it is sufficient to start an instance with 4 cores and 8GB of memory, launch devstack with a localrc very similar to that of the full neutron test, and then keep running scenario tests.

A tweak for not removing the instance where ssh fails helps a lot: http://paste.openstack.org/show/105982/

Revision history for this message

Awesome Salvatore, thanks for sharing that patch.

So it’s running the latest Cirros 0.3.2 which I see fixed some bugs related to getting metadata [1]. Do you see anything interesting in /var/log/cloud-init.log in the VM?

[1] https://launchpad.net/cirros/trunk/0.3.2

Changed in tempest:
assignee:	nobody → Salvatore Orlando (salvatore-orlando)

Revision history for this message

Download full text (5.4 KiB)

So this is what I found out.

Instance log from a failing istance [1]. The important bit there is «cirros-apply-local already run per instance», and not «no userdata for datasource» as initially thought. That was just me being stupid and thinking the public key was part of user data. That was really silly.

«cirros-apply-local already run per instance» seems to appear in the console log for all SSH protocol banner failures [2]. the presence of duplicates makes it difficult to prove correlation with SSH protocol banner failures.
However, they key here is that local testing revealing that when the SSH connection fails there is no authorized_keys file in /home/cirros/.ssh. This obviously explains the authentication failure. Whether the subsequent SSH protocol banner errors are due to the cited MTU problems or else it has to be clarified yet.
What is certain is that cirros processes the data source containing the public SSH key before starting sshd. So the auth failures cannot be due to the init process not being yet complete.

The cirros initialization process executes a set of steps on an instance basis. This steps include setting public ssh keys.
«On an instance basis » means that these steps are not executed at each boot but once per instance.

cirros-apply local [3] is the step which processes, among other things, ssh public keys.
It is called by the cirros-per scripts [4], which at the end of its execution writes a marker file [5]. The cirros-per process will terminate if when executed the marker file is already present [6]

During the failing test it has been observed the following:

from the console log:
[ 3.696172] rtc_cmos 00:01: setting system clock to 2014-09-04 19:05:27 UTC (1409857527)

from the cirros-apply marker directory:
$ ls -le /var/lib/cirros/sem/
total 3
-rw-r—r— 1 root root 35 Thu Sep 4 13:06:28 2014 instance.197ce1ac-e2df-4d3a-b392-4803383ddf74.check-version
-rw-r—r— 1 root root 22 Thu Sep 4 13:05:07 2014 instance.197ce1ac-e2df-4d3a-b392-4803383ddf74.cirros-apply-local
-rw-r—r— 1 root root 24 Thu Sep 4 13:06:31 2014 instance.197ce1ac-e2df-4d3a-b392-4803383ddf74.userdata

as cirros defaults to MDT (UTC -6), this means the apply-local marker has been applied BEFORE instance boot.
This is consistent with the situation we’re seeing where the failure always occur after events such as resize or stop.
The ssh public key should be applied in the first boot of the VM. When it’s restarted the process is skipped as the key should already be there. Unfortunately the key isn’t there, which is a bit of a mystery, especially since the instance is powered off in a graceful way thanks to [7].

Nevertheless when an instance receives a shutdown signal it sends a TERM signal to all processes. Meaning that the apply-local spawned by cirros-per at [4] can be killed before it actually writes the key.
However, cirros-per even if it retrieves the return code it writes the marker in any case [5].
This creates the conditions for a situation where the marker can be present without having actually completed the apply-local phase. As a result it is possible to have guests without SSH …

Changed in tempest:
assignee:	Salvatore Orlando (salvatore-orlando) → Joe Gordon (jogo)
status:	New → In Progress
assignee:	Joe Gordon (jogo) → Matthew Treinish (treinish)

Changed in neutron:
status:	New → Incomplete
Changed in nova:
status:	Confirmed → Incomplete
Changed in grenade:
status:	New → Incomplete

Changed in tempest:
assignee:	Matthew Treinish (treinish) → Joe Gordon (jogo)

Changed in neutron:
milestone:	juno-rc1 → none
assignee:	Salvatore Orlando (salvatore-orlando) → nobody

Changed in tempest:
assignee:	Joe Gordon (jogo) → nobody
status:	In Progress → New
Changed in nova:
milestone:	none → juno-rc1

Changed in nova:
status:	Incomplete → Fix Released
assignee:	nobody → Augustina Ragwitz (auggy)