Launch agent has issues after restarting Jenkins. If you try to launch several times, it will eventually work.
ssh rules: no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty
Node config.xml
<?xml version='1.1' encoding='UTF-8'?> <slave> <name>cac-slave-01</name> <description></description> <remoteFS>/app/var/lib/jenkins/slave_node</remoteFS> <numExecutors>5</numExecutors> <mode>NORMAL</mode> <retentionStrategy class="hudson.slaves.RetentionStrategy$Always"/> <launcher class="hudson.plugins.sshslaves.SSHLauncher" plugin="ssh-slaves@1.29.1"> <host>jenkins-slave.example.com</host> <port>22</port> <credentialsId>jenkins-master-ssh-key</credentialsId> <launchTimeoutSeconds>210</launchTimeoutSeconds> <maxNumRetries>10</maxNumRetries> <retryWaitTime>15</retryWaitTime> <sshHostKeyVerificationStrategy class="hudson.plugins.sshslaves.verifiers.KnownHostsFileKeyVerificationStrategy"/> </launcher> <label></label> <nodeProperties/> </slave>
Error log 1:
SSHLauncher{host='jenkins-slave.example.com', port=22, credentialsId='jenkins-master-ssh-key', jvmOptions='', javaPath='', prefixStartSlaveCmd='', suffixStartSlaveCmd='', launchTimeoutSeconds=210, maxNumRetries=10, retryWaitTime=15, sshHostKeyVerificationStrategy=hudson.plugins.sshslaves.verifiers.KnownHostsFileKeyVerificationStrategy, tcpNoDelay=true, trackCredentials=true} [11/27/18 16:46:51] [SSH] Opening SSH connection to jenkins-slave.example.com:22. ERROR: null java.util.concurrent.CancellationException at java.util.concurrent.FutureTask.report(FutureTask.java:121) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at hudson.plugins.sshslaves.SSHLauncher.launch(SSHLauncher.java:902) at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:294) at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46) at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) [11/27/18 16:50:21] Launch failed - cleaning up connection [11/27/18 16:50:21] SSH Launch of cac-slave-01 on jenkins-slave.example.com failed in 210,000 ms
Error log 2:
SSHLauncher{host='jenkins-slave.example.com', port=22, credentialsId='jenkins-master-ssh-key', jvmOptions='', javaPath='', prefixStartSlaveCmd='', suffixStartSlaveCmd='', launchTimeoutSeconds=210, maxNumRetries=10, retryWaitTime=15, sshHostKeyVerificationStrategy=hudson.plugins.sshslaves.verifiers.KnownHostsFileKeyVerificationStrategy, tcpNoDelay=true, trackCredentials=true} [11/27/18 16:18:08] [SSH] Opening SSH connection to jenkins-slave.example.com:22. [11/27/18 16:18:29] [SSH] SSH host key matches key in Known Hosts file. Connection will be allowed. [11/27/18 16:18:29] [SSH] SSH host key matches key in Known Hosts file. Connection will be allowed. [11/27/18 16:18:29] [SSH] Authentication successful. [11/27/18 16:18:29] [SSH] Authentication successful. [11/27/18 16:18:29] [SSH] The remote user's environment is: [11/27/18 16:18:29] [SSH] The remote user's environment is: BASH=/usr/bin/bash BASHOPTS=cmdhist:extquote:force_fignore:hostcomplete:interactive_comments:progcomp:promptvars:sourcepath BASH_ALIASES=() BASH_ARGC=() BASH_ARGV=() BASH_CMDS=() BASH_EXECUTION_STRING=set BASH_LINENO=() BASH_SOURCE=() BASH_VERSINFO=([0]="4" [1]="2" [2]="46" [3]="2" [4]="release" [5]="x86_64-redhat-linux-gnu") BASH_VERSION='4.2.46(2)-release' DIRSTACK=() EUID=1000 GROUPS=() HOME=/var/lib/jenkins_slave HOSTNAME=jenkins-slave.example.com HOSTTYPE=x86_64 ID=1000 IFS=$' tn' LANG=en_US.UTF-8 LESSOPEN='||/usr/bin/lesspipe.sh %s' LOGNAME=jenkins_slave_local MACHTYPE=x86_64-redhat-linux-gnu MAIL=/var/mail/jenkins_slave_local OPTERR=1 OPTIND=1 OSTYPE=linux-gnu PATH=/usr/local/bin:/usr/bin PIPESTATUS=([0]="0") PPID=15365 PS4='+ ' PWD=/var/lib/jenkins_slave SHELL=/bin/bash SHELLOPTS=braceexpand:hashall:interactive-comments SHLVL=1 SSH_CLIENT='165.115.33.181 36584 22' SSH_CONNECTION='165.115.33.181 36584 165.115.33.182 22' TERM=dumb TMOUT=1800 UID=1000 USER=jenkins_slave_local XDG_RUNTIME_DIR=/run/user/1000 XDG_SESSION_ID=3293 _=/etc/bashrc [11/27/18 16:18:29] [SSH] Checking java version of /app/var/lib/jenkins/slave_node/jdk/bin/java BASH=/usr/bin/bash BASHOPTS=cmdhist:extquote:force_fignore:hostcomplete:interactive_comments:progcomp:promptvars:sourcepath BASH_ALIASES=() BASH_ARGC=() BASH_ARGV=() BASH_CMDS=() BASH_EXECUTION_STRING=set BASH_LINENO=() BASH_SOURCE=() BASH_VERSINFO=([0]="4" [1]="2" [2]="46" [3]="2" [4]="release" [5]="x86_64-redhat-linux-gnu") BASH_VERSION='4.2.46(2)-release' DIRSTACK=() EUID=1000 GROUPS=() HOME=/var/lib/jenkins_slave HOSTNAME=jenkins-slave.example.com HOSTTYPE=x86_64 ID=1000 IFS=$' tn' LANG=en_US.UTF-8 LESSOPEN='||/usr/bin/lesspipe.sh %s' LOGNAME=jenkins_slave_local MACHTYPE=x86_64-redhat-linux-gnu MAIL=/var/mail/jenkins_slave_local OPTERR=1 OPTIND=1 OSTYPE=linux-gnu PATH=/usr/local/bin:/usr/bin PIPESTATUS=([0]="0") PPID=15366 PS4='+ ' PWD=/var/lib/jenkins_slave SHELL=/bin/bash SHELLOPTS=braceexpand:hashall:interactive-comments SHLVL=1 SSH_CLIENT='165.115.33.181 36878 22' SSH_CONNECTION='165.115.33.181 36878 165.115.33.182 22' TERM=dumb TMOUT=1800 UID=1000 USER=jenkins_slave_local XDG_RUNTIME_DIR=/run/user/1000 XDG_SESSION_ID=3292 _=/etc/bashrc [11/27/18 16:18:29] [SSH] Checking java version of /app/var/lib/jenkins/slave_node/jdk/bin/java Couldn't figure out the Java version of /app/var/lib/jenkins/slave_node/jdk/bin/java bash: /app/var/lib/jenkins/slave_node/jdk/bin/java: No such file or directory[11/27/18 16:18:29] [SSH] Checking java version of java Couldn't figure out the Java version of /app/var/lib/jenkins/slave_node/jdk/bin/java bash: /app/var/lib/jenkins/slave_node/jdk/bin/java: No such file or directory[11/27/18 16:18:29] [SSH] Checking java version of java [11/27/18 16:18:29] [SSH] java -version returned 1.8.0_181. [11/27/18 16:18:29] [SSH] Starting sftp client. [11/27/18 16:18:29] [SSH] java -version returned 1.8.0_181. [11/27/18 16:18:29] [SSH] Starting sftp client. [11/27/18 16:18:29] [SSH] Copying latest remoting.jar... [11/27/18 16:18:29] [SSH] Copying latest remoting.jar... [11/27/18 16:18:29] [SSH] Copied 776,265 bytes. Expanded the channel window size to 4MB [11/27/18 16:18:29] [SSH] Starting agent process: cd "/app/var/lib/jenkins/slave_node" && java -jar remoting.jar -workDir /app/var/lib/jenkins/slave_node [11/27/18 16:18:29] [SSH] Copied 776,265 bytes. Expanded the channel window size to 4MB [11/27/18 16:18:29] [SSH] Starting agent process: cd "/app/var/lib/jenkins/slave_node" && java -jar remoting.jar -workDir /app/var/lib/jenkins/slave_node Nov 27, 2018 4:18:30 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir INFO: Using /app/var/lib/jenkins/slave_node/remoting as a remoting work directory Both error and output logs will be printed to /app/var/lib/jenkins/slave_node/remoting Nov 27, 2018 4:18:30 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir INFO: Using /app/var/lib/jenkins/slave_node/remoting as a remoting work directory Both error and output logs will be printed to /app/var/lib/jenkins/slave_node/remoting <===[JENKINS REMOTING CAPACITY]===>channel started <===[JENKINS REMOTING CAPACITY]===>channel started Remoting version: 3.25 Remoting version: 3.25 This is a Unix agent This is a Unix agent Evacuated stdout Evacuated stdout ERROR: Unexpected error in launching a agent. This is probably a bug in Jenkins. java.lang.IllegalStateException: Already connected at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:678) at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:432) at hudson.plugins.sshslaves.SSHLauncher.startAgent(SSHLauncher.java:1032) at hudson.plugins.sshslaves.SSHLauncher.access$500(SSHLauncher.java:128) at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:866) at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:831) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) [11/27/18 16:18:33] Launch failed - cleaning up connection [11/27/18 16:18:33] [SSH] Connection closed. Connection terminated channel stopped Connection terminated ERROR: Unexpected error in launching a agent. This is probably a bug in Jenkins. java.lang.NullPointerException at org.jenkinsci.modules.systemd_slave_installer.SlaveInstallerFactoryImpl.createIfApplicable(SlaveInstallerFactoryImpl.java:33) at org.jenkinsci.modules.slave_installer.SlaveInstallerFactory.createIfApplicable(SlaveInstallerFactory.java:29) at org.jenkinsci.modules.slave_installer.SlaveInstallerFactory.createFor(SlaveInstallerFactory.java:46) at org.jenkinsci.modules.slave_installer.impl.ComputerListenerImpl.onOnline(ComputerListenerImpl.java:30) at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:693) at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:432) at hudson.plugins.sshslaves.SSHLauncher.startAgent(SSHLauncher.java:1032) at hudson.plugins.sshslaves.SSHLauncher.access$500(SSHLauncher.java:128) at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:866) at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:831) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) [11/27/18 16:18:33] Launch failed - cleaning up connection [11/27/18 16:18:33] [SSH] Connection closed. SSHLauncher{host='jenkins-slave.example.com', port=22, credentialsId='jenkins-master-ssh-key', jvmOptions='', javaPath='', prefixStartSlaveCmd='', suffixStartSlaveCmd='', launchTimeoutSeconds=210, maxNumRetries=10, retryWaitTime=15, sshHostKeyVerificationStrategy=hudson.plugins.sshslaves.verifiers.KnownHostsFileKeyVerificationStrategy, tcpNoDelay=true, trackCredentials=true} [11/27/18 16:19:03] [SSH] Opening SSH connection to jenkins-slave.example.com:22. [11/27/18 16:19:03] [SSH] SSH host key matches key in Known Hosts file. Connection will be allowed. [11/27/18 16:19:03] [SSH] Authentication successful. [11/27/18 16:19:03] [SSH] The remote user's environment is: BASH=/usr/bin/bash BASHOPTS=cmdhist:extquote:force_fignore:hostcomplete:interactive_comments:progcomp:promptvars:sourcepath BASH_ALIASES=() BASH_ARGC=() BASH_ARGV=() BASH_CMDS=() BASH_EXECUTION_STRING=set BASH_LINENO=() BASH_SOURCE=() BASH_VERSINFO=([0]="4" [1]="2" [2]="46" [3]="2" [4]="release" [5]="x86_64-redhat-linux-gnu") BASH_VERSION='4.2.46(2)-release' DIRSTACK=() EUID=1000 GROUPS=() HOME=/var/lib/jenkins_slave HOSTNAME=jenkins-slave.example.com HOSTTYPE=x86_64 ID=1000 IFS=$' tn' LANG=en_US.UTF-8 LESSOPEN='||/usr/bin/lesspipe.sh %s' LOGNAME=jenkins_slave_local MACHTYPE=x86_64-redhat-linux-gnu MAIL=/var/mail/jenkins_slave_local OPTERR=1 OPTIND=1 OSTYPE=linux-gnu PATH=/usr/local/bin:/usr/bin PIPESTATUS=([0]="0") PPID=15645 PS4='+ ' PWD=/var/lib/jenkins_slave SHELL=/bin/bash SHELLOPTS=braceexpand:hashall:interactive-comments SHLVL=1 SSH_CLIENT='165.115.33.181 37012 22' SSH_CONNECTION='165.115.33.181 37012 165.115.33.182 22' TERM=dumb TMOUT=1800 UID=1000 USER=jenkins_slave_local XDG_RUNTIME_DIR=/run/user/1000 XDG_SESSION_ID=3294 _=/etc/bashrc [11/27/18 16:19:03] [SSH] Checking java version of /app/var/lib/jenkins/slave_node/jdk/bin/java Couldn't figure out the Java version of /app/var/lib/jenkins/slave_node/jdk/bin/java bash: /app/var/lib/jenkins/slave_node/jdk/bin/java: No such file or directory[11/27/18 16:19:03] [SSH] Checking java version of java [11/27/18 16:19:03] [SSH] java -version returned 1.8.0_181. [11/27/18 16:19:03] [SSH] Starting sftp client. [11/27/18 16:19:03] [SSH] Copying latest remoting.jar... [11/27/18 16:19:03] [SSH] Copied 776,265 bytes. Expanded the channel window size to 4MB [11/27/18 16:19:03] [SSH] Starting agent process: cd "/app/var/lib/jenkins/slave_node" && java -jar remoting.jar -workDir /app/var/lib/jenkins/slave_node Nov 27, 2018 4:19:03 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir INFO: Using /app/var/lib/jenkins/slave_node/remoting as a remoting work directory Both error and output logs will be printed to /app/var/lib/jenkins/slave_node/remoting <===[JENKINS REMOTING CAPACITY]===>channel started Remoting version: 3.25 This is a Unix agent Evacuated stdout Agent successfully connected and online
Troubleshooting
Common Pitfalls
Login profile files
When the SSH Build Agents plugin connects to a agent, it does not run an interactive shell.
Instead it does the equivalent of running ssh agenthost command...
a few times,
eventually running ssh agenthost java -jar ...
. Exactly what happens on the agent as a result of this depends on the SSHD implementation, OpenSSH runs this with bash -c command ...
(or whatever your login shell is.)
This means some of your login profiles that set up your environment are not read by your shell. See this post for more details.
If your login shell does not understand the command syntax used (e.g. the fish shell), use the advanced options Prefix Start agent Command and Suffix Start agent Command to wrap the agent command in e.g. sh -c " and "
.
Example: For loading ~/.bash_profile
on macOS agents set Prefix Start agent Command to source ~/.bash_profile &&
(intended whitespace at the end).
Make sure to reconnect the agent after changing the agents commands.
Overall recommendations:
- Use a JDK version in the same major version as the Jenkins instance and agents and preferably a close minor version.
- Tune the TCP stack on of Jenkins instance and agents
- Linux
- Windows
- Mac
- You should check for hs_err_pid error files in the root fs of the agent http://www.oracle.com/technetwork/java/javase/felog-138657.html#gbwcy
- Check the logs in the root filesystem of the agent
- Disable energy save options that suspend, or hibernate the host
- If you experience Out of Memory issues on the remoting process, try to fix the memory of the remoting process to at least 128MB (JVM options -Xms<MEM_SYZE> and -Xmx<MEM_SYZE>).
- Avoid to use slow network filesystems (<100MB/s) for the agent work directory. This impacts performance.
- If you connect several jenkins nodes to the same host, you should use different user and work directory for each one, to avoid concurrence issues.
Common info needed to troubleshooting a bug
In order to try to replicate an issue reported in Jira, we need the following info. Also, keep in mind that Jenkins Jira is not a support site, see How to report an issue
- Jenkins core version
- OS you use on your SSH agents
- OpenSSH version you have installed on your SSH agents?
- Did you check the SSHD service logs on your agent? see Enable verbose SSH Server log output
- Attach the agent connection log (http://jenkins.example.com/computer/NODENAME/log)
- Attach the logs inside the remoting folder (see remoting work directory )?
- Could you attach the agent configuration (http://jenkins.example.com/computer/NODENAME/config.xml) file?
- Attach the exception on Jenkins logs associated with the fail
- Attach the exception on build logs associated with the fail
- Attach a thread dump captured when the issue is exposed Obtaining a thread dump
- Are your SSH agents static or provisioned by a cloud plugin (k8s, Mesos, Docker, EC2, Azure, …)?
- Do it happen only on the SSH agents?
- Do it happen on all SSH agents or only on a few? Is there something in common between those SSH agents?
- Do you see if it happens always with the same job or type of job?
Force disconnection
In some cases the agent appears as connected but is not, and the disconnect button is not present, in those cases you
can force the disconnection of the agent by using an URL like this one http://jenkins.example.com/jenkins/computer/NODE_NAME/doDisconnect
Enable SSH keepAlive traffic
One common issue is that agents disconnect after an inactivity period of time, if that disconnections happens because there is no traffic
between the Jenkins instance and the Agents, you can fix the issue by enabling the keepAlive setting in the SSH service or in the stack of your OS.
To configure keepAlive traffic in the SSH service in the Agent you have to options:
- Change the SSH Server config by setting ClientAliveInterval or TCPKeepAlive on the SSH server (/etc/ssh/sshd_config) see sshd_config
- Change the SSH client config by setting ServerAliveInterval or TCPKeepAlive options for the user connection (/etc/ssh/ssh_config or ~/.ssh/ssh_config) see ssh_config
To tune your TCP stack to sent a keepAlive package every 2 minutes or so:
- Linux see Using TCP keepalive under Linux
sysctl -w net.ipv4.tcp_keepalive_time=120
sysctl -w net.ipv4.tcp_keepalive_intvl=30
sysctl -w net.ipv4.tcp_keepalive_probes=8
sysctl -w net.ipv4.tcp_fin_timeout=30
- Windows see TCP/IP and NetBT configuration parameters for Windows 2000 or Windows NT
KeepAliveInterval = 30
KeepAliveTime = 120
TcpMaxDataRetransmissions = 8
TcpTimedWaitDelay=30
- macOS see how-to-configure-tcp-keepalive-under-mac-os-x
net.inet.tcp.keepidle=120000
net.inet.tcp.keepintvl=30000
net.inet.tcp.keepcnt=8
Enable verbose SSH Server log output
Many times the only way to know what it is really happening in the SSH connection is to enable verbose logs in the SSH Server.
to increase the verbosity by setting LogLevel VERBOSE
or LogLevel DEBUG1
on your /etc/ssh/sshd_config file
and see Logging_and_Troubleshooting
Threads stuck at CredentialsProvider.trackAll
If you detect an abnormal number of threads and the thread dump showed a thread for each offline agent stuck waiting for a lock like this:
at hudson.XmlFile.write(XmlFile.java:186)
at hudson.model.Fingerprint.save(Fingerprint.java:1301)
at hudson.model.Fingerprint.save(Fingerprint.java:1245)
locked hudson.model.Fingerprint@40e3a4f1
at hudson.BulkChange.commit(BulkChange.java:98)
at com.cloudbees.plugins.credentials.CredentialsProvider.trackAll(CredentialsProvider.java:1533)
at com.cloudbees.plugins.credentials.CredentialsProvider.track(CredentialsProvider.java:1478)
at hudson.plugins.sshslaves.SSHLauncher.launch(SSHLauncher.java:856)
locked hudson.plugins.sshslaves.SSHLauncher@57dc4a8a
...
You may want to disable the credentials tracking by setting the property -Dhudson.plugins.sshslaves.SSHLauncher.trackCredentials=false
in the Jenkins properties. it can be set in runtime by executing the following code in the Jenkins script console but the change is not permanent.
System.setProperty("hudson.plugins.sshslaves.SSHLauncher.trackCredentials","false");
1.29.0 Breaks compatibility with Cloud plugins that do not use trilead-api plugin as dependency
SSH Build Agents Plugin not longer uses trilead-ssh2 module from the Jenkins core so plugins that depends on SSH Build Agents Plugin it must include trilead-api plugin as dependency until every the plugins change to this dependency. If you find this issue with one of your cloud plugins please report it and downgrade SSH Build Agents Plugin to <1.28.1 until the dependency is added to your cloud plugin.
SSHLauncher{host='192.168.1.100', port=22, credentialsId='b6a4fe2c-9ba5-4052-b91c-XXXXXXXXX', jvmOptions='-Xmx256m', javaPath='', prefixStartSlaveCmd='', suffixStartSlaveCmd='', launchTimeoutSeconds=210, maxNumRetries=10, retryWaitTime=15, sshHostKeyVerificationStrategy=hudson.plugins.sshslaves.verifiers.ManuallyTrustedKeyVerificationStrategy, tcpNoDelay=true, trackCredentials=false}
[11/20/18 00:29:56] [SSH] Opening SSH connection to 192.168.1.100:22.
[11/20/18 00:29:57] [SSH] SSH host key matches key seen previously for this host. Connection will be allowed.
ERROR: Unexpected error in launching a agent. This is probably a bug in Jenkins.
java.lang.NoClassDefFoundError: com/trilead/ssh2/Connection
at com.cloudbees.jenkins.plugins.sshcredentials.impl.TrileadSSHPasswordAuthenticator$Factory.supports(TrileadSSHPasswordAuthenticator.java:194)
at com.cloudbees.jenkins.plugins.sshcredentials.impl.TrileadSSHPasswordAuthenticator$Factory.newInstance(TrileadSSHPasswordAuthenticator.java:181)
at com.cloudbees.jenkins.plugins.sshcredentials.SSHAuthenticator.newInstance(SSHAuthenticator.java:216)
at com.cloudbees.jenkins.plugins.sshcredentials.SSHAuthenticator.newInstance(SSHAuthenticator.java:170)
at hudson.plugins.sshslaves.SSHLauncher.openConnection(SSHLauncher.java:1213)
at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:846)
at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:833)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[11/20/18 00:29:57] Launch failed - cleaning up connection
[11/20/18 00:29:57] [SSH] Connection closed.
After upgrade to ssh-slaves 1.28+ Failed to connect using SSH key credentials from files
The SSH Build Agents Plugin version newer than 1.28 uses ssh-credentials 1.14, this versions deprecated the use of «From the Jenkins controller ~/.ssh», and «From a file on Jenkins controller» SSH credential types because SECURITY-440, the ssh-credentials plugins should migrate these deprecated credentials to «Enter directly» type on restart but seems there are some cases that it fails or it is not possible.
The issue is related to ssh-credentials and a deprecated type of credentials, the workaround it is to recreate the credential with the same ID using «Enter directly» for the key, probably if you only save again the credential it will be migrated.
for more details see JENKINS-54746
Agent log
SSHLauncher{host='HOSTNAME', port=22, credentialsId='XXXXXX', jvmOptions='', javaPath='', prefixStartSlaveCmd='', suffixStartSlaveCmd='', launchTimeoutSeconds=210, maxNumRetries=10, retryWaitTime=15, sshHostKeyVerificationStrategy=hudson.plugins.sshslaves.verifiers.ManuallyTrustedKeyVerificationStrategy, tcpNoDelay=true, trackCredentials=true}
[11/21/18 09:40:05] [SSH] Opening SSH connection to HOSTNAME:22.
[11/21/18 09:40:05] [SSH] SSH host key matches key seen previously for this host. Connection will be allowed.
[11/21/18 09:40:05] [SSH] Authentication failed.
Authentication failed.
[11/21/18 09:40:05] Launch failed - cleaning up connection
[11/21/18 09:40:05] [SSH] Connection closed.
Manage old data
ConversionException: Could not call com.cloudbees.jenkins.plugins.sshcredentials.impl.BasicSSHUserPrivateKey$UsersPrivateKeySource.readResolve() : anonymous is missing the Overall/RunScripts permission : Could not call com.cloudbees.jenkins.plugins.sshcredentials.impl.BasicSSHUserPrivateKey$UsersPrivateKeySource.readResolve() : anonymous is missing the Overall/RunScripts permission ---- Debugging information ---- message : Could not call com.cloudbees.jenkins.plugins.sshcredentials.impl.BasicSSHUserPrivateKey$UsersPrivateKeySource.readResolve() : anonymous is missing the Overall/RunScripts permission cause-exception : com.thoughtworks.xstream.converters.reflection.ObjectAccessException cause-message : Could not call com.cloudbees.jenkins.plugins.sshcredentials.impl.BasicSSHUserPrivateKey$UsersPrivateKeySource.readResolve() : anonymous is missing the Overall/RunScripts permission class : com.cloudbees.jenkins.plugins.sshcredentials.impl.BasicSSHUserPrivateKey$UsersPrivateKeySource required-type : com.cloudbees.jenkins.plugins.sshcredentials.impl.BasicSSHUserPrivateKey$UsersPrivateKeySource converter-type : hudson.util.RobustReflectionConverter path : /com.cloudbees.plugins.credentials.SystemCredentialsProvider/domainCredentialsMap/entry/java.util.concurrent.CopyOnWriteArrayList/com.cloudbees.jenkins.plugins.sshcredentials.impl.BasicSSHUserPrivateKey/privateKeySource line number : 21 -------------------------------
Selenium Grid agents failed to connect
On Jenkins Core 2.89.3, Agent — Controller Access Control was introduced it can causes an issue with Selenium Grid Agents, to fix this problem you have to disable: «Manage Jenkins» > «Configure Global Security», and check «Enable Agent → Controller Access Control» (as it’s said in Jenkins documentation)
you can see more details at JENKINS-49118
https://wiki.jenkins.io/display/JENKINS/Slave+To+Master+Access+Control
«On the other hand, if all your agents are trusted to the same degree as your master, then it is safe to leave this subsystem off»
Apr 03, 2019 9:46:01 AM org.jenkinsci.remoting.util.AnonymousClassWarnings warn
WARNING: Attempt to (de-)serialize anonymous class hudson.plugins.selenium.configuration.DirectJsonInputConfiguration$1; see: https://jenkins.io/redirect/serialization-of-anonymous-classes/
Apr 03, 2019 9:46:06 AM hudson.remoting.SynchronousCommandTransport$ReaderThread run
INFO: I/O error in channel channel
java.io.IOException: Unexpected termination of the channel
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
Caused by: java.io.EOFException
at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2681)
at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3156)
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:862)
at java.io.ObjectInputStream.<init>(ObjectInputStream.java:358)
at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
at hudson.remoting.Command.readFrom(Command.java:140)
at hudson.remoting.Command.readFrom(Command.java:126)
at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:36)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
Corrupt agent workdir folder
If you experience an immediate disconnection without any clear trace it could be related with a corrupt file in the agent workdir folder.
Apr 17, 2019 2:16:23 PM INFO hudson.remoting.SynchronousCommandTransport$ReaderThread run
When attempting to connect an agent using "Launch Agent via SSH" getting the following error.
[04/17/19 16:27:27] [SSH] Checking java version of /home/jenkins/jdk/bin/java
[04/17/19 16:27:28] [SSH] /home/jenkins/jdk/bin/java -version returned 1.8.0_191.
[04/17/19 16:27:28] [SSH] Starting sftp client.
[04/17/19 16:27:28] [SSH] Copying latest remoting.jar...
[04/17/19 16:27:28] [SSH] Copied 776,717 bytes.
Expanded the channel window size to 4MB
[04/17/19 16:27:28] [SSH] Starting agent process: cd "/home/jenkins" && /home/jenkins/jdk/bin/java -jar remoting.jar -workDir /home/jenkins
Apr 17, 2019 4:27:28 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
INFO: Using /home/jenkins/remoting as a remoting work directory
Both error and output logs will be printed to /home/jenkins/remoting
<===[JENKINS REMOTING CAPACITY]===>channel started
Remoting version: 3.27
This is a Unix agent
Evacuated stdout
Agent JVM has not reported exit code. Is it still running?
[04/17/19 16:27:33] Launch failed - cleaning up connection
[04/17/19 16:27:33] [SSH] Connection closed.
ERROR: Connection terminated
java.io.EOFException
at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2678)
at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3153)
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:861)
at java.io.ObjectInputStream.<init>(ObjectInputStream.java:357)
at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
at hudson.remoting.Command.readFrom(Command.java:140)
at hudson.remoting.Command.readFrom(Command.java:126)
at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:36)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
Caused: java.io.IOException: Unexpected termination of the channel
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
Try to connect the agent via «command on the master», if you see the following error, you would wipe out the workdir and the issue would be resolved.
Unable to launch the agent for *************
java.io.IOException: Invalid encoded sequence encountered: 3D 3D 5B 4A 45 4E 4B 49 4E 53 20 52 45 4D 4F 54 49 4E 47 20 43 41 50 41 43 49 54 59 5D 3D 3D 3D 3E 72 4F 30 41 42 58 4E 79 41 42 70 6F 64 57 52 7A 62 32 34 75 63 6D 56 74 62 33 52 70 62 6D 63 75 51 32 46 77 59 57 4A 70 62 47 6C 30 65 51 41 41 41 41 41 41 41 41 41 42 41 67 41 42 53 67 41 45 62 57 46 7A 61 33 68 77 41 41 41 41 41 41 41 41 41 66 34
Use Remote root directory in a no C: drive
The default configuration assumes the Remote root directory in C:
drive, so the agent command launch will fail if the Remote root directory is in another drive. You can change the Remote root directory drive by using Prefix Start Agent Command
, if you set Prefix Start Agent Command
to cd /d D: &&
you would change to the drive D:
before to enter in the Remote root directory.
Issue
I have configured a docker cloud dynamic agent creation with the following settings:
/>
and the agent config as follows:
but they fail to start. Been googling around but since I have little to nothing to go on with the logs, I don’t know how to fix this.
Connecting to docker container 2169d7b0f9de955c89916da421dc6e04f41104423d0fdcd28796162225cf491f, running command java -jar //remoting-4.11.2.jar -noReconnect -noKeepAlive -agentLog //agent.log
HTTP/1.1 101 UPGRADED
Content-Type: application/vnd.docker.raw-stream
Connection: Upgrade
Upgrade: tcp
Api-Version: 1.41
Docker-Experimental: false
Ostype: linux
Server: Docker/20.10.12 (linux)
ERROR: Unexpected error in launching an agent. This is probably a bug in Jenkins
Also: java.lang.Throwable: launched here
at hudson.slaves.SlaveComputer._connect(SlaveComputer.java:282)
at hudson.model.Computer.connect(Computer.java:440)
at hudson.slaves.SlaveComputer.doLaunchSlaveAgent(SlaveComputer.java:796)
at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:627)
at org.kohsuke.stapler.Function$MethodFunction.invoke(Function.java:393)
at org.kohsuke.stapler.Function$InstanceFunction.invoke(Function.java:405)
at org.kohsuke.stapler.interceptor.RequirePOST$Processor.invoke(RequirePOST.java:77)
at org.kohsuke.stapler.PreInvokeInterceptedFunction.invoke(PreInvokeInterceptedFunction.java:26)
at org.kohsuke.stapler.Function.bindAndInvoke(Function.java:208)
at org.kohsuke.stapler.Function.bindAndInvokeAndServeResponse(Function.java:141)
at org.kohsuke.stapler.MetaClass$11.doDispatch(MetaClass.java:536)
at org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:58)
The last part of the logs stay:
java.io.EOFException: unexpected stream termination
at hudson.remoting.ChannelBuilder.negotiate(ChannelBuilder.java:464)
at hudson.remoting.ChannelBuilder.build(ChannelBuilder.java:409)
at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:432)
at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:399)
at io.jenkins.docker.connector.DockerComputerAttachConnector$DockerAttachLauncher.launch(DockerComputerAttachConnector.java:321)
at hudson.slaves.DelegatingComputerLauncher.launch(DelegatingComputerLauncher.java:63)
at io.jenkins.docker.connector.DockerDelegatingComputerLauncher.launch(DockerDelegatingComputerLauncher.java:37)
at hudson.slaves.SlaveComputer.lambda$_connect$0(SlaveComputer.java:293)
at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:80)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
and in jenkins logs I see:
96a29780b14e340c1667f2 for node Jenkins JDK11 agent-00000a8utz442 from image: jenkins/agent:alpine
2022-01-27 22:56:32.628+0000 [id=188] INFO i.j.d.c.DockerMultiplexedInputStream#readInternal: stderr from Jenkins JDK11 agent-00000a8utz442 (4611c8568d1b73e6a8b7e54613f7ef06c79210f03a96a29780b14e340c1667f2): Jan 27, 2022 10:56:32 PM org.jenkinsci.remoting.engine.WorkDirManager setupLogging
INFO: Using /agent.log as an agent error log destination; output log will not be generated
2022-01-27 22:56:32.628+0000 [id=188] INFO i.j.d.c.DockerMultiplexedInputStream#readInternal: stderr from Jenkins JDK11 agent-00000a8utz442 (4611c8568d1b73e6a8b7e54613f7ef06c79210f03a96a29780b14e340c1667f2): Exception in thread "main"
2022-01-27 22:56:32.629+0000 [id=188] INFO i.j.d.c.DockerMultiplexedInputStream#readInternal: stderr from Jenkins JDK11 agent-00000a8utz442 (4611c8568d1b73e6a8b7e54613f7ef06c79210f03a96a29780b14e340c1667f2): java.io.FileNotFoundException: /agent.log (Permission denied)
2022-01-27 22:56:32.629+0000 [id=188] INFO i.j.d.c.DockerMultiplexedInputStream#readInternal: stderr from Jenkins JDK11 agent-00000a8utz442 (4611c8568d1b73e6a8b7e54613f7ef06c79210f03a96a29780b14e340c1667f2): at java.base/java.io.FileOutputStream.open0(Native Method)
2022-01-27 22:56:32.629+0000 [id=188] INFO i.j.d.c.DockerMultiplexedInputStream#readInternal: stderr from Jenkins JDK11 agent-00000a8utz442 (4611c8568d1b73e6a8b7e54613f7ef06c79210f03a96a29780b14e340c1667f2): at java.base/java.io.FileOutputStream.open(Unknown Source)
2022-01-27 22:56:32.629+0000 [id=188] INFO i.j.d.c.DockerMultiplexedInputStream#readInternal: stderr from Jenkins JDK11 agent-00000a8utz442 (4611c8568d1b73e6a8b7e54613f7ef06c79210f03a96a29780b14e340c1667f2): at java.base/java.io.FileOutputStream.<init>(Unknown Source)
Which is actually not that much info either.
Solution
In the agent configuration put /home/jenkins in the field Remote File System Root. I had the same problem and this fixed it for me.
Answered By — Alan de Almeida
Answer Checked By — Robin (JavaFixing Admin)
Prepare Java runtime
1. Download Java
2. Configure Java Windows path
JAVA_HOME=C:Program FilesJavajdk1.8.0_201
CLASSPATH=.;%JAVA_HOME%lib;%JAVA_HOME%jrelib
Create Node
1. Jenkins home page->Manage Node->New Node, such as window-build-machine
2. List windows agent settings
Items | Settings |
---|---|
Name | window-build-machine |
Description | used for windows build |
of executors | 1 |
Remote root directory | C:agent |
Labels | windows, build |
Usage | Use this node as much as possible |
Launch method | Let Jenkins control this Windows slave as a Windows service |
Administrator user name | .Administrator |
Password | mypassword |
Host | 192.168.1.111 |
Run service as | Use Administrator account given above |
Availability | Keep this agent online as much as paossible |
3. Save then Connect
[2019-05-11 01:32:50] [windows-slaves] Connecting to 192.168.1.111
Checking if Java exists
java -version returned 1.8.0.
[2019-05-11 01:32:50] [windows-slaves] Copying jenkins-slave.xml
[2019-05-11 01:32:50] [windows-slaves] Copying slave.jar
[2019-05-11 01:32:50] [windows-slaves] Starting the service
[2019-05-11 01:32:50] [windows-slaves] Waiting for the service to become ready
[2019-05-11 01:32:55] [windows-slaves] Connecting to port 52,347
<===[JENKINS REMOTING CAPACITY]===>Remoting version: 3.29
This is a Windows agent
Agent successfully connected and online
Troubleshooting
The following issues I met and how I fixed them.
1. ERROR: Message not found for errorCode: 0xC00000AC
You need need to install JDK, and config JAVA environment variable.
2. How to fix add windows node as Windows service error
Ref to JENKINS-16418.
3. org.jinterop.dcom.common.JIException: Message not found for errorCode: 0x00000005
Fixed permission for the following registry keys
- HKEY_LOCAL_MACHINESOFTWAREClassesWow6432NodeCLSID{72C24DD5-D70A-438B-8A42-98424B88AFB8}
- HKEY_CLASSES_ROOTCLSID{76A64158-CB41-11D1-8B02-00600806D9B6}
Steps to fix it
- Open ‘regedit’ (as Administrator), Find (Ctrl+F) the registry key: “{72C24DD5-D70A-438B-8A42-98424B88AFB8}” in HKEY_LOCAL_MACHINESOFTWAREClassesWow6432NodeCLSID
- Right click and select ‘Permissions’, Change owner to administrators group (Advanced…).
- Change permissions for administrators group. Grant Full Control。
- Change owner back to TrustedInstaller (user is “NT ServiceTrustedInstaller” on local machine)
Repeat the above steps to fix permission for HKEY_CLASSES_ROOTCLSID{76A64158-CB41-11D1-8B02-00600806D9B6}
Finally, Restart Remote Registry Service (Administrative Tools / Services).
4. ERROR: Unexpected error in launching an agent
This is probably a bug in Jenkins.
- Login remote machine and open Services find
jenkinsslave-C__agent
- Set startup type: Automatic
- Log On: select This account, type correct account and password
- start jenkinsslave-C__agent
5. Caused by: org.jinterop.dcom.common.JIRuntimeException: Message not found for errorCode: 0x800703FA
Slave under domain account, If your slave is running under a domain account and you get an error code 0x800703FA, change a group policy:
- open the group policy editor (gpedit.msc)
- go to Computer Configuration->Administrative Templates->System-> UserProfiles, “Do not forcefully unload the user registry at user logoff”
- Change the setting from “Not Configured” to “Enabled”, which disables the new User Profile Service feature (‘DisableForceUnload’ is the value added to the registry)
6. ERROR: Message not found for errorCode: 0xC0000001 Caused by: jcifs.smb.SmbException: Failed to connect: 0.0.0.0<00>/10.xxx.xxx.xxx
Need to enable SMB1
- Search in the start menu for ‘Turn Windows features on or off’ and open it.
- Find ‘SMB1.0/CIFS File Sharing Support’ in the list of optional features that appears, and select the checkbox next to it.
- Click OK and Windows will add the selected feature.
You’ll be asked to restart your computer as part of this process.
7. .NET Framework 2.0 or later is required on this computer to run a Jenkins agent as a Windows service
Need to upgrade your .NET Framework. Here is a link for update .NET Framework.
6. more connect jenkins agent problem on windows
Please refer to this link https://github.com/jenkinsci/windows-slaves-plugin/blob/master/docs/troubleshooting.adoc
Неизмененный стандартный ввод, неизмененный стандартный вывод
Я считаю, что что-то в вашем скрипте подделывает стандартный ввод.
Ваш сценарий должен передавать поток стандартного ввода целиком, без изменений процессу агента Jenkins.
Универсальное решение
Команда OP для установки сеанса Jenkins отличается от моей, но, тем не менее, вы должны разделить свой сценарий запуска на 3 основные части:
Настраивать: в этой части не вмешивайтесь в stdin или stdout.
Установите сеанс Jenkins: java -jar jenkins-cli.jar …
Срывать: в этой части не вмешивайтесь в stdin или stdout.
#!/bin/bash
function set_up {
# your set-up code here
}
function tear_down {
# your tear-down code here
}
function main {
# set-up (no stdin, no stdout)
set_up "$@" < /dev/null > /dev/null || exit $?
# establish Jenkins session
java -jar jenkins-cli.jar -blah -blah -blah
# tear-down (no stdin, no stdout)
tear_down "$@" < /dev/null > /dev/null || exit $?
}
main "$@"
Но почему?
Задача вашего сценария запуска — установить свободный канал связи (через stdin и stdout) между мастером и агентом сборки.
+------------+
"Hello Agent" | |
_ _ ----+ +----
v Hello Agent ->
----+ +----
| |
| | "Hello Master"
----+ +---- _ _
<- Hello Master v
----+ +----
| |
+------------+
launch
script
Если этот канал связи будет изменен, Дженкинс не будет работать.
+------------+
"Hello Agent" | |
_ _ ----+ +-----------
v Hel PLZ SEND HELP!! t ->
----+ +-----------
| |
| |
----+ +---- | |
^
----+ +----
| |
+------------+
launch
script
Некоторые команды Unix могут «проглотить» стандартный ввод вашего сценария запуска, если вы ничего не передадите в эту команду, и, следовательно, «повредите» канал связи. Рассмотрим следующий сценарий.
#!/bin/bash
function keep_stdin_intact {
printf 'I do not consume any stdin, ' >&2
echo 'and I do not alter the original stdout.' >&2
}
function swallow_stdin {
echo 'I swallow stdin. Did you see any hexdump below?' >&2
read yn # read consumed some stdin
}
echo 'yes' | { keep_stdin_intact; cat -; } | xxd
echo 'yes' | { swallow_stdin; cat -; } | xxd
echo "no you can't now :P" | { swallow_stdin < /dev/null; cat -; } | xxd
-
Первый yes был передан по конвейеру и выгружен в шестнадцатеричном формате, потому что keep_stdin_intact не вмешивался в stdin, в данном случае поток «yes».
-
Второй yes пропал, потому что swallow_stdin потреблял его, поэтому cat нечего катать, а xdd нечего читать.
-
Подключив /dev/null к команде stdin-swallowing, мы защитили наш собственный stdin.
Что случилось с ssh?
Ssh — одна из злых команд, которые поглощают ваш стандартный ввод.
Допустим, вы хотите удалить некоторые файлы в агенте сборки перед запуском agent.jar. Без шаблона у вас может возникнуть соблазн написать:
ssh $OPTIONS "$remote" 'sudo rm -rf /var/log/nginx/*'
ssh $OPTIONS "$remote" 'cd $HOME && java -jar agent.jar'
^ Но это неправильно! Первая команда ssh проглотит ваш стандартный ввод, и сеансу Jenkins нечего будет читать.
Первый ssh надо «заглушить». Передайте /dev/null в качестве стандартного ввода.
ssh $OPTIONS "$remote" 'sudo rm -rf /var/log/nginx/*' < /dev/null
ssh $OPTIONS "$remote" 'cd $HOME && java -jar agent.jar'
user9587350, 12 августа 2018 г., 06:59