Opsfire: Recovering Jenkins after Complete Failure
Hold on to your seats
Cause this is going to be a bit of a ride. This is a tale of explosions, defeat, perserverance, and ultimate victory.
I'm giving you a spoiler for the ultimate victory because, like all situations of prolonged pain, there were several points that I didn't think we were going to get there.
Hooked? Here's how it began
We were having an issue where some of our Jenkins jobs were hanging on the git clone
. This is something relatively new that started happening this week, so after manually killing and kicking a couple of jobs until they Just Worked (already forgetting the HTML/PEM file lesson) I decided to do a rollback.
This wasn't too huge of a deal, really. We use the weekly Jenkins builds, and I keep the previous week's build handy, so:
sudo service jenkins stop
sudo cp ~/backup/jenkins-v2.103.war /usr/lib/jenkins/jenkins.war
sudo service jenkins start
Everything came up normally, except for the fact that the jobs still hung on git clone
. Oh well. Rinse repeat the above, replace jenkins-v2.103.war
with jenkins-v2.104.war
. Everything came back live, nothing to see here.
A few of the plugins were upgraded this week as well. Since one of them was the Github plugin, and the issue was with git clone
, I figured I'd try to roll back that first.
Feel free to click so you can view this insanity in all its glory.
Source: Giphy, Firefly
But wait, what?
It's worth now noting what fired through my brain in rapid succession:
- We use Github Oauth
- This appears to be something with Github core
- How could could a plugin rollback do this
- I knew this box was fragile, I should have snapshotted it before
- Is there a snapshot?
- ...Of course the last one is from freaking December.
What to do: Act 1, Scenes 1-3
My first instinct, as any battle hardened person will tell you, was
to
everything.
Filtering out the seemingly unending volumes of advice about how to roll plugin versions forward and backward, using the UI thank you very much for that, I found some advice about how to install plugins using their CLI.
YIL that Jenkins has a CLI.
I went to find where to download it, but lo: you need a working version of Jenkins to download the CLI. And the version of the CLI is dependent on the Jenkins release version as well. And you can only download it from using your actual Jenkins install, e.g.
wget -O /desired/path/to/jenkins-cli.war https://${JENKINS_URL}/jnlpJars/jenkins-cli.jar
You can also go to the latter path in your browser if your installation is up and running.
Which mine was not.
Oh! I know! I'll make a fresh Jenkins box with the same version and download the CLI from there!
I'm not going to detail this part for you (yet), but stay tuned. I got the CLI, but primary Jenkins was so hosed that pointing the CLI at it threw a Java exception. Even if this hadn't happened, though, I was reminded when I created a sandbox Jenkins that I probably still would have run into difficulties without the exception since Github auth was still enabled and I would have needed to disable it or create a token for the CLI. Which I would have needed access to the UI to do. (Jenkins is really reliant on having that UI up and running.)
It's worth mentioning that while all that was going on I had concurrently attempted to spin up a new instance using an AMI I had made from the December image, but when I tried to start Jenkins on that instance it also died and threw exceptions. Not in the web browser, in the browser I was just presented with a lovely site unreachable error. I ssh
ed into the instance to see Jenkins' logs (/var/log/jenkins/jenkins.log
) and there were several Java exceptions everywhere. Also, importantly, errors referencing missing jobs.
Source: Gunnerkrigg Court. It's an awesome web comic.
It was at this point I practiced some deep breathing.
What to do: Act 2
So I was half hoping that at least some of the exceptions could be handled by copying over the jobs
from the dying dying dead Jenkins to the Dec Jenkins. I did this by shutting off the dead one's EC2 instance, detaching the volume, and attaching it to the December Jenkins instance. Amazon has how to attach a volume in the console documented very well. It's worth noting you use essentially the same process, when the instances in question are powered off, to detach the volume.
I think it's important to clarify again, since you may encounter instructions of other ways to unmount disk volumes while a system is powered on, e.g. this EBS doc by Amazon, that in this case those instructions will not work as you can't (safely) detach the root and only volume from a running system.
Anywho, after the volume is attached then just create the mount point, get volume list, mount volume:
→ sudo mkdir /jenkins-defunct
→ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda 202:0 0 512G 0 disk
└─xvda1 202:1 0 512G 0 part /
xvdf 202:2 0 512G 0 disk
└─xvdf1 202:3 0 512G 0 part /
→ sudo mount /dev/xvdf1 /jenkins-defunct
Breathing. Ok.
Now it's time to rsync.
→ sudo mv /var/lib/jenkins/jobs{,--bkp}
→ sudo rsync -a /jenkins-defunct/var/lib/jenkins/jobs /var/lib/jenkins/jobs
And now we wait.
And wait.
*** Skipping any contents from this failed directory ***
rsync: recv_generator: mkdir "/var/lib/jenkins/jobs/jobs/${SOME_PLACEHOLDER_JOB}/workspace" failed: No space left on device (28)
*** Skipping any contents from this failed directory ***
rsync: mkstemp "/var/lib/jenkins/jobs/jobs/${SOME_PLACEHOLDER_JOB}/.config.xml.dovJOF" failed: No space left on device (28)
rsync: mkstemp "/var/lib/jenkins/jobs/jobs/${SOME_PLACEHOLDER_JOB}/.disk-usage.xml.xnYWQ8" failed: No space left on device (28)
rsync: mkstemp "/var/lib/jenkins/jobs/jobs/${SOME_PLACEHOLDER_JOB}/.github-polling.log.RVYdTB" failed: No space left on device (28)
rsync: mkstemp "/var/lib/jenkins/jobs/jobs/${SOME_PLACEHOLDER_JOB}/.nextBuildNumber.EtLAV4" failed: No space left on device (28)
rsync: recv_generator: mkdir "/var/lib/jenkins/jobs/jobs/${SOME_PLACEHOLDER_JOB}/workspace@tmp" failed: No space left on device (28)
*** Skipping any contents from this failed directory ***
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1039) [sender=3.0.6]
After a couple hours of seemingly silent, copying bliss I was abruptedly introduced to an unhumanly countable number of lines like that.
But wait, I filled up the whole drive? With jobs?
$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/xvda1 504G 276G 229G 55% /
devtmpfs 7.9G 64K 7.9G 1% /dev
tmpfs 7.9G 0 7.9G 0% /dev/shm
/dev/xvdf1 504G 258G 247G 52% /jenkins-defunct
I am using... half the drive. What.
One of my coworker's offered to ssh
in at this point to see if he saw anything.
But he could not, he was given an out of disk error.
I looked at CloudWatch metrics for the alarm and thankfully df
didn't lie there: using half disk.
What gives?
inodes
The short version, since they aren't the focus here, is that inodes store file and directory metadata on *nix filesystems. To the passerby, they become important when you have a ton of tiny files, as Jenkins clearly does, and you run out:
$ df -ih
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/xvda1 32M 32M 0 100% /
devtmpfs 2.0M 477 2.0M 1% /dev
tmpfs 2.0M 1 2.0M 1% /dev/shm
/dev/xvdf1 32M 27M 5.7M 83% /jenkins-defunct
(If you'd like to read more on what inodes are, please check out this Linux Magazine article.)
Of course I ran out of inodes.
The quickest way to address this problem is to resize the disk. This is the option I went with, since this is production Jenkins and we're now several hours into this outage.
Since this is a short term solution, I stopped the instance, resized the root volume to 2 TB, rebooted, and restarted the rsync.
By the way, here is an example of why to rsync
rather than cp
: the latter would just completely start over and write over what it had already done as needed, taking more time, whereas rsync
will pick up where it left off.
That said, it still took another hour or two for the sync to complete.
While this was going on, I was pondering my next move. Also: getting really tired as it was late now.
What to do: Act 3, all the climatic scenes
I read a bit more on Jenkins admin. Since everything should be hypothetically recoverable from JENKINS_HOME
, I decided to try re-installing all the plugins on a new Jenkins instance (don't ask me how many I have running now), and the copy that's instances /var/lib/jenkins/plugins
directory back to the original.
This ultimately ended in failure, but I ended up with a very useful shell script that allowed me to install plugins very quickly.
Another spoiler: vim
tricks are super handy.
Spinning up Jenkins on a Linux AMI
I'm going to go a little detailed here for those new to installing Jenkins.
The instance configuration:
- Instance Type:
m5.xlarge
- Security: security group with ports
22
,80
,8080
,443
, and4443
are open / accessible on our VPC and via VPN. - EBS type: 512 GB GP2
- In hindsight, though, I recommend using IO1. Cost is similar and would help speed up
rsync
later.
- In hindsight, though, I recommend using IO1. Cost is similar and would help speed up
- AMI: Amazon Linux AMI 2017.09.1 (HVM)
ssh
into the instance and:
- Update
- Install some basic tools
- Create a group
- Use
visudo
to enable passwordlesssudo
- Create your user
- Add your user to the group
- Switch to the user account
- Install a handy prompt and useful rc files
Here we go.
$ sudo yum upgrade -y
$ sudo yum install git tmux tree htop ack unzip -y
$ sudo groupadd admin
$ sudo useradd quintessence
$ sudo usermod -aG admin quintessence
$ sudo EDITOR=vim visudo
$ sudo su - quintessence
Last login: Fri Feb 2 16:49:01 UTC 2018 on pts/0
[quintessence@ip-███-███-███-███ ~]$ sudo ls /etc/
acpi blkid csh.cshrc
{{{ snip }}}
[quintessence@ip-███-███-███-███ ~]$ git clone https://github.com/jhunt/env
Cloning into 'env'...
remote: Counting objects: 713, done.
remote: Total 713 (delta 0), reused 0 (delta 0), pack-reused 713
Receiving objects: 100% (713/713), 128.96 KiB | 18.42 MiB/s, done.
Resolving deltas: 100% (419/419), done.
[quintessence@ip-███-███-███-███ ~]$ cd env/
[quintessence@ip-███-███-███-███ env]$ ./install
setting up dot files in ~
configuring vim...
copying in ~/bin scripts...
installing jq...
installing spruce (v1.8.2)...
configuring git...
setting up ~/.bashrc...
hostname: No address associated with name
[quintessence@ip-███-███-███-███ env]$ cd ..
[quintessence@ip-███-███-███-███ ~]$ vim .host
[quintessence@ip-███-███-███-███ ~]$ source ~/.bashrc
+033+16:52:54:8:0 ███.███.███.███/20 quintessence@jenkins ~
→
For visudo
, here's the magic line to allow admin
group passwordless sudo
:
%admin ALL=(ALL) NOPASSWD: ALL
As a quick aside: the .host
file is used by the prompt to display the hostname
if none is set, which is the case for this dev instance. Right now it just has jenkins
in it and that's what you see in the prompt above. For the remainder of the blog post when you see →
, that's just part of my prompt.
Now for the Jenkins install
I'm going to use yum
to install the last stable release of Jenkins to make sure that loads. Here's the needful, spaced out with output:
→ sudo wget -O /etc/yum.repos.d/jenkins.repo http://pkg.jenkins-ci.org/redhat-stable/jenkins.repo
--2018-02-02 16:53:25-- http://pkg.jenkins-ci.org/redhat-stable/jenkins.repo
Resolving pkg.jenkins-ci.org (pkg.jenkins-ci.org)... 52.202.51.185
Connecting to pkg.jenkins-ci.org (pkg.jenkins-ci.org)|52.202.51.185|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 85
Saving to: ‘/etc/yum.repos.d/jenkins.repo’
/etc/yum.repos.d/jenkins.repo 100%[===================================================================================================================================================================>] 85 --.-KB/s in 0s
2018-02-02 16:53:26 (30.9 MB/s) - ‘/etc/yum.repos.d/jenkins.repo’ saved [85/85]
→ sudo rpm --import http://pkg.jenkins-ci.org/redhat-stable/jenkins-ci.org.key
→ sudo yum install jenkins -y
Loaded plugins: priorities, update-motd, upgrade-helper
amzn-main | 2.1 kB 00:00:00
amzn-updates | 2.5 kB 00:00:00
jenkins | 2.9 kB 00:00:00
jenkins/primary_db | 23 kB 00:00:00
Resolving Dependencies
--> Running transaction check
---> Package jenkins.noarch 0:2.89.3-1.1 will be installed
--> Finished Dependency Resolution
Dependencies Resolved
================================================================================================================================================================================================================================================================================
Package Arch Version Repository Size
================================================================================================================================================================================================================================================================================
Installing:
jenkins noarch 2.89.3-1.1 jenkins 71 M
Transaction Summary
================================================================================================================================================================================================================================================================================
Install 1 Package
Total download size: 71 M
Installed size: 71 M
Downloading packages:
jenkins-2.89.3-1.1.noarch.rpm | 71 MB 00:00:01
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Installing : jenkins-2.89.3-1.1.noarch 1/1
Verifying : jenkins-2.89.3-1.1.noarch 1/1
Installed:
jenkins.noarch 0:2.89.3-1.1
Complete!
When I ran sudo service jenkins start
to start Jenkins, I received the following because Jenkins needs Java 8 and apparently Amazon Linux is shipping with Java 7:
→ sudo service jenkins start
Starting Jenkins Jenkins requires Java8 or later, but you are running 1.7.0_161-mockbuild_2017_12_19_23_46-b00 from /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.161.x86_64/jre
java.lang.UnsupportedClassVersionError: 51.0
at Main.main(Main.java:124)
[ OK ]
I'm going to remove Java 7 and install Java 8 (output not included for the yum
commands):
→ sudo yum remove java-1.7.0-openjdk -y
→ sudo yum install java-1.8.0 -y
→ sudo service jenkins start
Starting Jenkins [ OK ]
I'm also going to add the jenkins
service to start on boot:
→ sudo chkconfig jenkins on
Ok, now that I have all of that: does bare Jenkins load?
Yes.
This the first moment of relief that I've had in hours.
Source: IconArchive
As a result, I breezed through the next bit:
- Unlocked grabbing the initial admin password as indicated on the splash page
- Selected to have Jenkins Install the Recommended Plugins
- Set up my Jenkins username + password (also part of the setup wizard)
- Made sure my Jenkins username matched my Github username to prevent redundancy when hooking up Github Oauth
- Made sure Jenkins loaded
You can click on this image for full size.
This is the happy time. So happy, I'm gonna use that emoji again:
Ok, now to stop the service, update to the latest weekly build so it matches the desired environment, and then restart the bare Jenkins. Not anticipating any issues since this is still a bare environment. Skipping the step where I download the war file:
→ ls
bin code env jenkins-v2.104.war
→ sudo mv /usr/lib/jenkins/jenkins.war{,_old}
→ sudo cp jenkins-v2.104.war /usr/lib/jenkins/jenkins.war
→ sudo ls /usr/lib/jenkins/
jenkins.war jenkins.war_old
→ sudo service jenkins restart
On restart this looks mostly the same:
You can click on this image for full size.
With a crucial difference:
At this point, I think we can upgrade to a full on smile:
Source: IconArchive
Plugin installs
I discovered very quickly that installing these plugins via the UI was going to be a nightmare because their search feature is a challenge. And not in the "what doesn't kill you makes you stronger" way. It was more like this:
Source: This LifeHacker AUS article. I'll admit to not verifying the quote because it fits my needs here.
I'll show you what I mean, and then I'll show you how I used the Jenkins CLI (remember that?) to get around it.
Searching ... Searching ... Searching ...
You can see the names of the plugins, as Jenkins understands them, in /var/lib/jenkins/plugins
. One of the plugins we have is cloudbees-folder
so let's try to find that with the UI and install it.
Searching for "folder" by itself was really generic, so I tried to search for "cloudbees". It was also unhelpful. I did happen to notice, though, the URL for the plugins is actually linked to the download directory:
You can click on this image for full size.
This is somewhat helpful as it shows me to go to http://updates.jenkins-ci.org/download/plugins/ for the download list. Here, the plugins appear the same as they do once installed rather than the "friendly" or "long" names that they are given.
Which is great and all, and sure this is a lot easier to manage than that hot mess of a search, but how to install them?
Source: Emojipedia
Jenkins CLI to save the day
It is at this point that I remember the only thing stopping me from using the CLI before was that Jenkins was so hosed it wouldn't even talk to it.
That isn't the case now though. So:
→ wget -O ~/jenkins-cli.jar https://${JENKINS_PUBLIC_URL}/jnlpJars/jenkins-cli.jar
Swap out your Jenkins instance's route or public IP for ${JENKINS_PUBLIC_URL}
and you're in business.
Once I have the CLI, I run it against the local Jenkins and just supply help
to see if it has a help page:
→ java -jar jenkins-cli.jar -s http://127.0.0.1:8080/ help
ERROR: You must authenticate to access this Jenkins.
Jenkins CLI
Usage: java -jar jenkins-cli.jar [-s URL] command [opts...] args...
Options:
...
Ah, ok. I need to auth. This Jenkins doesn't have Github Oauth enabled, so I'm still using a username and password. This actually makes my CLI life easy, so I'm going to leave that alone and just run it with my username and password like so:
→ java -jar jenkins-cli.jar --username quintessence --password █████████████ -s http://127.0.0.1:8080/ help
Neither -s nor the JENKINS_URL env var is specified.
Jenkins CLI
Usage: java -jar jenkins-cli.jar [-s URL] command [opts...] args...
Options:
-s URL : the server URL (defaults to the JENKINS_URL env var)
{{{ SNIP }}}
→ export JENKINS_URL=http://127.0.0.1:8080/
→ java -jar jenkins-cli.jar help --username quintessence --password █████████████
add-job-to-view
Adds jobs to view.
build
Builds a job, and optionally waits until its completion.
cancel-quiet-down
Cancel the effect of the "quiet-down" command.
clear-queue
{{{ SNIP }}}
Now to test with cloudbees-folder
:
→ java -jar jenkins-cli.jar install-plugin cloudbees-folder --username quintessence --password █████████████
Installing cloudbees-folder from update center
→ sudo service jenkins restart
Shutting down Jenkins [ OK ]
Starting Jenkins [ OK ]
→ sudo ls /var/lib/jenkins/plugins/cloud*
/var/lib/jenkins/plugins/cloudbees-folder.jpi
/var/lib/jenkins/plugins/cloudbees-folder:
images META-INF WEB-INF
→ sudo cat /var/lib/jenkins/plugins/cloudbees-folder/META-INF/MANIFEST.MF
Manifest-Version: 1.0
Archiver-Version: Plexus Archiver
Created-By: Apache Maven
Built-By: jglick
Build-Jdk: 1.8.0_151
Extension-Name: cloudbees-folder
Specification-Title: This plugin allows users to create "folders" to o
rganize jobs. Users can define custom taxonomies (like
by project type, organization type etc). Folders are nestable and
you can define views within folders. Maintained by CloudBees, Inc.
Implementation-Title: cloudbees-folder
Implementation-Version: 6.3
Group-Id: org.jenkins-ci.plugins
Short-Name: cloudbees-folder
Long-Name: Folders Plugin
Url: https://wiki.jenkins.io/display/JENKINS/CloudBees+Folders+Plugin
Compatible-Since-Version: 5.2
Plugin-Version: 6.3
Hudson-Version: 2.60.3
Jenkins-Version: 2.60.3
Plugin-Dependencies: credentials:2.1.11;resolution:=optional
Plugin-Developers:
You can click the image to view full size, note that the path is now to the plugin doc page.
Here we learn a few things:
- Plugin is installed
- "cloudbees-folder" is what is called the "short name"
- "Folders Plugin" is what's called the "long name", which is what the search feature uses. Not helpful if you only know this name after you find your plugin though.
It works! Hooray!
Now for for the rest.
First get the complete list of plugins from the /var/lib/jenkins/plugins
directory of the defunct Jenkins instance.
→ cd /defunct-jenkins/var/lib/jenkins/plugins
→ ls -d */
ace-editor/ blueocean-github-pipeline/ cloudbees-folder/ git-changelog/ jenkins-multijob-plugin/ parameterized-trigger/ resource-disposer/ warnings/
amazon-ecr/ blueocean-git-pipeline/ cobertura/ git-client/ jira/ performance/ run-condition/ windows-slaves/
analysis-core/ blueocean-i18n/ codedeploy/ github/ jira-ext/ phabricator-plugin/ runscope/ workflow-aggregator/
ansicolor/ blueocean-jira/ command-launcher/ github-api/ jquery-detached/ pipeline-build-step/ saferestart/ workflow-api/
ant/ blueocean-jwt/ conditional-buildstep/ github-branch-source/ jsch/ pipeline-github-lib/ sauce-ondemand/ workflow-basic-steps/
antisamy-markup-formatter/ blueocean-personalization/ credentials/ github-oauth/ junit/ pipeline-graph-analysis/ schedule-build/ workflow-cps/
apache-httpcomponents-client-4-api/ blueocean-pipeline-api-impl/ credentials-binding/ github-organization-folder/ ldap/ pipeline-input-step/ scm-api/ workflow-cps-global-lib/
authentication-tokens/ blueocean-pipeline-editor/ cvs/ github-pr-comment-build/ liquibase-runner/ pipeline-milestone-step/ script-security/ workflow-durable-task-step/
aws-credentials/ blueocean-pipeline-scm-api/ disk-usage/ github-pullrequest/ mailer/ pipeline-model-api/ slack/ workflow-job/
aws-java-sdk/ blueocean-rest/ display-url-api/ git-server/ mapdb-api/ pipeline-model-declarative-agent/ sse-gateway/ workflow-multibranch/
BlazeMeterJenkinsPlugin/ blueocean-rest-impl/ docker-commons/ git-userContent/ matrix-auth/ pipeline-model-definition/ ssh/ workflow-scm-step/
blueocean/ blueocean-web/ docker-workflow/ greenballs/ matrix-project/ pipeline-model-extensions/ ssh-agent/ workflow-step-api/
blueocean-autofavorite/ bouncycastle-api/ durable-task/ handlebars/ maven-plugin/ pipeline-rest-api/ ssh-credentials/ workflow-support/
blueocean-bitbucket-pipeline/ branch-api/ envinject/ handy-uri-templates-2-api/ memegen/ pipeline-stage-step/ ssh-slaves/ ws-cleanup/
blueocean-commons/ build-environment/ envinject-api/ htmlpublisher/ mercurial/ pipeline-stage-tags-metadata/ structs/
blueocean-config/ build-monitor-plugin/ external-monitor-job/ icon-shim/ metrics/ pipeline-stage-view/ subversion/
blueocean-core-js/ build-timeout/ favorite/ jackson2-api/ momentjs/ plain-credentials/ token-macro/
blueocean-dashboard/ built-on-column/ feature-branch-notifier/ jacoco/ multi-branch-project-plugin/ port-allocator/ translation/
blueocean-display-url/ chucknorris/ ghprb/ javadoc/ multiple-scms/ postbuild-task/ variant/
blueocean-events/ cloudbees-bitbucket-branch-source/ git/ jenkins-design-language/ pam-auth/ pubsub-light/ violations/
Don't forget to scroll sideways...
At this point you may also realize "Oh great, all I need to do is run this":
java -jar jenkins-cli.jar install-plugin ${PLUGIN_NAME} --username quintessence --password █████████████
....for all of these?
Source: IconArchive
Itty Bitty Shell Script Saves the Day
Well, as they say: play to your strengths.
I don't know about you, but one of my strengths is using shell scripts of any size to save my sanity. Even better in this case as minimal typing required and, true to magical fashion, only a little vim
wizardry is required.
To get started, copy all of the plugins on the above list into a new file in vim
and run three commands:
:%s/\//\r/g
:%s/^\s\+//e
:g/^$/d
These commands will:
- Change all of the trailing
/
characters to new lines (\r
is carriage return) - Replace all the whitespaces (
\s
) at the beginning of each line (^
) with nothing, eliminating them. - Replace all the lines that only have a start and end of line with nothing, eliminating them.
If you used my plugin list to practice your vim
fu you, like I at this point, realize that I have 154 plugins to install. 153, if you don't count the one that's already there.
Keep that file open and run these three commands:
:%s/^/"/
:%s/$/" /
:%s/\n//
These commands will:
- Add a
"
to the beginning of each line - Add a
"
to the end of each line (don't neglect the space here!)- But if you do, run
:%s/$/ /
, which will add a space to the end of each line
- But if you do, run
- This last one deletes the newline character, so you get a nice handy blob
You can use that blob to make an array, and then loop through that array like follows:
#!/bin/bash
PLUGIN_LIST=( "ace-editor" "blueocean-github-pipeline" "cloudbees-folder" "git-changelog" "jenkins-multijob-plugin" "parameterized-trigger" "resource-disposer" "warnings" "amazon-ecr" "blueocean-git-pipeline" "cobertura" "git-client" "jira" "performance" "run-condition" "windows-slaves" "analysis-core" "blueocean-i18n" "codedeploy" "github" "jira-ext" "phabricator-plugin" "runscope" "workflow-aggregator" "ansicolor" "blueocean-jira" "command-launcher" "github-api" "jquery-detached" "pipeline-build-step" "saferestart" "workflow-api" "ant" "blueocean-jwt" "conditional-buildstep" "github-branch-source" "jsch" "pipeline-github-lib" "sauce-ondemand" "workflow-basic-steps" "antisamy-markup-formatter" "blueocean-personalization" "credentials" "github-oauth" "junit" "pipeline-graph-analysis" "schedule-build" "workflow-cps" "apache-httpcomponents-client-4-api" "blueocean-pipeline-api-impl" "credentials-binding" "github-organization-folder" "ldap" "pipeline-input-step" "scm-api" "workflow-cps-global-lib" "authentication-tokens" "blueocean-pipeline-editor" "cvs" "github-pr-comment-build" "liquibase-runner" "pipeline-milestone-step" "script-security" "workflow-durable-task-step" "aws-credentials" "blueocean-pipeline-scm-api" "disk-usage" "github-pullrequest" "mailer" "pipeline-model-api" "slack" "workflow-job" "aws-java-sdk" "blueocean-rest" "display-url-api" "git-server" "mapdb-api" "pipeline-model-declarative-agent" "sse-gateway" "workflow-multibranch" "BlazeMeterJenkinsPlugin" "blueocean-rest-impl" "docker-commons" "git-userContent" "matrix-auth" "pipeline-model-definition" "ssh" "workflow-scm-step" "blueocean" "blueocean-web" "docker-workflow" "greenballs" "matrix-project" "pipeline-model-extensions" "ssh-agent" "workflow-step-api" "blueocean-autofavorite" "bouncycastle-api" "durable-task" "handlebars" "maven-plugin" "pipeline-rest-api" "ssh-credentials" "workflow-support" "blueocean-bitbucket-pipeline" "branch-api" "envinject" "handy-uri-templates-2-api" "memegen" "pipeline-stage-step" "ssh-slaves" "ws-cleanup" "blueocean-commons" "build-environment" "envinject-api" "htmlpublisher" "mercurial" "pipeline-stage-tags-metadata" "structs" "blueocean-config" "build-monitor-plugin" "external-monitor-job" "icon-shim" "metrics" "pipeline-stage-view" "subversion" "blueocean-core-js" "build-timeout" "favorite" "jackson2-api" "momentjs" "plain-credentials" "token-macro" "blueocean-dashboard" "built-on-column" "feature-branch-notifier" "jacoco" "multi-branch-project-plugin" "port-allocator" "translation" "blueocean-display-url" "chucknorris" "ghprb" "javadoc" "multiple-scms" "postbuild-task" "variant" "blueocean-events" "cloudbees-bitbucket-branch-source" "git" "jenkins-design-language" "pam-auth" "pubsub-light" "violations" )
for PLUGIN in "${PLUGIN_LIST[@]}"
do
# echo "Plugin name: ${PLUGIN}"
java -jar jenkins-cli.jar install-plugin ${PLUGIN} --username quintessence --password █████████████
done
exit 0
Note that I only needed to add a few lines around the big blob of text, most of our savings here are vim
manipulations to change a directory list to a useful blob. The echo
line is for you to test printing out the plugin names if you would like - just comment out the jenkins-cli
line.
Now for the mass plugin install. Fingers crossed.
→ chmod +x plugin-install.sh
→ ./plugin-install.sh
Installing ace-editor from update center
Installing blueocean-github-pipeline from update center
Installing cloudbees-folder from update center
Installing git-changelog from update center
Installing jenkins-multijob-plugin from update center
Installing parameterized-trigger from update center
Installing resource-disposer from update center
Installing warnings from update center
Installing amazon-ecr from update center
Installing blueocean-git-pipeline from update center
Installing cobertura from update center
...
Source: IconArchive
The Moment of Truth
As you recall me mentioning more than once, rsync
ing these directories was a time consuming affair. On the order of hours. But I've had a moment of inspiration: what if I just put the jobs
and workspaces
directories in to the working Jenkins instead? Since copying the plugins
directory into borked Jenkins didn't de-bork it.
To test this a little faster and saner, since I have the jenkins-defunct
volume attached and mounted to the new Jenkins, I decided to test this by creating symlinks to the defunct Jenkins' jobs
and workspaces
directories.
Important note: This is not production ready, please do not do this in production. This is a drill.
Now to continue: I'm going to both backup the empty jobs
directory of the bare Jenkins install as well as its whole HOME directory so, if all else fails, I can swiftly get the bare install back. Then I'm going to make the symlinks.
→ sudo mkdir JENKINS_BARE_v2.104_BKP_WITH_PLUGINS
→ sudo cp -r /var/lib/jenkins JENKINS_BARE_v2.104_BKP_WITH_PLUGINS/
→ sudo service jenkins stop
Shutting down Jenkins [ OK ]
→ sudo mv /var/lib/jenkins/jobs{,--bkp}
→ sudo ln -s /jenkins-ci-old/var/lib/jenkins/jobs /var/lib/jenkins/jobs
→ sudo ls -lh /var/lib/jenkins/job*
lrwxrwxrwx 1 root root 36 Feb 2 19:21 /var/lib/jenkins/jobs -> /jenkins-defunct/var/lib/jenkins/jobs
/var/lib/jenkins/jobs--bkp:
total 0
→ sudo ln -s /jenkins-defunct/var/lib/jenkins/workspace /var/lib/jenkins/workspace
→ sudo ls -lh /var/lib/jenkins/work*
lrwxrwxrwx 1 root root 41 Feb 2 19:22 /var/lib/jenkins/workspace -> /jenkins-defunct/var/lib/jenkins/workspace
Note: there was no existing workspaces
directory as that involves a plugin that we use / that was just installed.
Now.
to.
Restart.
Jenkins.
Source: IconArchive
MOMENT OF TRUTH
Source: IconArchive
As a bit of a throwback: those two hanging jobs are what, if you still remember the top of this post, inspired the rollback and caused all this drama.
What to do: Resolution
Of course since this works that means that I need to unlink the symlinks and rsync
the actual data where it belongs. This is a short section by word count, but it took it's 2-3 hours to do. To unlink:
→ sudo unlink /var/lib/jenkins/jobs
→ sudo unlink /var/lib/jenkins/workspace
And now for the rsync
. I didn't mention it directly before, but the reason I was able to get up and walk away, open other sessions with ease, etc. is because I was using tmux
sessions. You may have noticed that was one of the "packages I like to install" above. This is why.
→ tmux new -s rsync
Here's a tmux cheatsheet If you're new to tmux
. If you're using the env
I cloned above, there is a tmux
configuration in there and to create new windows you'll use control+a+c. If you're using the default / not that env
, then I believe the default is control+b+c.
In one window:
sudo rsync -a /jenkins-defunct/var/lib/jenkins/jobs /var/lib/jenkins/jobs
And in the other:
sudo rsync -a /jenkins-defunct/var/lib/jenkins/workspace /var/lib/jenkins/workspace
And then you wait.
As a quick tip: I mentioned above that I had initially made this instance with a GP2 type SSD in AWS. In hindsight, it would have been nice to have had IO1 and then it would have made more sense to up the instance type to something beefier, at least just for the transfer, so it'd have been less slow. There's no way to change a volume from GP2 to IO1, though, so I would have needed to snapshot and recreate with a new instance. Alas.
I can verify that after the rsync
completed that Jenkins booted up successfully. Let's take another look at that sweet, sweet image.
Source: IconArchive
Github Oauth Note
When I was flipping the routes around to swap the new Jenkins to production, I noticed it kept trying to preserve the old route. There are a couple of ways to add the Jenkins route. One is in the UI, if it's working. To do that go to Manage Jenkins -> Configure System and scroll to the Jenkins location section:
The other place to do it is by editing the /var/lib/jenkins/jenkins.model.JenkinsLocationConfiguration.xml
.
I verified that the URL was correctly set in both of these places; however, Jenkins kept bouncing back to the jenkins-dev.example.com
route I had made for it and it also popped a notification that I had a broken reverse proxy. So what gives?
Apparently the culprit was Github Oauth. When you configure the Oauth app in Github it looks like this:
The fields indicated with arrows were using the old route, so Jenkins kept redirecting due to auth.
Post Mortem: Adding Resiliency and What Not
So one of the reasons I ended up in this mess is the lack of backups for a single point of Jenkins shaped failure. There were also some undocumented dependencies and a few other pain points that were uncovered as we did the first round of jobs. There are actually enough points here that I'm splitting this portion into it's own post, which will be released shortly.
Documented on my frequently used assets page.
Sources for header: Jenkins logo and Jenkins art: Fire from Jenkins site, Health Potion by adorabless @ DeviantArt, and a curved arrow from FreePik. Fiery background is from Shutterstock user Bernatskaya Oxana.