Opsfire: Recovering Jenkins after Complete Failure

Hold on to your seats

Cause this is going to be a bit of a ride. This is a tale of explosions, defeat, perserverance, and ultimate victory.

I'm giving you a spoiler for the ultimate victory because, like all situations of prolonged pain, there were several points that I didn't think we were going to get there.

Hooked? Here's how it began

We were having an issue where some of our Jenkins jobs were hanging on the git clone. This is something relatively new that started happening this week, so after manually killing and kicking a couple of jobs until they Just Worked (already forgetting the HTML/PEM file lesson) I decided to do a rollback.

This wasn't too huge of a deal, really. We use the weekly Jenkins builds, and I keep the previous week's build handy, so:

sudo service jenkins stop
sudo cp ~/backup/jenkins-v2.103.war /usr/lib/jenkins/jenkins.war
sudo service jenkins start

Everything came up normally, except for the fact that the jobs still hung on git clone. Oh well. Rinse repeat the above, replace jenkins-v2.103.war with jenkins-v2.104.war. Everything came back live, nothing to see here.

A few of the plugins were upgraded this week as well. Since one of them was the Github plugin, and the issue was with git clone, I figured I'd try to roll back that first.

fire-horizontal-rule

Github OAuth Puke
Feel free to click so you can view this insanity in all its glory.

fire-horizontal-rule

Oh no, oh dear god in heaven
Source: Giphy, Firefly

But wait, what?

It's worth now noting what fired through my brain in rapid succession:

  1. We use Github Oauth
  2. This appears to be something with Github core
  3. How could could a plugin rollback do this
  4. I knew this box was fragile, I should have snapshotted it before
  5. Is there a snapshot?
  6. ...Of course the last one is from freaking December.

What to do: Act 1, Scenes 1-3

My first instinct, as any battle hardened person will tell you, was

to

Google

everything.

Filtering out the seemingly unending volumes of advice about how to roll plugin versions forward and backward, using the UI thank you very much for that, I found some advice about how to install plugins using their CLI.

YIL that Jenkins has a CLI.

I went to find where to download it, but lo: you need a working version of Jenkins to download the CLI. And the version of the CLI is dependent on the Jenkins release version as well. And you can only download it from using your actual Jenkins install, e.g.

wget -O /desired/path/to/jenkins-cli.war https://${JENKINS_URL}/jnlpJars/jenkins-cli.jar

You can also go to the latter path in your browser if your installation is up and running.

Which mine was not.

Oh! I know! I'll make a fresh Jenkins box with the same version and download the CLI from there!

I'm not going to detail this part for you (yet), but stay tuned. I got the CLI, but primary Jenkins was so hosed that pointing the CLI at it threw a Java exception. Even if this hadn't happened, though, I was reminded when I created a sandbox Jenkins that I probably still would have run into difficulties without the exception since Github auth was still enabled and I would have needed to disable it or create a token for the CLI. Which I would have needed access to the UI to do. (Jenkins is really reliant on having that UI up and running.)

It's worth mentioning that while all that was going on I had concurrently attempted to spin up a new instance using an AMI I had made from the December image, but when I tried to start Jenkins on that instance it also died and threw exceptions. Not in the web browser, in the browser I was just presented with a lovely site unreachable error. I sshed into the instance to see Jenkins' logs (/var/log/jenkins/jenkins.log) and there were several Java exceptions everywhere. Also, importantly, errors referencing missing jobs.

annie-gunnerkrigg-cool-beans
Source: Gunnerkrigg Court. It's an awesome web comic.

It was at this point I practiced some deep breathing.

What to do: Act 2

So I was half hoping that at least some of the exceptions could be handled by copying over the jobs from the dying dying dead Jenkins to the Dec Jenkins. I did this by shutting off the dead one's EC2 instance, detaching the volume, and attaching it to the December Jenkins instance. Amazon has how to attach a volume in the console documented very well. It's worth noting you use essentially the same process, when the instances in question are powered off, to detach the volume.

I think it's important to clarify again, since you may encounter instructions of other ways to unmount disk volumes while a system is powered on, e.g. this EBS doc by Amazon, that in this case those instructions will not work as you can't (safely) detach the root and only volume from a running system.

Anywho, after the volume is attached then just create the mount point, get volume list, mount volume:

→ sudo mkdir /jenkins-defunct

→ lsblk
NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
xvda    202:0    0  512G  0 disk
└─xvda1 202:1    0  512G  0 part /
xvdf    202:2    0  512G  0 disk
└─xvdf1 202:3    0  512G  0 part /

→ sudo mount /dev/xvdf1 /jenkins-defunct

Breathing. Ok.

Now it's time to rsync.

→ sudo mv /var/lib/jenkins/jobs{,--bkp}
→ sudo rsync -a /jenkins-defunct/var/lib/jenkins/jobs /var/lib/jenkins/jobs

And now we wait.

And wait.

fire-horizontal-rule

*** Skipping any contents from this failed directory ***
rsync: recv_generator: mkdir "/var/lib/jenkins/jobs/jobs/${SOME_PLACEHOLDER_JOB}/workspace" failed: No space left on device (28)
*** Skipping any contents from this failed directory ***
rsync: mkstemp "/var/lib/jenkins/jobs/jobs/${SOME_PLACEHOLDER_JOB}/.config.xml.dovJOF" failed: No space left on device (28)
rsync: mkstemp "/var/lib/jenkins/jobs/jobs/${SOME_PLACEHOLDER_JOB}/.disk-usage.xml.xnYWQ8" failed: No space left on device (28)
rsync: mkstemp "/var/lib/jenkins/jobs/jobs/${SOME_PLACEHOLDER_JOB}/.github-polling.log.RVYdTB" failed: No space left on device (28)
rsync: mkstemp "/var/lib/jenkins/jobs/jobs/${SOME_PLACEHOLDER_JOB}/.nextBuildNumber.EtLAV4" failed: No space left on device (28)
rsync: recv_generator: mkdir "/var/lib/jenkins/jobs/jobs/${SOME_PLACEHOLDER_JOB}/workspace@tmp" failed: No space left on device (28)
*** Skipping any contents from this failed directory ***
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1039) [sender=3.0.6]

After a couple hours of seemingly silent, copying bliss I was abruptedly introduced to an unhumanly countable number of lines like that.

But wait, I filled up the whole drive? With jobs?

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda1      504G  276G  229G  55% /
devtmpfs        7.9G   64K  7.9G   1% /dev
tmpfs           7.9G     0  7.9G   0% /dev/shm
/dev/xvdf1      504G  258G  247G  52% /jenkins-defunct

I am using... half the drive. What.

One of my coworker's offered to ssh in at this point to see if he saw anything.

But he could not, he was given an out of disk error.

I looked at CloudWatch metrics for the alarm and thankfully df didn't lie there: using half disk.

What gives?

inodes

The short version, since they aren't the focus here, is that inodes store file and directory metadata on *nix filesystems. To the passerby, they become important when you have a ton of tiny files, as Jenkins clearly does, and you run out:

$ df -ih
Filesystem     Inodes IUsed IFree IUse% Mounted on
/dev/xvda1        32M   32M     0  100% /
devtmpfs         2.0M   477  2.0M    1% /dev
tmpfs            2.0M     1  2.0M    1% /dev/shm
/dev/xvdf1        32M   27M  5.7M   83% /jenkins-defunct

(If you'd like to read more on what inodes are, please check out this Linux Magazine article.)

Of course I ran out of inodes.

The quickest way to address this problem is to resize the disk. This is the option I went with, since this is production Jenkins and we're now several hours into this outage.

Since this is a short term solution, I stopped the instance, resized the root volume to 2 TB, rebooted, and restarted the rsync.

By the way, here is an example of why to rsync rather than cp: the latter would just completely start over and write over what it had already done as needed, taking more time, whereas rsync will pick up where it left off.

That said, it still took another hour or two for the sync to complete.

While this was going on, I was pondering my next move. Also: getting really tired as it was late now.

zzz-horizontal-rule

What to do: Act 3, all the climatic scenes

I read a bit more on Jenkins admin. Since everything should be hypothetically recoverable from JENKINS_HOME, I decided to try re-installing all the plugins on a new Jenkins instance (don't ask me how many I have running now), and the copy that's instances /var/lib/jenkins/plugins directory back to the original.

This ultimately ended in failure, but I ended up with a very useful shell script that allowed me to install plugins very quickly.

Another spoiler: vim tricks are super handy.

Spinning up Jenkins on a Linux AMI

I'm going to go a little detailed here for those new to installing Jenkins.

The instance configuration:

  • Instance Type: m5.xlarge
  • Security: security group with ports 22, 80, 8080, 443, and 4443 are open / accessible on our VPC and via VPN.
  • EBS type: 512 GB GP2
    • In hindsight, though, I recommend using IO1. Cost is similar and would help speed up rsync later.
  • AMI: Amazon Linux AMI 2017.09.1 (HVM)

ssh into the instance and:

  • Update
  • Install some basic tools
  • Create a group
  • Use visudo to enable passwordless sudo
  • Create your user
  • Add your user to the group
  • Switch to the user account
  • Install a handy prompt and useful rc files

Here we go.

$ sudo yum upgrade -y
$ sudo yum install git tmux tree htop ack unzip -y
$ sudo groupadd admin
$ sudo useradd quintessence
$ sudo usermod -aG admin quintessence
$ sudo EDITOR=vim visudo
$ sudo su - quintessence
Last login: Fri Feb  2 16:49:01 UTC 2018 on pts/0
[quintessence@ip-███-███-███-███ ~]$ sudo ls /etc/
acpi               blkid                      csh.cshrc
{{{ snip }}}

[quintessence@ip-███-███-███-███ ~]$ git clone https://github.com/jhunt/env
Cloning into 'env'...
remote: Counting objects: 713, done.
remote: Total 713 (delta 0), reused 0 (delta 0), pack-reused 713
Receiving objects: 100% (713/713), 128.96 KiB | 18.42 MiB/s, done.
Resolving deltas: 100% (419/419), done.
[quintessence@ip-███-███-███-███ ~]$ cd env/
[quintessence@ip-███-███-███-███ env]$ ./install
setting up dot files in ~
configuring vim...
copying in ~/bin scripts...
  installing jq...
  installing spruce (v1.8.2)...
configuring git...
setting up ~/.bashrc...
hostname: No address associated with name
[quintessence@ip-███-███-███-███ env]$ cd ..
[quintessence@ip-███-███-███-███ ~]$ vim .host
[quintessence@ip-███-███-███-███ ~]$ source ~/.bashrc
+033+16:52:54:8:0 ███.███.███.███/20 quintessence@jenkins ~
→  

For visudo, here's the magic line to allow admin group passwordless sudo:

%admin        ALL=(ALL)       NOPASSWD: ALL

As a quick aside: the .host file is used by the prompt to display the hostname if none is set, which is the case for this dev instance. Right now it just has jenkins in it and that's what you see in the prompt above. For the remainder of the blog post when you see , that's just part of my prompt.

Now for the Jenkins install

I'm going to use yum to install the last stable release of Jenkins to make sure that loads. Here's the needful, spaced out with output:

→  sudo wget -O /etc/yum.repos.d/jenkins.repo http://pkg.jenkins-ci.org/redhat-stable/jenkins.repo
--2018-02-02 16:53:25--  http://pkg.jenkins-ci.org/redhat-stable/jenkins.repo
Resolving pkg.jenkins-ci.org (pkg.jenkins-ci.org)... 52.202.51.185
Connecting to pkg.jenkins-ci.org (pkg.jenkins-ci.org)|52.202.51.185|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 85
Saving to: ‘/etc/yum.repos.d/jenkins.repo’

/etc/yum.repos.d/jenkins.repo                                       100%[===================================================================================================================================================================>]      85  --.-KB/s    in 0s

2018-02-02 16:53:26 (30.9 MB/s) - ‘/etc/yum.repos.d/jenkins.repo’ saved [85/85]



→  sudo rpm --import http://pkg.jenkins-ci.org/redhat-stable/jenkins-ci.org.key



→  sudo yum install jenkins -y
Loaded plugins: priorities, update-motd, upgrade-helper
amzn-main                                                                                                                                                                                                                                                | 2.1 kB  00:00:00
amzn-updates                                                                                                                                                                                                                                             | 2.5 kB  00:00:00
jenkins                                                                                                                                                                                                                                                  | 2.9 kB  00:00:00
jenkins/primary_db                                                                                                                                                                                                                                       |  23 kB  00:00:00
Resolving Dependencies
--> Running transaction check
---> Package jenkins.noarch 0:2.89.3-1.1 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

================================================================================================================================================================================================================================================================================
 Package                                                          Arch                                                            Version                                                                Repository                                                        Size
================================================================================================================================================================================================================================================================================
Installing:
 jenkins                                                          noarch                                                          2.89.3-1.1                                                             jenkins                                                           71 M

Transaction Summary
================================================================================================================================================================================================================================================================================
Install  1 Package

Total download size: 71 M
Installed size: 71 M
Downloading packages:
jenkins-2.89.3-1.1.noarch.rpm                                                                                                                                                                                                                            |  71 MB  00:00:01
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : jenkins-2.89.3-1.1.noarch                                                                                                                                                                                                                                    1/1
  Verifying  : jenkins-2.89.3-1.1.noarch                                                                                                                                                                                                                                    1/1

Installed:
  jenkins.noarch 0:2.89.3-1.1

Complete!

When I ran sudo service jenkins start to start Jenkins, I received the following because Jenkins needs Java 8 and apparently Amazon Linux is shipping with Java 7:

→  sudo service jenkins start
Starting Jenkins Jenkins requires Java8 or later, but you are running 1.7.0_161-mockbuild_2017_12_19_23_46-b00 from /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.161.x86_64/jre
java.lang.UnsupportedClassVersionError: 51.0
        at Main.main(Main.java:124)
                                                           [  OK  ]

I'm going to remove Java 7 and install Java 8 (output not included for the yum commands):

→  sudo yum remove java-1.7.0-openjdk -y
→  sudo yum install java-1.8.0 -y
→  sudo service jenkins start
Starting Jenkins                                           [  OK  ]

I'm also going to add the jenkins service to start on boot:

→  sudo chkconfig jenkins on

Ok, now that I have all of that: does bare Jenkins load?

Bare Jenkins Landing Page

Yes.

This the first moment of relief that I've had in hours.

relief-emoji
Source: IconArchive

As a result, I breezed through the next bit:

  • Unlocked grabbing the initial admin password as indicated on the splash page
  • Selected to have Jenkins Install the Recommended Plugins
  • Set up my Jenkins username + password (also part of the setup wizard)
    • Made sure my Jenkins username matched my Github username to prevent redundancy when hooking up Github Oauth
  • Made sure Jenkins loaded

Jenkins v2.89.3
You can click on this image for full size.

This is the happy time. So happy, I'm gonna use that emoji again:

relief-emoji

Ok, now to stop the service, update to the latest weekly build so it matches the desired environment, and then restart the bare Jenkins. Not anticipating any issues since this is still a bare environment. Skipping the step where I download the war file:

→  ls
bin  code  env  jenkins-v2.104.war

→  sudo mv /usr/lib/jenkins/jenkins.war{,_old}

→  sudo cp jenkins-v2.104.war /usr/lib/jenkins/jenkins.war

→  sudo ls /usr/lib/jenkins/
jenkins.war  jenkins.war_old

→  sudo service jenkins restart

On restart this looks mostly the same:

Jenkins v2.104
You can click on this image for full size.

With a crucial difference:

Version confirmation

At this point, I think we can upgrade to a full on smile:

smile-flush-emoji
Source: IconArchive

Plugin installs

I discovered very quickly that installing these plugins via the UI was going to be a nightmare because their search feature is a challenge. And not in the "what doesn't kill you makes you stronger" way. It was more like this:

Don't waste time on fake work
Source: This LifeHacker AUS article. I'll admit to not verifying the quote because it fits my needs here.

I'll show you what I mean, and then I'll show you how I used the Jenkins CLI (remember that?) to get around it.

Searching ... Searching ... Searching ...

You can see the names of the plugins, as Jenkins understands them, in /var/lib/jenkins/plugins. One of the plugins we have is cloudbees-folder so let's try to find that with the UI and install it.

No results for cloudbees-folder

Searching for "folder" by itself was really generic, so I tried to search for "cloudbees". It was also unhelpful. I did happen to notice, though, the URL for the plugins is actually linked to the download directory:

Take a look at the plugin path
You can click on this image for full size.

This is somewhat helpful as it shows me to go to http://updates.jenkins-ci.org/download/plugins/ for the download list. Here, the plugins appear the same as they do once installed rather than the "friendly" or "long" names that they are given.

Which is great and all, and sure this is a lot easier to manage than that hot mess of a search, but how to install them?

thinking-emoji
Source: Emojipedia

Jenkins CLI to save the day

It is at this point that I remember the only thing stopping me from using the CLI before was that Jenkins was so hosed it wouldn't even talk to it.

That isn't the case now though. So:

→  wget -O ~/jenkins-cli.jar  https://${JENKINS_PUBLIC_URL}/jnlpJars/jenkins-cli.jar

Swap out your Jenkins instance's route or public IP for ${JENKINS_PUBLIC_URL} and you're in business.

Once I have the CLI, I run it against the local Jenkins and just supply help to see if it has a help page:

→  java -jar jenkins-cli.jar -s http://127.0.0.1:8080/ help

ERROR: You must authenticate to access this Jenkins.
Jenkins CLI
Usage: java -jar jenkins-cli.jar [-s URL] command [opts...] args...
Options:
...

Ah, ok. I need to auth. This Jenkins doesn't have Github Oauth enabled, so I'm still using a username and password. This actually makes my CLI life easy, so I'm going to leave that alone and just run it with my username and password like so:

→  java -jar jenkins-cli.jar --username quintessence --password █████████████ -s http://127.0.0.1:8080/ help
Neither -s nor the JENKINS_URL env var is specified.
Jenkins CLI
Usage: java -jar jenkins-cli.jar [-s URL] command [opts...] args...
Options:
-s URL       : the server URL (defaults to the JENKINS_URL env var)
{{{ SNIP }}}

→  export JENKINS_URL=http://127.0.0.1:8080/

→  java -jar jenkins-cli.jar help --username quintessence --password █████████████
  add-job-to-view
    Adds jobs to view.
  build
    Builds a job, and optionally waits until its completion.
  cancel-quiet-down
    Cancel the effect of the "quiet-down" command.
  clear-queue
  {{{ SNIP }}}

Now to test with cloudbees-folder:

→  java -jar jenkins-cli.jar install-plugin cloudbees-folder --username quintessence --password █████████████
Installing cloudbees-folder from update center

→  sudo service jenkins restart
Shutting down Jenkins                                      [  OK  ]
Starting Jenkins                                           [  OK  ]

→  sudo ls /var/lib/jenkins/plugins/cloud*
/var/lib/jenkins/plugins/cloudbees-folder.jpi

/var/lib/jenkins/plugins/cloudbees-folder:
images  META-INF  WEB-INF

→  sudo cat /var/lib/jenkins/plugins/cloudbees-folder/META-INF/MANIFEST.MF
Manifest-Version: 1.0
Archiver-Version: Plexus Archiver
Created-By: Apache Maven
Built-By: jglick
Build-Jdk: 1.8.0_151
Extension-Name: cloudbees-folder
Specification-Title: This plugin allows users to create "folders" to o
 rganize jobs. Users can define custom taxonomies (like
     by project type, organization type etc). Folders are nestable and
  you can define views within folders. Maintained by CloudBees, Inc.
Implementation-Title: cloudbees-folder
Implementation-Version: 6.3
Group-Id: org.jenkins-ci.plugins
Short-Name: cloudbees-folder
Long-Name: Folders Plugin
Url: https://wiki.jenkins.io/display/JENKINS/CloudBees+Folders+Plugin
Compatible-Since-Version: 5.2
Plugin-Version: 6.3
Hudson-Version: 2.60.3
Jenkins-Version: 2.60.3
Plugin-Dependencies: credentials:2.1.11;resolution:=optional
Plugin-Developers:

The plugin is installed
You can click the image to view full size, note that the path is now to the plugin doc page.

Here we learn a few things:

  • Plugin is installed
  • "cloudbees-folder" is what is called the "short name"
  • "Folders Plugin" is what's called the "long name", which is what the search feature uses. Not helpful if you only know this name after you find your plugin though.

It works! Hooray!

Now for for the rest.

First get the complete list of plugins from the /var/lib/jenkins/plugins directory of the defunct Jenkins instance.

→  cd /defunct-jenkins/var/lib/jenkins/plugins
→  ls -d */
ace-editor/                          blueocean-github-pipeline/          cloudbees-folder/         git-changelog/               jenkins-multijob-plugin/      parameterized-trigger/             resource-disposer/  warnings/
amazon-ecr/                          blueocean-git-pipeline/             cobertura/                git-client/                  jira/                         performance/                       run-condition/      windows-slaves/
analysis-core/                       blueocean-i18n/                     codedeploy/               github/                      jira-ext/                     phabricator-plugin/                runscope/           workflow-aggregator/
ansicolor/                           blueocean-jira/                     command-launcher/         github-api/                  jquery-detached/              pipeline-build-step/               saferestart/        workflow-api/
ant/                                 blueocean-jwt/                      conditional-buildstep/    github-branch-source/        jsch/                         pipeline-github-lib/               sauce-ondemand/     workflow-basic-steps/
antisamy-markup-formatter/           blueocean-personalization/          credentials/              github-oauth/                junit/                        pipeline-graph-analysis/           schedule-build/     workflow-cps/
apache-httpcomponents-client-4-api/  blueocean-pipeline-api-impl/        credentials-binding/      github-organization-folder/  ldap/                         pipeline-input-step/               scm-api/            workflow-cps-global-lib/
authentication-tokens/               blueocean-pipeline-editor/          cvs/                      github-pr-comment-build/     liquibase-runner/             pipeline-milestone-step/           script-security/    workflow-durable-task-step/
aws-credentials/                     blueocean-pipeline-scm-api/         disk-usage/               github-pullrequest/          mailer/                       pipeline-model-api/                slack/              workflow-job/
aws-java-sdk/                        blueocean-rest/                     display-url-api/          git-server/                  mapdb-api/                    pipeline-model-declarative-agent/  sse-gateway/        workflow-multibranch/
BlazeMeterJenkinsPlugin/             blueocean-rest-impl/                docker-commons/           git-userContent/             matrix-auth/                  pipeline-model-definition/         ssh/                workflow-scm-step/
blueocean/                           blueocean-web/                      docker-workflow/          greenballs/                  matrix-project/               pipeline-model-extensions/         ssh-agent/          workflow-step-api/
blueocean-autofavorite/              bouncycastle-api/                   durable-task/             handlebars/                  maven-plugin/                 pipeline-rest-api/                 ssh-credentials/    workflow-support/
blueocean-bitbucket-pipeline/        branch-api/                         envinject/                handy-uri-templates-2-api/   memegen/                      pipeline-stage-step/               ssh-slaves/         ws-cleanup/
blueocean-commons/                   build-environment/                  envinject-api/            htmlpublisher/               mercurial/                    pipeline-stage-tags-metadata/      structs/
blueocean-config/                    build-monitor-plugin/               external-monitor-job/     icon-shim/                   metrics/                      pipeline-stage-view/               subversion/
blueocean-core-js/                   build-timeout/                      favorite/                 jackson2-api/                momentjs/                     plain-credentials/                 token-macro/
blueocean-dashboard/                 built-on-column/                    feature-branch-notifier/  jacoco/                      multi-branch-project-plugin/  port-allocator/                    translation/
blueocean-display-url/               chucknorris/                        ghprb/                    javadoc/                     multiple-scms/                postbuild-task/                    variant/
blueocean-events/                    cloudbees-bitbucket-branch-source/  git/                      jenkins-design-language/     pam-auth/                     pubsub-light/                      violations/

Don't forget to scroll sideways...

At this point you may also realize "Oh great, all I need to do is run this":

java -jar jenkins-cli.jar install-plugin ${PLUGIN_NAME} --username quintessence --password █████████████

....for all of these?

sweat-smile-icon
Source: IconArchive

Itty Bitty Shell Script Saves the Day

Well, as they say: play to your strengths.
Harry Potter: Play to your strengths

I don't know about you, but one of my strengths is using shell scripts of any size to save my sanity. Even better in this case as minimal typing required and, true to magical fashion, only a little vim wizardry is required.

To get started, copy all of the plugins on the above list into a new file in vim and run three commands:

:%s/\//\r/g
:%s/^\s\+//e
:g/^$/d

These commands will:

  • Change all of the trailing / characters to new lines (\r is carriage return)
  • Replace all the whitespaces (\s) at the beginning of each line (^) with nothing, eliminating them.
  • Replace all the lines that only have a start and end of line with nothing, eliminating them.

If you used my plugin list to practice your vim fu you, like I at this point, realize that I have 154 plugins to install. 153, if you don't count the one that's already there.

Keep that file open and run these three commands:

:%s/^/"/
:%s/$/" /
:%s/\n//

These commands will:

  • Add a " to the beginning of each line
  • Add a " to the end of each line (don't neglect the space here!)
    • But if you do, run :%s/$/ /, which will add a space to the end of each line
  • This last one deletes the newline character, so you get a nice handy blob

You can use that blob to make an array, and then loop through that array like follows:

#!/bin/bash

PLUGIN_LIST=( "ace-editor" "blueocean-github-pipeline" "cloudbees-folder" "git-changelog" "jenkins-multijob-plugin" "parameterized-trigger" "resource-disposer" "warnings" "amazon-ecr" "blueocean-git-pipeline" "cobertura" "git-client" "jira" "performance" "run-condition" "windows-slaves" "analysis-core" "blueocean-i18n" "codedeploy" "github" "jira-ext" "phabricator-plugin" "runscope" "workflow-aggregator" "ansicolor" "blueocean-jira" "command-launcher" "github-api" "jquery-detached" "pipeline-build-step" "saferestart" "workflow-api" "ant" "blueocean-jwt" "conditional-buildstep" "github-branch-source" "jsch" "pipeline-github-lib" "sauce-ondemand" "workflow-basic-steps" "antisamy-markup-formatter" "blueocean-personalization" "credentials" "github-oauth" "junit" "pipeline-graph-analysis" "schedule-build" "workflow-cps" "apache-httpcomponents-client-4-api" "blueocean-pipeline-api-impl" "credentials-binding" "github-organization-folder" "ldap" "pipeline-input-step" "scm-api" "workflow-cps-global-lib" "authentication-tokens" "blueocean-pipeline-editor" "cvs" "github-pr-comment-build" "liquibase-runner" "pipeline-milestone-step" "script-security" "workflow-durable-task-step" "aws-credentials" "blueocean-pipeline-scm-api" "disk-usage" "github-pullrequest" "mailer" "pipeline-model-api" "slack" "workflow-job" "aws-java-sdk" "blueocean-rest" "display-url-api" "git-server" "mapdb-api" "pipeline-model-declarative-agent" "sse-gateway" "workflow-multibranch" "BlazeMeterJenkinsPlugin" "blueocean-rest-impl" "docker-commons" "git-userContent" "matrix-auth" "pipeline-model-definition" "ssh" "workflow-scm-step" "blueocean" "blueocean-web" "docker-workflow" "greenballs" "matrix-project" "pipeline-model-extensions" "ssh-agent" "workflow-step-api" "blueocean-autofavorite" "bouncycastle-api" "durable-task" "handlebars" "maven-plugin" "pipeline-rest-api" "ssh-credentials" "workflow-support" "blueocean-bitbucket-pipeline" "branch-api" "envinject" "handy-uri-templates-2-api" "memegen" "pipeline-stage-step" "ssh-slaves" "ws-cleanup" "blueocean-commons" "build-environment" "envinject-api" "htmlpublisher" "mercurial" "pipeline-stage-tags-metadata" "structs" "blueocean-config" "build-monitor-plugin" "external-monitor-job" "icon-shim" "metrics" "pipeline-stage-view" "subversion" "blueocean-core-js" "build-timeout" "favorite" "jackson2-api" "momentjs" "plain-credentials" "token-macro" "blueocean-dashboard" "built-on-column" "feature-branch-notifier" "jacoco" "multi-branch-project-plugin" "port-allocator" "translation" "blueocean-display-url" "chucknorris" "ghprb" "javadoc" "multiple-scms" "postbuild-task" "variant" "blueocean-events" "cloudbees-bitbucket-branch-source" "git" "jenkins-design-language" "pam-auth" "pubsub-light" "violations" )

for PLUGIN in "${PLUGIN_LIST[@]}"
do
#    echo "Plugin name: ${PLUGIN}"
    java -jar jenkins-cli.jar install-plugin ${PLUGIN} --username quintessence --password █████████████
done

exit 0

Note that I only needed to add a few lines around the big blob of text, most of our savings here are vim manipulations to change a directory list to a useful blob. The echo line is for you to test printing out the plugin names if you would like - just comment out the jenkins-cli line.

Now for the mass plugin install. Fingers crossed.

→  chmod +x plugin-install.sh

→  ./plugin-install.sh
Installing ace-editor from update center
Installing blueocean-github-pipeline from update center
Installing cloudbees-folder from update center
Installing git-changelog from update center
Installing jenkins-multijob-plugin from update center
Installing parameterized-trigger from update center
Installing resource-disposer from update center
Installing warnings from update center
Installing amazon-ecr from update center
Installing blueocean-git-pipeline from update center
Installing cobertura from update center
...

starry-eyed-icon
Source: IconArchive

The Moment of Truth

As you recall me mentioning more than once, rsyncing these directories was a time consuming affair. On the order of hours. But I've had a moment of inspiration: what if I just put the jobs and workspaces directories in to the working Jenkins instead? Since copying the plugins directory into borked Jenkins didn't de-bork it.

To test this a little faster and saner, since I have the jenkins-defunct volume attached and mounted to the new Jenkins, I decided to test this by creating symlinks to the defunct Jenkins' jobs and workspaces directories.

Important note: This is not production ready, please do not do this in production. This is a drill.

Now to continue: I'm going to both backup the empty jobs directory of the bare Jenkins install as well as its whole HOME directory so, if all else fails, I can swiftly get the bare install back. Then I'm going to make the symlinks.

→  sudo mkdir JENKINS_BARE_v2.104_BKP_WITH_PLUGINS

→  sudo cp -r /var/lib/jenkins JENKINS_BARE_v2.104_BKP_WITH_PLUGINS/

→  sudo service jenkins stop
Shutting down Jenkins                                      [  OK  ]

→  sudo mv /var/lib/jenkins/jobs{,--bkp}

→  sudo ln -s /jenkins-ci-old/var/lib/jenkins/jobs /var/lib/jenkins/jobs

→  sudo ls -lh /var/lib/jenkins/job*
lrwxrwxrwx 1 root    root      36 Feb  2 19:21 /var/lib/jenkins/jobs -> /jenkins-defunct/var/lib/jenkins/jobs

/var/lib/jenkins/jobs--bkp:
total 0

→  sudo ln -s /jenkins-defunct/var/lib/jenkins/workspace /var/lib/jenkins/workspace

→  sudo ls -lh /var/lib/jenkins/work*
lrwxrwxrwx 1 root    root      41 Feb  2 19:22 /var/lib/jenkins/workspace -> /jenkins-defunct/var/lib/jenkins/workspace

Note: there was no existing workspaces directory as that involves a plugin that we use / that was just installed.

Now.

to.

Restart.

Jenkins.

sweat-smile-icon
Source: IconArchive

MOMENT OF TRUTH

JOBS JOBS JOBS
JOBS JOBS JOBS

starry-eyed-icon
Source: IconArchive

As a bit of a throwback: those two hanging jobs are what, if you still remember the top of this post, inspired the rollback and caused all this drama.

What to do: Resolution

Of course since this works that means that I need to unlink the symlinks and rsync the actual data where it belongs. This is a short section by word count, but it took it's 2-3 hours to do. To unlink:

→  sudo unlink /var/lib/jenkins/jobs

→  sudo unlink /var/lib/jenkins/workspace

And now for the rsync. I didn't mention it directly before, but the reason I was able to get up and walk away, open other sessions with ease, etc. is because I was using tmux sessions. You may have noticed that was one of the "packages I like to install" above. This is why.

→  tmux new -s rsync

Here's a tmux cheatsheet If you're new to tmux. If you're using the env I cloned above, there is a tmux configuration in there and to create new windows you'll use control+a+c. If you're using the default / not that env, then I believe the default is control+b+c.

In one window:

sudo rsync -a /jenkins-defunct/var/lib/jenkins/jobs /var/lib/jenkins/jobs

And in the other:

sudo rsync -a /jenkins-defunct/var/lib/jenkins/workspace /var/lib/jenkins/workspace

And then you wait.

As a quick tip: I mentioned above that I had initially made this instance with a GP2 type SSD in AWS. In hindsight, it would have been nice to have had IO1 and then it would have made more sense to up the instance type to something beefier, at least just for the transfer, so it'd have been less slow. There's no way to change a volume from GP2 to IO1, though, so I would have needed to snapshot and recreate with a new instance. Alas.

I can verify that after the rsync completed that Jenkins booted up successfully. Let's take another look at that sweet, sweet image.

JOBS JOBS JOBS
JOBS JOBS JOBS

starry-eyed-icon
Source: IconArchive

Github Oauth Note

When I was flipping the routes around to swap the new Jenkins to production, I noticed it kept trying to preserve the old route. There are a couple of ways to add the Jenkins route. One is in the UI, if it's working. To do that go to Manage Jenkins -> Configure System and scroll to the Jenkins location section:

Navigation Menu

Jenkins URL

The other place to do it is by editing the /var/lib/jenkins/jenkins.model.JenkinsLocationConfiguration.xml.

I verified that the URL was correctly set in both of these places; however, Jenkins kept bouncing back to the jenkins-dev.example.com route I had made for it and it also popped a notification that I had a broken reverse proxy. So what gives?

Apparently the culprit was Github Oauth. When you configure the Oauth app in Github it looks like this:

Github Ouath App Configuration

The fields indicated with arrows were using the old route, so Jenkins kept redirecting due to auth.

Post Mortem: Adding Resiliency and What Not

So one of the reasons I ended up in this mess is the lack of backups for a single point of Jenkins shaped failure. There were also some undocumented dependencies and a few other pain points that were uncovered as we did the first round of jobs. There are actually enough points here that I'm splitting this portion into it's own post, which will be released shortly.

OpsFire Badge

Documented on my frequently used assets page.


Sources for header: Jenkins logo and Jenkins art: Fire from Jenkins site, Health Potion by adorabless @ DeviantArt, and a curved arrow from FreePik. Fiery background is from Shutterstock user Bernatskaya Oxana.