Devops4Scala - a JDK Docker image with SBT and Ammonite

In this post I will examine how to build a Docker image for Java Development Kit under Alpine Linux using the SBT Scala build tool, and Ammonite the Scala scripting shell.

In a previous post I started a series “Devops4Scala” describing how to use Scala for Devops tasks. The first post was about building Docker images using SBT and the plugin sbt-docker. The plugin is pretty powerful but it does not allow to do everything I needed for complex image builds.

To perform the required tasks I introduce another plugin I wrote, sbt-mosaico, extending sbt-docker and featuring configuration files, automated download, Ammonite scripting and more. I advise to read the past blog post about sbt-docker in order to fully understand this post, because it takes for granted you know how to build Docker images with SBT.

Since I am building a set of Docker images for Scala applications, a base image with the Java Development Kit is an absolute must. Scala is nowadays primarily a JVM language, although it can be successfully compiled to Javascript, and a nascent native compiler also exists.

Furthermore, I want to use Alpine Linux as a basis for my images, because it is very lightweight and is the de-facto standard for Docker images.

There are some “official” JDK images on Docker Hub built by Oracle, but they are not based on Alpine Linux, and they are also pretty big. There are also some images based on Alpine Linux with Sun JDK but they are not official. Last but not least, there are Open JDK based Alpine Linux images. What I want to build is an image based on Alpine Linux and Oracle JDK that are as small as possible.

Such an image cannot be legally distributed on the Docker HUB because it would violate the distribution agreement of Oracle JDK. However it is certainly legal to build an in-house image. So this blog post is about how to build such an optimized image for internal use. I believe the scripts can be handy in many cases.

All the scripts described in this post can be found on GitHub.

Use cases for building a JDK image

In this post I will cover the following use cases you may face when building a JDK image:

  • downloading software from internet outside Docker
  • creation of a separate configuration files to parametrize the build
  • processing of the downloaded software, using ad-hoc scripts

All of those use cases cannot be covered with sbt-docker alone. For this reason I developed an extension plugin, named sbt-mosaico, that I will use in this post for building more complex images.

Let’s start doing it all in order.

Downloading software for Docker images

Dockerfiles usually download internally required software. You can to it in 2 ways. First way is to use the ADD command followed by an URL. The second way is to use a RUN command executing wget or curl to actually download your file.

Both the solutions are problematic. If you use ADD, by design, it will download the url at every build, because a layer created with ADD is not cached. The rationale behind this behavior is explained in the linked issue: Docker by design do not trust http timestamps, so they end up downloading every time. Unfortunately this slows down a lot when you rebuild images, and it is a major issue in the building process, especially if you run those builds in a continuous integration chain.

The alternative suggested in the ticket itself, is to use a RUN wget, that works a bit better. However, this way the downloaded software is left inside the image. It eats precious space: this technique basically almost doubles the size of the image. The wget layer of an image is cached, so it does not download the software again, but also when you drop the images and rebuild them from scratch, you have to download the software again.

The solution is of course to download the image before building the image automating the download step with SBT. So let’s see how to do it.

Enabling download in SBT

First, you need the sbt-mosaico plugin, that is available on Maven Central. So all you need to do is to add in your project/plugins.sbt:

1
addSbtPlugin("com.sciabarra" % "sbt-mosaico" % "0.2")

and enable it with:

1
enablePlugins(MosaicoDockerPlugin)

Note this plugin depends on the sbt-docker plugin and it will automatically download and enable it.

Now you have a few extra tasks. The one we are interested is just download. It is an interactive command, so you can use it straight from the SBT prompt. For example, you can use this task to download the Ammonite Scala Shell with just:

1
download https://git.io/vioDM amm

Note: you do not actually need to download Ammonite to run the scripts I will describe later, since Ammonite is already integrated in sbt-mosaico and it will be download by SBT as a dependency.

Download the requirements

Now that we have the download task we can use it to build a Docker image.
Let’s write an SBT script for it. The full script is here. We will need of course the url to download the JDK, so let’s define a variable with it:

1
val jdkUrl = "http://download.oracle.com/otn-pub/java/jdk/8u101-b13/jdk-8u101-linux-x64.tar.gz"

However, if we download this archive and try to run in an alpine image, we will have a nasty surprise: it does not work, since it is dynamically linked with the standard glibc, and Alpine Linux uses musl libc. So we have no choice but to download and install also glibc shared libraries for Alpine. Luckily, those are available on the net, so let’s declare another URL to download this other requirement:

1
val glibcUrl= "https://github.com/sgerrand/alpine-pkg-glibc/releases/download/2.23-r3/glibc-2.23-r3.apk"

So we can now first download the glibc using:

1
download.toTask(s" $glibcUrl glibc.apk").value

This code in a SBT build file will transform the input task (the interactive download task) in a task that can be performed automatically. It will “build” downloading the URL and storing it in the local glibc.apk file. It will also download only once. If the file is already there, it will not be downloaded again. The task returns the File object pointing to the local copy.

We can use the download task also to download the JDK, but JDK has an additional step required to be downloaded. It is protected by a cookie, requiring the acceptance of the license. The download task actually cover also those cases allowing to specify in the request additional headers. So we can download the JDK with:

1
2
val oraCookie = "Cookie: oraclelicense=accept-securebackup-cookie"
download.toTask(s" $jdkUrl jdk.tgz $oraCookie").value

Now we have downloaded the files, the rest is pretty usual. This script however it is only the first version. There is more to say.

Parametrizing the build script

Now we have a build script where we placed a number of constants inside. It is useful to actually place those constants in a separate, simple, non coded configuration file, to allow administrators and non programmers to change values easily. This is especially useful for something so volatile as an url. Normally it is not possible in SBT out of the box. Luckily, the MosaicoDockerPlugin has a feature to read a property file.

When you enable the plugin, you get a prp settings returning a Scala map. By default, it looks in a file named mosaico.properties in the current directory, but you can customize the location of the property files. Indeed this is the first thing we do now.

I created jdk2 project where I added this setting at the beginning:

1
prpLookup += baseDirectory.value.getParentFile -> "alpine"

This way the build will lookup for an alpine.properties in parent directory. I will share this configuration file with different subprojects. The configuration file is the following:

1
2
3
4
alpine.jdk2=devops4scala/alpine-jdk2:1
alpine.jdk3=devops4scala/alpine-jdk3:1
jdk.url=http://download.oracle.com/otn-pub/java/jdk/8u101-b13/jdk-8u101-linux-x64.tar.gz
glibc.url=https://github.com/sgerrand/alpine-pkg-glibc/releases/download/2.23-r3/glibc-2.23-r3.apk

I placed there the names of the images I am going to build, and the urls of the files to download.

To use a value from the configuration file I just need to use prp.value("key.name"). Remember, since it is an SBT task, you need to use .value to evaluate a task. I get a hash map, from where I can extract a value. If a value looked up in the configuration file is missing, I will get an exception in the build. It is what I want, because if I forget a key I should get an error. If, insted, I would like to get a default for a missing key, I can use prp.value.getOrElse("key.name", "default value"). Result: if the key.name does not exist I get the default value.

We can see it in action in this code in the jdk2/build.sbt:

1
imageNames in docker := Seq(ImageName(prp.value("alpine.jdk2")))

Using properties in SBT input tasks

We have a problem: unfortunately, in SBT, we cannot use dynamic values in generic input tasks. When you want to invoke an SBT input task as a task in a build script (with the toTask method), you cannot use other tasks or settings. I won’t discuss why here (there is a complicated technical reason), I simply point out the solution. The download task (and also other sbt-mosaico tasks) perform configuration replacement internally. In practice this means you cannot write ${prp.value("glibc.url")} in the download but instead you have to @glibc.url. For example, the download code in the new project looks like this:

1
2
3
4
Def.sequential(
download.toTask(s" @glibc.url glibc.apk"),
download.toTask(s" @jdk.url jdk.tgz $oraCookie")
).value

The notation @glibc.url and @jdk.url will be replaced by a configuration key in the input task directly. Everything starting with @ will be replaced with the corresponing key in the configuration. Note the $oraCookie is instead just Scala string interpolation.

Incidentally, note that I placed the two tasks in a “sequential” block to be sure they are executed in sequence. SBT tries to execute dependent tasks in parallel by default.

Profile support

Actually I lied on the fact the plugin looks for property files with a .properties extension. It actually looks for 3 extensions, in this order: first .dist.properties then .properties and finally .local.properties. The latest value will override the older value.

The idea is you can place your distribution default in the .dist.properties, your actual configuration in the .properties and finally you can have a local override with a .local.properties file.

Furthermore, it is possible to set the system property profile, and it will be used as an additional extension (plus .properties). For example if I invoke SBT with -Dprofile=devel, also the property files with extension .devel.properties will be loaded and used (as the last one).

Removing extra files in JDK

So far so good, we built the image. However, if you check the size of the JDK image you will see it is 377 MB big! It is smaller than the official Java image, (currently at 600 megabytes!) but it is still pretty large.

Actually, in the JDK are included a lot of files not necessary for the execution of many applications. For example we can remove the source code of the Java library, the Java DB and tuning tools like Visual VM and Mission Control.

Unfortunately, just removing those files in the Docker build does not make the image smaller. What we need is the ability to trim the JDK before actually placing it in the image. Thankfully, we are in the context of building images with a build tool, so it is just matter of another build step. The problem is: how we write this build step?

A portable scripting solution: Ammonite

We can perform build activities in many different ways, generally invoking a script. The most obvious solution is to write a Bash shell script. However the cost of using Bash is introducing a dependency on the system to be used to build our Docker images. Shell script cannot be run on all windows systems out of the box.

Dependency on shell is a problem, because I want the kit to be usable on multiple systems, to build images for development as well as for production. I need to support all the development environments where images can be built. The build should be usable for example by developers running Windows, web designer using a Mac and production servers running on Linux. Last but not least, since we are writing Scala SBT build scripts for our Dockerfiles, I would like to write Scala scripts also for supporting code. The simplest way is probably writing SBT extensions, but it is no always the simplest solution. In certain cases, SBT can be awkward to use. It is a build tool, not a scripting tool.

For those reasons I included support for a portable scripting shell based on Scala, Ammonite. The sbt-mosaico plugin includes the ability to run Ammonite scripts. You need to enable it, so in your project definition you have to add this code:

1
enablePlugins(MosaicoDockerPlugin,MosaicoAmmonitePlugin)

Once done, you will get a new interactive task amm that can execute Scala scripts. If you have such a script (we will show one later ) in current directory, for example named trimjdk.sc, you can run amm trimjdk from the command line to execute it. In a build you can use amm.toTask(s" trimjdk") to execute it durning a build. Once we have a trimjdk.sc script, the code to download and trim the JDK looks like this:

1
2
3
4
5
Def.sequential(
download.toTask(s" @glibc.url glibc.apk"),
download.toTask(s" @jdk.url jdk.tgz $oraCookie"),
amm.toTask(s" trimjdk")
).value

Here the sequential is necessary because the trim task must be executed after the download has been completed. Without it, SBT would try to execute the amm task before the download is complete. You can check the final build file here.

An Ammonite script

Let’s examine the Ammonite script (you can find it here).

The script is mostly written in Scala, and you do not need anything else. The only dependency is SBT. Required tools, like a library to extract the tar.gzw is going to be downloaded on demand. The scripts actually were tested on Windows 10 and Mac OSX and they run just fine on both systems. Let’s give a look at the script:

1
2
3
import $ivy.`org.rauschig:jarchivelib:0.7.0`
import org.rauschig.jarchivelib._
import java.io._

I am unsing a library named jarchivelib (available on Maven Central) for extracting the .tar.gz. The strange $ivy imoport is the Ammonite syntax for declaring a dependency on a library. Once done the rest is plain scala.

1
val exclude = "jdk1.8.0_\\d+/(src\\.zip|javafx-src\\.zip.*|db/.*|man/.*|include/.*|lib/(missioncontrol|visualvm)/.*|jre/lib/desktop/.*)".r

This is a regular expression to list files to exclude.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
val infile = new File("jdk.tgz")
val base = pwd.toIO
val outdir = (pwd/"usr").toIO
outdir.mkdirs
val arc = ArchiverFactory.createArchiver(ArchiveFormat.TAR, CompressionType.GZIP);
val str = arc.stream(infile)
var ent = str.getNextEntry
while (ent != null) {
val curr = ent.getName
exclude.findFirstIn(curr) match {
case Some(_) => print(s"-")
case None => print("+")
ent.extract(outdir)
}
ent = str.getNextEntry
}

This code just extract the files excluding those I am not interested in.

Now, we can build the image. The build, after downloading the files, will also extract only the files we are interested, and finally will place them inside the image. The size of the image built with the jdk3 build is only 248 MB. Success!

Conclusions

Now our SBT build kit is now more powerful, since we can also download, configure and script our builds. We have not finished yet, there is more. We still have even more complicated tasks to execute , like compiling binary code (not supported by SBT). Stay tuned for more installments of Devops4Scala.