What changes with Docker for Developers?

Last week I was having a talk with a new friend (@rdiazconcha) about Docker and one of the virtual table topics was: “What are the implications of adopting Docker for developers? Do we need to keep doing the same things as we’ve been doing it in the last years (or months) or should we start considering other things too?” Well, in this article I’ll give my perspective and what changed for me and the good group of developers around me.

But first, I don’t think this is a complete list of topics that you need to consider (I’m assuming that you already have some basic knowledge about Docker) in fact I think some one of them could need more than one article to discuss, so you can add more in the comments.

Is not just about having the same environments

We always read that the common phrase “it works in my machine” it’s dead with Docker, but let me tell you that this isn’t always true, but why? Because we need to be aware that there are many other components that are needed for our app to work, say for example: redis, elasticsearch, mysql, memcached, etc.

We often have different environments too, where all tests need to be green (everybody test their code, right?), the most common ones are: local, dev, qa, staging and production (to name a few). And the idea of having Docker is that we can “build once run anywhere” so we need to have in mind that the app won’t run only on my machine but also in any other team member (yes, treat servers like a team member, lol). The usual flow is like this: I create my feature branch, make the changes, test the changes, push the code, make a merge request, my team lead approves it, it gets merged to master, then the CI tool is triggered to build the image and it’s deployed for the first time with the CD tool to the dev environment, after this we can just deploy the same image to the rest of environments … yes, when all tests passed.

Sounds good and easy right? Not always, sometimes there are inconsistencies that causes so much pain. For example, one of the most common problems we had is that the endpoints for the resources weren’t always in the same way, for redis some used the server IP (that sometimes was changed because whatever) other used the domain name but then we realized that we needed to have a standard there so we changed the domains again and we always forgot to change something because eveng though the tests were green, when we shut down an old server we started noticing that some things were still pointing there. So, even us having a flow similar to the one described above, we always had some problems between environments.

We never knew exactly what was happening (everything worked in our machines, remember?), but weren’t taking full advantage of something very useful (we’re in that): the debugger. Sometimes we have to go the limit of saying: “We don’t have the same environment as in production so we will need to debug there” and we end up having ugly debug messages that sometimes are exposed to our customers. No, no, no, we can’t keep working in that way, we do have a better way now because t’s possible to even debug our code inside a container, for example, if you’re a .NET developer, Visual Studio 2017 has this already integrated and for existing projects you just need to “Add Docker Support” to your projects and it will build all what’s necessary to run and debug locally your app, it will use some base structure and according to the version of your code it will create a proper Dockerfile, but hey, as Scott Hanselman said in the VS launch: “You can use this even to learn Docker” (you can do this with Java too: [here].

Testing my app can be easily automated

We can’t just test (and/or debug) locally, if we have full controll of the environment we’ll always find the hack for our app to work and as our app grows, more and more tests will be needed so this means that more and more time is going to be needed and this might start slow you down.

Yesterday, I found out that SQL server team use Docker to automate testing (link) and also Yelp says that they run millions of tests every day (link), that’s amazing! We have two examples of big companies on how we can take advantage of containers to test our app to the limit to even run those type of tests locally. The approach you’ve been using to test your app needs to change because like the SQL Server team, now you have the ability to run the database in a desired state as the first step, run the tests with several containers at the same time and wait some time for the results (you’ll need more resources but we leverage that to the cloud, right?).

Imagine that you can run all you test suite after you’ve deployed the app in the dev (or other one) environment with this approach, you’ll gain confidence the more you practice and also you’ll reduce not just lead time but risks when deploying something new to production.

Storage is ephemeral

You need to know that all changes that happen in a container when it’s running, might get lost forever because when you start again it will create a new container (unless you relaunch the same container). At the moment it gets stopped there’s no guarantee that your files will stay there, these can be logs, missing files when deploying and uploaded manually, new uploaded files when using the app, missing components that were installed, etc. I definetly recommend you reading this article [here] to know more about how this works.

Why this is important? Like I said before, when you have the control of the environment you will always find the way to make your app work and as humans (specially with deadlines) we tend to forgot what we did to fix it and that’s why our app stops working in other environments. Also, this means that you can’t depend on having the state inside a container, the container should always be stateless.

Let me give you an example, let’s say a user uploads their profile pic with your app and you store that pic in the container, the user is happy because the profile pic is being displayed but then something happens with the container and it’s terminated. Boom, the profile pic is not there anymore, at scale it’s annoying to go and re start a stopped container. In this specific case you should instead move the image to a permanent storage like S3, Azure Blob or whatever you prefer.

Environment variables

Now, this is one of the most important things where you’ll see significant gains if you applied this correctly. This was for me the “aha” moment for “run anywhere” because by implementing environment variables you just need to “inject” the values you need for the environment the container will run. Let’s use the database endpoint as an example, by using an environment variable like DB_HOST I can start making the changes needed to my app to take the endpoint from the environment, then I need to add that variable to my computer … right? … ahmm you can do that but you won’t be changing things that much, just making everything more complex because what you did in your computer you’ll need to do it in the rest of the environments.

So how can this be achieved? You need to make use of the environment variables when running the container, this way you’ll be injecting the desired values. This recommendation is based on the 12 Factor App guide and specifically for the config part, you can find more info [here] and I also wrote a post about using docker env vars with .NET Core [here].

Logs, logs, logs

Data might get lost, remember? So how do we know what happened after a container crashed? Logs. This is crucial in production so you need to be prepared by making sure you have this across all environments (even in your machine, yes, to verify it’s working). You need to know that by default all logs goes to the standard output (/dev/stdout) and standard error (/dev/stderr), take a look for example to an nginx container, the /var/log/nginx/access.log is a symbolic link to /dev/stdout. This way you can use the “docker logs” command to get the information without login in to the container or you can make use of other logging drivers and have a centralized logging server where all containers will push logs, as the 12 Factor App says “treat logs as event streams” [here]. But you have to choose just one, if you decide to go with the logging driver approach, you are no longer able to get logs with the “docker logs” command and I should mention that you can only have just one logging driver, at least at the moment (that sucks).

How this affects you as developer? You need to know that if you send logs to another path different that stdout or stderr (this is translated to all console writes your app can make) your logs might get lost (unless you send them to a shared volume, in this case the logs will be kept in the host, but there’s still a chance to lose them). So keep this in mind.

Port binding

You should find a way that this is not an impediment for your app, especially if you would like to scale out containers in the same host. The application inside your container will expose a port but the host is not going to necessary expose the same port (there is a default range but as developer you don’t need to know this, but if you’re curious go [here].

And this is also a characteristic of a 12 Factor App, you can find more info [here]. But, why is this important? By having this option, you don’t need to worry where the container will be living, in your local computer could be port 8080 but in development could be port 32888. You might start thinking that his will make things more complex to you because you need to know in which port the container is running, this gets complicated when you run more than one container to have a cluster be fault tolerant.

You also have some options to work with this and here they are: 1. You can use docker compose and link the containers, more info [here] 2. Use a load balancer, this will expose obviously a port that will always be the same and that you will use to call a container, the difference is that this will proxy the calls to the container’s port in the host that are registered to it, there are several ways to do this and here are some. And all the orchestrators have this feature in some way, but as a developer you don’t necessary need to know how to do implement it. 3. Ignore this section and continue working with a fixed port number, the downside here is that you will only be able to have one container per host

Tooling to have my local environment

Now, you don’t need to learn a bunch of tools to start but there are some things that you should consider when beginning this journey:

  1. Docker installed, kind of obvious but anyway, more info [here]
  2. In the case of Windows, you might need to share the drive where you have the code so you can debug or use volumes in that drive
  3. In some cases, make sure you’re giving enough resources to the docker daemon, i.e. to run SQL Server containers you need to have at least 3.5 Gb of memory reserved
  4. A script to start the container app and all its dependencies (i.e. local redis), this can be another topic table, just take note that you’ll need to use compose to ease the work. This way when a new developer comes in he/she will just need to pull the code, read the README.md, follow some instructions maybe, run docker compose and start hacking. This sounds easy, but it can get complicated to get to this point, but it will worth it.

Create the Dockerfile with the operations team

You need to have this scripted, it will make everyone’s life easier, trust me. You can launch a container with a base image (even FROM scratch), install what you need, copy your app and then test, after you are happy with everything you just commit the container, create a new image tag, push the image to the repo and that’s all. In other words, you could treat the container like a virtual machine, don’t do that, please.

Instead, make use of the Dockerfile, you can start with what you know you’re going to need for your app and then ask to the operations team for a review, they might have something to add or remove and they will know what the container will have installed. Same thing happens if other person is building the image, you would like to know what’s inside, right? Maybe something that it’s in there is what is making your app to crash or running slowly.

A few things to consider here: avoid using a Dockerfile for build locally, other for development and another one for production, it will get a mess, instead use the multi-stage builds, more info [here]. Another thing is the image size, remember that the idea is to have only what you need to run your app, nothing else, so ideally you’ll try to have your image size as low as possible, for example you can achieve this by using base images like Alpine or CoreOS (create your own with Linux Kit?) instead of using the Ubuntu one. Why is this important? Well, when you scale out and a new server is launched, the image needs to be pulled and if the image is serveral GBs of size, it will take some time before you can launch your first container, you users might not even notice it but it can affect someone.

Summary

I hope you have a good idea now what changes you might need to implement to your existing applications. But do not try to make all changes at once, start little by little, you will always have some benefits. Right from the very beginning you’ll be able to have a decent level of consistency between deployments, you might not be able to get the full advantages of Docker but at least give the first step: containerized your app :)