Docker in Production: An Update

23 February 201727 August 2017 thehftguy73 Comments

The previous article Docker in Production: A History of Failure was quite a hit.

After long discussions, hundreds of feedbacks, thousands of comments, meetings with various individuals and major players, more experimentation and more failures, it’s time for an update on the situation.

We’ll go over the lessons learned from all the recent interactions and articles, but first, a reminder and a bit of context.

Disclaimer: Intended Audience

The large amount of comments made it clear that the world is divided in 10 kind of people:

1) The Amateur

Running mostly test and side projects with no real users. May think that using Ubuntu beta is the norm and call anything “stable” obsolete.

I dont always make workin code but when I do it works on my machine — Can’t blame him. It worked on his machine.

2) The Professional

Running critical systems for a real business with real users, definitely accountable, probably get a phone call when shit hits the fan.

one-does-not-simply-say-well-it-worked-on-my-machine.jpg — Didn’t work on the machine that served his 586 million customers.

What Audience Are You?

There is a fine line between these worlds and they clash pretty hard when they ever meet. Obviously, they have very different standards and expectations.

One of the reason I love finance is because that it has a great culture of risk. It doesn’t mean to be risk-averse contrary to a popular belief. It means to evaluate potential risks and potential gains and weight them against each other.

You should take a minute to think about your standards. What do you expect to achieve with Docker? What do you have to lose if it crashes all systems it’s running on and corrupt the mounted volumes? These are important factor to drive your decisions.

What pushed me to publish the last article was a conversation with a guy from a random finance company, just asking my thoughts about Docker, because he was considering to consider it. Among other things, this company -and this guy in particular- manages systems that handle trillions of dollars, including the pensions of millions of Americans.

Docker is nowhere ready to handle my mother’s pension, how could anyone ever think that??? Well, it seemed the Docker experience wasn’t documented enough.

What Do You Need to Run Docker?

As you should be aware by know, Docker is highly sensitive to the kernel, the host and the filesystem it’s using. Pick the wrong combination and you’re talking kernel panic, filesystem corruption, Docker daemon lock down, etc…

I had time to collect feedback on various operating conditions and test a couple more myself.

We’ll go over the results of the research, what has been registered to work, not work, experience intermittent failures, or blow up entirely in epic proportions.

Spoiler Alert: There is nothing with or around Docker that’s guaranteed to work.

Disclaimer: Understand the Risks and the Consequences

I am biased toward my own standards (as a professional who has to handle real money) and following the feedback I got (with a bias toward reliable sources known for operating real world systems).

For instance, if a combination of operating system and filesystem is marked as “no-go: registered catastrophic filesystem failure with full volume data loss“. It is not production ready (for me) but it is good enough for a student who has to do a one-off exercise in a vagrant virtual machine.

You may or may not experience the issues mentioned. Either way, they are mentioned because they are certified to be present in the wild as confirmed by the people who hit them. If you try an environment that is similar enough, you are on the right path to become the next witness.

The worst that can -and usually- happen with Docker is that it seems okay during the proof of concepts and you’ll only begin to notice and understand issues far down the line, when you cannot easily move away from it.

CoreOS

CoreOS is an operating that can only run containers and is exclusively intended to run containers.

Last article, the conclusion was that it might be the only operating system that may be able to run Docker. This may or may not be accurate.

We abandoned the idea of running CoreOS.

First, the main benefit of Docker is to unify dev and production. Having a separate OS in production only for containers totally ruins this point.

Second, Debian (we were on Debian) announced the next major release for Q1 2017. It takes a lot of effort to understand and migrate everything to CoreOS, with no guarantee of success. It’s wiser to just wait for the next Debian.

CentOS/RHEL

CentOS/RHEL 6

Docker on CentOS/RHEL 6 is no-go: known filesystem failures, full volume data loss

Various known issues with the devicemapper driver.
Critical issues with LVM volumes in combination with devicemapper causing data corruption, container crash, and docker daemon freeze requiring hard reboot to fix.
The Docker packages are not maintained on this distribution. There are numerous critical bug fixes that were released in the CentOS/RHEL 7 packages but were not back ported to the CentOS/RHEL 6 packages.

ship crash shipt it revert — The only sane way to migrate to Docker in a big company still running on RHEL 6 => Don’t do it!

CentOS/RHEL 7

Originally running the kernel 3, RedHat has been back porting the kernel 4 features into it, which is mandatory for running Docker.

It caused problems at time because Docker failed to detect the custom kernel version and the available features on it, thus it cannot set proper system settings and fails in various mysterious ways. Every time this happens, this can only be resolved by Docker publishing a fix on feature detection for specific kernels, which is neither a timely nor systematic process..

There are various issues with the usage of LVM volumes, depends on the version.

Otherwise, it’s a mixed bag. Your mileage may vary.

As of CentOS 7.0, RedHat recommended some settings but I can’t find the page on their website anymore. Anyway, there are a tons of critical bugfixes in later version so you MUST update to the latest version.

As of CentOS 7.2, RedHat recommends and supports exclusively XFS and they give special flags for the configuration. AUFS doesn’t exist, OverlayFS is officially considered unstable, BTRFS is beta (technology preview).

The RedHat employees are admitting themselves that they struggle pretty hard to get docker working in proper conditions, which is a major problem because they gotta resell it as part of their OpenShift offering. Try making a product on an unstable core.

If you like playing with fire, it looks like that’s the OS of choice.

Note that for once, it is a case where you surely wants to have RHEL and not CentOS, meaning timely updates and helpful support at your disposal.

Debian

Debian 8 jessie (stable)

A major cause of the issues we experienced was because our production OS was Debian stable, as explained in the previous article.

Basically, Debian froze the kernel to a version that doesn’t support anything Docker needs and the few components that are present are rigged with bugs.

Docker on Debian is major no-go: There is a wide range of bugs in the AUFS driver (but not only), usually crashing the host, potentially corrupting the data, and that’s just the tip of the iceberg.

Docker is 100% guaranteed suicide on Debian 8 and it’s been since the inception of Docker a few years ago. It’s killing me no one ever documented this earlier.

I wanted to show you a graph of AWS instances going down like dominoes but I didn’t have a good monitoring and drawing tool to do that, so instead I’ll illustrate with a piano chart that looks the same.

docker-crash-illustrated — Typical docker cascade failure in our test systems.

Typical Docker cascading failure on our test systems. A test slave crashes… the next one retries two minutes later… and dies too. This specific cascade took 6 tries to go past the bug, slightly more than usual, but nothing fancy.

You should have CloudWatch alarms to restart dead hosts automatically and send a crash notifications.

Fancy: You can also have a CloudWatch alarm to automatically send a customized issue report to your regulator whenever there is an issue persisting more than 5 minutes.

Not to brag but we got quite good at containing Docker. Forget about Chaos Monkey, that’s child play, try running trading systems handling billions of dollars on Docker [1].

[1] Please don’t do that. That’s a terrible idea.

Debian 9 stretch

Debian stretch is planned to become the stable edition in 2017. (Note: might be released as I write and edit this article).

It will feature the kernel 4.10 which is the latest LTS, published simultaneously.

At the time of release, Debian Stretch will be the most up to date stable operating system and it will allegedly have all the shiny things necessary to run Docker (until the Docker requirements change again).

It may resolve a lot of the issues and it may make a tons of new ones. We’ll see how it goes.

Ubuntu

~~Ubuntu has always been more up to date than the regular server distributions.~~

Sadly, I am not aware of any serious companies than run on Ubuntu. This has been a source of much misunderstanding in the docker community because dev and amateur bloggers try things on the latest Ubuntu (not even the LTS [1]) yet it’s utterly non representative of production systems in the real world (RHEL, CentOS, Debian or one of the exotic Unix/BSD/Solaris).

~~I cannot comment on the LTS 16 as I do not use it. It’s the only distribution to have Overlay2 and ZFS available, that gives some more options to be tried and maybe find something working?~~

~~The LTS 14 is a definitive no-go: Too old, don’t have the required components.~~

[1] I received quite a few comments and unfriendly emails of people saying to “just” use the latest Ubuntu beta. As if migrating all live systems, changing distribution and running on a beta platform that didn’t even exist at the time was an actual solution.

Update: I said I’m never coming back to Docker and certainly not to spend an hour on digging up references but I guess I have to now that they are handed to me in spectacular ways.

I received a quite insulting email from a guy who is clearly in the amateur league to say that “any idiot can run Docker on Ubuntu” then proceed to give a list of software packages and advanced system tweaks that are mandatory to run Docker on Ubuntu, that allegedly “anyone could have found in 5 seconds with Google“.

At the heart of his mail is this bug report, which is indeed the first Google result for “Ubuntu docker not working” and “Ubuntu docker crash“: Ubuntu 16.04 install for 1.11.2 hangs.

This bug report, published on June 2016 highlights that the Ubuntu installer simply doesn’t work at all because it doesn’t install some dependencies which are required by Docker to run, then it’s a see of comments, user workarounds and not-giving-a-fuck #WONTFIX by Docker developers.

The last answer is given by an employee 5 months later to say that the Ubuntu installer will never be fixed, however the next major version of Docker may use something completely different that won’t be affected by this issue.

A new major version (v1.13) just got released (8 months since the report), it is not confirmed whether it is affected by the bug or not (but it is confirmed to come with breaking changes).

It’s fairly typical of what to expect from Docker. Checklist:

Is everything broken to the point Docker can’t run at all? YES.
Is it broken for all users, of say a major distribution? YES.
Is there a timely reply to acknowledge the issue? NO.
Is it confirmed that the issue is present and how severe it is? NO.
Is there any fix planned? NO.
Is there a ton of workarounds of various danger and complexity? YES.
Will it ever be fixed? Who knows.
Will the fix, if it ever comes, be backported? NEVER.
Is the ultimate answer to everything to just update to latest? Of course.

AWS Container Service

AWS has an AMI dedicated to running Docker. It is based on an Ubuntu.

As confirmed by internal sources, they experienced massive troubles to get Docker working in any decent condition

Ultimately, they released am AMI for it, running a custom OS with a custom docker package with custom bug fixes and custom backports. They went and are still going through extensive efforts and testing to keep things together.

If you are locked-in on Docker and running on AWS, your only salvation might be to let AWS handles it for you.

Google Container Service

Google offers containers as a service. Google merely exposes a Docker interface, the containers are run on internal google containerization technologies, that cannot possibly suffer from all the Docker implementation flaws.

Don’t get me wrong. Containers are great as a concept, the problem is not the theoretical aspect, it’s the practical implementation and tooling we have (i.e. Docker) which are experimental at best.

If you really want to play with Docker (or containers) and you are not operating on AWS, that leaves Google as the single strongest choice, better yet, it comes with Kubernetes for orchestration, making it a league of its own.

That should still be considered experimental and playing with fire. It just happens that it’s the only thing that may deliver the promises and also the only thing that comes with containers AND orchestration.

OpenShift

It’s not possible to build a stable product on a broken core, yet RedHat is trying.

From the feedback I had, they are both struggling pretty hard to mitigate the Docker issues, with variable success. Your mileage may vary.

Considering that they both appeal to large companies, who have quite a lot to lose, I’d really question the choice of going for that route (i.e. anything build on top of Docker).

You should try the regular clouds instead: AWS or Google or Azure. Using virtual machines and some of the hosted services will achieve 90% of what Docker does, 90% of what Docker doesn’t do, and it’s dependable. It’s also a better long-term strategy.

Chances are that you want to do OpenShift because you can’t do public cloud. Well, that’s a tough spot to be in. (Good luck with that. Please write a blog in reply to talk about your experience).

Summary

CentOS/RHEL: Russian roulette
Debian: Jumping off a plane naked
Ubuntu: ~~Not sure~~ Update: LOL.
CoreOS: Not worth the effort

AWS Containers: Your only salvation if you are locked-in with Docker and on AWS
Google Containers: The only practical way to run Docker that is not entirely insane.
OpenShift: Not sure. Depends how good the support and engineers can manage?

A Business Perspective

Docker has no business model and no way to monetize. It’s fair to say that they are releasing to all platforms (Mac/Windows) and integrating all kind of features (Swarm) as a desperate move to 1) not let any competitor have any distinctive feature 2) get everyone to use docker and docker tools 3) lock customers completely in their ecosystem 4) publish a ton of news, articles and releases in the process, increasing hype 5) justify their valuation.

It is extremely tough to execute an expansion both horizontally and vertically to multiple products and markets. (Ignoring whether that is an appropriate or sustainable business decision, which is a different aspect).

In the meantime, the competitors, namely Amazon, Microsoft, Google, Pivotal and RedHat all compete in various ways and make more money on containers than Docker does, while CoreOS is working an OS (CoreOS) and competing containerization technology (Rocket).

That’s a lot of big names with a lot of firepower directed to compete intensively and decisively against Docker. They have zero interest whatsoever to let Docker locks anyone. If anything, they individually and collectively have an interest in killing Docker and replacing it with something else.

Let’s call that the war of containers. We’ll see how it plays out.

Currently, Google is leading the way, they are replacing Docker and they are the only one to provide out of the box orchestration (Kubernetes).

Conclusion

Did I say that Docker is an unstable toy project?

Invariably some people will say that the issues are not real or in the past. They are not in the past, the challenges and the issues are very current and very real. There is definite proof and documentation that Docker has suffered from critical bugs making it plain unusable on ALL major distributions, bugs that ran rampant for years, some still present as of today.

If you look for any combination of “docker + version + filesystem + OS” on Google, you’ll find a trail of issues with various impact going back all the way to docker birth. It’s a mystery how something could fail that bad for that long and no one writes about it. (Actually, there are a few articles, they were just lost under the mass of advertisement and quick evaluations). The last software to achieve that level of expectation with that level of failure was MongoDB.

I didn’t manage to find anyone on the planet using Docker seriously AND successfully AND without major hassle. The experiences mentioned in this article were acquired by blood, the blood of employees and companies who learned Docker the hard way while every second of downtime was a $1000 loss.

Hopefully, you can learn from our past, as to not repeat it.

mistake - it could be that the purpose of your life is only to serve as a warning to others

If you were wondering whether you should have adopted docker years ago => The answer is hell no, you dodged a bullet. You can tell that to your boss. (It’s still not that much useful today if you don’t proper have orchestration around it, which is itself an experimental subject).

If you are wondering whether you should adopt it now… while what you run is satisfactory and you have any considerations for quality => The reasonable answer is to wait until RHEL 8 and Debian 10. No rush. Things need to mature and the packages ain’t gonna move faster than the distributions you’ll run them on.

If you like to play with fire => Full-on Google Container Engine on Google Cloud. Definitive high risk, probable high reward.

Would this article have more credibility if I linked numerous bug reports, screenshots of kernel panics, personal charts of system failures over the day, relevant forum posts and disclosed private conversations? Probably.

Do I want to spend yet-another hundred hours to dig that off, once again? Nope. I’d rather spend my evening on Tinder than Docker. Bye bye Docker.

Moving On

Back to me. My action plan to lead the way on Containers and Clouds had a major flaw I missed out, the average tenure in tech companies is still not counted in yearS, thus the year 2017 began by being poached.

Bad news: No more cloud and no more Docker where I am going. Meaning no more groundbreaking news. you are on your own to figure it out.

Good news: No more toying around with billions dollars of other people’s money… since I am moving up by at least 3 orders of magnitude! I am moderately confident that my new immediate playground may include the pensions of a few millions of Americans, including a lot of people who read this blog.

docker your pension fund 100% certified not dockeri — Rest assured: Your pension is in good hands! =D

73 thoughts on “Docker in Production: An Update”

Steven says:

23 February 2017 at 14:50

Containers in production on a battle tested substrate you say ? SmartOS / Joyent Triton.

LikeLiked by 2 people

Reply
- Khalil Gibran says:
  
  26 February 2017 at 10:09
  
  Or, the oldest and the first – BSD Jails on FreeBSD 🙂
  
  LikeLike
  
  Reply
stuffandthings says:

23 February 2017 at 17:50

CoreOS supports rkt and lxc as well. The statement “CoreOS is an operating that can only run Docker and is exclusively intended to run Docker” isn’t entirely accurate

LikeLike

Reply
- thehftguy says:
  
  23 February 2017 at 19:38
  
  Edited to “container” instead of “docker”.
  
  LikeLike
  
  Reply
AnthonyC says:

23 February 2017 at 17:55

We are currently using Docker in production for monitoring; we architect the applications with named-volumes so that that each containers or even the entire stack can be restarted with no real consequences.

So far we have no issues, the docker hosts are running Ubuntu 16.04 LTS.

LikeLike

Reply
justme says:

23 February 2017 at 18:00

You missed one group.

The money other)
Riding the wave of buzzwords for money making.

LikeLiked by 1 person

Reply
nwildner says:

23 February 2017 at 18:15

Maybe Arch Linux with LTS kernel + Debian images inside?

https://wiki.archlinux.org/index.php/Docker
https://wiki.archlinux.org/index.php/Docker#Images
https://www.archlinux.org/packages/core/x86_64/linux-lts/

You could use the default kernel, but it updates almost weekly, but it has all the shiny new features of it.

LikeLiked by 2 people

Reply
stochasticresonance says:

23 February 2017 at 19:12

All these problems are found in the wild and more but this not that different than the issues one might face in the early days of virtualization or the Linux kernel for that matter. The mismatch might be the level of hype against expectation. I appreciate that this was framed with different types of people to help balance the perspective. For the record, Cloud Foundry does not use Docker. That doesn’t mean there aren’t other container related issues but they are mitigated by not trying to do everything everywhere. Happy to go into more detail if you are into that. I’d also be up for doing a podcast sometime soon. You should have my email.

LikeLiked by 1 person

Reply
- thehftguy says:
  
  23 February 2017 at 19:57
  
  A few years of strong hype and advertisement and we reached a place where people at VISA* are considering Docker for their next development, because they don’t realize what they are getting into.
  
  Think of it as a simple re-framing of where Docker stands on the maturity curve. That is needed.
  
  * Feel free to replace VISA by any company that shouldn’t touch Docker with a 10 foot pole.
  
  LikeLike
  
  Reply
- creedio says:
  
  23 February 2017 at 22:33
  
  I’d be interested.
  
  The context is so important though – I’d also like to learn whether there are good or bad application profiles to run within docker. Is it the IO, CPU, RAM, network load, short va long running processes and even the application types e.g. Java vs ruby. I’ve had problems in the past with JVM apps running okay in one Hypervisor and core dumping on another.
  
  LikeLike
  
  Reply
Mustafa Akın says:

23 February 2017 at 19:19

I think, you are having too much drama. We run Hadoop/Kafka/Flink on Docker for months that hits 80k inserts per second. We never had a problem due to Docker on Ubuntu 14.04 even with AUFS. And I really question the credibility of your opininons because the CoreOS is the easiest one among the other ones, your decisions seem to based on emotions, not quantifiable facts.

LikeLiked by 2 people

Reply
- thehftguy says:
  
  23 February 2017 at 20:00
  
  Emotions based on thousands of hours of debugging of ever more mysterious bugs and crashes plus being called overnight at times because Docker fucked up.
  
  Remove the first two words of the previous sentence and you got your facts.
  
  LikeLike
  
  Reply
dayo says:

23 February 2017 at 20:46

small correction: there is no Ubuntu 15 LTS, only even years: 12.04, 14.04, 16.04

LikeLike

Reply
Hans says:

23 February 2017 at 20:47

While CloudFoundry accepts Docker image containers, the runtime is not based on Docker. Your post is entierly inaccurate there.

LikeLike

Reply
VigRoco says:

23 February 2017 at 21:23

I love it when the tech decisions I make at my company are validated by real-world experience higher up in the tech chain.

LikeLike

Reply
- thehftguy says:
  
  23 February 2017 at 22:16
  
  Glad I could help you to stay away from Docker 😀
  
  LikeLiked by 1 person
  
  Reply
pulse says:

23 February 2017 at 21:57

hi, i have two questions
do you have in plans testing rocket?
do you know what engine is google using for containers?

LikeLike

Reply
- thehftguy says:
  
  23 February 2017 at 22:15
  
  No, as I said I am retiring from containers.
  Not sure.
  
  LikeLike
  
  Reply
  - pulse says:
    
    23 February 2017 at 22:25
    
    can you share infromation in what language, apps that you are taking care of, are written? how do host them?
    
    LikeLike
    
    Reply
Dave says:

23 February 2017 at 23:25

I implemented application setup using docker on Ubuntu while working at HP Enterprise – it currently powers https://marketplace.saas.hpe.com/ and https://saas.hpe.com/. Development, builds, deployment, and production all revolve around docker and it actually works VERY smoothly and is stable.

My secret? Don’t use docker’s infrastructure automation tools. Use only the core runtime and image ecosystem (on a private registry). Pick your favorite infrastructure automation tool and marry the two. I used puppet.

LikeLiked by 2 people

Reply
roberthubbard2013 says:

24 February 2017 at 02:31

Openshift now is kube and cloud foundry has never been docker. It has its own container system called Diego. Oh and the Diego containers run Ubuntu. And cloud foundry is run by plenty of serious companies. 🙂

LikeLike

Reply
- thehftguy says:
  
  24 February 2017 at 18:55
  
  Yes, OpenShift is a commercial packaging of Kubernetes nowadays.
  
  LikeLike
  
  Reply
Justin King says:

24 February 2017 at 05:49

At my company our entire production line runs on Ubuntu Xenial/16.04.1 and kubernetes. With Google releasing it to the public i think the container war may already be over.

LikeLike

Reply
afasdfa says:

24 February 2017 at 09:10

Did you look at Joyent Triton? That’s the only sane solution I’ve seen

LikeLiked by 1 person

Reply
- thehftguy says:
  
  24 February 2017 at 18:55
  
  It’s not a Linux.
  
  Both Solaris, BSD and their children have had working containers for around a decade. Too bad they lost the war of the operating systems.
  
  LikeLike
  
  Reply
  - Khalil Gibran says:
    
    26 February 2017 at 10:14
    
    Can you describe “lost”, what I see is they are alive and have their users, the numbers might not be as much as GNU/Linux, but they are production ready.
    
    LikeLike
    
    Reply
    - thehftguy says:
      
      26 February 2017 at 16:20
      
      Lost = around 1% market share, going down consistently year over year.
      
      It’s not something you migrate to, it’s something you migrate away from.
      
      LikeLike
      
      Reply
      - Khalil Gibran says:
        
        15 March 2017 at 18:03
        
        Any references? I mean Intel recently announced support for FreeBSD including more participation in development of drivers. https://twitter.com/michaeldexter/status/840423816249589760
        I feel it could be lesser than 1%, one of the cause is rapid adoption of GNU/Linux compared to BSDs.
        Just like how people are flocking to the new hype – Docker 😉
        
        Next, the more I deal with docker the more I feel the need to move to BSD jails. In fact the other worthy alternative is OpenVZ, but can’t use it as it uses a custom distribution. Solaris Zones – not sure about the level of hardware support OpenIndiana can offer.
        
        LikeLike
  - alasdairlumsden says:
    
    21 March 2017 at 12:51
    
    Joyent’s SmartOS/Triton runs Linux software inside “Linux branded zones” – system call translation that allows Linux software to run on top of the OpenSolaris-derived kernel. That’s what their docker implementation leverages.
    
    It’s pretty entertaining that SmartOS can run Linux software and docker better than Linux can.
    
    LikeLike
    
    Reply
    - Curt J Sampson says:
      
      22 March 2017 at 00:55
      
      Are you sure that this is “Linux” software and not “Unix” software? There’s not a huge amount of Linux-specific software out there, and there have always been various applications that will run better on one Unix variant than another.
      
      LikeLike
      
      Reply
Rickard says:

24 February 2017 at 13:18

“AWS has an AMI dedicated to running Docker. It is based on an Ubuntu.”
Untrue, it’s based on CentOS/RHEL but has evolved quite much.

“Google merely exposes a Docker interface, all the containers are run on internal google containerisation technologies, that cannot possibly suffer from all the Docker implementation flaws.”
You got this mixed up also. Google Container Engine launches GCE instance which either runs a customised Debian 7 image with Docker or the new Container Optimised OS with Docker.

LikeLiked by 1 person

Reply
- thehftguy says:
  
  24 February 2017 at 19:29
  
  @Amazon
  It seems the Linux Image AMI was based on CentOS, while the ECS AMI was based on Ubuntu.
  Can’t confirm though in the absence of an AWS account and the fact that Amazon erased all official references to a base operating system.
  
  @Google
  Nope. A Google employee messaged me to deny that their run on internal technologies without Docker.
  Then I asked what they use and he proceed to list a series of operating systems and software that are not only internal Google technologies, but purposefully NOT packaged for use by the rest of the world (that should use CoreOS) by his own world.
  
  If there is anything mixed up, the aforementioned companies are free to take an official stance, instead of going to great length to hide their stack and ensure it doesn’t exist outside of their platform.
  
  LikeLike
  
  Reply
rickardvonessen says:

24 February 2017 at 13:20

“AWS has an AMI dedicated to running Docker. It is based on an Ubuntu.”
Untrue, it’s based on CentOS/RHEL but has evolved quite much.

“Google merely exposes a Docker interface, all the containers are run on internal google containerisation technologies, that cannot possibly suffer from all the Docker implementation flaws.”
You got this mixed up also. Google Container Engine launches GCE instance which either runs a customised Debian 7 image with Docker or the new Container Optimised OS with Docker.

LikeLiked by 1 person

Reply
filippmm says:

24 February 2017 at 14:11

You really need to try illumos/SmartOS. 🙂

LikeLiked by 2 people

Reply
Meir says:

25 February 2017 at 07:36

This article was brought to you by VMWare

LikeLiked by 1 person

Reply
- thehftguy says:
  
  25 February 2017 at 17:01
  
  Ahah. Not at all.
  
  But good reminder, VmWare is a way more solid contender than Docker if ones wanna make a private cloud.
  A decade of service and it’s always been flawless.
  
  LikeLike
  
  Reply
  - mtomczak says:
    
    26 February 2017 at 10:16
    
    I wouldn’t call VMware flawless – especially recently. They started releasing products that are not tested and they cannot fix major issues within expected time. Still – good product – but not flawless.
    
    LikeLike
    
    Reply
  - Ben Corrie says:
    
    27 February 2017 at 16:29
    
    And you may note that VMware is building a product that uses the Docker grammar, image format, client and APIs but targets vSphere directly, thus avoiding the majority of Linux issues identified here. http://github.com/vmware/vic.
    
    LikeLike
    
    Reply
Richard Jacquier (@rjbris) says:

26 February 2017 at 23:45

I’m in the amateur group for Arch Linux (using ext4 + overlay2) and I’ve not had any issues. Manjaro (net version) may be a safer option, as there is some level of testing from the upstream Arch repo before it makes it to the Manjaro repo.

At work (now wearing my ‘Professional’ hat) we use RHEL 7.2, and we’ve definitely had issues with XFS + Overlay (not overlay*2* to be clear).

Other than that I’ve never seen/experienced a kernel panic – websites, services, database engines (using volumes for data storage).

Question for the Author: what type of operations is your container’s application doing to cause a kernel panic. Are you performing low level OS operations?

LikeLike

Reply
- thehftguy says:
  
  27 February 2017 at 00:09
  
  Some web apps and support services. Nothing fancy really. (The trading systems, as the title of the blog suggests, are not dockerized for good).
  
  The kernel panics are a specialty of AUFS on Debian stable. It seems like a race condition, an host can run fine for a month as well as crash 3 times in a single day. All build and tests systems -which can build a couple of containers in parallel at times- are especially unstable.
  
  On CentOS/RHEL, various versions, I personally tested and witnessed dead containers, corrupted volumes and docker daemon lockdown, when using device mapper and LVM volumes, but it’s limited to screwing one container at a time, not the entire host.
  
  LikeLiked by 1 person
  
  Reply
- brisalta says:
  
  12 April 2017 at 05:47
  
  If using XFS make sure to create the XFS file system with ftype=1 or you may run into problems with running an Overly file system on top of it.
  
  LikeLike
  
  Reply
Richard Jacquier (@rjbris) says:

26 February 2017 at 23:47

I’ve had a ‘play’ with RancherOS, but nothing serious – has anyone got any comments about RancherOS?

LikeLike

Reply
- Yun Zhi Lin says:
  
  3 March 2017 at 04:26
  
  For the last 3 years I’ve implemented Rancher orchestration + Rancher OS + AWS in production for 2 successful startups and now doing the same at a major telco with more advanced autoscaling and AMIs. I currently serve over 1M customers with a large number containerised Microservices running 24×7 across multiple product domains, I haven’t experienced anywhere near the amount of instability described here.
  
  The only issue we had was our infrastructure team tried AmazonLinux and it didn’t supporting AUFS at the time, so we decided to stick with RancherOS. It’s a simple matter of picking the right tools for the job.
  
  LikeLiked by 1 person
  
  Reply
  - thehftguy says:
    
    3 March 2017 at 18:57
    
    https://github.com/rancher/os/releases/tag/v0.0.1 => First commit is dated Feb 2015.
    
    It’s indigenous to say you’ve been using Rancher OS for 3 years considering it didn’t exist 3 years ago.
    
    I won’t even get into how an initial commit is hardly a full-featured production-ready battle-tested release and you call that picking the right tool for the job.
    
    LikeLike
    
    Reply
    - Yun Zhi Lin says:
      
      6 March 2017 at 10:13
      
      Sorry I generalised the dates a bit and left out some details off the top of my head. I should have said I’ve been using Docker for 3 years in production with mostly minimal Docker-based distros: 2014 to early 2015 was CoreOS+Deis followed Ubuntu+Tutum. After Tutum became Docker Cloud in late 2015 I moved to RancherOS+Rancher Cattle from 0.4 release onwards and has been upgrading ever since.
      
      I’m not suggesting anyone to use RancherOS v0.0.1 nor am I saying RancherOS is a silver bullet. I’m just sharing my experiences with it. Compared to CoreOS, RancherOS has the advantage of “system” containers being segregated from the “user” containers (think sudo but for containers), it’s much more stable because your individual user containers can’t bring down entire machine (and things gets real funky when that machine is running fleetctl).
      
      I’ve read your previous article and gained some context. Interestingly enough we also have similar number of servers and actually apply most of the work arounds your mentioned: 3+ node HA, autoscaling groups, stateless, and no docker DB. I agree in your scenario docker may not be right tool, regardless of distros.
      
      LikeLike
      
      Reply
Hassan says:

28 February 2017 at 16:40

Rancher is horribly unstable.

LikeLike

Reply
- thehftguy says:
  
  28 February 2017 at 19:54
  
  It would be appreciate for you to comment on the bugs and failures you encounter with it.
  
  LikeLiked by 1 person
  
  Reply
Conan Kudo (ニール・ゴンパ) says:

1 March 2017 at 09:56

I wonder if SUSE Linux Enterprise 12 might be a good option. Like Ubuntu, it has updated kernels (As of SP2, it runs kernel 4.4) and has all the new feature stuff and an up to date Docker. Unlike Ubuntu, it has a usefully long life-cycle (13 years).

LikeLike

Reply
Jan says:

2 March 2017 at 08:43

Using docker successful on debian Jessie for years on a variety of servers. It runs all sorts of software. Databases, Webservices, Webservers, compute clusters, …

I had never any panic or corruption. I’m using btrfs and the stable kernel (3.16).

The only downtime I had was when I moved from 1.5 to 1.12. but even that was rolling and did not result in unavailable services.

I find your post uninformed and without any proof of your points not really credible. Of course, docker have its flaws and the way it’s developed as a fast moving target causes a lot new problems. But it is often worth he effort – and not quite that problematic to deploy.

I’m sorry for you that you experienced so much trouble. But if you investigate technology as you write blog posts it’s nothing to wonder about. Im a bit worried that you manage huge quantities of money with this attitude. Maybe try to understand concepts and technologies before you try o deploy them. Understanding and knowledge helps a lot in the field of IT.

LikeLiked by 1 person

Reply
- Jan says:
  
  2 March 2017 at 08:47
  
  Btw: aufs was never part of the mainline kernel because of a reason. It was part of debian for its live-system-support. Even that was quite controversial. If you use debian stable I suggest btrfs.
  
  LikeLike
  
  Reply
- thehftguy says:
  
  3 March 2017 at 19:00
  
  You picked an exotic filesystem and it worked for you. Happy for you =)
  
  LikeLike
  
  Reply
  - Jan says:
    
    8 March 2017 at 19:32
    
    I think aufs is the exotic FS, as its not even contained in the mainline kernel (and it never was). BTRFS is even the default-FS for SLES these days. It might no existed a few decades ago but calling it exotic is somewhat ridiculous.
    
    LikeLike
    
    Reply
“Docker has no business model and no way to monetize” https://thehf… | Dr. Roy Schestowitz (罗伊) says:

4 March 2017 at 20:55

[…] has no business model and no way to monetize" https://thehftguy.com/2017/02/23/docker-in-production-an-update/ not until days ago, when #docker added proprietary addons […]

LikeLike

Reply
- thehftguy says:
  
  5 March 2017 at 14:34
  
  Still correct. The monetization is flimsy, at best.
  
  LikeLike
  
  Reply
Rickard says:

8 March 2017 at 18:00

“@Google
Nope. A Google employee messaged me to deny that their run on internal technologies without Docker.
Then I asked what they use and he proceed to list a series of operating systems and software that are not only internal Google technologies, but purposefully NOT packaged for use by the rest of the world (that should use CoreOS) by his own world.”

You are mixing what Google run them self (borg, linux with cgroups) and what “Google offers containers as a service” (GKE) which if you launch a GKE cluster with gci image you currently get Docker Engine v1.11.2 (4dc5990).

LikeLike

Reply
Dan says:

18 March 2017 at 22:01

Your article has successfully warned me off of Docker, thank you so much for that. Wondering if you have any opinions on LXC+LXD? Is it any better than Docker?

My use case is to simulate a large number of small instances by using LXC containers on a handful of Digital Ocean droplets, spread over two or three regions, but when traffic hits be prepared to rapidly replace each LXC container with a full VM instance, first on Digital Ocean and eventually to GCE. (Before discovering your blog, the plan had called for AWS but I will also take your warning and plan to scale into GCE instead.)

The app is to be an SaaS with microservice architecture, so we have at least 12 or 15 different Ansible roles to deploy (one per microservice, plus web, db, caching, lb, monitoring and all the rest). At scale, (one or more) vm instance per role becomes affordable and necessary, but until then, we have to cram multiple roles onto each of a handful of Droplets. With Docker out of contention, the main options I see are LXC or, if all else fails, stuff multiple Ansible roles directly onto each Droplet. The LXC option strikes me as much preferable for management, for more realistic testing, and for smoother scaling with minimal changes to the Ansible code. But, that hinges on whether LXC/LXD actually work as advertised.

Further info: all instances will be Ubuntu 16.04, planned DevOps stack includes Ansible (or possibly Saltstack, opinions on that also welcome), GitLab and GitLab CI, Terraform and Consul. The app stack itself will run on Python plus some web framework, PostgreSQL, Nginx and/or Varnish, HAProxy, and whatever else needed for caching, monitoring, etc.

Any advice welcome. Thank you.

LikeLike

Reply
- thehftguy says:
  
  19 March 2017 at 13:24
  
  No opinion on LXC/LXD, never used them directly.
  
  I don’t know why you’d want to simulate small instances. Buy small instances if you want small instances, buy larger instance if you want larger instances. There is no automation tool that will allow to change easily on the fly.
  
  You seem quite cash constrained, contrary to the clients I work with. I’d advise to look at the price of GCE instances and network and disk, before you switch to DO. Google Cloud is sure half of AWS, yet they are both significantly more than DO.
  
  I’ve been at places with ansible roles to deploy services, it’s great!
  You can deploy many roles on the same host, as far as they don’t have conflicting dependencies, people have been doing that for decades, that makes Docker looks like snake oil. Use virtualenv for python and manage your requirements properly.
  
  Not familiar with SaltStack. They’ve been the two tools of choice recently: Ansible vs Salt. Not sure which is ahead.
  
  LikeLiked by 1 person
  
  Reply
alp says:

5 April 2017 at 08:13

> GKE runs on internal google technology, not Docker

Seriously this is quite misleading. You should consider amending your blog post. “Google containers” is not a thing as you can easily search on the web (including the double quotes).

LikeLiked by 1 person

Reply
- thehftguy says:
  
  5 April 2017 at 19:21
  
  edited.
  
  LikeLike
  
  Reply
skies says:

12 April 2017 at 12:37

CoreOS uses rkt (Rocket) container-runtime which in my opinion is better designed from a architecturall standpoint. It doesn’t have a daemon/server process like Docker that is fragile and is a SPOF.

I do run quantitative analysis work loads on Docker, but these are batch jobs that fetches and crunches data from database and can be rerun up on failure. I would agree that real-time/semi-realtime transactional systems that handles large amounts of money, container tech is not quite ready at the moment. Linux container tech is still fairly new and not yet mature. Give it another 2-3 years to mature and things may be different.

LikeLike

Reply
tdg5 says:

17 April 2017 at 12:48

Great round up, thanks!

After reading your article I did some research on Debian Stretch and discovered that it will ship with v4.10 of the Linux kernel, FYI: https://lwn.net/Articles/678770/

LikeLike

Reply
- thehftguy says:
  
  20 April 2017 at 22:32
  
  corrected
  
  LikeLike
  
  Reply
goldyfruit says:

5 June 2017 at 15:42

Thanks for the post, full of feedback 🙂

I started to play with Docker nice months ago when I left Red Hat for an another company and since we had issues almost every months !

We are using Docker to run OpenStack on Ubuntu 16.04.2, the day to day is pretty OK until… Docker failed for an unexpected reason and then it’s Armageddon…

I tried Docker with Ubuntu 14.04/16.04, Debian 8.x/9, every time it’s a new issue.

The latest: https://github.com/moby/moby/issues/33497

LikeLike

Reply
Joakim Karlsson says:

1 August 2017 at 07:56

Rancher is complete and utter s*it. avoid at all cost and deploy Kubernetes directly on bare-metall instead

LikeLike

Reply
- thehftguy says:
  
  1 August 2017 at 15:11
  
  Care to elaborate?
  
  LikeLike
  
  Reply
- kjhkjhkhlkjhkjh says:
  
  3 August 2017 at 20:47
  
  yeah please elaborate, since uh… we are going to use rancher next (we abandoned docker enterprise edition). kubernetes, are you joking?
  
  LikeLike
  
  Reply
Roman Gaufman says:

14 August 2017 at 22:34

If you are spinning up cloud services, then I think you have a point, I struggle to see a need for Docker. However where I am considering it is on embedded decides such as Raspberry Pi and similar, where you have thousands of them and the only other alternative appears to be either traditional embedded linux firmware updates or something like Ansible/Chef. What are your thoughts on Docker in this use case?

LikeLike

Reply
- thehftguy says:
  
  15 August 2017 at 18:41
  
  Docker has nothing to do on any embedded device.
  
  What do you do with thousands of PI?
  
  LikeLike
  
  Reply
Is Docker highly sensitive to the kernel, the host and the filesystem it’s using? – program faq says:

9 January 2018 at 14:59

[…] in developing and deploying Python applications. In my research, I found a widely circulated blog post that […]

LikeLike

Reply
Docker Struggles – Systems Lab says:

18 February 2018 at 03:03

[…] I was originally excited when docker was going to be included in the next release of unraid, the concept behind it was solid and sounded like it would make management of my server easier. This was the case for months before docker started acting up. Now I’ve been working on a way to remove any need of docker on my NAS, moving it to a VM or another server due to its instabilities. Issues I’ve run into include it not being able to stop running containers, start stopped containers, create new containers, and preventing Linux from shutting down. I could live with all of the above except the shutdown bug. It doesn’t just prevent shutdown from running, but it prevents the kernel from shutting down at all, and well after the user shells are all offline, so there’s no way to manually kill docker to allow the system to shut down safely. This is exceptionally frustrating and has caused unclean shutdowns when I’ve lost power and even when I’m just doing maintenance, since the only way to restart when docker does this is to do a hard reset. I’m not giving up hope on containers, just going to be a bit more careful around docker, they seem to advertise quite well compared to issues people have had with their software. […]

LikeLike

Reply
Gary Mort says:

8 June 2018 at 18:52

I don’t understand what your problem with Docker is. Docker is a great system. I use it for development all the time. Properly configured per project, it provides a great balance between local dev performance and emulation of the production environment. Without Docker, close emulation of the production environment for dozens of different projects/environments results in either an overloaded local system, or a “single size fits none” dev environment which results in constant release issues.

Docker is simply wonderful for development…. oh wait, you were talking about production? 🙂

Thanks for your great posts on real world experiences. I find so many posts about people who just reused systems that worked fine for development, and then walked away a month later before things fell apart. Or staked their reputation on it and were not able to admit to any problems.

I find it telling that companies currently selling Docker as “the answer” for web servers to the masses, turn around and sell their enterprise customers dedicated systems.

LikeLike

Reply
steevy says:

13 March 2019 at 20:01

Hi there,

Are you planned to make a 2019 update of this article ?
I would be please to read id 🙂

Best regards

LikeLike

Reply
- thehftguy says:
  
  14 March 2019 at 19:42
  
  I would love to. That depends where my job leads me.
  
  LikeLike
  
  Reply