Alert! This is another rant blog post! I promise the next one will be more technical :-)
Rocket launch me some opinions
Past few days I’ve been playing around with Rocket, the new container runtime by CoreOS. Quite a few people have asked for my opinion so I figured I would put it into a blog post. I haven’t planned on publishing it any time soon, but few blog posts I have come across recently have prompted me to put off the original post and write a different one first. So before I finalise I need to get something out of my chest first. Anyhow, let’s get to it!
First of all, let’s start by setting up some rules. For those who are comparing Docker with Rocket I would advise you to either stop right now or at least do a proper research. SERIOUSLY. If you ever compare a software which has been developed by thousands of open source developers around the world past 1+ years or so, the software which has been and is being deployed and tested in real production environment with “fresh from the oven” kit, then surely there is a big likelihood that the more mature software - if it’s not a total pile of shite - will come out as a winner. Don’t you think ?
Much fairer comparison would have been to compare Docker 0.1.1 with the latest pre-release of Rocket. Furthermore, Brandon Philips has said a few times already that some of the features that are available in Docker are not planned to be implemented in Rocket - not right now and maybe never. Why ? Because focus of the Rocket project - at least where it stands now - is not to reimplement Docker. The focus of the projec is simply not the same as the focus of Docker inc. Rocket is an implementation of App Container specification, hence much better comparison would be a comparison between App Container specification with original Docker Manifest or some kind of other specification Docker implements.
Ok, now that we have a COMMON SENSE off the table, let’s move on.
One of the most discussed topics of Rocket is systemd-spawn or systemd in general. CoreOS have picked systemd as the init system for their Linux distro. They could’ve gone with Init scripts (oh my!) or upstart (ouch!), but they went with what is de facto going to be the standard init system in the near future on all mainstream Linux distributions. If I have to be honest, I don’t have a proper opinion about systemd. I’ve read a few of Lennart’s blog posts and few other ones on the internet. I liked some things, disliked other ones, but even for the sake of the abve mentioned, picking systemd as an init system of the new Linux distribution was really a no brainer for CoreOS. Plus, CoreOS have earned my confidence so I trust them with this one without a blink of an eye. You might argue by throwing the choice of btrfs at me, but even that might go away soon. I like that - shit does not work - it either has to be fixed, or if the cost of fixing it is too high to pay, it should be replaced by a reasonably reliable alternative. After all stability is more valuable than features.
Ok, let’s get back to the point. One of many things systemd does or rather is capable of is process supervision. You might argue that the way systemd it approaches this problem somewhat overlaps the concepts of service and process supervisions and unnecessarily overcomplicates things which would probably be better done outside of PID 1, but it has this capability and since CoreOS is an extensive user of systemd it would be quite surprising for the guys to come up with something else or use some other tool for the job.
Furthermore Brandon said that the reason why the initial implementation of Rocket uses systemd-nspawn is because they wanted to use systemd and systemd-nspawn was already implemented and was doing what they intended to do, so it helped to kick off the project. I don’t remember him saying they’ll be sticking to it forever - they might, they might not. Who knows. And frankly, at this moment of 0.1.1 release, I don’t really care. Remember, Rocket is designed to be pluggable. If you don’t like systemd-nspawn you can use your own implementation of stage1. Rocket already provides particular cli arguments, so go nuts and start hacking.
Another importan point some people are getting totally wrong is that if you want to use Rocket you need to run systemd. Wrong! Rocket does NOT require systemd at all. It should work with any other init systems like SysV or upstart. When I played around with it I was testing it on upstart and did not encounter any issues. Rocket merely “reuses” systemd and systemd-nspawn to handle stage1 and stage2
It seems that some people probably haven’t given App Container specification a proper read. They keep crying about systemd blah blah and saying how awesome Docker is that it does not need systemd to run processes in containers. Docker handles - or rather is supposed to handle - the one-process container supervision via Docker daemon. Now, let’s face it. Docker daemon has not been written as a process supervisor and that behaviour is already kinda showing up when your containers die. You end up hacking around that by using some kind of process supervisor on the host or even in the container. If you have not experienced this you probably either have been lucky or haven’t run lots of containers in production.
So here we are. We need a process supervisor sometimes. Or some sort of functionality process supervisors provide. Wait a second, it gets better. If you read the App Container specification properly you will come across this point. I’m going to copy paste it here for reference:
“A container executes one or more apps with shared PID namespace, network namespace, mount namespace, IPC namespace and UTS namespace. Each app will start pivoted (i.e. chrooted) into its own unique read-write rootfs before execution. The definition of the container is a list of apps that should be launched together, along with isolators that should apply to the entire container.”
The above snippet clearly mentions several processes sharing linux namespaces. In other words, App Container’s definition of container suspiciously resembles Kubernetes’ Pod. I don’t know if it has anything to do with CoreOS being actively involved in Kubernetes development, but that’s the fact. It seems like they are the same thing, but they are really not. Kubernetes Pod is a set of containers as oppose to a set of multiple processes running in one container. That gives you a benefit of composing a Pod from multiple docker images. This is where App Container spec “jumps” in again and defines a concept of dependencies, so your container can depend on other containers, so the final container runtime manifest created by stage0 might look like this.
Now, that you have several processes running in the container and some of them can be daemons - yep, you’ve guessed it - they might need to be supervised. So, you need a process supervisor. Now, arguably one of the best supervisors I’ve ever used is runit, but again, why would you want to write runit scripts or use any other supervisor when you have already amassed a lot of experience with systemd? If I were in the same position I would make the same decision as the CoreOS did - go with systemd.
Docker and namespaces
Now let’s go back to Docker. Docker have been advocating a concept of running preferably ONE single process per container. In general I agree with this, as Linux containers are really just isolated processes on the host and running one process per container gives you or is supposed to give you more flexibility and composition, independent updates and rollbacks etc. i.e. easier life for an operator. However, there is a subtlety hiding. When you create a new Docker container or in fact LXC container, the container is allocated fresh set of ALL new Linux namespaces provided by libcontainer. I find this kinda wasteful and unnecessary if I have to be honest. With Docker it kinda make sense as you are supposed to run a single process in the container and if you want to be generic to provide process environment to cover majority of use cases you might need to provide a large chunk of of all available namespaces which libcontainer effecetively does.
By creating a new set of namespaces for just one single process you are adding one extra bit of work to Kernel to keep an eye on. If you run loads of containers on your host you might start hitting some Kernel limits. You might argue the overhead is small, but why would you want to unnecessarily over occupy Kernel anyway? I remember hitting these Kernel limits when running hundreds of LXCs on my host deliberately, when I was playing around with LXC a while ago kinda trying to push the Kernel as far as possible.
So to save the overhead, you can share the namespace between the processes. The problem though is if you start sharing namespaces and the container which created them dies, it takes down all the other processes it shares the namespaces with. This is one of the issues Kubernetes guys were dealing with when designing Pods. Kubernetes Pod is allocated an IP address from some internal /24 VLAN which is assigned to the first created container. This IP address is then shared between all containers within the Pod which share the same network (and other) namespaces.
You can see the subtlety here. Whenever the “network” creating container died it took down all the other containers in the Pod with it as there was NO namespace to share with any more, so they would have to be cleaned by replication controller and new Pod would be created from scratch and assigned new IP … or not. That’s not really important. What is worse, your links die too. So when your container dies and you can’t restart it automatically ie. it’s not properly supervised all linked containers are effectively useless you need to restart them. I’m sure there are some solutions based on watching Docker events for dying containers but let’s face it, that’s just a hack around. So if you want multiple processes running inside Docker container you need……drum rolls…a process supervisor.
Lastly, logging. Reliable logging that is. Process supervisor is supposed to take care of this. Normally the process supervisors do this by capturing the process stdout/stderr and route it into a log. Now read what App Container Spec says about logging:
“Apps should log to stdout and stderr. The container executor is responsible for capturing and persisting the output.”
Again, systemd seems like an obvious choice to go with for CoreOS. Runit is pretty awesome at this, too, but again this falls down to what I’ve mentioned already above. Expertise, unnecessary extra work etc.
The state of Docker logging is documented in almost 6 months old Logging plugin proposal. It has yet to be implemented and truth to be said, at this time I’ve already forgotten what it was about, but given it was proposed by Michael Crosby I have big confidence and trust in it.
Experience the user and Use the experience
Ok, now that we have the important bits of App Container spec covered, one last point people seem to be crying about is User Experience provided by Rocket. Again. Please: GIMME A BREAK. Latest Rocket release at the time of this writing is 0.1.1. Have you seen what the first car looked like? It looked something like this. That was not the greatest of user experiences. Nope. It was however the first step towards Ferraris and Porsches.
But AGAIN and most importantly, Rocket is an implementation of App Container specification. It is totally up to you to implement your improved implementation of Dockerfiles, Docker daemons and whatever else your imagination gives you. You can implement other niceties you like when using Docker. Rocket does not enforce anything on you. Not even Stage1! You want Docker as a supervisor? Hack on it and pass it in an argument to
rkt run as a Stage1 process. Boom! I can imagine CoreOS might implement some nifty tools in the future and even improve user experience for Ops and Devs, but I’m guessing those toold will be created as separate projects and not as part of the core of Rocket. At this point I would argue it’s important to get the project hacked to some stability.
Switching off the rant
To close this rant off, I’d like to invite you to hack on both Docker and Rocket. Hell, if you like coding in C, you should get involved in LXC or the likes, too. There is so much work to be done and so much tremendous opportunity for you to be part of something awesome happening in our industry at the moment.
Lastly I want to make absolutely clear that I love Docker. I love hacking on it every now and then. I love community around it and chatting to the awesome people at various Docker IRC channels. This rant was not about Docker or Rocket being bad or good. It was about getting out my 5 cents to the topic of Docker vs Rocket comparison and the questionable opinions I’ve come across recently, mostly by people who probably didn’t even bother to read the App Container spec. I’ll be covering Rocket on this blog in near future more and more so stay tuned if you’re interested.
Now,let’s hack and keep on shipping Dockers, launching Rockets, executing LXCs…