September 10, 2017

An introduction to end-to-end encryption in Matrix and Riot

21:10 -0400

Disclaimer

End-to-end encryption in Matrix and in Riot are in Beta, and may be subject to change.

I have made every effort to ensure the accuracy of the information in this post, but this should not be viewed as an official guide to end-to-end encryption in Matrix or Riot.

Introduction

End-to-end encryption is one of the main features of the Matrix communications protocol and of Riot, a glossy client for Matrix. This post provides a high-level overview of what end-to-end encryption is, how it works in Matrix and Riot, and how to use it. It is intended to be understandable to people who are starting with little to no knowledge of encryption, while still being as accurate as possible, and the goal is to help people get a better understanding of end-to-end encryption in Matrix so that they can use it more securely and effectively.

What is end-to-end encryption?

Encryption is a way of ensuring that unauthorized people cannot view information that is not intended for them. Encryption takes the information and, using an encryption key, scrambles the information in a such a way that it cannot be read without the corresponding decryption key. (In some encryption systems, the encryption and decryption keys are the same, whereas in others, they are different.)

In some communication systems that involve a server, the connection between each user and the server is encrypted so that anyone who taps into that connection cannot read any messages. By default, all communication in Matrix is encrypted in this way. However, this still allows messages to be read by server administrators, or anyone who manages to gain access to the server.

End-to-end encryption (sometimes abbreviated as e2e encryption, or simply e2e or e2ee) means that messages are encrypted by the sender in such a way that only the people you are communicating with can read it — none of the servers in between can read the message.

Why do I need end-to-end encryption?

Whether it's our credit card or banking details, health records, corporate strategy, or even plans for a surprise party, we all have things that we would prefer not to be made public. End-to-end encryption helps maintain your privacy.

Using end-to-end encryption even for messages that don't need to be secret also helps increase the security of messages that do need to be secret, as it prevents someone from determining which messages have sensitive information and which ones don't.

Are all conversations in Matrix end-to-end encrypted?

End-to-end encryption can be enabled on each room individually. While encryption is still in beta, all rooms are unencrypted by default. Once encryption is out of beta, then private rooms will be encrypted by default.

If you have sufficient privileges (normally moderator or admin permissions) in a room, you can go to the room settings and enable encryption. Note that once encryption is enabled in a room, it cannot be disabled again.

Riot indicates encrypted rooms with a locked icon next to the message input box, and unencrypted rooms with an unlocked icon.

Why won't all rooms be encrypted?

There are several reasons why some rooms will not be encrypted even after encryption is out of beta. In brief, some of the reasons are that encryption interferes with certain types of integrations (including the bots and bridges hosted by matrix.org), encryption prevents people from reading messages sent before they joined the room (which is useful for some rooms such as rooms used as support forums), encryption can slow down sending messages (which should not be noticeable in small rooms, but could be quite significant in large rooms), and encryption is of questionable value in a room that anyone can join and read.

What's the deal with all these devices?

Matrix encrypts messages to devices rather than to users. This allows for greater flexibility and privacy. For example, if your phone gets stolen, then you can tell your contacts to blacklist your phone, and whoever has your phone will not be able to decrypt any future conversations, without affecting any of your other devices.

Why does Riot complain about "unknown devices" when I send a message in an encrypted chat?

When you try to communicate with someone, Riot will fetch the list of that person's devices from the server, including an encryption key for each device that can be used to encrypt messages so that they can be read on that device. However, Riot has no way of determining whether that the key is legitimate or if it was planted or altered by someone trying to snoop in on your conversations, so it warns you when it encounters a device that it hasn't seen before.

Riot allows you the option to send messages even to devices that you haven't verified, or to verify the key to tell Riot that it is trusted, or to blacklist the device to tell Riot that it should never encrypt messages to that device.

How do I verify devices?

Note that the current device verification process is only temporary and in the future will be replaced by something that's easier to use.

In order to verify someone's device, you need to have some reasonably secure way to communicate with them. It doesn't have to be secret (if someone listens in on the key verification process, it won't make it any less secure), but it has to be something that won't allow someone else to be able to impersonate you or the device's owner. For example, if you know the device owner's voice, you can phone them, or even start a video call with them in Riot. You can also verify someone's devices if you meet them in person.

When you're ready to verify someone's devices, you can click on their avatar in any conversation that you have with them, and Riot will show you a list of their devices. Find the device that you want to verify, and click the "Verify" button under it. This will show the device's name, ID and key.

The other person will then have to go to their user settings on the device that you want to verify, and find the device key there. You can then compare the keys, and if they match, then you can click the button saying so, and their device is now verified.

Repeat this for all of their devices that you want to verify.

This may seem like a lot of work, and it is, but there are plans to improve this in the future, before end-to-end encryption leaves Beta. For example, in the future your devices may be able to vouch for each other so that others will only have to verify one of your devices.

How does encryption work in Matrix?

Conceptually, when you first send a message in an encrypted room, your Riot client generates a random key to encrypt your message, sends the encrypted message to the server, and then sends the decryption key to all the devices in the room that should be allowed to decrypt the message. Of course, the decryption key is sent encrypted (based on¹ the device's unique key, which you verified above) so that it cannot be intercepted. The recipient then fetches the message decryption key and the encrypted message and decrypts the message.

In order to avoid having to re-send decryption keys to every device for every message you send, Matrix's encryption system includes a method for generating a new key based on an old key. So for the next message you send, your Riot client will use that method on your previous encryption key to generate a new key, and the recipients will use the same method and generate the same key, so that when you send a message encrypted using the new key, the recipients can decrypt the message without any extra key exchange. The new key will only need to be sent to any new devices that showed up in between when the first message was sent and when the second message was sent.

Riot will occasionally start from scratch, generating a new random key and sending it to all the devices in the room. This happens, for example, whenever someone leaves a room, after you have sent a certain number of messages, or after a certain amount of time.

As a result of how encryption is done in Matrix, there are several encryption and decryption keys being used. The main ones that you may need to be aware of are the device keys and the message decryption keys. The message decryption keys allow you to decrypt encrypted messages, and device keys allow you to send the decryption keys securely to other devices. Device keys are unique to each device and cannot be copied from one device to another, whereas decryption keys may be sent from one device to another, or exported from one device and imported to another, in order to allow you to read older messages.

¹ The decryption key is not encrypted directly with the device's key, but uses a more complicated method to improve security.

Help! I can't read some encrypted messages!

There are a few main possible reasons for not being able to decrypt a message.

The first possible reason is that you were not a member of the chat when the message was sent. In this case, it is by design that you cannot decrypt the message; decryption keys for messages are only sent to the users that are in the room when the message was sent.

Another possible reason is that your device was not registered at the time the message was sent. When a message is sent, the sender only sends the decryption key to devices that it knows about; when you log into a new device, that device has not yet received the decryption key for the message, and so cannot decrypt the message. (Note that when you log out and log in again, your new session is considered a new device from Riot's perspective.) There are two ways around this. One way is to export the decryption keys from another device that is able to decrypt the message, and import the keys into the new device. Another way is to verify your new device with another device: When Riot encounters a message that it cannot decrypt, it will ask your other devices for the decryption keys for that message. If you have verified that device from your other devices, then they will send the decryption key to your new device. Recent versions of Riot may automatically prompt you to verify new devices.

The final reason that you might not be able to decrypt a message is that you have encountered a bug. If you are interested in the technical details, you can see the tracking issue for encryption bugs, but the short story is that developers are aware of most (if not all) of the bugs and are working on fixing them. Some bugs can be worked around by the sender clearing Riot's cache and reloading (in their user settings), or by leaving a room and rejoining. Other bugs can only by worked around by logging out and logging back in. However, note that this will create a new device that will need to be re-verified by others, and you will probably want to export your decryption keys before logging out and import them after you log back in so that you can read old messages.

When will encryption be out of beta?

Before encryption is out of beta, the developers need to fix some of the remaining bugs that prevent people from decrypting messages that they should be able to decrypt, and to make the device verification process more usable. It is difficult to estimate when this work will be completed as the developers are working on other issues as well.

0 Comments
July 14, 2017

Matrix community roundup

08:34 -0400

Last week, the Matrix team put out a call to arms for the community to support Matrix financially, and the community response has been great. Response in online forums such as reddit and Hacker News has been extremely positive, and in the first 24 hours, the community pledged well over $1000/month through Patreon and Liberapay. Although it has slowed down since then, it is now over $1700/month, getting close to the first goal of supporting one developer working half time on Matrix.

While last week was the first time that the Matrix community was able to contribute hard dollars towards the development of Matrix, the community has been supporting Matrix in other ways for quite a while. Since I started following Matrix a little less than a year ago, I've seen that the Matrix community has been quite active testing, filing and triaging bugs, contributing code to core projects, writing bots and bridges, providing support in the Matrix rooms, and more.

Here's a quick roundup of some of the new things that the community has come up with since the Matrix Holiday Special. I have undoubtedly missed some projects, so apologies in advance for all those that I've missed.

Voyager

TravisR has been experimenting a lot with Matrix, and one of the unique things he has come up with is a bot that maps out how Matrix rooms are related to each other by noting when one room is mentioned in another room. The results are then mapped in a massive graph.

Linux distributions

Gentoo: PureTryOut has created an overlay for Gentoo for installing some Matrix-related software.

Debian: Synapse and the Matrix plugin for Pidgin have been packaged for Debian and are included in Debian unstable, and other Matrix-related software has been packaged and submitted for inclusion. Myself and others have also been working on forming a Matrix packaging team.

Integrations

One of Matrix's main features is the ability to bridge with different networks, and while the core Matrix team has had their hands full maintaining the IRC, Gitter, and Slack bridges, the community has been writing bridges to other networks.

Puppeting bridges: The Matrix Hacks group has added quite a few new bridges this year so far. Looking at their GitHub account, they have bridges for Hangouts, Slack, Skype, Signal, and GroupMe, in addition to the previously-announced iMessage bridge. Discussion and support for these bridges take place in #matrix-puppet-bridge:matrix.org.

email: Two email integrations have been written, which work in different ways. Max's bridge allows email users to participate in Matrix rooms while TravisR's bot sends messages to rooms when it receives an email.

Discord: Half-Shot has also written a Discord bridge.

Clients

Riot: While the core Riot team was busy working on creating an improved experience for new users, the community implemented some of Riot Web/Desktop's most requested features, resulting in the release of Riot Web/Desktop 0.10 in which all of the major new features were initiated by the community. I think that's an achievement that the community can be very proud of, as well as the core Riot team, for being able to foster such an active community.

Nheko: As good as Riot is, it isn't for everyone. Development on other clients such as Quaternion has continued, but a new one, Nheko, has been started recently which already seems quite promising.

Matrix Recorder: Although Matrix keeps all history on the server, some people want to keep their own copy of history. Alex created Matrix Recorder, which saves history to a local SQLite database. Matrix Recorder even supports saving history from end-to-end encrypted rooms.

e2e crypto: While end-to-end cryptography is still in beta in Riot, some brave souls have been experimenting with it in other clients. In addition to Matrix Recorder's support of e2e rooms as mentioned above, penguin42 has done work on adding e2e to the Matrix Pidgin plugin and davidar has added e2e support to his Hubot adapter.

SHA2017 badge: One of the most intriguing projects is the badge for the SHA2017 camp, which reportedly contains a Matrix client. I don't know what they're using Matrix for, so I hope they do a write up at some point.

Documentation/Talks

The Matrix community has been busy writing documentation and blog posts, and doing talks about Matrix. Coffee has been collecting helpful information about Matrix into a machine-readable knowledge base. Some guides for Riot have been written, including usage basics by muppeth, maxigaz's guide, and an introduction from an IRC perspective. CryptoAUSTRALIA recently had a workshop for setting up Synapse and Riot, and published a tutorial online. And PureTryOut did a Matrix talk at the Dutch Linux User's Group a few months ago. There have certainly been other people from the community doing talks about Matrix that I am not aware of.

Server list

A federated communications protocol is less valuable if users can't find servers to join. Since there is no official list yet, Alex set up a list of Matrix servers. Though to call it just a list of servers is an understatement. It includes statistics on each server such as uptime, response times from various locations, and SSL test scores, so that users can make a more informed choice of servers. If you are running a Matrix server, whether public or private, please consider submitting your server to the list to improve visibility for your server and to strengthen the federation.

GSoC

Matrix was again accepted as a mentoring organization for Google Summer of Code and has three students. Two of them are working on iOS-related projects, and since I don't have an iOS device, I haven't been following their progress. However, Michael (a.k.a t3chguy), in addition to improving Riot, has been working on creating a search engine-friendly view of public rooms, which will be helpful for Matrix rooms that are used as support forums.


Matrix would not be what it is today without the support of the community, and I'm looking forward to seeing what the community will develop in the future. Last week, the community was invited to contribute financially towards Matrix's development. But for those who are unable to contribute in this way but still want to support Matrix, or for those who have pledged money but still want to do more, hopefully this list gives some ideas for how you can help out, either by supporting an existing project or starting your own.

Addendum (July 19, 2017)

Some projects that I missed:

Max has written an alternative Identity Server implementation called mxisd. Identity Servers haven't been getting as much attention as homeservers, application services, or clients, so it's great that someone has been working on an alternative implementation.

TravisR has also been working on an Dimension, alternative implementation of Riot's integration manager.

0 Comments
April 1, 2017

An alternate transport for the Matrix Client-Server API

00:00 -0400

Matrix is an open communications protocol that has many great features. However, one flaw that it has is that the baseline specification is based on long-polling HTTP requests, which is not very efficient. In order to address this deficiency, I've created a spec that presents an alternative transport for the Matrix Client-Server API that uses a protocol that was designed for real-time communications instead of using HTTP.

0 Comments
March 13, 2017

The latest additions to my init.el

11:00 -0400

Inspired by xkcd (but using Alt-mousewheel):

(global-set-key (kbd "<M-mouse-5>") 'undo)
(global-set-key (kbd "<M-mouse-4>") 'redo)

And, since I sometimes need to paste from an HTTP request into a buffer:

(defun insert-from-url (url)
  (interactive "MURL: ")
  (let ((url-request-method "GET")
        (dest (current-buffer))
        (src (url-retrieve-synchronously url)))
    (set-buffer src)
    (goto-char (point-min))
    (search-forward "\n\n")
    (set-buffer dest)
    (insert-buffer-substring src (match-end 0))))
0 Comments
January 25, 2017

On transparency

21:01 -0500

I've written briefly before about the value of companies being open and transparent. Back then, I wrote that the way that companies react when things go wrong is a good way to differentiate between them. No matter what company you deal with, things will go wrong at one point or another. Some companies try to avoid responsibility, or only tell you that something has happened if you ask them. Others companies are much more open about what happened.

Matrix.org (and the associated Riot.im) is an example of a team that falls into the latter category. And last night's incident is a good example. Their post-mortem blog post is a great example for others to follow. It gives a detailed timeline of what happened and why the outage occurred. And it finishes off with steps that they will take to prevent future incidents.

Kudos to the Matrix.org team for their transparency.

0 Comments
December 1, 2016

Let's Encrypt for Kubernetes

21:08 -0500

A while ago, I blogged about automatic Let's Encrypt certificate renewal with nginx. Since then, I've also set up renewal in our Kubernetes cluster.

Like with nginx, I'm using acme-tiny to do the renewals. For Kubernetes, I created a Docker image. It reads the Let's Encrypt secret key from /etc/acme-tiny/secrets/account.key, and CSR files from /etc/acme-tiny/csrs/{name}.csr. In Kubernetes, these can be set up by mounting a Secrets store and a ConfigMap, respectively. It also reads the current certificates from /etc/acme-tiny/certs/{name}, which should also be set up by mounting a ConfigMap (called certificates), since that is where the container will put the new certificates.

Starting an acme-tiny pod will start an nginx server to store the .well-known directory for the Acme challenge. Running /opt/acme-tiny-utils/renew in the pod will renew the certificate if it will expire within 20 days (running it with the -f option will disable the check). Of course, we want the renewal to be automated, so we want to set up a sort of cron task. Kubernetes has cron jobs since 1.4, but at the time I was setting this up, we were still on 1.3. Kubernetes also does cron jobs by creating a new pod, whereas the way I want this to work is to run a program in an existing pod (though it could be set up to work the other way too). So I have another cron Docker image, which I have set up to run

kubectl exec `kubectl get pods --namespace=lb -l role=acme -o name | cut -d / -f 2` --namespace=lb ./renew sbscalculus.com

every day. That command finds the acme-tiny pod and runs the renew command, telling it to renew the sbscalculus.com certificate.

Now in order for the Acme challenge to work, HTTP requests to /.well-known/acme-challenge/ get redirected to acme-tiny rather than to the regular pods serving those services. Our services are behind our HAProxy image. So I have a 0acmetiny entry (the 0 causes it to be sorted before all other entries) in the services ConfigMap for HAProxy that reads:

    {
      "namespace": "lb",
      "selector": {
        "role": "acme"
      },
      "hostnames": ["^.*"],
      "path": "^/\\.well-known/acme-challenge/.*$",
      "ports": [80]
    }

This causes HAProxy to all the Acme challeges to the acme-tiny pod, while leaving all the other requests alone.

And that's how we have our certificates automatically renewed from Let's Encrypt.

0 Comments
September 6, 2016

Buildbot latent build slaves

12:28 -0400

I've blogged before about using Buildbot to build our application server. One problem with it is that the build (and testing) process can be memory intensive, which can sometimes exceed the memory that we have available in our Kubernetes cluster. I could add another worker node, but that would be a waste, since we do builds infrequently.

Fortunately, the Buildbot developers have already built a solution to this: latent buildslaves. A latent buildslave is a virtual server that is provisioned on-demand. That means that when a build isn't active, then we don't have to pay for an extra server to be active; we only have to pay for the compute time that we actually need for builds (plus a bit of storage space).

I chose to use AWS EC2 as the basis of our buildslave. Buildbot also supports OpenStack, so I could have just used DreamCompute, which we already use for our Kubernetes cluster, but with AWS EC2, we can take advantage of spot instances and save even more money if we needed to. In any event, the setup would have been pretty much the same.

Setting up a latent buildslave on AWS is pretty straightforward. First, create an EC2 instance in order to build a base image for the buildslave. I started with a Debian image. Then, install any necessary software for the buildslave. For us, that included the buildslave software itself (Debian package buildbot-slave), git, tinc, npm, and Docker. Most of our build process happens inside of Docker containers, so we don't need anything else. We use tinc to build a virtual network with our Kubernetes cluster, so that we can push Docker images to our own private Docker repository.

After installing the necessary software, we need to configure it. It's configured just like a normal buildslave would be configured: I configured tinc, added an ssh key so that it could check out our source code, configured Docker so that it could push to our repository, and of course configured the Buildbot slave itself. Once it's configured, I cleaned up the image a bit (truncated logs, cleared bash history, etc.), and then took a snapshot in the AWS control panel, giving it a name so that it would show up as an AMI.

Finally, I added the latent buildslave in our Buildbot master configuration, giving it the name of the AMI that was created. Once set up, it ran pretty much as expected. I pushed out a change, Buildbot master created a new EC2 instance, built our application server, pushed and deployed it to our Kubernetes cluster, and after a short delay (to make sure there are no other builds), deleted the EC2 instance. In all, the EC2 instance ran for about 20 minutes. Timings will vary, of course, but it will run for less than an hour. If we were paying full price for a t2.micro instance in us-east-1, each build would cost just over 1 cent. We also need to add in the storage cost for the AMI which, given that I started with an 8GB image, will cost us at most 80 cents per month (since EBS snapshots don't store empty blocks, it should be less than that). We probably average about two builds a month, giving us an average monthly cost of at most 83 cents, which is not too bad.

0 Comments
June 22, 2016

Load balancing Kubernetes pods

10:10 -0400

At work, we recently switched from Tutum (now Docker Cloud) to Kubernetes. Part of that work was building up a load balancer. Kubernetes has built-in load balancing capabilities, but it only works with Google Compute or AWS, which we are not using. It also requires a public IP address for each service, which usually means extra (unnecessary) costs.

Having previously worked with Tutum's HAProxy image, I figured I could do the same thing with Kubernetes. A quick web search didn't turn up any existing project, so I quickly wrote my own. Basically, we have HAProxy handling all incoming HTTP(S) connections and passing them off to different services based on the Host header. There's a watcher that watches Kubernetes such as new/deleted pods for relevant changes and updates the HAProxy configuration so that it always sends requests to the right place. I also improved the setup by adding in a Varnish cache for some of our services. Here's how it all works.

We have two sets of pods: an set of HAProxy pods and a set of Varnish pods. Each pod has a Python process that watches etcd for Kubernetes changes, updates the appropriate (HAProxy or Varnish) configuration, and tells HAProxy/Varnish about the new configuration. Why do we watch etcd instead of using the Kubernetes API directly? Because, as far as I can tell, in the Kubernetes API, you can only watch one type of object (either pods, configmaps, secrets, etc.) for changes, whereas we need to watch multiple types at once, so dealing with the Kubernetes API means that we would need to make multiple simultaneous API requests, which would just make things more complicated.

Unlike Tutum's HAProxy image, which only allows you to change certain settings using environment variables, our entire configuration template is configurable using Jinja2 templates. This gives us a lot more flexibility, including being able to plug in Varnish fairly easily without having to make any code changes to the HAProxy configurator. Also, configuration variables for services are stored in their own ConfigMap, rather than as environment variables in the target pods which allows us to make configuration changes without restarting the pods.

When combining HAProxy and Varnish, one question to ask is how to arrange them: HAProxy in front of Varnish, or Varnish in front of HAProxy? We are using a setup similar to the one recommended in the HAProxy blog. In that setup, HAProxy handles all requests and passes non-cacheable requests directly to the backend servers. Cacheable requests are, of course, passed to Varnish. If Varnish has a cache miss, then it passes the request back to HAProxy, which then hands off the request to the backend server. As the article points out, in the event of a cache miss, there's a lot of requests, but cache misses should be very infrequent since Varnish only sees cacheable content. One main difference between the setup we have and the one in the article is that in the article, HAProxy listens on two IP addresses: one for requests coming from the public, and one for requests coming from Varnish. In our setup, we don't have two IP addresses for HAProxy to use. Instead, Varnish adds a request header that indicates that the request is coming from it, and HAProxy checks for that header.

At first, I set the Python process as the pod's command (the pod's PID 1), but ran into a slight issue. HAProxy reloads its configuration by, well, not reloading its configuration; it starts a new set of processes with the new configuration, which means that we ended up with a lot of zombie processes. To fix this, I could have changed the Python process to reap the zombies, but it was easier to just use Yelp's dumb-init instead.

We have the HAProxy pods managed as a DaemonSet, so it's running on every node, and the pods are set to use host networking for better performance. HAProxy itself is small enough that, at least with our current traffic, it doesn't affect the nodes much, so it isn't a problem for us right now to run it on every node. If we get enough traffic that it does make a difference, we can dedicate a node to it without much problem. One thing about this setup is that, even though it uses Kubernetes, HAProxy and Varnish don't need to be managed by Kubernetes. It just needs to be able to talk to etcd. So if we ever need a dedicated load balancer, we can spin up a node (or nodes) that just runs HAProxy and/or Varnish, say, using a DaemonSet and nodeSelector. Varnish is managed as a normal Kubernetes deployment and uses the normal container networking, so there's a bit of overhead there, but is fine for now. Again, if we have more concerns about performance, we can change our configuration easily enough.

It all seems to be working fairly well so far. There are some configuration tweaks that I'll have to go make, and there's one strange issue where Varnish doesn't like one of our services and just returns an empty response. But other than that, Varnish and HAProxy are just doing what they're supposed to do.

All the code is available on GitHub (HAProxy, Varnish).

0 Comments
June 9, 2016

Kubernetes vs Docker Cloud

09:42 -0400

Note: this is not a comprehensive comparison of Kubernetes and Docker Cloud. It is just based on my own experiences. I am also using Tutum and Docker Cloud more or less interchangeably, since Tutum became Docker Cloud.

At work, we used to use Tutum for orchestrating our Docker containers for our Calculus practice problems site. While it was in beta, Tutum was free, but Tutum has now become Docker cloud and costs about $15 per month per managed node per month, on top of server costs. Although we got three free nodes since we were Tutum beta testers, we still felt the pricing was a bit steep, since the management costs would be more than the hosting costs. Even more so since we would have needed more private Docker repositories than what would have been included.

So I started looking for self-hosted alternatives. The one I settle on was Kubernetes, which originated from Google. Obviously, if you go self-hosted, you need to have enough system administration knowledge to do it, whereas with Docker Cloud, you don't need to know anything about system administration. It's also a bit more time consuming to set up — it took me about a week to set up Kubernetes (though most of that time was scripting the process so that we could do it again more quickly next time), whereas with Tutum, it took less than a day to get up and running.

Kubernetes will require at least one server for itself — if you want to ensure high availability, you'll want to run multiple masters. We're running on top of CoreOS, and a 512MB node seems a bit tight for the master for our setup. A 1GB node was big enough that, although they recommend not to, I allowed the master to schedule running other pods.

Kubernetes seems to have a large-ish overhead on the worker nodes (a.k.a. minions). Running top, the system processes take up at least 200MB, which means that on a 512MB node, you'd only have about 300MB to run your own pods unless you have swap space. I have no idea what the overhead on a Tutum/Docker cloud node was, since I didn't have access to check.

Previously, under Tutum, we were running on 5*512MB nodes, each of which had 512MB swap space. Currently, we're running on 3*1GB worker nodes plus 1*1GB master node (which also serves as a worker), no swap. (We'll probably need to add another worker in the near future (or maybe another combined master/worker) though under Tutum, we would have probably needed another node with the changes that I'm planning anyways.) Since we also moved from DigitalOcean to DreamHost (affiliate link) and their new DreamCompute service (which just came out of Beta as we were looking into self-hosting), our new setup ended up costing $1 less per month.

Under Tutum, the only way to pass in configuration (other than baking it into your Docker image, or unless you run your own configuration server) is through environment variables. With Kubernetes, you have more options, such as ConfigMaps and Secrets. That gives you more flexibility and allows (depending on your setup) on changing configuration on-the-fly. For example, I created an auto-updating HAProxy configuration that allows you to specify a configuration template via a ConfigMap. When you update the ConfigMap, HAProxy gets immediately reconfigured with almost no downtime. This is in contrast to the Tutum equivalent, in which a configuration change (via environment variables) would require a restart and hence more downtime.

The other configuration methods also allows the configuration to be more decoupled. For example, with Tutum's HAProxy, the configuration for a service such as virtual host names are specified using the target container's environment variables, which means that if you want to change the set of virtual hosts or the SSL certificate, you would need to restart your application containers. Since our application server takes a little while to restart, we want to avoid having to do that. On the other hand, if the configuration were set in HAProxy's environment, then it would be lost to other services that might want to use it (such as monitoring software that would might use the HTTP_CHECK variable). With a ConfigMap, however, the configuration does not need to belong to one side or the other; it can stand on its own, and so it doesn't interfere with the application container, and can be accessed by other pods.

Kubernetes can be all configured using YAML (or JSON) files, which means that everything can be version controlled. Under Tutum, things are primarily configured via the web interface, though they do have a command-line tool that you could use as well. However, the command-line tool uses a different syntax for creating versus updating, whereas with Kubernetes, you can just "kubectl apply -f", so even if you use the Tutum CLI and keep a script under version control for creating your services, it's easier to forget to change your script after you've changed a service.

There are a few things that Tutum does that Kubernetes doesn't do. For example, Tutum has built-in node management (if you use AWS, DigitalOcean, or one of the other providers that it is made to work with), whereas with Kubernetes, you're responsible for setting up your own nodes. Though there are apparently tools built on top of Kubernetes that do similar things, I never really looked into them, since we currently don't need to bring up/take down nodes very frequently. Tutum also has more deployment strategies (such as "emptiest node" and "high availability"), which was not that important for us, but might be more important for others.

Based on my experience so far, Kubernetes seems to be a better fit for us. For people who are unable/unwilling to administer their own servers, Docker Cloud would definitely be the better choice, and starting with Tutum definitely gave me time to look around in the Docker ecosystem before diving into a self-hosted solution.

0 Comments
April 29, 2016

Let's encrypt errata

10:06 -0400

Back in February, I posted about Automatic Let's Encrypt certificates on nginx. One of the scripts had a problem in that it downloaded the Let's Encrypt X1 intermediate certificate. Let's Encrypt recently switched to using their X3 intermidiate, which means that Firefox was unable to reach sites using the generated certificates, and Chrome/IE/Safari needed to make an extra download to verify the certificate.

Of course, instead of just changing the script to download the X3 certificate, it's best to automatically download the right certificate. So I whipped up a quick Python script, cert-chain-resolver-py (inspired by the Go version) that checks a certificate and downloads the other certificates in the chain.

I've updated my original blog post. The changed script is /usr/local/sbin/letsencrypt-renew, and of course you'll need to install cert-chain-resolver-py (the script expects it to be in /opt/cert-chain-resolver-py).

0 Comments