Encrypted USB drives with eCryptfs

I’ve got several USB flash drives that I carry around with me on a regular basis, and it’s nice to be able to use those for small backups of important things, in addition to my usual system-wide backups that get dumped onto a couple external hard drives. This way, even if something happens to my computer and the backup drives, (the house burns down, my computer and hard drives are stolen, or an EMP bomb goes off in the garage) I still have the important things with me. Being me, I like these important things to be encrypted so that if I lose the USB drive, it’s not cause for panic.

That said, I also use my USB drives for more mundane things, like transferring files between two computers and carrying around my music. Since I’m a Linux geek surrounded by people using Microsoft and Apple products, it’s nice to have something that works with both. After all, it’s hard to sing the praises of Linux effectively when they’re asking questions about why your USB drives won’t work.

Today I drew up a list of features that my ideal USB drive would have.

  1. Must be capable of storing encrypted data in such a way that it can be mounted as a filesystem. Backup software shouldn’t need to worry about the encryption.
  2. The USB drive must be capable of being read and written by Linux, Windows, and Mac OSX. The encrypted data only needs to be read on my personal computers, which all run Linux.
  3. If I have free space left over after backing up, I should be able to use it to transfer files, store music, or whatever else I want to use the USB drive for.

The second requirement means that the USB drive will have to have at least part of it formatted as a filesystem that everyone can read and write to. I could format it as NTFS, which is what Windows uses now, but I don’t believe Mac OSX allows writes to NTFS filesystems, and I have previously had issues with using NTFS on Linux myself. It may be the case that I could work around any Linux-NTFS issues, but I don’t want to have to convince Mac users to install things or tweak system settings to make my USB drive work. I can’t use Mac’s normal filesystem, HFS+,  either, because Windows users wouldn’t be able to use it without installing new drivers. Finally, I can’t use something like ext4 or Btrfs, because then it would be limited to Linux users only without extra software. FAT32 is not my favourite filesystem, but it’s supported on all the target platforms with no extra software needed, so that will be my filesystem of choice, guaranteeing that whatever systems I’m likely to encounter will be compatible with my USB drive.

Turning to the first requirement, I need to be able to store encrypted data such that it can be mounted and used like a regular filesystem. In Linux, there are two basic ways of going about this, you can use block level encryption like dm-crypt/LUKS, which I use for encrypting my hard disk or you can use filesystem-level encryption, like eCryptfs, which I also previously mentioned. To figure out which of these two solutions to use, I’ll look at the final requirement, which is the ability to dynamically resize to avoid wasting space.

I could partition my USB drive and format the first part as FAT32 with the rest encrypted using dm-crypt with whatever filesystem I like (encrypted data only needs to be usable in Linux) on top of the encrypted block device. This is not very nice however, as I’m pre-allocating the space before I know what I’m storing. If I don’t store much in my backup area, then the space is wasted and I can’t use it for other things. On the other hand, if I suddenly need more backup space, I’d have to manually repartition, expand the filesystem, and so on. That’s annoying, so I won’t go that route.

The second alternative, eCryptfs, is a lot better suited to this dynamically resizable storage problem. I can format the entire drive as FAT32, then create a directory to use for eCryptfs backing files. If I then mount an eCryptfs filesystem with that directory as the backing directory, all the files I write into the mount are encrypted before being written onto my USB stick. Now I can just use that as output for my backups, and I have the dynamically resizable encryption scheme that I wanted. It only takes up as much space as the encrypted files, since they are just files stored in the FAT32 filesystem. If I create a new file in the eCryptfs mount, one new file is created in FAT32, and if I delete something in eCryptfs, the file goes away for FAT32 as well.

So the final solution is to format the whole USB drive as FAT32, then stack eCryptfs on top of that. Now I can safely carry around my backups and still have room should I need to use a USB drive like a normal person, or should normal people want to use my USB drive.

What I’ve been up to recently

As part of my degree requirements, I’ve got to complete a large project within a group of four people. The project goal is self selected, but must be useful and related to software engineering. I’ll be working with Michael Chang, Zameer Manji, and Alvin Tran until early 2014 to add integrity protection to eCryptfs, a cryptographic file system that can be used on Linux. In this post, I’ll present some concepts related to the project along with some details about the project itself.

Confidentiality and Integrity

Two important concepts in computer security are those of confidentiality and integrity. There is also usually a third concept mentioned alongside these, that of availability, but I’m only mentioning it here for completeness.

Confidentiality protection attempts to prevent certain parties from reading information while allowing other parties to access the information. In the case of a cryptographic file system like eCryptfs, this is done by encrypting the files before writing them to disk, and decrypting the files when they are needed later. This could be done manually, but it is much easier and less error prone to have the file system handle this sort of thing than to try to encrypt all sensitive information by hand.

Integrity protection attempts to ensure that information has not been unintentionally changed. This might entail actually trying to prevent modifications to the information, or it may simply indicate when the information has been changed. Cryptographically, this can be done using a Message Authentication Code (MAC), which is a short binary string that can be easily calculated with a file and a key, but cannot be calculated without both. Additionally, if the file changes then the calculated MAC will be different. Anyone knowing the key and having access to the file can calculate the MAC and compare it to one that was calculated and stored earlier, and if the two are different, then the file must have been changed.

Current state of eCryptfs

The eCryptfs file system is a stacking file system, which means that it relies on a lower file system to handle stuff like I/O and buffering, and just manages file encryption and decryption. Currently, that is all it manages, as it does not include any integrity protection. The contents of files are made unreadable to anyone without the correct key, but it is still possible to modify those files in partly predictable ways, as presented below.

There is already a wide user base for eCryptfs, with Ubuntu and it’s derivatives using it to provide the encrypted home directory feature, and within Google’s ChromeOS.

Attack against CBC mode

Cipher Block Chaining (CBC) is one of the most common modes of operation for block ciphers, and is used currently by eCryptfs. In this mode of operation, each block of plaintext is XORed with the previous ciphertext block before encryption. This ensures that the same block of plaintext won’t encrypt to the same ciphertext, unless the previous ciphertext block is the same as well. A one-block initialization vector stands in for the previous ciphertext block during the first encryption. CBC decryption just reverses the process, first decrypting the ciphertext block, then XORing it with the previous ciphertext block.

Operations can be expressed in the following way (Taken from Wikipedia)

Encryption: C_i = E_K(P_i \oplus C_{i-1}), C_0 = IV

Decryption: P_i = D_K(C_i) \oplus C_{i-1}, C_0 = IV

Now let’s perform the attack. Let’s say we want to change a certain plaintext block P_n into a different plaintext {P_n}' by flipping some bits. We’ll denote this change as \Delta. That is, {P_n}' = P_n \oplus \Delta

It turns out that if we don’t care what happens to the previous plaintext block, P_{n-1}, all we have to do is replace C_{n-1} with {C_{n-1}}' = C_{n-1} \oplus \Delta

We can substitute this into the decryption formula above to see what will happen.

{P_n}' = D_K(C_n) \oplus {C_{n-1}}'

{P_n}' = D_K(C_n) \oplus {C_{n-1}} \oplus \Delta

{P_n}' = P_n \oplus \Delta

This is an integrity issue, as an attacker can now modify files without ever knowing the key used to encrypt them. It’s also not guaranteed that this modification is detectable, depending on whether the previous block can be checked for validity. If it can be checked, great, but that’s just another form of integrity protection, and the project I’m working on aims to implement integrity protection regardless of the data stored. If it can’t be checked for correctness, or is ignored (maybe it’s a different record in a database) then the modification will go unnoticed.

Galois Counter Mode

Galois Counter Mode (GCM) is another mode of operation for block ciphers, but in addition to encryption, also produces a piece of data known as an authentication tag. This tag acts as a MAC taken over the data that was encrypted. An attacker could still modify the ciphertext, but now the resultant changes to the plaintext will invalidate the tag, making them detectable. The attacker cannot modify the tag so that it validates the new data, because calculating the tag requires the cryptographic key that was used to encrypt the data, and the attacker does not know this key.

Another benefit to GCM is speed. It’s true that the same effect on security could be had by encrypting the data and calculating a MAC separately, but that requires two passes of the file, one for each operation. GCM does both in one pass over the file, speeding things up. This is important in a file system, as you’d rather have access to your files quickly.

The project aims to implement GCM as the mode of operation for eCryptfs, thus providing both integrity and confidentiality protection. Integrity protection was something the original developers wanted to have from the beginning, but didn’t have the time to implement. I’m proud to be helping to create the first widely used integrity protected cryptographic file system.