Registering dissectors for unknown MIME types in Wireshark

I came across a situation today where I was looking at some packets in Wireshark that used HTTP with a JSON payload. However, the Content-Type header was one unknown to Wireshark, and this led it to simply display the content unparsed. Since Wireshark supports JSON dissectors since version 1.6 for some Content-Types, such as “application/json”, I suspected it was possible to get it to recognize the Content-Type header I was seeing as one indicating a JSON payload.

Unfortunately, my usual approach of using the “Decode as…” entry in the right click menu wasn’t working, as it’s not a network protocol, and doesn’t appear there. I eventually found it in the Dissector tables, accessible via the Internals menu. There, one can find the “Internet media type” table, which appears to map MIME types to the appropriate dissector. Now I knew what I wanted to add, and where, I just needed to figure out how.

Enter Lua, the scripting language used by Wireshark. Chapter 11 of the Wireshark manual gives information and examples, but the specifics I needed were in section 10. The following line will register the JSON dissector as the one to be used when encountering a media type of “application/vnd.myapp+json”. To enter it, navigate to “Tools>Lua>Evaluate”, enter the code below, substituting whatever values you need for your particular situation, and click the evaluate button.

DissectorTable.get("media_type"):add("application/vnd.myapp+json", Dissector.get("json"))

For the curious, this gets a reference to the Internet Media types dissector table, and calls the add method with two arguments, the content type string, and a reference to the desired dissector. You can check that the dissector was registered successfully by navigating to “Internals>Dissector” tables and checking the “Internet media types” dissector table


Removing AVG Secure Search

I just spend the better part of two hours removing what can only be described as malware from a Windows computer owned by a friend of mine. This is the AVG secure search toolbar, proudly marketed by AVG and stuffed down the throat of my unfortunate friend, who doesn’t even use AVG products.

It manifests itself as a browser plugin for Firefox (apparently Chrome and IE as well), and worms its way into every part of the system to hinder removal. It also disables the “remove” button in the Firefox extensions list, changes all your Firefox settings to use its own search plugin, and runs background processes that “update” it each day. (read: reinstall without your permission if you delete its files) The uninstall instructions from AVG are useless rubbish, and just result in it reinstalling itself the next day.

First, disable both the extension and the plugin in firefox, and reset your settings using these uninstall instructions. Next, open up your control panel, and uninstall the “AVG Secure Search” program. You may need to also remove vToolbarupdater  from your startup services. Go to the task manager and kill “vToolbarupdaterXXXX” where XXXX is some version number. Also kill loggingserver.exe and the process named “vprot.exe” at this point, as they are also associated with AVG.  Now in your services list, stop vToolbarUpdater, and set the startup type to “disabled” in its properties window.

It also hides in the following locations. You will need to view hidden files to see some of these.

  1. C:\Program Files (x86)\Common Files\AVG Secure Search\
  2. C:\ProgramData\AVG Secure Search\
  3. C:\windows\system32\drivers\avgtpx64.sys

More browser extension rubbish is found in your AppData folder, something like C://Users/yourusername/Appdata/. Look though your Mozilla and Mozilla Firefox folders in your Local and Roaming folders, especially under the extensions folders in your firefox profile. Also check the Temp folder for things like “FireFoxSearchBar.xml”, “toolbar_log.txt” and anything starting with “avg”. I found a few folders named something like “avg_a02000” and another text file that was named “AVGsecuresearch_log.txt”

Finally, go to the Windows registry editor and search for “AVG”. This should turn up a lot of hits. Other than the windows “dfrg” one, which appears to be legit, most of the hits had something to do with the toolbar. Delete these, and change the IE homepage one back to something sensible ( for me, but you can choose another home page).

Now restart your computer, and pray that AVG never does this again. If you’ve got any of their software running, I’d nuke that too, but that’s a topic that I don’t have the expertise for covering here. Hopefully this post will help anyone who wants to get rid of this crap, but I don’t claim any particular knowledge or expertise in this area, so your mileage may vary.

Edited to add:

This blog post was helpful, and someone has helpfully posted the browser settings that AVG changes here so that you don’t need to visit AVG’s site to get them.

Encrypted USB drives with eCryptfs

I’ve got several USB flash drives that I carry around with me on a regular basis, and it’s nice to be able to use those for small backups of important things, in addition to my usual system-wide backups that get dumped onto a couple external hard drives. This way, even if something happens to my computer and the backup drives, (the house burns down, my computer and hard drives are stolen, or an EMP bomb goes off in the garage) I still have the important things with me. Being me, I like these important things to be encrypted so that if I lose the USB drive, it’s not cause for panic.

That said, I also use my USB drives for more mundane things, like transferring files between two computers and carrying around my music. Since I’m a Linux geek surrounded by people using Microsoft and Apple products, it’s nice to have something that works with both. After all, it’s hard to sing the praises of Linux effectively when they’re asking questions about why your USB drives won’t work.

Today I drew up a list of features that my ideal USB drive would have.

  1. Must be capable of storing encrypted data in such a way that it can be mounted as a filesystem. Backup software shouldn’t need to worry about the encryption.
  2. The USB drive must be capable of being read and written by Linux, Windows, and Mac OSX. The encrypted data only needs to be read on my personal computers, which all run Linux.
  3. If I have free space left over after backing up, I should be able to use it to transfer files, store music, or whatever else I want to use the USB drive for.

The second requirement means that the USB drive will have to have at least part of it formatted as a filesystem that everyone can read and write to. I could format it as NTFS, which is what Windows uses now, but I don’t believe Mac OSX allows writes to NTFS filesystems, and I have previously had issues with using NTFS on Linux myself. It may be the case that I could work around any Linux-NTFS issues, but I don’t want to have to convince Mac users to install things or tweak system settings to make my USB drive work. I can’t use Mac’s normal filesystem, HFS+,  either, because Windows users wouldn’t be able to use it without installing new drivers. Finally, I can’t use something like ext4 or Btrfs, because then it would be limited to Linux users only without extra software. FAT32 is not my favourite filesystem, but it’s supported on all the target platforms with no extra software needed, so that will be my filesystem of choice, guaranteeing that whatever systems I’m likely to encounter will be compatible with my USB drive.

Turning to the first requirement, I need to be able to store encrypted data such that it can be mounted and used like a regular filesystem. In Linux, there are two basic ways of going about this, you can use block level encryption like dm-crypt/LUKS, which I use for encrypting my hard disk or you can use filesystem-level encryption, like eCryptfs, which I also previously mentioned. To figure out which of these two solutions to use, I’ll look at the final requirement, which is the ability to dynamically resize to avoid wasting space.

I could partition my USB drive and format the first part as FAT32 with the rest encrypted using dm-crypt with whatever filesystem I like (encrypted data only needs to be usable in Linux) on top of the encrypted block device. This is not very nice however, as I’m pre-allocating the space before I know what I’m storing. If I don’t store much in my backup area, then the space is wasted and I can’t use it for other things. On the other hand, if I suddenly need more backup space, I’d have to manually repartition, expand the filesystem, and so on. That’s annoying, so I won’t go that route.

The second alternative, eCryptfs, is a lot better suited to this dynamically resizable storage problem. I can format the entire drive as FAT32, then create a directory to use for eCryptfs backing files. If I then mount an eCryptfs filesystem with that directory as the backing directory, all the files I write into the mount are encrypted before being written onto my USB stick. Now I can just use that as output for my backups, and I have the dynamically resizable encryption scheme that I wanted. It only takes up as much space as the encrypted files, since they are just files stored in the FAT32 filesystem. If I create a new file in the eCryptfs mount, one new file is created in FAT32, and if I delete something in eCryptfs, the file goes away for FAT32 as well.

So the final solution is to format the whole USB drive as FAT32, then stack eCryptfs on top of that. Now I can safely carry around my backups and still have room should I need to use a USB drive like a normal person, or should normal people want to use my USB drive.

What I’ve been up to recently

As part of my degree requirements, I’ve got to complete a large project within a group of four people. The project goal is self selected, but must be useful and related to software engineering. I’ll be working with Michael Chang, Zameer Manji, and Alvin Tran until early 2014 to add integrity protection to eCryptfs, a cryptographic file system that can be used on Linux. In this post, I’ll present some concepts related to the project along with some details about the project itself.

Confidentiality and Integrity

Two important concepts in computer security are those of confidentiality and integrity. There is also usually a third concept mentioned alongside these, that of availability, but I’m only mentioning it here for completeness.

Confidentiality protection attempts to prevent certain parties from reading information while allowing other parties to access the information. In the case of a cryptographic file system like eCryptfs, this is done by encrypting the files before writing them to disk, and decrypting the files when they are needed later. This could be done manually, but it is much easier and less error prone to have the file system handle this sort of thing than to try to encrypt all sensitive information by hand.

Integrity protection attempts to ensure that information has not been unintentionally changed. This might entail actually trying to prevent modifications to the information, or it may simply indicate when the information has been changed. Cryptographically, this can be done using a Message Authentication Code (MAC), which is a short binary string that can be easily calculated with a file and a key, but cannot be calculated without both. Additionally, if the file changes then the calculated MAC will be different. Anyone knowing the key and having access to the file can calculate the MAC and compare it to one that was calculated and stored earlier, and if the two are different, then the file must have been changed.

Current state of eCryptfs

The eCryptfs file system is a stacking file system, which means that it relies on a lower file system to handle stuff like I/O and buffering, and just manages file encryption and decryption. Currently, that is all it manages, as it does not include any integrity protection. The contents of files are made unreadable to anyone without the correct key, but it is still possible to modify those files in partly predictable ways, as presented below.

There is already a wide user base for eCryptfs, with Ubuntu and it’s derivatives using it to provide the encrypted home directory feature, and within Google’s ChromeOS.

Attack against CBC mode

Cipher Block Chaining (CBC) is one of the most common modes of operation for block ciphers, and is used currently by eCryptfs. In this mode of operation, each block of plaintext is XORed with the previous ciphertext block before encryption. This ensures that the same block of plaintext won’t encrypt to the same ciphertext, unless the previous ciphertext block is the same as well. A one-block initialization vector stands in for the previous ciphertext block during the first encryption. CBC decryption just reverses the process, first decrypting the ciphertext block, then XORing it with the previous ciphertext block.

Operations can be expressed in the following way (Taken from Wikipedia)

Encryption: C_i = E_K(P_i \oplus C_{i-1}), C_0 = IV

Decryption: P_i = D_K(C_i) \oplus C_{i-1}, C_0 = IV

Now let’s perform the attack. Let’s say we want to change a certain plaintext block P_n into a different plaintext {P_n}' by flipping some bits. We’ll denote this change as \Delta. That is, {P_n}' = P_n \oplus \Delta

It turns out that if we don’t care what happens to the previous plaintext block, P_{n-1}, all we have to do is replace C_{n-1} with {C_{n-1}}' = C_{n-1} \oplus \Delta

We can substitute this into the decryption formula above to see what will happen.

{P_n}' = D_K(C_n) \oplus {C_{n-1}}'

{P_n}' = D_K(C_n) \oplus {C_{n-1}} \oplus \Delta

{P_n}' = P_n \oplus \Delta

This is an integrity issue, as an attacker can now modify files without ever knowing the key used to encrypt them. It’s also not guaranteed that this modification is detectable, depending on whether the previous block can be checked for validity. If it can be checked, great, but that’s just another form of integrity protection, and the project I’m working on aims to implement integrity protection regardless of the data stored. If it can’t be checked for correctness, or is ignored (maybe it’s a different record in a database) then the modification will go unnoticed.

Galois Counter Mode

Galois Counter Mode (GCM) is another mode of operation for block ciphers, but in addition to encryption, also produces a piece of data known as an authentication tag. This tag acts as a MAC taken over the data that was encrypted. An attacker could still modify the ciphertext, but now the resultant changes to the plaintext will invalidate the tag, making them detectable. The attacker cannot modify the tag so that it validates the new data, because calculating the tag requires the cryptographic key that was used to encrypt the data, and the attacker does not know this key.

Another benefit to GCM is speed. It’s true that the same effect on security could be had by encrypting the data and calculating a MAC separately, but that requires two passes of the file, one for each operation. GCM does both in one pass over the file, speeding things up. This is important in a file system, as you’d rather have access to your files quickly.

The project aims to implement GCM as the mode of operation for eCryptfs, thus providing both integrity and confidentiality protection. Integrity protection was something the original developers wanted to have from the beginning, but didn’t have the time to implement. I’m proud to be helping to create the first widely used integrity protected cryptographic file system.

Password Restrictions Really Bug Me

Warning, rant ahead.

Maybe it’s just me, but when I hit restrictions on what I can use as my password, I get annoyed. Lower limits, such as “at least 6 characters long” are fine, but there are several things that I can see no reason for, that make me doubt the competence of the programmers involved in the system. When I realised that my bank’s website did all of the things mentioned in this post, I was really annoyed. Thousands of dollars of my money are sitting there, just one string of characters away from an attacker getting it all, and of course, reading the fine print of their security agreement reveals that it’s not their responsibility if my password or reset questions are compromised.

Character restrictions
Passwords are passwords, not HTML, not shell scripts, not anything that needs to be parsed by machines. They should be treated as opaque sequences of bytes, and the only thing that should be done with them while logging in is salting and hashing. When I see restrictions like, “To preserve online security, your information cannot contain unacceptable symbols or words (for example, “%”, “<“, “{“, “www.”, “ftp”,”https”, etc.)”, I’m astounded that they let these people touch code at all. There is no reason at all that they should need to check for that sort of thing in passwords. Passwords should never be displayed, not on their website, not in email, not anywhere. There should be no cause to worry about XSS attacks, SQL injection, or any other sort of incomplete mediation attack via passwords if they’re properly handled as opaque data.

I’m not complaining about entropy figures here. A 12 character string composed of random alphanumeric character would be approximately 71 bits of entropy. Adding in the printable punctuation and whitespace characters on a standard US keyboard only adds about 5 bits of entropy to that figure for a random 12 character string. The reason I’m annoyed by these restrictions is that they hint at deeper problems about how the password is handled by the system. They also impede those of us who do want to use “special” characters in our passwords for whatever reason, from non-ASCII characters in a preferred language to an obsession over password entropy.

Upper limit on length
I would have thought that the days of small fixed size strings were behind us, but apparently not. My bank puts an upper limit of 12 characters on password length, which precludes using an easily memorable passphrase. They put a similarly low limit of 25 characters on the password reset questions and answers. One of the few things they did right is allowing me to write my own security question, as I definitely didn’t want to use the default questions. I then got cut off halfway through writing a short sentence by this low character limit. Is this due to some aspiring database administrator learning that CHARs were faster than VARCHARs, and deciding to speed up logging in by a few milliseconds? Is the process of logging into an account, or resetting a password really where the bottlenecks are? If the problem doesn’t lie with the storage, but with the login system itself, then it’s time the programmers learned about dynamic allocation.

In addition to making it impossible to use longer passwords, this upper length limit also hints at improper handling of the passwords, as properly salted and hashed passwords would be constant length, and the length of the original password would be completely irrelevant to the storage requirements.

Required character classes
This sort of attempt to increase security is what leads to users choosing “Password1” instead of “password” to protect their life savings. No system is idiot proof, so rather than treating the symptoms by attempting to programmatically enforce good passwords, try to treat the problem by educating the idiots on how to choose a good password, and mentioning why they care. Suggest using a diceware passphrase, or use my password suggestions. Of course, this is only effective if users actually can use good passwords, so fixing the first two issues is a priority.

While I’m ranting about programmatically enforcing password strength, I should mention that I’m of the opinion that the only checks that I think are valid for this would be checking against a list of common passwords, and checking against information like account name or other public details associated with the account. These are going to be among the first things a social engineering attacker would try, and common workarounds for required character classes are not going to stop them, making that method of enforcement worthless.

Default password reset questions
Not really a password restriction, but it’s related, and it ticks me off. The usual culprits, along the lines of “what was your first [job, school, pet’s name, car]”, just train people to think this information is unguessable. After all, if so many places use the same questions, there must be a good reason right? I did a quick experiment where I looked at the public information about some of my friends on Google+, searched through their post history, and if I could find a link to their website, looked at that as well. In many cases, I could find answers to at least one of those questions just from these sources of information. The current system might as well be an all-you-can-eat buffet for social engineers.

Perhaps more obscure questions could be used by default, or perhaps websites should start using alternate methods of authentication, for example, resetting via SMS or OpenID.  Approached from an alternate perspective, why should we even need to give answers to these questions in the first place? What business do random websites have knowing trivia about me, and more importantly, why is it the same default trivia that protects my bank account?


Killing Machines

Automated systems are already doing much of our work for us, making everyday decisions to remove the burden from humans. Anything from Google’s licensed self driving car navigating the roads alongside human-guided vehicles to the computers doing stock market trading on behalf of investors. Both of these are technologies that do what humans can do, but better, faster, or more reliably. Humans suffer lapses of concentration and fatigue, and do so unpredictably, whereas computers don’t. Computers have their own set of interesting problems, like the priority inversion bug that crashed Mars Pathfinder, but those can be found and removed from systems.

The question that springs to mind for me is “Is there anything we shouldn’t let a computer decide, even if it could make that decision faster or more reliably than a human?” My answer is that a machine should not be allowed to decide whether to kill a human. I’m not against computers aiding humans in acts of war, that’s just technological progression, and it’s happening already. Modern fighter jets are extremely unstable, to aid in maneuverability, so much so that a human pilot could not possibly stabilize the aircraft, so the jets are stabilized by computer. The human pilot still has the decision of whether or not to fire the weapons, and at what targets.

I’d also like to point out that I’m not mentioning anything about sentient computers when I say “decision”. Computers make decisions every time they execute conditional statements, without any sort of capacity for sentience or consciousness. A machine could be programmed to identify humans and kill some subset of them without human intervention, and it would be making a decision whether or not to kill humans. A machine that identified humans and then asked whether or not to kill them would not be making the decision to kill a human, as it would be passed off to a human operator.

By giving a machine the decision to kill a human being, we have created something capable of autonomously waging war. Most people consider war to be something to be avoided and minimized if possible. It is also known that the more indirect the method of killing, the easier it will be for a person to rationalize it, and the less aversion to performing the killing they will have. Psychologically, it is much easier to kill someone by pressing a button that launches a missile than to shoot someone that you can see, and shooting someone is psychologically easier than stabbing someone to death. If we remove the decision entirely from humans, the killing is now out of sight, out of mind, making it much easier for mass killings to take place without psychological consequence for those waging war.

One problem with this answer is how to define “deciding to kill”. Through various means, computers can control much of how we view the world. For example, information people view on the Internet is managed by computers. If the search engine ratings on some website are low, fewer people will be impacted by that website. If the search engines raise the rating, more people are likely to see it. By controlling what we see, computers could indirectly control what we think, and could theoretically manipulate one human into killing another. As such, it is likely an exercise in futility to try to prove that a computer could not decide to kill a human, even though that scenario seems to be very unlikely at this point in time.

SSH X Forwarding

Recently I was messing about with X forwarding through SSH, and I realized something that perhaps should have been obvious, but caught me off guard, so I’ll share it here.

The scenario involves 2 computers and a fairly resource intensive graphical application that could take advantage of 3D acceleration. The computers involved were a desktop computer with a reasonable graphics card, and a netbook with whatever graphics capabilities were built into the motherboard, but no 3D acceleration. I wanted to take advantage of the netbook screen, effectively using it as a second monitor, but running the application on the more powerful desktop. I figured I should be able to forward the X session over SSH, and end up with the netbook displaying it, but all the hard work done on my desktop. Unfortunately, when I set this up, I found the application could not take advantage of 3D acceleration anymore.

The reason for this is that when an X session is forwarded, only the X traffic is transferred across the network. This traffic consists of things like “draw a rectangle here” or “draw this bitmap there”. Where I had gone wrong was assuming that my graphical application would have the 3D acceleration done on the desktop’s fully capable graphics card, then have the resulting bitmap sent over the network. In reality, the netbook was doing all the graphical rendering, leading to a lack of 3D acceleration capability.

Despite the fact that I couldn’t get my 3D acceleration, this is actually a much smarter way of doing things, as it significantly reduces the network traffic involved. My netbook’s screen size is 1024×768, and let’s assume I wanted 30fps and 32 bit colours. The resultant network traffic would be (1024×768 pixels/frame)x(30frames/second)x(32 bits/pixel), coming out to a little over 750Mb/s just for the image going one way. If I recall correctly, the actual network load while I was doing this was a little over 100Mb/s.

The lesson learned here is that hardware acceleration is done on the display end (the X server) rather than the client end of an X connection.