This is another tutorial for Level 3 of InfoSecInstitute’s CTF challenge.This level involves decoding (not decrypting) some data to retrieve the flag. The challenge can be found at http://ctf.infosecinstitute.com.
Disclaimer: Capturing the flag for this level takes around 30 seconds with online tools, and is basically a no-brainer. Since this is about education and learning, and not blindly using tools, let’s dig into what’s actually going on here.
The message for this level is encoded with two different methods. The first method we have to decode is obviously the QR code, but some simple QR code readers will fail, since the encoded data is not a URL (hint).
This article will touch upon the basics of manual QR code decoding, but is far from being a comprehensive source of information on the subject. For the basics, I’ll refer you to QR Code Essentials, a PDF document published by the creators of the QR code. If you’re looking for information on implementing the scheme, or want to explore the subject in-depth, I highly recommend thonky’s tutorial on the subject. It’s a very thorough article aimed at programmers and is invaluable in working with QR codes in software.
QR Code Basics
QR codes were developed as an alternative to the 2D barcode schemes (commonly used on driver’s licenses and such). It has several advantages over the 2D schemes and is much more resilient to human error and physical damage than other barcoding schemes. If you’ve ever tried to implement a barcode scanning system, you’re no doubt familiar with the headaches that come with it.
A QR code stores information in binary format – each “pixel” represents a 1 or 0. In the interest of specificity, each square can obviously be much larger than a true pixel, and the black and white coloration doesn’t exactly correlate to 1 and 0, for reasons that will be explained shortly. In the QR code world, each of these squares is referred to as a module (light modules and dark modules).
For starters, let’s take a look at the anatomy of a QR code:
One of the major strengths of QR codes is their error correction, which allows creative use and distortion of the code while maintaining its ability to be scanned.
It does this in part by applying a bit mask to the data, which results in skewing the correlation between dark/light modules and 1/0 values.
Essentially, the bit mask is determined by looking for areas of consecutive matching squares (penalty score). Since large areas of black/white space could make automatic decoding unpredictable or impossible, a mask is used to achieve a more uniform distribution of dark/light modules.
Note that the bit mask applies only to the data and error correction areas, and not any of the reserved ones. For this reason, it’s sometimes referred to as the D/E mask.
Determining the Version
First, we’ll need to determine the version. The version, along with the error correction level, determines how much data can be stored in the QR code. Note that this is not a sequential version (like most things in the IT world), but rather an indicator of the dimensions.
There are 40 possible QR code versions, which have dimensions ranging from 21×21 (v.1) to 177×177 (v.40) modules. The version can be inferred by the width of the overall QR code (in modules) using the formula:
width = (version * 4) + 17
version = (width – 17) / 4
In this case, the width of the QR code is 29, which makes this a version 3 QR code. Additionally, each version uses one of four possible error correction values:
- L – 7%
- M – 15%
- Q – 25%
- H – 30%
These error correction values indicate the amount of redundant data, or the maximum theoretical damage/defacement that can occur and still scan accurately. (A 40L QR code holds the most data and a 1H contains the least.)
Determining the Mask
Since the encoding type is masked (bottom four pixels), we have to determine the mask first. This is done by looking at the format pattern, which is masked by a different mask. The format mask is a specific 15-bit mask 101010000010010 (decimal: 21522) used only for this purpose. Of these 15 bits, we’re only interested in the first 5 (for now).
In this specific case, it’s safe to assume dark=1 and light=0, which yields a binary value of 11101. Therefore, we’ll XOR the actual value 11101 with the first 5 bits of the format mask:
11101 ^ 10101 ------- 01000
This means the error-correction level is 01 (L) and the mask ID is 000. Error correction level L allows for around 7% correction, as well as a corresponding decrease in the amount of information that can be stored. This makes sense, after all, since those bits can no longer be used to store “real” data.
Content-Encoding & Maximum Data Length
We’re about to apply the mask, but let’s make some predictions first. Specifically, I’ll walk you through how to determine the encoding type and character count.
Using the tables available on this page, along with the version and error correction values, we can determine the maximum length of the data contained in the code:
We can expect up to 440 bits, 127 digits (0-9), 77 characters (A-Z, 0-9 + special chars), or 53 binary digits. (Kanji encoding is used for special Japanese characters. There’s also an additional ECI mode used for other character sets, but we won’t be using either.)
It’s worth keeping in mind that alphanumeric digits can also be represented as data bits. The number of bits per character depends on the encoding type, with ASCII encoding using 8 bits (7 required + 1 parity bit). Unicode encoding (multibyte) can use more, depending on the character set (hence the drop in numbers for Kanji characters). Various encoding tricks can be employed here, but that’s outside the scope of this article.
Data Encoding Type & Content Length
We need to determine the encoding type in order to determine the maximum data length. The encoding type and data length are encoded in the first 13 bits of the message, starting from the bottom-right. Bits in a QR code are read in a zig-zag pattern, two columns at a time:
The four encoding methods (mentioned above) have the following mode indicators, which are stored in the first four bits. Again, the values in the above image are masked, and will not correlate with these binary values (yet):
- Numeric – 0001
- Alphanumeric – 0010
- Bytes – 0100
- Kanji – 1000
After this is a 9-digit (binary) value representing the character length. It’s padded on the left with zeros to ensure a consistent length. For example, the binary value 1 would be encoded as 000000001.
We can take a quick peek at the data by applying the mask to the bottom corner. I’ll discuss this next, so you’ll have to take my word on it for now. Also, note that the black/white values have changed from above, due to the image processing.
Starting from the bottom right, this gives us a result of:
The first four bits are the encoding type (alphanumeric), and the remaining nine bits (1001010 after removing padding) convert to 74 in decimal. We should get 74 alphanumeric characters once this is fully decoded.
Applying the Mask
The image on the left is the original QR code with reserved areas removed. On the right is mask 000, which we’ve determined to be the appropriate mask for this code.
For contrast, I’ve changed the mask to red before layering it on the original.
I’ve applied the mask visually, using the “difference” filter in GIMP. This essentially applies an XOR to the two layers, but some additional inversion is required. I won’t explain the process behind it, but you can see it applied for an OTP cipher here. The grids were generated in OpenOffice Calc, and I’ve included a link to the original at the end of this post.
At long last, the unmasked data:
Next, the bits shown above need decoded according to their respective schemes. I’m going to skip the discussion on character encoding, since there’s enough room for at least an entire other article on the subject.
The Short Answer
As I mentioned at the beginning of this post, there are several online tools that can decode this easily. In fact, there are tons of tools available to do the decoding. Since no article of this nature would be complete without an Python code example, you can also use the python-qrtools package (Ubuntu).
Encoding #2 – Morse Code
It turns out that the QR code was encoding a series of periods, hyphens and spaces (Morse code).
.. -. ..-. --- ... . -.-. ..-. .-.. .- --. .. ... -- --- .-. ... .. -. --.
Now that we’ve got the Morse code, we simply need to decode it: