This is a walkthrough of InfoSec Institute’s CTF challenge, Level 12.
As I mentioned in some of the other walkthroughs, the first step is to look through the source code for anything that’s out of place. After that, I typically evaluate the headers and other responses (with Chrome’s developer tools) and proceed from there. Anything that the site loads will be revealed in the “Network” tab, so it’s a pretty good source of information that’s always available.
In this level, the file “design.css” was out of place. Viewing the contents showed an invalid CSS statement:
In CSS, colors are typically specified with their hexidecimal value. (There are a couple of other acceptable formats, but that’s irrelevant for now)
Load that string into a Python interpreter, and use the built-in “decode” function. Pretty intuitive, yeah?
This is a walkthrough for InfoSec Institute’s CTF Challenge, Level 11.
The only immediate difference between this and level 10 is the addition of a grainy PHP logo.
Grainy images are one indicator of steganography, so I proceeded along the route of checking for readable strings. Using the strings command again revealed the flag instantly (but read on!). However, opening it in emacs revealed that it was in the header of the image file.
This is hardly the same as steganography, which hides the message in the image data. This flag is hidden in the image’s EXIF (Exchangable Image Format) data, which provides metadata about the image. If you have exiftool installed (apt-get install exiftool, IIRC), you can get the same information:
exiftool php-logo-virus.jpg | grep -i infosec
The “document name” field contains the string, plus two additional bytes. If you had trouble viewing the image properties, it was likely because the viewer wasn’t prepared for the extra bytes at the end of the string. Remember, the “strings” command only reveals printable characters.
Depending on whether or not the flag includes the extra bytes, there are two options:
Bytes 240 and 206 are outside of the printable range, and not valid Unicode, as they’re missing the BOM. The control characters correspond to the end of the field, which is NUL-terminated. These characters are present because of the way emacs is forced to display something, but you can see the true value with a hex editor. I used hexedit in the following screenshot:
The additional unprintable characters are in red and NUL bytes in yellow. Since the bytes are part of the field (as far as any EXIF parser is concerned), these bytes are part of the field. That leaves us with bytes A0 86 01 unaccounted for.
Note: Your raw data may differ from mine, due to endianness. If your hex editor displayed this, it’s correct, but you’ll notice each pair is switched.
Before we cross over the threshold into text-encoding hell, extended ASCII sets, control characters, and all the levels dedicated specifically to Unicode, let’s take a step back. (Sorry guys, I only led you down this road to take a look at the actual contents of the field!)
Occam’s razor says the field was obfuscated to prevent what I’ll dub a “View Properties Attack”, and we’re dealing with a printable-characters-only string. This is a n00bs challenge, afterall.
Although it lacks the characteristic ending equals signs of a base64-encoded string, the flag we have does contain a valid base64 string. (The ending == signs are for padding, and not always present.)
This is a walkthrough for InfoSec Institute’s CTF challenge, Level 9.
The challenge presents with a login screen for a Cisco Intrusion Detection System (IDS). I tried a few typical username/password combinations (root/root, admin/password, etc) before googling “Cisco IDS default password”.
Sure enough, ‘root’/’attack’ worked, and the flag was given in a popup box:
At first glance, this looks like the string presented in their Level 4 CTF challenge, but the character spacing is all wrong. We already determined that they’re using the format “infosec_flagis_?????”, and they’re unlikely to change the grouping since it helps identify the flag in a CTF event.
The flag is presented in plaintext, but reversed. To undo this, use the “rev” command in Linux, which reverses a string passed into it:
This is a walkthrough on Level 8 of InfoSec Institute’s CTF challenge. The challenge begins by asking if you’d like to download “app.exe”. Since I’m not about to run an untrusted *.exe file (and I’m on Linux anyway), I decided to open it up in emacs. The flags follow a common format, so performing a string search can’t hurt:
Well, that was easy.
This can also be done with the strings command, which prints strings of printable characters. Binary files do have quite a few readable characters, so combining strings with grep shouldn’t hurt (the -i flag means case-insensitive search):
This challenge is linked directly to a file called “404.php”, that serves up the following content:
f00 not found
Something is not right here???
This is intentional, and not an accidental 404, given the level-specific bounty and the fact that it’s linked directly in the menu. Let’s tryhttp://ctf.infosecinstitute/levelseven.php, since that’s what all the other levels are. Sure enough, it works. Kind of.
The page is blank, but instead of a 404 status code, we get 200. Well, not really:
The HTTP status is 200, but the status text should be “OK”, so let’s see what it actually says:
Ahh, another base64-encoded string. We came across that in level 2, so we’ll just use that atob() function again:
Easy enough! But why does this work? Did they hack the internet?!
The HTTP status code is separate from the status text – they’re just commonly used together. We can generate the same effect with PHP’s header function.
die(header("HTTP/1.0 404 Just kidding, it's here."));
It’s important to note that some software (crawlers, for example) may only look at the status code. Generating random HTTP statuses because you can is generally not a useful thing to do in real life ;)
This is a writeup on Level 4 of InfoSecInstitute’s CTF challenge.
As discussed in Level 2, we begin with a general survey of the page. After checking over the page and not finding anything that stood out, I began inspecting the HTTP headers. (The hint, after all, is “HTTP stands for Hypertext transfer protocol”.)
The response looks about the same, with the exception of an additional cookie that wasn’t present before:
Note: If you’ve been browsing around their site, and came across this level before, the cookie WILL be set on all headers, since it’s set for all pages on the domain ctf.infosecinstitute.com. You can see this for yourself by looking at the cookie file. (In Chrome, visit chrome://settings/cookies)
Since this is clearly encoded, encrypted, or otherwise obfuscated, I made a few (correct) assumptions to find a starting point.
A low level (Level 4) challenge would use a classical cipher, rather than true symmetric/asymmetric encryption (ala PGP/AES)
A digraph (“bb”) and two groupings ending with “v(r)f” indicate to me that it’s likely a monoalphabetic substitution cipher.
Note that these assumptions come from having a bit of experience with cryptography, and are simply a starting point. Since we have a short ciphertext, it’s difficult to be sure about anything until we actually attempt decryption. While it could possibly be a Vignere, Playfair, or something more complex, starting off with a simple monoalphabetic substitution cipher seemed like the best bet.
The most well-known of all the classical ciphers is probably the Caesar Shift cipher (or ROT-13). I don’t think I’ve ever met anyone who wasn’t familiar with it, but it’s not a question I typically ask. To be precise, the Casear cipher is a specific key setting for a shift cipher, in which the letters of the alphabet are ROTated by a certain number. The ROT-13 cipher rotates the alphabet by 13 characters, resulting in the following shift:
To solve, simply decode the value of the cookie. (Note that the ‘=’ sign is not part of the allowed message space, so the values will have to be decoded separately.)
If it wasn’t a key of 13, we could try each possible combination, looking for (hopefully English) text that makes sense. This is known as a brute-force attack, and would be the next logical step before moving on to additional cipher types. In fact, if you used an online tool or other program to solve it, it likely worked by generating each of the 26 possible decryptions, then checking which one(s) had one (or both) of the following:
A character frequency distribution approaching that of the expected plaintext language.
Common words in the expected plaintext language
Method 1 is the most accurate, given “enough” ciphertext. It’s a statistical comparison between how often certain letters appear in a language, and how often they appear in the ciphertext. When the ciphertext is short, however, this method can fail since there’s not enough data to accurately compare it.
The second method is especially useful in situations like this, where we know some of the plaintext (the words “infosec” and “flag”). In fact, this makes it easier to write the script on-the-fly if you’re not familiar with calculating frequency distributions, or are otherwise “not a math person”. By scanning each decrypted (deciphered) set of characters for these words, we can rapidly narrow down our search for the correct cipher and key. (I wouldn’t waste time searching for “is”, because it’s too short to be significant.)
Disclaimer: Capturing the flag for this level takes around 30 seconds with online tools, and is basically a no-brainer. Since this is about education and learning, and not blindly using tools, let’s dig into what’s actually going on here.
The message for this level is encoded with two different methods. The first method we have to decode is obviously the QR code, but some simple QR code readers will fail, since the encoded data is not a URL (hint).
This article will touch upon the basics of manual QR code decoding, but is far from being a comprehensive source of information on the subject. For the basics, I’ll refer you to QR Code Essentials, a PDF document published by the creators of the QR code. If you’re looking for information on implementing the scheme, or want to explore the subject in-depth, I highly recommend thonky’s tutorial on the subject. It’s a very thorough article aimed at programmers and is invaluable in working with QR codes in software.
QR Code Basics
QR codes were developed as an alternative to the 2D barcode schemes (commonly used on driver’s licenses and such). It has several advantages over the 2D schemes and is much more resilient to human error and physical damage than other barcoding schemes. If you’ve ever tried to implement a barcode scanning system, you’re no doubt familiar with the headaches that come with it.
A QR code stores information in binary format – each “pixel” represents a 1 or 0. In the interest of specificity, each square can obviously be much larger than a true pixel, and the black and white coloration doesn’t exactly correlate to 1 and 0, for reasons that will be explained shortly. In the QR code world, each of these squares is referred to as a module (light modules and dark modules).
For starters, let’s take a look at the anatomy of a QR code:
One of the major strengths of QR codes is their error correction, which allows creative use and distortion of the code while maintaining its ability to be scanned.
It does this in part by applying a bit mask to the data, which results in skewing the correlation between dark/light modules and 1/0 values.
Essentially, the bit mask is determined by looking for areas of consecutive matching squares (penalty score). Since large areas of black/white space could make automatic decoding unpredictable or impossible, a mask is used to achieve a more uniform distribution of dark/light modules.
Note that the bit mask applies only to the data and error correction areas, and not any of the reserved ones. For this reason, it’s sometimes referred to as the D/E mask.
Determining the Version
First, we’ll need to determine the version. The version, along with the error correction level, determines how much data can be stored in the QR code. Note that this is not a sequential version (like most things in the IT world), but rather an indicator of the dimensions.
There are 40 possible QR code versions, which have dimensions ranging from 21×21 (v.1) to 177×177 (v.40) modules. The version can be inferred by the width of the overall QR code (in modules) using the formula:
width = (version * 4) + 17
version = (width – 17) / 4
In this case, the width of the QR code is 29, which makes this a version 3 QR code. Additionally, each version uses one of four possible error correction values:
L – 7%
M – 15%
Q – 25%
H – 30%
These error correction values indicate the amount of redundant data, or the maximum theoretical damage/defacement that can occur and still scan accurately. (A 40L QR code holds the most data and a 1H contains the least.)
Determining the Mask
Since the encoding type is masked (bottom four pixels), we have to determine the mask first. This is done by looking at the format pattern, which is masked by a different mask. The format mask is a specific 15-bit mask 101010000010010 (decimal: 21522) used only for this purpose. Of these 15 bits, we’re only interested in the first 5 (for now).
In this specific case, it’s safe to assume dark=1 and light=0, which yields a binary value of 11101. Therefore, we’ll XOR the actual value 11101 with the first 5 bits of the format mask:
This means the error-correction level is 01 (L) and the mask ID is 000. Error correction level L allows for around 7% correction, as well as a corresponding decrease in the amount of information that can be stored. This makes sense, after all, since those bits can no longer be used to store “real” data.
Content-Encoding & Maximum Data Length
We’re about to apply the mask, but let’s make some predictions first. Specifically, I’ll walk you through how to determine the encoding type and character count.
Using the tables available on this page, along with the version and error correction values, we can determine the maximum length of the data contained in the code:
We can expect up to 440 bits, 127 digits (0-9), 77 characters (A-Z, 0-9 + special chars), or 53 binary digits. (Kanji encoding is used for special Japanese characters. There’s also an additional ECI mode used for other character sets, but we won’t be using either.)
It’s worth keeping in mind that alphanumeric digits can also be represented as data bits. The number of bits per character depends on the encoding type, with ASCII encoding using 8 bits (7 required + 1 parity bit). Unicode encoding (multibyte) can use more, depending on the character set (hence the drop in numbers for Kanji characters). Various encoding tricks can be employed here, but that’s outside the scope of this article.
Data Encoding Type & Content Length
We need to determine the encoding type in order to determine the maximum data length. The encoding type and data length are encoded in the first 13 bits of the message, starting from the bottom-right. Bits in a QR code are read in a zig-zag pattern, two columns at a time:
The four encoding methods (mentioned above) have the following mode indicators, which are stored in the first four bits. Again, the values in the above image are masked, and will not correlate with these binary values (yet):
Numeric – 0001
Alphanumeric – 0010
Bytes – 0100
Kanji – 1000
After this is a 9-digit (binary) value representing the character length. It’s padded on the left with zeros to ensure a consistent length. For example, the binary value 1 would be encoded as 000000001.
We can take a quick peek at the data by applying the mask to the bottom corner. I’ll discuss this next, so you’ll have to take my word on it for now. Also, note that the black/white values have changed from above, due to the image processing.
Starting from the bottom right, this gives us a result of:
The first four bits are the encoding type (alphanumeric), and the remaining nine bits (1001010 after removing padding) convert to 74 in decimal. We should get 74 alphanumeric characters once this is fully decoded.
Applying the Mask
The image on the left is the original QR code with reserved areas removed. On the right is mask 000, which we’ve determined to be the appropriate mask for this code.
For contrast, I’ve changed the mask to red before layering it on the original.
I’ve applied the mask visually, using the “difference” filter in GIMP. This essentially applies an XOR to the two layers, but some additional inversion is required. I won’t explain the process behind it, but you can see it applied for an OTP cipher here. The grids were generated in OpenOffice Calc, and I’ve included a link to the original at the end of this post.
At long last, the unmasked data:
Next, the bits shown above need decoded according to their respective schemes. I’m going to skip the discussion on character encoding, since there’s enough room for at least an entire other article on the subject.
The Short Answer
As I mentioned at the beginning of this post, there are several online tools that can decode this easily. In fact, there are tons of tools available to do the decoding. Since no article of this nature would be complete without an Python code example, you can also use the python-qrtools package (Ubuntu).
Encoding #2 – Morse Code
It turns out that the QR code was encoding a series of periods, hyphens and spaces (Morse code).
This article is a solution to Level 2 of InfoSecInstitute’s CTF challenge. The challenge can be found here.
Although the solution is actually very simple, I’m going to describe a number of other steps to consider. If you just want the final answer, feel free to skip to the end.
Step 1 – Static Source Code Analysis
The first step is to look at the underlying source code. I’ll typically do a quick scan to see if anything is out of place or poorly-hidden. Incomplete or shoddy cover-ups can serve as a red flag to identify areas that need further investigation.
The next chunk of code is a generic asynchronous loader function for the domain pardot.com. Oh, it’s just some marketing code from a (probably legitimate) company. I’ve never heard of them, but I’m not immediately suspicious.
There are a couple of unaccounted-for variables in the second snippet, but they appear to be consistent across pageviews, so I’m assuming it’s a unique identifier of some sort. I’ll mark a note to return here later, if needed.
Finally, there’s this:
The “leveltwo.jpeg” image is inside the “lvlone” div. This could certainly be a simple typo, but typographical errors break things, too. Since there’s nothing super solid to go on, and the message references the image file, I’ll start the next step with this.
Step 2 – Checking Dependencies
Any number of scripts (or other resources) can be included in the request. Although it’s not at all uncommon for people to host their own libraries versus using a content delivery network (CDN), seeing local paths to scripts and CSS files for common libraries should be investigated further.
If we trust the CDN, we can trust that the content it delivers is the actual jQuery library or Bootstrap CSS file, and not a malicious script that someone simply named “bootstrap.min.css”.
This doesn’t account for MITM-style vulnerabilities, where someone listening on your network could supply an altered copy of these libraries.
The local files could be verified with an md5 hash, but I’m more interested in the image at this point. I’ll save that for later if nothing else turns up.
Open up the developer console (F12 in Chrome), and view the “Network” tab. This tab shows all of the individual components required to render the page.
The most interesting thing to me is the 200 (OK) HTTP response from the server. This indicates that the image does exist, but it’s still not showing up. If the image didn’t exist, I’d expect a 404 (Not Found) response, which is what is returned with this image tag:
You’ll see the following requests for this page:
Note: If you receive a 302 (Not Modified) response instead of a 200, it simply indicates that the image is already saved in your browser’s cache. The server is telling your browser there’s no need to download a new copy, since the image hasn’t changed. Ctrl+F5 will force the browser to get the image from the server.
This leaves us with a missing image that isn’t actually missing. Since the image can’t be right-clicked on and saved, we can rule out a fake image of a missing image, like this one:
Open the image in a new tab by right-clicking on it and selecting “Open in New Tab/Window”. Developer tools confirms that the image exists, since we get another 200/302 response:
The response headers are curiously missing a Content-Type header, which is usually present. Specifically, I’d expect:
Again, this is nothing super-concerning, just another subtle clue that something is amiss. It could be a simple oversight or server misconfiguration that’s causing the image to not be displayed, or it could be a corrupt image.
By clicking on the “Response” tab, we can view the raw data received from the server, which is:
This is an easily-identifiable base64-encoded string, which is a valid way of sending images on the web. If you’re not familiar with this method, open the following link and inspect the Response.
Notice the difference in the string lengths? There’s not enough data in the leveltwo.jpeg file to be a valid image (probably). Instead, it looks like an encoded string (text).
The image above (apple.png) isn’t actually an image file. It’s a PHP script that outputs a base64-encoded image.
I’ve used an .htaccess rewrite to make it appear as an image at first glance, and for all intents and purposes, it is an image!
Remember that missing Content-Type header? That’s the missing piece that tells the browser to interpret it as image data, and not a bunch of garbled characters. Here’s the same script, but without the Content-Type header:
Here’s a comparison of the two, with and without the Content-Type header:
You can find the source code for these test files on GitHub.
That’s all there is to it! Now, go find the string and decode it!
(If you’re too lazy for that, click the image below)