Update: Added a colorize function:
Here’s a quick Python script to visualize binary data. In the grayscale example, each pixel is the color of the bit value (0x00 – 0xFF). The same method is used for colorization, except the bit value is used to provide hue and value values for HSV colorspace (saturation is fixed at 0.99).
The cols parameter is the width of the image to be generated (in pixels). By default, the script generates a couple of different sizes. The height is calculated based on the width. Patterns tend to be clearer when the column width is a multiple of 8 (16, 32, 64, 128…), though that could depend on the format and type of data in the file.
As an example, here are some images from a 256-byte file generated with the following Python program:
with open('foo.txt', 'wb') as fd: for i in range(256): fd.write(chr(i))
./process_dir.py <dirname> <cols>
The program will generate images for each of the binaries in the specified directory, create an “index.html” file and attempt to launch it in the browser.
The generated image on the left is from a PNG file. A dark patch in the beginning with a mostly-uniform distribution is consistent with file headers followed by image data.
The image to the right is an OpenOffice Writer file. The striped area indicates a repeating pattern of bytes, which often separates the metadata header and content in word processor files. The example screenshot shows an image generated from a compiled binary.
This can also be used to visually approximate the amount of entropy in a file. A high-entropy file would have a uniform byte distribution, thus occupying all of the available colorspace. I’ll include a histogram function later. This would show the frequency distribution of the bytes as well.
Compare the outputs of the following files:
- An MP3 file
- A TrueCrypt container (AES with RIPEMD-160)
- A plain text file