Default parameter values in bash

Since it’s often easier to understand with an example rather than a detailed explaination, here are a couple of examples illustrating how to handle default variable values in Bash. In addition, it’s often useful to be able to use environment variables (e.g., to specify the path to a binary in a build script), so I’ve included that as well. All of the code is available on GitHub Gists.

#1 – Specifying a default value for a Bash variable

Here’s a quick and easy method to provide default values for command-line arguments in Bash. It relies on Bash’s syntax for accepting default variable values, which is ${VARNAME:-“default”}. The double quotes allow anything that normal variable expansion allows as far as I can tell.

#2 – Specifying a default value in a Bash function

This is really no different than above, but illustrates how you can rely on the. In this example, the interface name ($iface) can be specified as the first parameter. Each of the functions then uses the same method to gather its arguments, resorting to the “global” defaults (CLI args) if not specified. (Note that in Bash, variables are global in scope by default. To override this behavior, use the local keyword)

#3 – Command output as default variable values

It’s also simple to use the output of an evaluated expression as the default value. This is great for getting system information (username, current working directory, etc.) or information that is easily generated on the command line — date constructs, random passwords, etc.

#4 – Override default values with environment variables

The following script uses the ‘htpasswd’ and ‘openssl’ binaries, which are usually specified by the full path (output of ‘which htpasswd’). By prefixing the standard definition with ${ENV_VAR-$(which htpasswd)}, you can now ‘override’ the default value with the use of an export  statement.

The script also takes an optional first and second parameter, which default to the current user and a random password respectively. If a password wasn’t specified, show the generated password to the user (otherwise, don’t display raw password info).

Example #5 – Just Because

Just a shorter, harder-to-read version.

Example #6 – Exit with an error if parameter is empty

Sometimes the input must come from the user, and the script needs to terminate if the user hasn’t specified the correct arguments. This can be done by using a question mark instead of a default value:
This results in output like:

./foo.sh: line 2: 1: You must specify a username

Example #7 – Exit with an error if binary not found

This could probably be made shorter, but it works. This statement tries to fill the value of $ifconfig with either $IFCONFIG or the output of which ifconfig. If both are empty, the boolean OR || is triggered, which echos an error and returns 1. Still unsatisfied, the final OR is triggered, causing the script to exit with status 1. Structuring your exit codes like this allows this script to be used in a similar fashion inside of other scripts or crontabs.

Default parameter values in bash

BlackBag Tool – A Framework for Rapid Information Discovery

Last Update: 14-Nov-2014

I’ve decided to pick up on the BlackBagTool project, which is an attempt at a program/framework to find interesting information on a mounted hard drive. The end-goal is an application that allows an investigator to gather a 2-minute summary of the information on the drive and act as a springboard for the overall investigation. This is an attempt at nailing down a spec.

Architecture

The layout consists of a series of Python modules and small scripts (installed to /usr/bin) that can be used in conjunction with each other. I’m debating whether or not to include an optional prefix on the command names for namespacing reasons.

The small, individual scripts can then be piped together or included in shell scripts to automate the discovery process. The python modules can also be imported into scripts or used in the REPL.

I’m also aiming to build an application around this set of tools that fully automates the task of:

  1. Take the mount directory as an argument
  2. Determine the operating system (based on files/paths/etc)
  3. Gather relevant OS files (/etc/shadow, ~/.bash_history, recent documents, etc)*
  4. Determine what applications are installed, and possibly which versions
  5. Gather relevant application data (recent files, configuration/settings, history, cookies, etc)
  6. Parse data according to known formats and process fields against known patterns (dates, email addresses, etc)

Email address in  tag.Interesting email addresses can be found in browser history Title fields.

Components:

  • dbxplorer – A module for automatically gathering information about databases on a computer (db files, tables, raw data). Working on support for MySQL and SQLite now.
  • fsxplorer – A module for filesystem scanning.
  • bbtutils – A utility module for gathering information in a consistent way
  • skypedump – A utility for dumping skype information (contacts, chat history, etc)
  • chromedump – A utility for dumping browser information from Google Chrome (history, downloads, favorites, cookies, autofill data, etc)
BlackBag Tool – A Framework for Rapid Information Discovery

Extract one table from a mysqldump file

I recently had to restore a MySQL table from a nightly database backup. Given the size of the dumpfile and the fact that only one table needed modified, I ended up using sed to extract the table:

sed -n '/CREATE TABLE.*table/,/UNLOCK TABLES/p' full_database_backup.sql > table.sql

The -n flag is an alias for –quiet, which suppresses output other than what sed is told to print.  The p at the end of the expression tells sed to print the matches to the screen.
I’ve created a bash script to handle this, and placed it in /bin/dbextract. It’s intended to be used the same way  as the actual command, in that output is directed to stdout. (You’ll want to redirect it with “> outfile”)

Extract one table from a mysqldump file

MySQL datadir on different partition

This writeup will walk you through installing MySQL with the data directory on a separate partition. Although a new install is pretty straightforward, we ran into some quirks when trying to move the data directory on an existing installation. For this tutorial, I’ll be using an otherwise-fresh Ubuntu 14.04 install with MySQL already installed.

The default MySQL data directory (where the database files are stored) is in /var/lib/mysql. I’ll be moving this to a disk mounted at /mnt/SAN for the purpose of freeing up disk space on the VM. (I’m not going to discuss the benefits and drawbacks of doing so, as that’s beyond the scope of this article. I assume that if you’re here, you’ve already determined a need to mount the data directory on another filesystem.)

There are a couple of steps involved in this:

  1. Create the new directory
  2. Stopping the MySQL service
  3. Copying the files to the new location
  4. Editing /etc/mysql/my.cnf
  5. Editing the AppArmor profile
  6. Reloading the AppArmor profile and restarting MySQL

The new data directory will be located at /mnt/SAN/mysql, which will have to be created. When creating this directory, ensure it’s owned by the mysql group and user, and set permissions to 700.

sudo mkdir -p /mnt/SAN/mysql
sudo chown mysql:mysql /mnt/SAN/mysql
sudo chmod 700 /mnt/SAN/mysql

Next, stop the MySQL service:

sudo service mysql stop

or

sudo /etc/init.d/mysql stop

Once you’ve set up the new data directory on your mounted partition, copy the files over:

cp -dpR /var/lib/mysql/* /mnt/SAN/mysql/

The -dpR flags do the following:

-d prevents symlinks from being followed
-p preserves ownership, timestamps and permissions
-R copies recursively

Once the files have copied, ensure the permissions match those of the original data directory (/var/lib/mysql/). Make sure the new mysql directory has the correct ownership and permissions as well!

At this point, a directory listing of /mnt/SAN/mysql should match /var/lib/mysql exactly.

Now, we’ll edit the MySQL config file, located at /etc/mysql/my.cnf. I recommend backing this file up first!

sudo cp /etc/mysql/my.cnf /etc/mysql/my.cnf.bak
sudo emacs /etc/mysql/my.cnf

Look for the “datadir” param, which should be set to the default value of “/var/lib/mysql”

[mysqld]
#
# * Basic Settings
#
user = mysql
pid-file = /var/run/mysqld/mysqld.pid
socket = /var/run/mysqld/mysqld.sock
port = 3306
basedir = /usr
datadir = /var/lib/mysql
tmpdir = /tmp
lc-messages-dir = /usr/share/mysql
skip-external-locking

Change this value to your new mysql data directory (/mnt/SAN/mysql) and save the file.

If you try to start the MySQL service now, it’ll likely fail because AppArmor sees it accessing a directory it’s not supposed to. Dmesg will show errors like this:

init: mysql main process ended, respawning
 init: mysql post-start process (14005) terminated with status 1
 apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/sbin/mysqld" pid=14020 comm="apparmor_parser"
 init: mysql main process (14032) terminated with status 1
 init: mysql respawning too fast, stopped

In order to correct this, we’ll have to tell AppArmor to allow mysql to read/write to the new data directory. Open up the MySQL AppArmor profile:

sudo emacs /etc/apparmor.d/usr.sbin.mysql

Comment out the lines pertaining to the old data directory, and add the new data directory to the AppArmor profile:

...
#/var/lib/mysql/ r,
#/var/lib/mysql/** rwk,
/mnt/SAN/mysql/ r,
/mnt/SAN/mysql/** rwk,
...

Once this is done, reload the AppArmor profile:

sudo apparmor_parser -r /etc/apparmor.d/usr.sbin.mysql

If all the permissions are correct, the mysql service should now start:
sudo service mysql start

or

sudo /etc/init.d/mysql start

If you’re still running into issues, make sure that:

  • The new data directory has the correct permissions
  • The AppArmor profile is correct
  • You’ve started the mysql service (mysqld)
MySQL datadir on different partition

IP Address Validation Without Regular Expressions

Validating an IP address is pretty simple, but requires an obnoxious regular expression in order to account for the possible values. Most examples I’ve seen resort to a regular expression to solve the task, but using a simple [0-9]{1,3} pattern isn’t enough. For example, it won’t prevent an IP like 444.555.666.777 from getting past the filter, so it has to be a little more complex:

((?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))(?![\\d])

Expressions like this suck, so I decided to go about writing my own function to validate an IPv4 address. In Python, it can be done in one line:

  1. is_valid = (ip.count('.') == 3 and False if False in [int(i) in range(0,256) for i in ip.split('.')] else True)

This statement starts with counting the occurrences of the ‘.’ character  in the `ip` variable (string). The interpreter will halt here if the number of octets is incorrect, preventing iteration over clearly-invalid IPv4 addresses like “192.168.1” or “192.168..1.1”. The second half of the statement uses a generator to determine if each octet is a number between 0 and 255 (inclusive), resulting in a list of boolean values. If False is in this list, the statement will evaluate to False.

Timeit shows this function is ever-so-slightly slower than compiling the above regular expression. The regular expression takes between 0.000015 – 0.000025s while the function has been consistently around 0.000025s.

Another variant of the function (using a similar methodology) in PHP looks like this:

  1. function validate_ip($ip) {
  2.     $i = 0;
  3.     foreach(explode('.', $ip) as $part) {
  4.         if ((int)$part >= 0 && (int)$part < 256) {
  5.             $i++;
  6.         }
  7.     }
  8.     return ($i === 4);
  9. }

While newer versions of PHP support generators, this function was written for an older version, hence the difference in formatting. Unfortunately, I don’t have any benchmarks for this function.

 

IP Address Validation Without Regular Expressions

MySQL Database Backup With mysqldump + netcat

I ran into a situation recently where I had to copy a database, but didn’t have the disk space for a full dump. Instead of rsync or scp (which can be done over netcat as well), I opted to pipe the output of mysqldump to netcat and transfer the data directly to the other server.

My setup was Ubuntu server 12.04 and Linux Mint 16 (client). First, start netcat on the client machine on an available port (e.g., 1234) and redirect the output to the desired .sql file:

nc -l 1234 > backup.sql.gz

On the server, we’ll route the mysqldump output through a gzip wrapper and into netcat. In this example, the destination machine (above) is 172.21.1.2, and should hopefully be listening on port 1234 already. (It is worthwhile to note that you should supply the MySQL password in the command itself, rather than just using the “-p” option. The password prompt will be transmitted to the listening machine,  which will end the netcat session. Security-conscious users can enter a space before the command to keep it from being stored in bash history.)

mysqldump -u root -pP@$$w0rd db_name | gzip | nc -w1 172.21.1.2 1234

MySQL Database Backup With mysqldump + netcat

Free Windows Test Virtual Machines

For Linux users looking to run Windows in a virtual machine, but not looking to pay for a copy of Windows, you can download free VM images directly from Microsoft. (Also featured on this page is a link to a game by Microsoft titled “Escape from XP“)

Windows 8.1 (IE 11) Batch Download
wget -i https://az412801.vo.msecnd.net/vhd/VMBuild_20140402/VirtualBox/IE11_Win8.1/Linux/IE11.Win8.1.For.LinuxVirtualBox.txt

Windows 8 (IE 10) Batch Download
wget -i https://az412801.vo.msecnd.net/vhd/VMBuild_20131127/VirtualBox/IE10_Win8/Linux/IE10.Win8.For.LinuxVirtualBox.txt

Windows 7 (IE 11)
wget -i https://az412801.vo.msecnd.net/vhd/VMBuild_20131127/VirtualBox/IE11_Win7/Linux/IE11.Win7.ForLinuxVirtualBox.txt

Windows 7 (IE 10)
wget -i https://az412801.vo.msecnd.net/vhd/VMBuild_20131127/VirtualBox/IE10_Win7/Linux/IE10.Win7.For.LinuxVirtualBox.txt

Windows 7 (IE 9)
wget -i https://az412801.vo.msecnd.net/vhd/VMBuild_20131127/VirtualBox/IE9_Win7/Linux/IE9.Win7.For.LinuxVirtualBox.txt

Windows 7 (IE 8)
wget -i https://az412801.vo.msecnd.net/vhd/VMBuild_20131127/VirtualBox/IE8_Win7/Linux/IE8.Win7.For.LinuxVirtualBox.txt

Windows Vista (IE 7)
wget -i https://az412801.vo.msecnd.net/vhd/VMBuild_20131127/VirtualBox/IE7_Vista/Linux/IE7.Vista.For.LinuxVirtualBox.txt

Windows XP (IE 8)
wget -i https://az412801.vo.msecnd.net/vhd/VMBuild_20131127/VirtualBox/IE8_WinXP/Linux/IE8.WinXP.For.LinuxVirtualBox.txt

Windows XP (IE 6)
wget -i https://az412801.vo.msecnd.net/vhd/VMBuild_20131127/VirtualBox/IE6_WinXP/Linux/IE6.WinXP.For.LinuxVirtualBox.txt

Free Windows Test Virtual Machines

Email Address Validation in PHP

Of all the input to validate, e-mail addresses seem to be one of the trickiest. At first glance, you might try validating the address with a simple regular expression, based on the usual requirements of an email provider. Let’s say two or more characters followed by an ‘@’ sign, followed by two or more characters, then a period and two or more characters. Two characters seems to be a good lower limit, because of addresses like ‘me@domain.com’ or ‘admin@site.co.uk’. But here’s where problems start to crop up.

[a-z0-9_]{2,}\@[a-z0-9]{2,}\.[a-z0-9]{2,}

If we specified only alphanumeric characters, plus maybe an underscore, a domain like “.co.uk” would fail, or only return “user@site.co”. We could add an optional part to the TLD regex to allow domains like that, but it looks like we’ve forgotten about users like “user.name@mail.com”. So maybe we should go back and expand the username portion as well. While we’re at it, we might as well incorporate all of the RFC spec, which results in something like this. While most mail clients (gmail, hotmail, etc) may not allow things like the plus sign (firstname+lastname@mail.com), there are plenty of users out there who still use this for various reasons. When your validation is too strict, chances are you’ve overlooked something.

I won’t continue to drag on about the inadequacy of regular expressions in validating email addresses. If you’ve visited the RFC-compliant regular expression (which is a bit overkill, but illustrates the point nicely), you get the message. Time to move on.

So how can we make sure the user is entering a valid email address? Well, the most practical way would be to simply send them an email. Sanitize the input, and fire off a validation email with a “confirm account” link. No worries about regular expressions, no frustrated users with odd email addresses, and no fake emails. If you’re on a shared host that limits the amount of emails you can send, you could try stripping the domain off of the email and validating the domain before sending. This should stop emails like “asdf@asdf.com” from getting through, but will pass along “aksjdflljklasdflj@gmail.com”. You can do this with the following bit of code:

$is_valid = (filter_var($email, FILTER_VALIDATE_EMAIL)) ? checkdnsrr(substr(strrchr($email, "@"), 1),"MX") : false;

Email Address Validation in PHP

A Backwards Robots.txt File

When a web crawler such as GoogleBot creeps around the web, it starts sucking up information and reporting it back to the search engine. In an effort to keep bots out of certain parts of a website (for whatever reason), a guy by the name of Martijn Koster came up with an idea:

Put a file in the root directory of the site that tells robots what not to look at!

From there, the Robots Exclusion Standard was born. Basically, you create a text file named robots.txt in your root directory (example.com/robots.txt), and it tells crawlers which parts of your website to stay away from. You can read about it in more detail here or by performing a google search.

What’s the problem?

A sample robots.txt file might look something like this:

User-Agent: *
Disallow: /images/
Disallow: /cgi-bin/

In this instance, the file is telling all bots (by using the * wildcard character) that it’s not allowed to look in the /images/ or /cgi-bin/ folders. This is reasonable enough, and most legitimate web crawlers follow the robots.txt file. However, you can plainly view the file in your browser (see:http://www.facebook.com/robots.txt) and this does nothing to prevent malicious or poorly-coded bots from ignoring your wishes. The robots.txt file is essentially a sign that reads “I have data in these folders that I don’t want anyone to know about. Please don’t look there and please don’t tell anyone.”

[Do not throw stones at this sign.]

If I’m snooping around a website, one of the first things I look at is the robots.txt file. It’s usually a huge list of things that people don’t want you to look at – which, of course, makes me all the more interested in looking for them. Here’s an example:

User-agent: *
  Disallow: /admin/
  Disallow: /members/
  Disallow: /webmail/
  Disallow: /personaldata/

I hope you see the problem.

Originally, the robots.txt standard only allowed a Disallow directive, but lots of search engines are now incorporating an Allow directive, as well as some basic pattern matching.

I leveraged the Allow directive to write a “backwards” robots.txt:

User-agent: *
Disallow: /*
Allow: /$
Allow: /articles/
Allow: /files/
Allow: /txt/
Allow: /tor/
Allow: /tools/

Allow: /about Allow: /anon-sopa Allow: /cards Allow: /computers Allow: /crypto Allow: /cryptographic-hashes Allow: /documents Allow: /ems-home Allow: /ems-videos Allow: /index Allow: /links Allow: /medicine Allow: /misc Allow: /software Allow: /voynich Allow: /zombies

To break this down line-by-line:

  • User-agent: tells all bots that they should follow these rules
  • Disallow: / tells the bot not to crawl the entire site
  • Allow: /$ makes use of Googlebot’s pattern matching, and allows http://cmattoon.com/ to be crawled, as the URI ends in a slash. (The $ marks the end of the URI.) This overrides the Disallow: /* directive on the line before it.
  • As you can see, the file goes on to grant permission for the public parts of the site, rather than announcing the parts I want to remain hidden.

The big question becomes whether to Disallow a directory (in my case, the entire site), then grant explicit permission (General => Specific), or whether to Allow files before issuing a Disallow for the directory. I can’t find a solid answer on this, so I’m modeling mine based on Google’s robots.txt (I’ve heard they know a thing or two about search engines). Google follows the (logical) General => Specific pattern, which was my first intuition. Mark the calendar: I did something right on the first try!

As a warning, this could easily cause a conflict with any of the myriad crawlers out there. There is no uniform standard, and nobody (including you!) is required to adhere to the recommendations that do exist.

That being said, a quick test of my site with the new backwards robots.txt (conducted using this tool) showed that it works for the major search engines. I’m not very concerned about my search engine ranking, so I’d rather be a geek and play with the file than fret over my page rank. If page rank and SEO are important to you, this may not be the best way to go.

Finally, for the people that are really worried about this, I recommend looking into using metadata, or playing with things like the x-robots-tag. There’s also an article on .htaccess and SEO that discusses the canonicalization of HTTPS vs HTTP versions of your site.

A Backwards Robots.txt File