Install R Packages on Shiny Server Pro

If you’ve installed Shiny Server Pro as a user other than shiny, you might have experienced difficulty adding R packages. This is because Shiny Server Pro runs R as the shiny user, and running R -e “install.packages(‘foo’)” will install packages to the local user’s files only.

The solution to this is to su to the shiny user:

su - shiny

And run

R -e "install.packages('foo', repos='http://cran.rstudio.com')"

Alternatively, this script will parse an R file looking for require statements and install the necessary packages. It isn’t very smart, so be careful.

Install R Packages on Shiny Server Pro

20 Essential Linux Commands

This post provides examples of everyday Linux commands. I wrote this as a quick orientation to the CLI for newcomers. Keep in mind that there are usually several ways to accomplish a task, and other command combinations or programs might be better suited to your needs. Don’t be afraid to google it.

Overview

  • apropros, man
  • ls, ll
  • pwd, cd
  • mv, cp
  • rm, shred
  • head, tail
  • more, less
  • wget, curl
  • cat, strings
  • grep, find

Get Help With Commands By Using MAN

The examples on this page represent a small fraction of the possible uses for each command. Many commands have tons of flags and arguments that allow them to adapt to many scenarios. Since it’s impossible to remember how each program works, its arguments, etc., most come with a manual page, or man page. The man command retrieves the manual for the program and displays it on the screen. While sometimes verbose, man pages are typically your best source of initial information. This is the Manual in  “RTFM“.

The man system has several different sections, each providing documentation on a specific aspect of the program:

  1. General Commands
  2. System Calls
  3. Library functions (Particularly the C STL)
  4. Special files and drivers (Typically in /dev)
  5. File Formats & Conventions
  6. Games and Screensavers
  7. Miscellaneous
  8. System Administration Commands and Daemons

Typing man <command> will generally display section 1, if it exists. Calling man on something else — for example, the C function pthread_join, will display page 3 by default. To view other sections, type man <page> <command>. Note that not all programs have manual pages. Of those that do, most don’t have manual pages in each section. On some systems, you can type “man <page>” and press TAB to view a list of pages available for that section.

To view the manual page for man itself, type man man.

Finding the right Linux command with apropos

The Unix Tools Philosophy aims for tools that serve a specific purpose and can be chained together and results in a seemingly-endless variety of tools to accomplish the same job. The apropos command will search the manual pages for terms and return a list of possible commands. This “search” feature isn’t full-featured and works mostly on keyword matching.

    apropos ftp
apt-ftparchive (1) - Utility to generate index files
ftp (1) - Internet file transfer program
netkit-ftp (1) - Internet file transfer program
netrc (5) - user configuration for ftp
pam_ftp (8) - PAM module for anonymous access module
pftp (1) - Internet file transfer program
sftp (1) - secure file transfer program
smbclient (1) - ftp-like client to access SMB/CIFS resources on servers

Sometimes the number of commands can be unwieldy (try “apropos user” or “apropos ip“). Since the apropos command doesn’t do well with terms like “add user” or “ftp upload”, it’s sometimes useful to join the output with grep. You can also pipe the output to more or less.

    apropos user | grep add
adduser.conf (5) - configuration file for adduser(8) and addgroup(8) .
addgroup (8) - add a user or group to the system
adduser (8) - add a user or group to the system
pam_issue (8) - PAM module to add issue file to user prompt
useradd (8) - create a new user or update default new user information

List directories with ls

ls is the command for listing a directory. Useful flags include:

  • -a (–all), which shows dotfiles
  • -l which provides a long listing that includes file size and type,
  • -h (–human-readable) shows filesizes in normal units instead of bytes (1.23 GB instead of 1234921293)
  • -S to sort by file size

Useful combinations:

Default ll command, plus human-readable sizes

    alias ll=”ls -alh”

List all tar files in current directory:

    ls *.tar

List tar files in long format:

    ls -l *.tar

List the largest 10 files in a directory and output size in human readable form:

    ls -lhS | head -n 10

 

Navigate the filesystem with CD and MC

Navigating around the filesystem is done with the change directory (cd) command. Instead of merely listing the contents of the /tmp directory (ls -l /tmp), you can move into the /tmp directory and list the contents of the current directory:

cd /tmp && ls -l

To return to your home directory, simply type cd with no arguments. You can also use the shortcut ~ to refer to files from your home directory. The two paths below describe the same location on the filesystem, assuming that the second command is run by the cmattoon user.

/home/cmattoon/Desktop/foo.py

~/Desktop/foo.py

If you aren’t sure which directory you’re in, you can use the pwd (Present Working Directory) command to tell you.

Midnight Commander (mc) is a third-party application that some people find useful for navigating the filesystem, copying and moving files, etc. You can find more information on their site.

Move, Rename and Copy files with mv and cp

Copy a config file to a backup:

    cp config.ini config.ini.bak

Or:

cp config.ini{,.bak}

Copy the entire config directory and it’s contents:

    cp -r config/ config-backup

Remove files with rm and shred

There is no “undo” command for rm.

The rm command removes files basically forever, so be careful.

There is no “undo” command for rm. Some people choose to edit their ~/.bashrc or ~/.bash_aliases and add the following:

alias rm="rm -i"  # Ask for confirmation before deleting files.

Linux makes the process of deleting a file forever deceptively simple:

    rm DELETEME.txr

To indiscriminately remove everything in the “/tmp” directory:

    rm -rf /tmp/*

Note: Either “rm -rf /tmp/” or “rm -rf /tmp” would delete the “/tmp” directory itself.

rm My\ Document

For private information, you might consider using shred.

cp ~/Downloads/ImportantDocument.pdf /mnt/backup/ && shred ~/Downloads/ImportantDocument.pdf

The -s (–size) flag takes an optional filesize (e.g., “1M”, “100K”, “1G”, etc.) and -u (–remove) removes the file after it’s done shredding.

Get first and last n lines from file with head and tail

The head and tail commands retrieve the first and last n lines from a file or stdin. Pipe to either of these to pass output to other commands:

Shows first 10 lines of README.md

    head README.md

Continuously output (–follow) the last screenful of information from /var/log/apache/error.log:

    tail -f /var/log/apache/error.log

Stores the last line of the log to $LAST_ENTRY

    LAST_ENTRY=$(cat /var/log/app/actions.log | tail -n 1)

Read large files one screen at a time with more and less

More and less are two programs that filter text output. They’re commonly used to page through large files, but can also be used to buffer output from programs. It’s a helpful habit to read config files with more or less, rather than open it with a text editor. In both programs, you can find help by pressing h and exit by pressing the q key.

    cat README.md | less

Writes the output of the install process to the screen for later

    ./install.sh 2>&1 | less

Although both programs are very similar, less is the newer one with more features. Specifically, less allows forward and backward navigation (via arrow keys and PgUp/PgDn) and doesn’t have to read the entire file into memory. This makes it more efficient on large files than it’s predecessor, more.

Use the slash key (/) to begin a search. While a search is in progress, the “n” key will move to the next result.

Download files with curl and wget

Since both curl and wget support HTTP/HTTPS and FTP, they are especially useful for interacting with web-based services like API’s and HTML forms. Both programs use the HTTP GET method by default, but are capable of others as well (POST, HEAD, PUT, etc..), and both support SSL. cURL supports even more protocols including Telnet, SCP, SFTP, POP3, IMAP, SMTP and LDAP, and a number of other features. 

Generally speaking, I prefer wget for downloading files and cURL for

Note: Ubuntu comes with wget, but you’ll need to install curl. CentOS and OS X are the opposite. You’ll probably need to download one or the other.

To download a file with wget:

    wget <URL>

If the URL contains special characters, or is pointing to a script, it’s sometimes better to wrap the URL in quotes and use the -O flag to specify an output file.

    wget “http://example.com/get_image.php?id=1234&size=130” -O image.png

Without the -O flag, wget would save the file as “get_image.php?id=1234&size=130” – which is unlikely to work as an image in any capacity.

While wget saves the file to the current directory (or the path specified by -O), curl’s default action is to write the output to stdout. To echo your current public IP address, you can run:

curl icanhazip.com

To download a file with curl, you’ll need to redirect stdout to a file:   

    curl “http://example.com/get_image.php?id=1234&size=130” > image.png

Curl and wget both have “quiet” or “silent” modes that suppress output. This mode is particularly useful for scripts and cron jobs where you don’t want extra output cluttering the screen.

curl -s “http://example.com/installer-x86_64-0.1.0a-rc1.tgz” > installer.tgz
wget -q “http://example.com/installer-x86_64-0.1.0a-rc2.tgz” -O installer.tgz

If you still want to see error output, but no progress bar, you can use -sS in curl. The lowercase -s is for silent mode, the uppercase -S for “show errors”.

For more details, type “man wget” or “man curl”.

Check disk usage with du and df

To check the amount of disk space available, use the df command (think “disk free space”). The du command will show you the amount of disk space used in the specified directory. Like the ls command, both df and du can output the human-readable filesize by using the -h flag.

Output of df -h will show the disk space for all mounted drives by default:

df

To see how much space the current directory is taking up, use du -sh. The -s flag means “summary”, and prints the total usage of all subdirectories. WIthout the -s flag, du will generate a report for each subdirectory. This feature can be useful for finding the largest n files in a directory. The following command finds the 10 largest subdirectories of the current directory. By piping the output of du into sort (-h sort by human-readable filesize, -r reverse), we can sort the files from largest to smallest. That output is then piped into head to retrieve the top 10 only.

    du -h . | sort -rh | head

Of course, you could pipe this output to more or less and peruse the entire list of directories, but there’s already a better tool for this: ncdu. (The “nc” alludes to the ncurses library used to render the user interface.) As you can see in the screenshot below, ncdu provides an easy way of tracking down large files.

ncdu

 

Get file contents with cat and strings

The cat and strings commands are used to write file contents to stdout. The cat command will dump the raw file contents (in whatever form), while strings will print only printable characters. This feature makes the strings command a useful choice in identifying a file format or other initial discovery tasks.

strings-cat
strings vs. cat – Output of “strings /bin/true” is on the left; output of “cat /bin/true” is on the right.

Print raw binary data of /bin/true to stdout:

    cat /bin/true

See all human-readable strings in the “true” binary:

    strings /bin/true

Zero a file with cat:

    cat /dev/null > README.md

Find what you’re looking for with grep and find

Grep (Globally search a Regular Expression and Print) is useful for finding strings in files (or stdout). The find utility is used for searching by file name, size, etc.

To find all PHP files with the string “@todo” (case insensitive) in the src/ directory:

    grep -i "@todo" src/*.php

Recursively search the src/ directory for files containing the string “@todo” (case-insensitive):

    grep -ri "@todo" src/

This uses the -r (–recursive) and -i (–ignore-case) flags. As you may suspect, the –recursive flag searches the directory recursively, while the –ignore-case flag ignores the difference between uppercase and lowercase characters.

Grep is also useful to filter output from commands or stdout:

cat /var/log/apache2/error.log | grep -i "fatal error"

Watch the error log for lines containing the IP address “132.45.67.89”:

tail -f /var/log/apache2/error.log | grep "132.45.67.89"

If you have multi-line output in the log, grep will cut off all but the first line. If you want to see lines on either side of the target line use the -A (–after-context) or -B (–before-context). For example, consider grepme.txt, a file with “This is Line #n” from 0-30. Both commands produce the same output:

    grep 20 grepme.txt -A 5 -B 3
    cat grepme.txt | grep "20" -A 5 -B 3
Output:
    This is Line #17
    This is Line #18
    This is Line #19
    This is Line #20
    This is Line #21
    This is Line #22
    This is Line #23
    This is Line #24
    This is Line #25

Other useful examples include -v, which inverts the match and the -L/-l flags that show filenames of lines matched instead of lines matched.

Show all lines in access log that don’t include “GoogleBot”:

tail -f /var/log/apache2/access.log | grep -v GoogleBot

Show the names of files in the current directory (and subdirectories) that don’t have “@license” in them:

    grep -riL "@license" .

Show the names of files that have “@todo” in them:

    grep -ril "@todo" .

Show all lines with “@todo” in the current directory (recursive). Exclude the “img” and “templates” directories from the search.

grep -ri "@todo" . --exclude-dir="templates" --exclude-dir="img"

The find command is useful for finding files based on filename, size, type, or other attributes. In its simplest form, the find command searches for a filename:

find ~/Downloads -name '*.tgz'

The above command searches the ~/Downloads directory for files matching the pattern ‘*.tgz’. Since no -type is specified, it’ll search for files or directories.

Let’s look for files (only) in ~/Downloads that are over 100 MB in size:

find ~/Downloads/ -type f -size +100M

To find files smaller than 100 MB:

find ~/Downloads/ -type f -size -100M

To search /var/log for files older than 30 days and delete them:

find /var/log -type f -mtime +30 -exec rm -f {} \;

You can also use the built-in -delete flag:

find /var/log -type f -mtime +30 -delete

 

20 Essential Linux Commands

Google SWE Interview Preparation

I interviewed at Google Pittsburgh a while back (as a result of Google FooBar), and while I signed an NDA regarding the interview questions, I can provide a brief overview of the process. Ultimately, I did not receive an offer, so take this for what it’s worth.

Preparation

Google will email you some official interview preparation materials, which you should obviously review. They outline the process very thoroughly, as well as provide an outline of possible material. If you’ve prepared for technical interviews before, much of this content is not a surprise, but it would be foolish not to review everything they’ve sent.

How does their interview process work?

Typically, there are phone interviews, then an on-site interview. I skipped the phone interview stage because of FooBar, and went directly to the on-site interviews.
If you are selected after submitting an application, or re-apply, you’ll be asked to do a phone interview first.
Since I can’t offer guidance here, I’ll refer you to Google’s Interviews page for specific details.

How much time should I allot to studying?

This answer depends on how comfortable you are with your CS fundamentals. Most people dedicate at least a month, possibly more. A recruiter told me they’re not able to schedule interviews greater than 30 days ahead, but you have the option of contacting them later to schedule. From every interaction I’ve had (a couple recruiters, and the engineers on-site), they genuinely want you to be at the top of your game when you come in. Take your time. They’re almost too cool about making sure you’re prepared for the interview process.

On-Site Interview

The on-site interview can be done over one day or two. I’m not sure what game theory says here, but I went for the one-day interview. This consisted of five interviews, about 45 minutes each. (You’ll also meet up with another engineer for lunch, which isn’t really part of the interview process.) They’ve even put up an example interview on YouTube:

Do not expect them to ask about your past projects, resume, etc. I saw a lot of complaining on glassdoor about this (mostly from people who didn’t get an offer).

They’re less interested in your specific background and accomplishments than your ability to solve the problems presented, which seems to offend a lot of people. Furthermore, everyone I met was super friendly, except for one interviewer who really didn’t seem interested in stepping away from work to interview someone. I’m told this is happens most frequently in phone interviews, though.

Generally speaking, the problems I was presented had a brute-force solution and an elegant solution or two. If you reach a working solution, they’ll likely ask a few cursory questions about Big-O notation or what data structure you’re using, then ask you to iterate on your code to meet additional requirements, consume fewer resources, or otherwise refine your solution. While they might appear to be tricky questions, they’re really not out to get you. The problems are very much in line with the TopCoder Division I problems, and I’m told that being comfortable with solving those types problems correlates with success at Google.

I was able to solve two of the problems relatively easily, had difficulty with the third, and did not reach a working solution for two other problems. You are not necessarily penalized for not reaching a solution, but it obviously helps. I’m told they’re more interested in your thought process and approach than getting a working solution.

Review Comp-sci fundamentals

You should be comfortable discussing the various types of sorting algorithms, BFS/DFS, tree and graph manipulation, etc. You will be expected to talk intelligently about Big-O notation and discuss the running time and space constraints of the algorithms you design. You should be able to digest the problem and find the most appropriate data structure (array, stack, linked list, graph, etc). I did not have any problems that involved crazy complex algorithms or cutting-edge research.

Data Structures

  • Stacks & Queues
  • Binary Trees
  • Trie-Trees
  • Graphs

Algorithms

  • Sorting
  • Tree insertion, manipulation, and search
  • Stack/Queue problems

Practice, practice, practice

Commit to doing at least one practice problem each day. You will be expected to do one interview in a compiled language (C++, Java, or Go), but are permitted to do the rest in a common language of your choosing (e.g., Python). I’d venture to guess that nearly all Google engineers are polyglots, and as long as you’re not using Lisp or Prolog or something, you should be fine. Talk with your recruiter, or attend the prep session for answers to specific questions like these.

What Libraries Are Permitted?

Neither myself nor my interviewers were aware of a specific list of libraries that are allowed, but I was permitted to use common Python and C++ libraries (bisect, std::vector, etc.), as long as they didn’t solve the problem outright (e.g., Python’s sorted() function). You are not expected to implement everything from scratch either – they want to see modern, idiomatic programming.

Example Google Interview Questions

The internet has some specific interview questions that others have asked, but obviously Google’s engineers aren’t dumb, and Google itself is uniquely aware of what content people are searching for. They routinely change up the questions, and I’m told their validated question pool is sufficiently large that you can’t study the test. That being said, the questions I’ve seen online accurately reflect the difficulty level of the problems I had, but my problems were 100% unrelated. Be able to apply the basics.

Summary

As someone without a CS degree, the questions that I had weren’t entirely outside my grasp. I could almost see the right solution, but wasn’t quite able to implement some of them. More preparation would have definitely helped.

Many of the questions were related to problems I’d solved while practicing. The best I can do in describing them is this: they’re standard comp-sci problems, with a twist. They’re close enough to standard problems that they’ll expect you to use the appropriate algorithms and/or data structures, but modified slightly so that you’ll have to actually understand what’s going on. Rote memorization of quicksort, mergesort, etc. won’t do.

Google SWE Interview Preparation

Find Your MySQL Username/Password in WordPress

If you need to manually manage your MySQL database associated with a WordPress installation, you’ll need to get the proper credentials first. Database connection information usually consists of:

  • Username (DB_USER)
  • Password (DB_PASSWORD)
  • Database name (DB_NAME)
  • Database host (DB_HOST)
  • Database port (WordPress assumes MySQL’s default port of 3306)

This information can be found in your wp-config.php. To show all lines of wp-config.php that have “DB_” in them, run the following command from the terminal:

grep -r 'DB_' wp-config.php
define('DB_NAME', 'wordpress');
define('DB_USER', 'username');
define('DB_PASSWORD', '********');
define('DB_HOST', 'localhost');
define('DB_CHARSET', 'utf8');
define('DB_COLLATE', '');

This information can now be used to log in to MySQL’s command-line interface:

mysql -u username -p

Leaving the “-p” parameter empty will trigger MySQL to prompt you for a password. On a *NIX server, it will look like you’re not typing anything — this is by design. While you may specify the password in the same line, this can leave your plaintext password in your command history, which is easily readable. If you want to use this format anyway (i.e., in a script), note that you cannot put a space between the “-p” flag and your password:

mysql -u username -ppassword

Once you’ve logged in, you can view available databases with the show databases; command. To use your wordpress database, take the value from DB_NAME (above) and use the use command: use wordpress;. To see available tables in the selected database, run show tables;.

Find Your MySQL Username/Password in WordPress

Enable Scrolling on Logitech TrackMan Marble Mouse (Linux Mint 17)

The TrackMan mouse has four (physical) buttons which include a large left and right button (1, 3), that serve as the primary mouse buttons, and two smaller left and right buttons (8, 9) that trigger your browser’s “back” and “forward” buttons. To replace this action with “Ctrl+Click” to scroll, insert the following lines in your ~/.bashrc (or anywhere else that can call some commands):

xinput set-button-map "Logitech USB Trackball" 1 2 3 4 5 6 7 8 9
xinput set-int-prop "Logitech USB Trackball" "Evdev Wheel Emulation Button" 8 8
xinput set-int-prop "Logitech USB Trackball" "Evdev Wheel Emulation" 8 1
xinput set-int-prop "Logitech USB Trackball" "Evdev Wheel Emulation Axes" 8 6 7 4 5
xinput set-int-prop "Logitech USB Trackball" "Evdev Wheel Emulation X Axis" 8 6
xinput set-int-prop "Logitech USB Trackball" "Evdev Drag Lock Buttons" 8 9

Then, run “source ~/.bashrc”, and you should be able to scroll by pressing the small left button and moving the trackball.

Enable Scrolling on Logitech TrackMan Marble Mouse (Linux Mint 17)

A look at RedStar OS 3.0 – North Korea’s Operating System

I recently stumbled upon a copy of RedStar OS, which appears to be a RHEL-based server distribution developed by North Korea. Version 2.5 was initially purchased and reviewed by a Russian student studying abroad, and a user by the name of slipstream uploaded version 3.0 (server) to TPB in mid-2014.

Several reports portray it as a tool to monitor web usage by the regime, and while I don’t doubt that, it seems unnecessary to repackage an operating system to do so. It seems more likely that it’s a symbol of sovereignty and independence from Windows (made in USA). Since North Korea’s internet is a giant class A network (10.76.1.0/22), any reporting software would likely try to report to an otherwise “internal” network. For example, the browser packaged with the OS has its homepage set to 10.76.1.11. A quick Wireshark analysis didn’t reveal anything immediately suspicious, but I’ve yet to dig into that fully.

On the surface, it’s a pretty hollow clone of RHEL using KDE desktop. The directory structure is a cross between OSX and *nix, as is the overall feel of the desktop environment. Applications

It comes with a couple of standard applications – a calculator, notepad, contact book, etc., as well as QuickTime and Naenera Browser (a Firefox clone). As Naenera (“my country”) is also the name of the official web portal, and that most citizens can’t access the “international internet”, the two might as well be synonymous.

You can see the public-facing Naenera at http://www.naenara.com.kp/en/, but be aware that they’ve been known to inject malware on some of their public-facing sites.

naenera

It’s also interesting to note there’s a CHM (compiled HTML) viewer. This is typically used for software documentation, and very well may be the case here. I’d be interested to see if this is utilized for something akin to Cuba’s Paquetes, downloading parts of the Kwangmyong, or something altogether different. (There is an empty “Sites” folder in the user’s home directory)

chm-viewr

There’s an OpenOffice clone, called Sogwang Office.

Sogwang Office Screenshot

It also has this music composition program, UnBangUI:

unbangui

The mail program doesn’t have any clear way to add an email account, but does prevent you from checking mail until you’ve added one.

email

The software center only allows importing from /media. There is a repository of extra applications that’s offered on a second CD (the Russian site says the extra CD costs about twice what the original OS costs), and I haven’t started to dig through that yet.

software-manager

In the “System Update” area, the Settings dialog shows a location for a URL and proxy, but I’m not sure it’s usable.

swmanager

Getting Root

Interestingly, the user isn’t added to sudoers and the root account is disabled. Fortunately, this is trivial to bypass, since someone “overlooked” the permissions in /etc/udev/rules.d. If you’re looking for a terminal shortcut, you won’t find it – you’ll have to press Alt+F2, then run konsole to get a shell.

That's convenient!
How convenient!

Once you’ve done that, fire up vi and create /tmp/freedom, or whatever you’d like to call it.

freedom

 

Now, open up that file in /etc/udev/rules.d and call /tmp/freedom via a RUN expression:

Don't forget to "chmod +x /tmp/freedom"
Don’t forget to “chmod +x /tmp/freedom”!

Now that that’s taken care of, you’ll need to refresh the udev rules. In VirtualBox, this worked simply by taking a snapshot, but you might have to reboot.

Enabling English on RedStar OS

Once you’re back up and running, you’ll likely want to enable a language other than Korean. While some reports state that Korean is the only language on the system, this isn’t true. It’s just that Korean is selected by default. Now that you have sudo superpowers, this can be done easily with sed: (obviously,for a language other than US English, use the appropriate locale code)

sed -i 's/ko_KP/en_US/g' /etc/sysconfig/i18n

sed -i 's/ko_KP/en_US/g' /usr/share/config/kdeglobals

Log out, and you should see the login screen in English:

afterlang

That’s it! You should now be able to browse around the OS relatively easily. I’ll post some interesting findings later, once I’ve had an opportunity to dig through it more.

 

A look at RedStar OS 3.0 – North Korea’s Operating System

Puppet: Error 400 on SERVER: undefined method `empty?’ for nil:NilClass

I received this error after making some changes to a Hiera config and the referenced “dev-server” role.

Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Error from DataBinding 'hiera' while looking up 'role::dev-server::use_ssl': undefined method `empty?' for nil:NilClass on node servername.local

Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run

It turns out this is a vague syntax error. Checking the following has worked for me:

  • Ensuring the syntax of your Hiera YAML or JSON file is correct. Check for trailing commas in JSON, or misplaced colons. (“foo:bar”, “foo::bar:”, “foo:::bar”, etc.)
  • The variable name is unique. In one case, “dev-server::use_ssl” was configuring a child resource with the same “use_ssl” property/param/variable.
  • There are no empty YAML or JSON files in your hieradata directory. I think I’ve had a similar issue with temp files (*~)
  • If you’ve modified your hiera.yaml to add a new hierarchy or something, restart Puppet.
Puppet: Error 400 on SERVER: undefined method `empty?’ for nil:NilClass

CryptoPHP – A WordPress backdoor in social.png

Summary

This is a series of posts on CryptoPHP, a PHP backdoor used for spamming and blackhat SEO. It seems to come bundled with certain copies of WordPress themes from unofficial sites and resides in a file named “social.png”. It comes installed with a list of email addresses and domains to contact and communicates with a C2 server using cURL and OpenSSL for encryption. Its main purpose appears to be to facilitate the display of links and other content, sent from the C2 server. When the script determines that a web crawler (e.g., GoogleBot), and not a real user, is viewing the site, it injects links to third-party sites in hopes of being indexed.

Symptoms

CryptoPHP communicates with external servers, requiring multiple external requests. You may see the following symptoms:

  • WordPress is slow to load, especially during the first pageview
  • Error messages in your server log, possibly due to failed requests.
  • Error messages from IDS/IPS or other security software (e.g., Suhosin) indicating that someone is making calls to exec and eval.

Discovery

A few days ago, I noticed that a WordPress installation was running extremely slowly. After enabling xhprof and profiling the index page, I noticed that a single method (RoQfzgyhgTpMgdUIktgNdYvKE) was taking around 160 seconds to run. The method name (others in the stack were similarly named) and the 23 calls to curl_exec came off as immediately suspicious. I used grep to search for the file and found it under the themes folder as images/social.png.

This file was included at the bottom of a theme file, causing it to be executed on each page load.

<?php include_once(‘images/social.png’); ?>

Opening social.png in a text editor reveals obfuscated and minified code. While it looks like a mess, it’s simply renamed variables and functions with whitespace removed, and can be undone rather easily with the “Find/Replace All” feature of your favorite text editor.

Obfuscated CryptoPHP

 

How to Remove CryptoPHP or social.png

In the limited tests that I’ve done, the offending file – social.png – is the only file that is malicious. It seems to be added to the images/ directory in themes downloaded from unofficial sources. Another line in the main theme files (index.php, header.php or footer.php) includes the file.

While nothing in the file itself indicates that personal or sensitive data is being transmitted back to the server, the file allows its controllers to send commands to it. These commands are then executed by the eval and exec commands in PHP. It is theoretically possible for content, account information, etc. to be transmitted back to the controlling server.

Since the WordPress instance I was using was running on localhost, it would have been unreachable by the controlling servers. It could still phone home and download commands, but could not be controlled directly.  However, due to the possibility of sensitive data being stolen, and the evidence of storing information in the database, I’d recommend a complete re-install of WordPress and changing your admin password(s).

Coming Soon

  • Encryption methods (including a script to decrypt database contents)
  • Detailed/technical review
CryptoPHP – A WordPress backdoor in social.png

What is Google foo.bar?

A week or two ago, the following popped up on my screen during a search for a Python-related topic:

You're speaking our language. Up for a challenge?

I had seen this before after our CTO got the same mysterious message a few months ago. We initially thought it was another one of Google’s Easter eggs, but a quick search revealed that everyone from HN and Reddit to Business Insider seems to think it’s a recruiting move by the search giant. (A similar program was rumored to be a search for cryptoanalyists, but turned out to be related to The Imitation Game, so who knows?)

Update: it is recruiting portal. Both of us were contacted by Google and interviewed on-site. The actual interview is under NDA, but I’ll post more about the interview process itself later.

The first time around, we discovered that replicating the query doesn’t necessarily trigger an invite, and visiting the URL without an invite doesn’t work. It was suggested that the invites are sent to a subset of users who have enabled search history. When I got the invite a week or two ago, I registered and then hit the “Back” button. The query string was preserved, so we tried an experiment: Is the invite based on a tagged query string, or the result of some back-end processing? After sending the URL to a couple of coworkers who had not received an invite after searching the same query, they tried accessing the URL directly. We learned two things:

  1. Both of them subsequently received an invite.
  2. One of them hit “refresh” as the animation began to show the box, and no invite was shown upon refresh. Opening the link in an Incognito window gave him a second chance.

The most likely scenario is that certain queries redirect to the results page with a query string, which triggers the message. Since neither of the other developers write lots of Python, but still got an invite after visiting the link, it’s likely that Google doesn’t validate invitee status. I doubt this is a simple oversight, and more likely indicates one of two things:

  1. Invitees are not on some sort of pre-selected list; and/or
  2. Google isn’t worried about additional invitees.

The latter was proven when the program displayed a “refer a friend” link. Assuming the recruitment theory is correct, it’s likely that Google is operating under the assumption that high-quality developers will refer other high-quality developers. I don’t know for sure, but this is probably a valid assumption.

To clarify some of the speculation, I was asked if I’d like a Google recruiter to contact me after completing the first six challenges.

Well, there goes that theory.

Others have asked Google directly about the program, and received a Python snippet that prints “glhf” in response – essentially “no comment”.

A Quick Tour

The pseudo-terminal responds to *nix commands like ls, cat and less and features its own editor. Listing the directory shows a textfile

Contents of start_here.txt
Contents of start_here.txt

The help menu offers several possible commands:

help

The levels consist of at least 5! challenges, split into 5 levels where each level n has challenges. Challenges fall into one of five categories, or tags.
Google Foobar Tags

Unfortunately, there has only been one crypto challenge available so far, and I haven’t been able to score a low_level challenge.  Most of the challenges I’ve completed so far involve one-off applications of computer science problems – like whiteboard interview questions with a twist. Additionally, there are constraints on execution time and memory use, which prevent some naive implementations from passing the test cases. This speaks to the needs of a company like Google who requires, or at least desires, efficient implementations rather than generic Algorithms 101 approaches.

I’ll be posting my solutions to GitHub shortly, along with some explanations here.

What is Google foo.bar?

Visual Binary File Analysis with Python

Update: Added a colorize function:

With colorize

Here’s a quick Python script to visualize binary data. In the grayscale example, each pixel is the color of the bit value (0x00 – 0xFF). The same method is used for colorization, except the bit value is used to provide hue and value values for HSV colorspace (saturation is fixed at 0.99).

The cols parameter is the width of the image to be generated (in pixels). By default, the script generates a couple of different sizes. The height is calculated based on the width. Patterns tend to be clearer when the column width is a multiple of 8 (16, 32, 64, 128…), though that could depend on the format and type of data in the file.

As an example, here are some images from a 256-byte file generated with the following Python program:

with open('foo.txt', 'wb') as fd:
    for i in range(256):
        fd.write(chr(i))

Bytes in range(0,256)

Usage

./process_dir.py <dirname> <cols>

The program will generate images for each of the binaries in the specified directory, create an “index.html” file and attempt to launch it in the browser.

PNG-ODT

The generated image on the left is from a PNG file. A dark patch in the beginning with a mostly-uniform distribution is consistent with file headers followed by image data.

The image to the right is an OpenOffice Writer file. The striped area indicates a repeating pattern of bytes, which often separates the metadata header and content in word processor files. The example screenshot shows an image generated from a compiled binary.

This can also be used to visually approximate the amount of entropy in a file. A high-entropy file would have a uniform byte distribution, thus occupying all of the available colorspace. I’ll include a histogram function later. This would show the frequency distribution of the bytes as well.

Compare the outputs of the following files:

  • An MP3 file
  • /dev/urandom
  • A TrueCrypt container (AES with RIPEMD-160)
  • A plain text file
MP3 File
MP3 File
urandom
Data from /dev/urandom
tc
TrueCrypt container
Textfile
Textfile

Visual Binary File Analysis with Python