Capturing Shell Output in R and Python

Sometimes I spend significant time in R or Python trying to do something which is trivial is bash. This is especially useful when I’m working with very large files that will take a long time to read in. Why read in an entire file to get the last line, when I could just use tail -n 1? Or if I want the line count, why read it in when wc -l will get the job done faster?

It turns out that it’s not too complicated to capture shell output in R or Python. Here’s how I do it.

Python

If you use Python 3, capturing shell output is pretty simple (if you’re still on Python 2, the tides are turning! It’s time to make the change!). You can use the subprocess module to get the output in bytes, then decode and parse it.

import subprocess

## Get the last line of the file 'fname'
last_line = subprocess.check_output("tail -n 1 " + fname, shell = True)
## convert to string and parse
## 'UTF-8' is a common encoding, but you may need to use something else
last_line = last_line.decode('UTF-8').strip()

R

R makes this process easy too. You may have used system() before to submit shell commands. It turns out that if you set the argument intern = TRUE, you’ll get the output as a character vector– you don’t even have to deal with encoding! The output may take some parsing, but the stringr package is good for that.

require(stringr)
## Get the last line of the file 'fname'
lastLine = system(stringr::str_c("tail -n 1 ", fname), intern = TRUE)
## strip leading/trailing whitespace
lastLine = stringr::str_trim(lastLine)

This has saved me from reinventing the wheel many times since I learned it. Hopefully it helps you too!

Done At: Mar 31,2016

Posted with : R, Python

Categories: