Automated Data Collection with R Blog

A considerable share of Twitter accounts is not actually run by humans. According to a recent release by Twitter, `up to approximately 8.5%' of the active users are bots or third-party software that automatically aggregates tweets. Bots can follow other users, retweet content or post content on their own. What they say is essentially generated by scripts.

Take @TwoHeadlines, for example. The bot, hosted by Darius Kazemi, scrapes headlines from Google News and replaces one of the nouns with another trending noun, which generates hilarious and sometimes thought-provoking tweets. Twitter bots can be more than just gadgets for nerds: A notable example is the congress-edits bot that tracks and posts modifications on Wikipedia which are made from IP addresses located inside the US Congress. It is fascinating to see how programmers use their creativity to repurpose Twitter's range and popularity.

If you are familiar with R, such projects are well within your reach. In this post, I give a little demonstration of how to program your own Twitter bot using R. The goal is to create a nerve-racking bot that reminds PhD students of their primary duty, that is to work on the dissertation. You can check out the results here.

Step 1: Create content for the bot's tweets.

The PhD whipping bot is inspired by @indiewhipbot, an equally tedious contemporary who pushes indie game developers back to work by shouting orders and closing with a mild insult. For my bot, I start by setting up three little databases stored in XLS sheets. You can find all of them at the end of this post. The first stores (de-)motivating phrases:

library(XLConnect)
shoutings <- readWorksheet(loadWorkbook( "phdwhipbot-shoutings.xlsx"), sheet=1, header=F, simplify=T)
shoutings[1:5]
## [1] "why aren't you working?"                                    
## [2] "i've got a phd in professional whipping. what have you got?"
## [3] "why are you checking Twitter again?"                        
## [4] "let me whip you back to your table."                        
## [5] "procrastinating again?" 

The second and third database contain a list of animal names that were scraped from Wikipedia and a short list of negative attributes:

animals <- readWorksheet(loadWorkbook("phdwhipbot-animals.xlsx"),    sheet=1, header=F, simplify=T)
attribs <- readWorksheet(loadWorkbook("phdwhipbot-attributes.xlsx"), sheet=1, header=F, simplify=T)
sample(animals, 5)
## [1] "Ostrich" "Ape"     "Pelican" "Dove"    "Okapi" 
sample(attribs, 5)
[1] "gullible"    "aloof"       "impatient"   "quarrelsome" "finicky"    

I use them to add some random mockery to the bot's shoutings. Pasting it together into one random tweet (everything in capital letters for an extra pinch of annoyance) works as follows:

library(stringr)
toupper(str_c(sample(shoutings, 1), " ", sample(attributs, 1), " ", sample(animals, 1), "."))
## [1] "THE THESIS DOESN'T WRITE ITSELF. FUSSY ORYX."

Motivating, indeed.

Step 2: Get connected to Twitter.

In order to tweet these random whiplashes using R, we first register a new app on apps.twitter.com for OAuth credentials, which we then use to log onto our Twitter account with R using the twitteR package. We have elaborated on this procedure in more detail in a previous post. In short, we load the twitteR package and connect to Twitter's REST API via OAuth, using credentials previously stored in the environment, that is the .Renviron file stored in your home directory which you can locate by entering normalizePath("~/") in the console:

devtools::install_github("geoffjentry/twitteR")
library(twitteR)
api_key             <- Sys.getenv("twitter_api_key")
api_secret          <- Sys.getenv("twitter_api_secret")
access_token        <- Sys.getenv("twitter_access_token")
access_token_secret <- Sys.getenv("twitter_access_token_secret")
setup_twitter_oauth(api_key, api_secret, access_token, access_token_secret)

Now, we can let the bot tweet a random entry with the tweet() function:

tweettxt <- toupper(str_c( sample(shoutings, 1), " ", 
                           sample(attributs, 1), " ", 
                           sample(animals,   1), "."     ))
tweet(tweettxt)
## [1] phdwhipbot: "SLACKING OFF AGAIN? NARROW-MINDED CHINCHILLA."

Step 3: Let your bot tweet on a regular basis.

Naturally, it would be cumbersome if we had to operate the bot manually. Fortunately, there are means to execute an R script on a regular basis without manual input. As I happen to use a Windows machine as running server, I demonstrate how to schedule R tasks in Windows. We illustrate how to do this on a Linux/Mac OS machine in our book.

On Windows platforms, the Windows Task Scheduler is the native tool for scheduling tasks. You'll find the Scheduler (on Windows 8) by right-clicking on Start > Computer Management > Task Scheduler. To set up a new task, click on Create Task. We are presented with a window with five tabs – General, Triggers, Actions, Conditions, and Settings. Under General we can provide a name for the task. Here I insert R PhD Whipping Bot for a descriptive title. In the field Triggers, we can add several triggers for starting the task. There are schedule triggers which start the task every day, week, or month and also triggers that refer to events like the startup of the computer or when it is in idle mode, and many more. After having set an execution interval, we should make sure that the start date and time of our task is placed in the future when we are done specifying the schedule.

Next, we have to tell the Scheduler what to execute at the specified time. This is defined in the Actions tab. We choose Start a program for action and use the browse button to select the destination of Rscript.exe, which should be placed under, e.g., C:\Program Files\R\R-3.1.2\bin\x64\. Next, we add phdwhipbot.r in the Add arguments field and type in the directory where the script is placed in the Start in field.

If you want to log all bot tweets in one common file, you can do so by adding the following to the script:

line <- paste(as.character(Sys.time()), tweettxt, sep="\t")
write(line, file="tweets.log", append=TRUE)

Done! I hope this little bot helps you to stay on track. You can find the full R script as well as the related data here.

P.S.: I recently discovered the Bot Weekly newsletter, a wonderful entry point into bot-land.

P.P.S.: If you've created your own Twitter bot with R, I'd love to hear about it! Feel free to share them with me. Or me. COWARDLY ELK.