Automated Data Collection with R

The Book

Book cover The rapid growth of the World Wide Web has opened many opportunities in collecting, sharing and publishing data of all kinds. This book shows how to collect and post-process this data with the most popular and easy to use statistical programming language R. It provides a hands-on guide to web scraping and text mining for both beginners and experienced users, featuring examples throughout that explain each of the techniques presented. Fundamental concepts of the main architecture of the Web and databases are discussed along with coverage of HTTP, HTML, XML, JSON, JavaScript and SQL.

  • Presents a practical guide to web scraping and text mining for both beginners and experienced users of R.
  • Explores basic techniques to query web documents and data sets (XPath and regular expressions) as well as technologies to gather information from dynamic HTML.
  • Demonstrates how to connect to web services/web APIs and collect data in a regular manner.
  • Provides a practical perspective on the workflow of data scraping and managment - from choosing the right method to optimizing code and maintaining scrapers
  • Features case studies throughout along with examples for each technique presented.
  • Provides a multitude of exercises to guide the reader through each technique.
  • R code and answers to questions posed in the text featured on this website.

Available from: Wiley, Amazon (in English), Amazon (in Chinese)

The Blog

Visit our blog to read more about all things data collection with R. We post regularly on text manipulation, databases, Web technologies, Web Scraping, and further topics. In R, of course. We also tweet regularly under RDataCollection on a wide spectrum of R and data collection specific topics.