This is an exciting post! In my last post, I described how I took a magical journey through a Word Press blog and found all these little gems along the way. In this post, I’m going to elaborate on one of the gems that I found: rvest and web scraping.
In the first post I mentioned in my last post, the author walked through an example of using the R package, rvest, to scrape data from the Lego Movie IMDB site.
I thought, “Well hey! I have a web scraping project I am working on. This looks really useful! I must learn more…”
So to start, I tried to install the rvest package in R. But I got an error: “package ‘rvest’ is available as a source package but not as a binary” “package ‘rvest’ is not available (for R version 3.0.2)”. Aw man!
So I googled the problem, which led me to the GitHub of the creator of rvest, Hadley Wickham. He’s young! And looks to be a contributor (author? I’m still trying to understand how to read the different roles in GitHub) of many major R packages including ones I’ve used like dplyr and ggplot2. It’s so cool to have such easy access to all these great people (and content!).
From there I installed the package directly from the GitHub site. I had to first install the devtools package (oooh what other goodies are in this package? I can be a developer!), then I could use the install_github function to install rvest from GitHub.
And it worked! Sweet! Now to test it…