Part 2 Caught in a Web Scraping Maze: xgoogle Python module

During my investigation for web scraping methods in Python, I came across this Stackflow discussion that used the Python module, xgoogle, to scrape Google search results while also building in a wait time between searches. It looked like a promising method, so I tried it out.

Unfortunately, as is the case with many young programmers, at every step I ran into trouble.

trouble

First, I tried to import the module into IPython. That was silly; I hadn’t installed it yet! So then I tried to install it using commands in the Terminal (similar to what I would do in R). But unlike R (it seems), Python has no way of knowing where to pull the installation files or what I’m asking it. So it kept telling me, “Sorry I have no idea what you’re trying to do.” Eventually I did manage to set it up (described later in this post), but first I have to share something very strange that happened.

In the process of trying to install xgoogle, I noticed something in my Terminal that horrified me. Instead of the Terminal reading my Home Directory as “Alyssa-Fus-Macbook-Air” it was reading as “jims-iphone”. What the!? Who in the world is this Jim, and what is his iPhone doing on my computer!?

Jim's iPhone infiltrating my laptop

Jim’s iPhone infiltrating my laptop

I’ll admit, I panicked. I’m very new to working in the Terminal and I was convinced that somehow I had been hacked. But like any calm and composed programmer, I swallowed my fear and turned to the one thing that could save my computer from inevitable doom: Google.

After starting with several vague and terrified search terms (“WHO IS INVADING MY COMPUTER AND HOW DO I STOP THEM?! …just kidding I didn’t search anything that histrionic), I finally whittled down my search to something specific: “terminal name different from user name”.

The links I found helped me to investigate the problem and in the end, solve it. First I looked into how far this “name change” went. Was I somehow accessing Jim’s iPhone or was this a superficial name change and I was still accessing my own computer and files? So I changed the directory to my Desktop, and I checked what was in it. (I suppose I could have just looked in the directory itself, but I wanted to see what would happen if I went “deeper” into “Jim’s iPhone”). This helped me confirm that though my laptop was taken over by the evil iPhone of Jim, at least everything seemed to be where it was supposed to be.

Jim can take my laptop's name, but he can't take my laptop's contents!

Jim can take my laptop’s name, but he can’t take my laptop’s contents!

So then I checked to see if my computer name was still the same, or if Jim had taken that too.

Sorry Jimmy Jim, you can take my name, but not my name name!

Sorry Jimmy Jim, you can take my name, but not my name name!

Okay, so I’m starting to calm down. Then I found this article discussing this problem, and I focused on the second response about computer sharing. I looked at my Sharing Preferences and was shocked to see Jim had infiltrated at this level.

Jim just wants to share

Jim just wants to share

Why, Jim, why?! What did I ever do to you…

So at this point I’m wondering, when I installed all those pandas and numpy modules and anaconda whatsits did I accidentally download something that changed my HostName to Jim? Or maybe since I’m always connecting to foreign networks (in cafes, in the library, in hotels), is that where I picked up a little friend named Jim?

This result suggests the latter explanation is most likely. “In short, the Mac will pick up a host name from the DHCP server. This does not affect your computer’s name as you have assigned it. This will only affect what you see at the command prompt.” Ah haaaa that’s plausible and no cause for alarm (I’m sure). (What’s the purpose of this though?)

And then this result gave a suggestion for how to change the HostName. I have read several cautionary tales of using “sudo” which makes changes directly to the OS (I think), but this seemed harmless enough. So I ran the command, and once I restarted my Terminal, everything was right as rain. Whew!

sudo scutil --set HostName Alyssa-Fus-Macbook-Air

sudo scutil –set HostName Alyssa-Fus-Macbook-Air

Right as rain

Right as rain

Alright! Now that that snafu has been fixed, let’s return to the real task at hand: installing xgoogle.

Eventually after several failed attempts, I found this Stackflow answer and used it to successfully install the module. I just had to download all the scripts from the xgoogle Github page, put the folder in my directory, change my directory to that folder, then run the script. And it worked beautifully!

Installing xgoogle

Installing xgoogle

Alright alright alright! Let’s put this puppy into action.

I ran the script straight (without the wait time) to see what I would get.

GoogleSearch results

GoogleSearch results

Unfortunately what I got were… 0 results. 0 results! How is that possible?

This response on Github suggested it was a bug and needed a patch available here as well as in the comments. …but I’ll admit, I had no idea how to apply the patch. I decided to run the code straight, because I figured this would replace the old function, and then I could rerun my code. But the patch code kept getting stuck.

When I copy-pasted from the comments (made by the creator of xgoogle himself), I got an Indented Block error:

Indented block error

Indented block error

So then I tried the RAW Paste Data from the pastebin site linked above:

Surprise! Unexpected indent

Surprise! Unexpected indent

Another indent error. I tried pasting them into TextWrangler to double-check the indents and reran the code. This time, there was no error for the patch code (omg, maybe it’s working?!) — I held my breath and ran the GoogleSearch script again…. still 0 results. Bah. Dejection…

PS. I just checked my afp address and now it looks like a perfectly normal random set of numbers. Whew! Good-bye Jim’s iPhone! It was fun, but I’m happy you are no longer on my computer.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s