Python is a simple and powerful language that can be used for data analysis, which is why it is one of the most important tools a data scientist should know. The primary statistical and plotting libraries include numpy, pandas, scikit, and matplotlib. (I’m excited to learn pandas; it sounds cute. ^^ Oh and extremely useful for statistical analysis, of course.)
When I first started learning Python, it was through the Code Academy course. While the language seemed pretty easy to pick up, I was frustrated because I wasn’t sure how it connected to data analysis.
So then I picked up the book, Python for Data Analysis book, by Wes McKinney. I had good luck with Practical Data Science with R, so I was optimistic about this book. These books, however, rely on some previous knowledge of the language, which, in the case of Python, I had very little. For example, from the start, the book discussed how data frames in Python are similar to lists of dicts and tuples, but I had no idea what tuples were (it sounded like a mix of triples and doubles to me, what does that even mean!?), and I only vaguely remembered what dicts were. Even the Appendix that covered Python basics at the end of the book was a little beyond me.
Then I found the Google Python class. I watched the first day videos and some of the second day videos. From the very beginning, the instructor was using dicts and tuples, and now I finally understand what they are! (Tuples are immutable lists created with parentheses; dicts are made up of key-value pairs.)
I also learned that Python is already installed on the Mac OS and can be easily accessed from the Terminal. (In the Python book, they start with installing XCode for Mac, and every version I tried to install was incompatible with my OS, Mountain Lion. In the end I did install XCode command line tools, which I think has worked? I discuss this a little later in the post.)
I also downloaded some free books on Python, including A Byte of Python, Non-Programmer’s Tutorial for Python 2.6, and Python Programming. It was useful to read different approaches to Python. A Byte to Python is more prose-y, but I was dissatisfied with its explanations (they would define the terms using the same words as the terms. For example, they say a literal constant is “called a literal because it is literal – you use its value literally.” O_O
I liked the Python Programming book better. It was more matter of fact and just went through different terms and ideas one at a time. It almost felt like a more detailed dictionary, but I liked it.
After trying the Google class and reading through the beginning of these books, terms and concepts like lists, dicts, tuples, indexes, object calls started to take form in my head. In fact, I returned to the Python book a few days ago, and when I reread the introductory examples, they actually made sense! I was over the moon. It was actually an amazing experience to go from reading text that seemed like gibberish to find that when I revisited them, I could understand their meaning and picture what the authors were describing in Python. It was awesome!
Armed with this new knowledge, I turned to installing the libraries pandas and numpy. I went through several suggestions for installing these packages, such as using Anaconda and/or Miniconda, suggested here. It didn’t seem to work though.
Eventually I think what worked was installing the XCode command line tools (not sure where it went on my computer), and then installing pandas with a command to the Terminal. At first I thought it wasn’t working because the test the Python book recommended resulted in an error (to create a plot). It turned out I had just spelled the command wrong (arange, not arrange, sigh). Once I fixed that error, it worked! I got a plot! It was a very exciting day (though I can’t wait until I can actually plot my real data).
After installing the libraries, I looked into setting up my Python environment. IPython was useful, but I was doing everything in the Terminal. I wanted a setup where I could record my scripts and run line by line.
I tried using PyCharm, but I was having trouble getting it to work with IPython. I tried running to Terminal from Text Wrangler, but it wasn’t connecting. I revisited the Python book and saw they recommended going from a text editor to the Terminal by copy-pasting. They had some tips about how to paste whole blocks of text (if you just paste multiple lines straight, they enter the Terminal line by line). This is now the method I’m using.
So things are working out pretty well! I’m revisiting the Python course in Code Academy to remind myself of what they taught. I’m kind of amazed at how much they cover what I am currently learning. And now I can put what they are teaching into context and how I would use it for data analysis. It’s cool!
So I feel like I’ve gotten a foot in the door and that I’m setting up the foundation for figuring out what I can do in Python. I’ve been doing a mix of studying, learning, reading, Googling, and just trying stuff out. It’s been fun! Check out my future posts for updates on the specific tasks I’ve been doing/learning!