Working with Nested Dictionaries in Python

In my first few posts, I described how to pull data from an API, convert JSON data for Python, and combine data into a table. The data I used was basic (short) user data from League of Legends including summoner ID, name, and profile icon. Very simple and not all that interesting.

Now I want to pull real game data to analyze trends in game play (and what predicts a win!).

First, I went to LoL’s API Reference page and selected the match history: https://developer.riotgames.com/api/methods#!/966/3312

I retrieved summoner ID to enter into the summonerId field (luckily I had a handy dandy table where my IDs were nicely listed). This set of data gives you the last 10 matches the person has played, and game data associated with those games like number of assists, champion level achieved, number of deaths, which items were bought, damage taken. Everything!

When you execute the request, you get a page that looks like this (remember, you have to insert your own API key into the URL):

Match History from API

Match History from API

More JSON data! With much more complicated nested dictionaries…

I imported the data into Python (using the same steps I mentioned in my last post) and tried to use the same DataFrame call. This is the result I got:

pd.DataFrame(data)

pd.DataFrame(data)

O.O All the keys within “matches” are in a single column instead of distributed across columns.

I found a JSON normalize function that seemed to do partly what I wanted (reference the web page discussing this option here), which got me to this:

json_normalize(data['matches'])

json_normalize(data[‘matches’])

Close, but no cigar. Some of the columns still contain nested data.

Here was another promising suggestion: http://pandas.pydata.org/pandas-docs/stable/io.html#normalization. But my different attempts still didn’t work. (I’ll have to figure out why later.)

json_normalize(data, [‘matches’, ‘participantIdentities’]) got me the data within participantId, but the player data was still a nested dictionary.

json_normalize(data, ['matches', 'participantIdentities'])

json_normalize(data, [‘matches’, ‘participantIdentities’])

json_normalize(data, ‘matches’, [‘matches’, participantIdentities’]) generated an error, even though as far as I can tell, it matches the example.

json_normalize(data, 'matches', ['matches', 'participantIdentities'])

json_normalize(data, ‘matches’, [‘matches’, ‘participantIdentities’])

json_normalize(data, [‘matches’, ‘participantIdentities’, [‘player’]]) got me the indexes I needed for the data, but no data!

json_normalize(data, ['matches', 'participantIdentities', ['player']])

json_normalize(data, [‘matches’, ‘participantIdentities’, [‘player’]])

So then I decided to take a different approach and figure out how to even call the nested data. This method actually generated something usable!

First I tried to call the ‘participants’ value/key from the ‘matches’ key.

data['matches']['participants']

data[‘matches’][‘participants’]

Nope! Error! It doesn’t like that I used a string (even though I’ve seen plenty of examples where it looked as though they called the data through the variable name, and it worked fine. For example, in the post from the link above, the person wrote:

for result in data['results']:
    result[u'lat']=result[u'location'][u'lat']
    result[u'lng']=result[u'location'][u'lng']
    del result[u'location']

which made me think I could call data[‘match’][‘participants’]).

But then I started to think more about what this for loop was doing and what it was looping over: the indexes within data[‘results’] (which is also what the error, “list indices must be integers, not str”, would suggest. Also I just noticed that the output from data[‘matches’] lists integer indexes, so that should have been a clear giveaway for how I should have called the data. Silly me!). So then I tried data[‘matches’][0].

data['matches'][0]

data[‘matches’][0]

Hoorah! It seems to have worked! It called the details from the first match. I swear, when I tried calling the first index from data[‘matches’] before, it didn’t work, but maybe I was trying data[0] instead.

data[0]

data[0] – But why doesn’t this work? Add this to the list of things to figure out later!

So then I kept playing around with the call and eventually got this:

data['matches'][0]['participants'][0]['stats']

data[‘matches’][0][‘participants’][0][‘stats’]

and this:

data['matches'][0]['participants'][0]['stats']['assists']

data[‘matches’][0][‘participants’][0][‘stats’][‘assists’]

Sweet!! I’m learning something new about calling data from dictionaries and nested dictionaries. This is pretty kickass awesome!

I also figured out one metric I could use to indicate whether I need to call the data through the index versus or through the key from the dictionary.

When I call data[‘matches’], this is what I get:

data['matches']

data[‘matches’]

Here the output starts with a square bracket, [, compared to output for data[‘matches’][0][‘participants’][0][‘stats’] (pictured above), which starts with a curly brace, {. I’m not sure exactly what that means… maybe the square brackets means the object is an array and therefore needs to be called by index, whereas [‘stats’] is a straight dictionary so the specific keys can be called. Either way, [‘participants’] also starts with a square bracket, [, so I used a [0] to call that dictionary, even though it was the only index in [‘participants’]. EDIT: I reached the part in the Python Codecademy course where they call a list-value from a key, and they used the index. So here, I called it an array, but I could have (should have?) called it a list. Now I’m not sure if it’s an array, but either way I’m excited that I figured something out that I’m also learning!

Okay! So now I have a way of pulling data from the nested dictionary, ‘matches’. After that it was smooth sailing to create a for loop to combine match history data together (at this point, the stats).

Creating data frame

Creating data frame

Awesome!! At this point I’m going to stop here because I have other things I need to work on, but it’s not a bad start! I need to add to the match ID to this data frame as well as all the other match data in ‘matches’. It shouldn’t be too hard to do, but I’ll have to play around with it another time.

It’s funny, when I look back at what I wrote, it seems completely obvious about what I should have done, but it wasn’t at the time. I hope I don’t sound too noob and incompetent (I’m sure half the terms I use aren’t right), but for now I’m happy with what I’ve accomplished!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s