Merging Data in R and the Power of a List

Okay, last post of the day, but I wanted to document a really cool breakthrough I made today in my understanding of lists in R.

In another one of my datasets, I assessed people’s recollection of situations in which their parents told them what to do. I collected data last quarter and this quarter, so I have two datasets for the same study (I like to start with a fresh study/data collection by quarter because sometimes it’s good to account for which quarter you ran the study, and it’s easier for me to keep track of the progress of the study when I focus on one quarter at a time).

First, I set my working directory, then I created variables for the names of each of my files.

setwd() and data file name

setwd() and data file name

Then I created a for loop to read each dataset into R and check their length (number of columns).

for loop - Read in data and print length

for loop – Read in data and print length

Next I wanted to name each dataset based on their variable name. As you can see in the example above, when I call “i”, it doesn’t call “fall14” or “winter15”; it calls the element that is stored in that variable.

So I played around with some ways to call the variable name, but what I eventually discovered from this link is that I can use the list function to name my variables within the list, and that will allow me to be able to call the name from the list (seems similar to a dictionary, eh?).

for loop - names(datasets())

for loop – names(datasets())

And in fact, I can set up my list to name the variable the name I had assigned it previously, list(fall14 = fall14, winter15 = winter15), and obtain the same result. I like the idea of doing this because I would rather define my variables outside of the list of variables where it’s easier to track the variable names (and looks cleaner). The funny thing is, I’ve used this exact call in the past when creating a new dataframe for a different project, but at the time I didn’t realize what I was doing. I love learning new things (or relearning old things/realizing how to understand things I thought I knew?)!

list(fall14 = fall14, winter15 = winter15)

list(fall14 = fall14, winter15 = winter15)

After that it was easy enough to merge the data. I won’t go over the process in detail here, but I provide my full script below (I’m starting to feel like this could be a good time to start making use of that GitHub thing). Basically I create a new variable to label which dataset is which and a variable to distinguish between rows in the data frame that are data and the first row in the data frame being a second variable row that is exported in Qualtrics. I rearrange the order of the data so my new variables are the first columns of the data frame. And then finally, I assign the new data frame a name based on the name in the list (“fall14” and “winter15”), but I added “data” after it so that it doesn’t overwrite the original variable.

Full for loop - Creating mergeable datasets

Full for loop – Creating mergeable datasets

I checked to make sure there was no difference in their column names, just in case, then merged them together and checked what the data looked like.

setdiff for discrepancy in variable names and rbind to merge rows

setdiff for discrepancy in variable names and rbind to merge rows

Whew! And that’s just the first step! After this I have to clean the data, and then the real fun happens: analyzing the data for trends and testing hypotheses. Who’s ready for a good time?


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s