Category Archives: Information

Linear Mixed Effects Analyses Tutorial (in R)

One of my datasets requires mixed models linear regression analyses, so I was reading up on exactly how the analyses are done and what they mean. Found this useful-looking tutorial that walks through several examples of the mixed effects, as well as how to do it in R.

Here’s a graph of individual subjects, grouped by gender, and the distribution of their voice pitch.

Pitch of male and female voices

Pitch of male and female voices

To take into account the individual variation in each subject’s voice pitch, run pitch ~ politeness + sex + (1 | subject) + error, where (1 | subject) indicates the assumption that the intercept is different for each subject.

PS. I love box plots!

Creating HTML5 Slides in R Markdown

OMG. My mind is literally exploding right now. This is starting to become my default state. (I like it.)

I am searching how to change the font size in the HTML output file from an R Markdown, knit to html, when I came across this page by Yihui (a name that is becoming extremely familiar as I spend more and more time on Stackflow).

How to Make HTML5 Slides in R Markdown

How to Make HTML5 Slides in R Markdown

He basically walks through how to create HTML5 slides in R Markdown. They are BEAUTIFUL. I am so floored.

Here is a link to the slides. Just use your left and right arrow keys to navigate through them.

HTML5 slides through R Markdown

HTML5 slides through R Markdown

I would love love love to be able to learn how to do this, but I think for my kind of presentations, it wouldn’t make sense to make them in HTML5. It’s a shame. It would be so mind-blowingly cool!!

Now to return to my current problem… how to change this pesky font size…

Resizing plots in R Markdown

I made a lot of progress on one of my datasets today. It’s a 2 x 2 x 2 study, so it requires a fair amount of thinking in what the best way is to plot the data.

Lately I have been writing up my code in an R script, then when I’m happy with it, I plug it into R Markdown so I can see all the graphs at once.

When I plotted my 3-way interaction graphs, the group labels on the x-axis squished together because the default plot size was too small. So I looked up how to change the plot size in R Markdown and found this useful stackflow response.

Plot size in R Markdown

Plot size in R Markdown

So I tried it and voila! My plots look beautiful in R Markdown.

Plots in R Markdown

Plots in R Markdown

K-Means Clustering

I want to use k-means clustering for one of my studies, so in this post, I gather useful-looking links to learn how to do it!

EDIT: I made pretty good progress on my k-means clustering! Here’s a little preview to give you an idea of what I found:

kmeans clustering

kmeans clustering

Useful Information on K-Means Clustering

https://stat.ethz.ch/R-manual/R-devel/library/stats/html/kmeans.html
R documentation for kmeans

kmeans {stats}

kmeans {stats}

http://www.r-bloggers.com/k-means-clustering-is-not-a-free-lunch/
When k-means may not work but how to work around it

K-means clustering is not a free lunch

K-means clustering is not a free lunch

http://www.rdatamining.com/examples/kmeans-clustering
Simple, easy example to follow for how to use k-means clustering

k-means Clustering

k-means Clustering

http://www.r-statistics.com/2013/08/k-means-clustering-from-r-in-action/
How to determine number of clusters

K-means Clustering

K-means Clustering

http://www.statmethods.net/advstats/cluster.html
Simple reference for how to k-means cluster

Cluster Analysis

Cluster Analysis

https://rstudio-pubs-static.s3.amazonaws.com/33876_1d7794d9a86647ca90c4f182df93f0e8.html
Walks through several examples

Cluster Analysis in R

Cluster Analysis in R

http://www.improvedoutcomes.com/docs/WebSiteDocs/Clustering/K-Means_Clustering_Overview.htm
To the point overview of clustering: Pros and cons

Overview of Clustering

Overview of Clustering

http://www.norusis.com/pdf/SPC_v13.pdf
Chapter on kmeans clustering – Useful discussion on determining variables

Cluster Analysis Chapter

Cluster Analysis Chapter

http://stats.stackexchange.com/questions/31083/how-to-produce-a-pretty-plot-of-the-results-of-k-means-cluster-analysis
Plotting pairwise scatterplots of clusters

Pairwise scatter plots of clusters

Pairwise scatter plots of clusters

From Cricket to Down the Rabbit Hole

I saw this blog post in my Word Press Reader and was intrigued. The post itself ended up being about web scraping, which is one of my current goals to learn how to do! (Check out a future post for where those adventures took me.) But where did cricket come in?

Cricket in R

Cricket in R

I followed the next blog post on this post on R for details on this R analysis of cricket, which ended in an interesting graph of something to do with cricket…

Cricket Graph

Cricket Graph

But what was really interesting was the next blog post on that blog post.

50 R functions to clear a basic interview

50 R functions to clear a basic interview

Bingo! Jackpot. Just what I need as an R user trying to get a job using R. And the applicant in the picture is a woman! How about that.

On another interesting note, the person who wrote this article is my one follower. O.O Maybe I should change my Word Press name back to datasciencefu…

Anyway, you never know what you’ll find when you play the Word Press tag. I learned about a new package in R to scrape web data and a useful resource for what to know about R in an interview! Not a bad start for a Thursday.

PS. I realized now that all the blog posts were written by that person, so I think I was just following his posts. Clearly I’m still learning new things about this newfangled Word Press thing.

GitHub for Beginners: Don’t Get Scared, Get Started

Really useful article on what GitHub is, how it can be used, and why any “knowledge worker” would benefit. Here are some highlights:

GitHub as a Social Network Tool

GitHub allows you to be connected to other people who are working on programming projects. Instead of sharing pictures of your cat, you’re sharing projects and code.

GitHub Claims No Property Rights Over Your Material

Here, you can read it for yourself: “We claim no intellectual property rights over the material you provide to the Service. Your profile and materials uploaded remain yours.” Cool!

Um, and that’s basically it! The rest of the article goes on to explain the details of setting up your GitHub site, so I’ll probably check that out another time. For now, the shift from thinking of GitHub as a site to store code to a social networking site actually helps me picture better how I can use GitHub. What can I start posting already?

Stanford Topic Modeling Tool Box

Last quarter, I started learning how to use topic modeling to analyze open-ended data (or unstructured data in data science terms). Basically data where participants are able to say anything they want in response to a prompt I give them.

Another time I’ll write a post about what I did at that time, but first I wanted to note something new I learned related to topic modeling today!

Today I wanted to revisit my data with topic modeling and in the process learn more about topic modeling. So far I’ve used topic modeling in a narrow sense–how to use it in R. Today I wanted to take a step back to understand what topic modeling is doing theoretically.

One of the first results that popped up for “topic modeling” was actually a link by Stanford! So of course I had to check it out.

It’s really useful! They use scala, so I want to see if I can take their advice and apply it to R (or Python?). But I really liked their explanations and tutorials. They discuss how to test the model and how to modify the parameters to create a better model. It’s neat! I want to try this tomorrow!

I realize now this wasn’t a very useful blog post, so let’s think of it as being a post-it note for useful information that will be revisited in the near future.

Okay fineee, here’s the real reason why I wanted to make this post, to discuss this point:

Model Convergence

Model Convergence

At the time I read this point, I thought it was about refining the number of topics, but now I realize it’s not about topics, but iterations on the data. So I was super excited by this piece of advice, but now my excitement level has dropped about two notches. Still useful, but not what I was looking for.

However, later on, the tutorial does mention refining number of topics, but their sentence gets cut off:

Number of topics

Number of topics

“…has started to decrease at a” …a what?! a what?! Don’t leave me hanging here…

Sorry folks, you’ll just have to be left hanging! If I find out more info, I will update this post or make a new one. Until next time!