Author Archives: Alyssa Fu Ward

About Alyssa Fu Ward

I am a Social Psychology PhD Candidate who is interested in leveraging my data analysis skills in a data science world. As I have been working to gain more skills, I want to document and share what I have been learning!

Mixed Models and R

Check out this webpage for a thorough overview of running mixed models in R. I wanted to pull out a few pieces of information from this article that I found useful. (If you aren’t familiar with mixed models, the following may not be too meaningful for you.)

Nested vs. Crossed Random Effects

“Before you proceed, you will also want to think about the structure of your random effects. Are your random effects nested or crossed? In the case of my study, the random effects are nested, because each observer recorded a certain number of trials, and no two observers recorded the same trial, so here Test.ID is nested within Observer. But say I had collected wasps that clustered into five different genetic lineages. The ‘genetics’ random effect would have nothing to do with observer or arena; it would be orthogonal to these other two random effects. Therefore this random effect would be crossed to the others.”

Identifying the Probability Distribution that Fits the Data

The author of the page plotted the data along various types of distributions (e.g., binomial, Poisson, gamma, log-normal).

“The y axis represents the observations and the x axis represents the quantiles modeled by the distribution. The solid red line represents a perfect distribution fit and the dashed red lines are the confidence intervals of the perfect distribution fit. You want to pick the distribution for which the largest number of observations falls between the dashed lines. In this case, that’s the lognormal distribution, in which only one observation falls outside the dashed lines. Now, armed with the knowledge of which probability distribution fits best, I can try fitting a model.”

Failure to Converge

I often encountered the error “failure to converge” when running mixed models. This article describes what now seems like an obvious way to deal with the failure to converge – systematically drop effects from the model and compare the performance. I am appreciative of how much I’ve learned and grown in my statistics knowledge because of my exposure to data science over the last year and a half.

“There is one complication you might face when fitting a linear mixed model. R may throw you a “failure to converge” error, which usually is phrased “iteration limit reached without convergence.” That means your model has too many factors and not a big enough sample size, and cannot be fit. Unfortunately, I don’t have any data that actually fail to converge on a model that I can show you, but let’s pretend that last model didn’t converge. What you should then do is drop fixed effects and random effects from the model and compare to see which fits the best. Drop fixed effects and random effects one at a time. Hold the fixed effects constant and drop random effects one at a time and find what works best. Then hold random effects constant and drop fixed effects one at a time. Here I have only one random effect, but I’ll show you by example with fixed effects.”

———

This article goes through more of the “math” of mixed models. I’m putting it here for now so I can look through it in more detail later.

Advertisements

Iteration vs. recursion

A few days ago, I showed a fellow burgeoning data scientist my code. What he saw made him gasp in horror. “You use for loops?! You shouldn’t use for loops…,” he said. I was a little surprised. Learning for loops was one of those breakthroughs in coding for me. But I could see what he was saying. I’ve been looking at a lot of Python code recently, and very rarely do I see for loops. I see a lot of defined functions, though. It’s overwhelming to see other programmers doing so much of something that I don’t yet do.

So I took to the Googles to investigate what’s the deal with for loops.

My first search yielded results on how to avoid for loops in R. This first result was interesting: it details how to use ifelse() statements that work on vectors instead of nested if-statements. But I already knew about this statement and it doesn’t always work for what I need to do.

But that result linked to another article that described the apply() functions and how to use it to apply a function across vectors. I’ve used this function once or twice (without really understanding what it did). But what shocked me was that I’ve used its sister function, tapply(), as one of my base functions when taking summary statistics, but I had never realized this is what it was doing. Mind blown. So that’s something I can use in the future.

But I still didn’t get a comfortable answer to why shouldn’t I use for loops and when should I use functions (turns out recursive functions) instead?

Then I read this response that someone gave to this similar question that I am pursuing in which they basically say “Recursive functions are perfect for tree structures. Loops are perfect for iterations and sequences.” I had to look up tree structures, which are basically nested structures with different branches of information. I’ve definitely used for loops and if statements to iterate through a tree structure.

I then started trying to really understand what iterations and sequences are to define use cases for for loops versus recursive functions. I found this useful article, which I liked for how simply it explains the ideas. Recursion relies on a base case (on terminating on a base case), while iteration terminates when the loop-continuation condition fails. It’s a little confusing, because this says that recursion is memory-intensive (which I remember now from a brief little experiment I did with recursive functions a few months ago), but something else I read said using functions instead of for loops reduces memory. I suppose it may depend on the type of function??

Hm, now I’m starting to wonder if iteration does not necessarily mean for loop, since you must be able to have iterative functions?

I guess I’m still confused about when to write functions versus write for loops and if that’s even a reasonable comparison to make.

Fun with Fourier Transforms

I’ve started reading Hacker News, and this PDF on Fourier Transforms came across the feed. It’s a really interesting introduction to Fourier Transforms with some added information on how to create a signal processing device on the side.

Back in the day, I had once taken a speech perception class where we analyzed speech frequencies. For all I know I learned about Fourier Transforms then, but I don’t remember now. Either way, reading about sine waves have never made me so excited!

Linear Mixed Effects Analyses Tutorial (in R)

One of my datasets requires mixed models linear regression analyses, so I was reading up on exactly how the analyses are done and what they mean. Found this useful-looking tutorial that walks through several examples of the mixed effects, as well as how to do it in R.

Here’s a graph of individual subjects, grouped by gender, and the distribution of their voice pitch.

Pitch of male and female voices

Pitch of male and female voices

To take into account the individual variation in each subject’s voice pitch, run pitch ~ politeness + sex + (1 | subject) + error, where (1 | subject) indicates the assumption that the intercept is different for each subject.

PS. I love box plots!

Creating HTML5 Slides in R Markdown

OMG. My mind is literally exploding right now. This is starting to become my default state. (I like it.)

I am searching how to change the font size in the HTML output file from an R Markdown, knit to html, when I came across this page by Yihui (a name that is becoming extremely familiar as I spend more and more time on Stackflow).

How to Make HTML5 Slides in R Markdown

How to Make HTML5 Slides in R Markdown

He basically walks through how to create HTML5 slides in R Markdown. They are BEAUTIFUL. I am so floored.

Here is a link to the slides. Just use your left and right arrow keys to navigate through them.

HTML5 slides through R Markdown

HTML5 slides through R Markdown

I would love love love to be able to learn how to do this, but I think for my kind of presentations, it wouldn’t make sense to make them in HTML5. It’s a shame. It would be so mind-blowingly cool!!

Now to return to my current problem… how to change this pesky font size…

Resizing plots in R Markdown

I made a lot of progress on one of my datasets today. It’s a 2 x 2 x 2 study, so it requires a fair amount of thinking in what the best way is to plot the data.

Lately I have been writing up my code in an R script, then when I’m happy with it, I plug it into R Markdown so I can see all the graphs at once.

When I plotted my 3-way interaction graphs, the group labels on the x-axis squished together because the default plot size was too small. So I looked up how to change the plot size in R Markdown and found this useful stackflow response.

Plot size in R Markdown

Plot size in R Markdown

So I tried it and voila! My plots look beautiful in R Markdown.

Plots in R Markdown

Plots in R Markdown

K-Means Clustering

I want to use k-means clustering for one of my studies, so in this post, I gather useful-looking links to learn how to do it!

EDIT: I made pretty good progress on my k-means clustering! Here’s a little preview to give you an idea of what I found:

kmeans clustering

kmeans clustering

Useful Information on K-Means Clustering

https://stat.ethz.ch/R-manual/R-devel/library/stats/html/kmeans.html
R documentation for kmeans

kmeans {stats}

kmeans {stats}

http://www.r-bloggers.com/k-means-clustering-is-not-a-free-lunch/
When k-means may not work but how to work around it

K-means clustering is not a free lunch

K-means clustering is not a free lunch

http://www.rdatamining.com/examples/kmeans-clustering
Simple, easy example to follow for how to use k-means clustering

k-means Clustering

k-means Clustering

http://www.r-statistics.com/2013/08/k-means-clustering-from-r-in-action/
How to determine number of clusters

K-means Clustering

K-means Clustering

http://www.statmethods.net/advstats/cluster.html
Simple reference for how to k-means cluster

Cluster Analysis

Cluster Analysis

https://rstudio-pubs-static.s3.amazonaws.com/33876_1d7794d9a86647ca90c4f182df93f0e8.html
Walks through several examples

Cluster Analysis in R

Cluster Analysis in R

http://www.improvedoutcomes.com/docs/WebSiteDocs/Clustering/K-Means_Clustering_Overview.htm
To the point overview of clustering: Pros and cons

Overview of Clustering

Overview of Clustering

http://www.norusis.com/pdf/SPC_v13.pdf
Chapter on kmeans clustering – Useful discussion on determining variables

Cluster Analysis Chapter

Cluster Analysis Chapter

http://stats.stackexchange.com/questions/31083/how-to-produce-a-pretty-plot-of-the-results-of-k-means-cluster-analysis
Plotting pairwise scatterplots of clusters

Pairwise scatter plots of clusters

Pairwise scatter plots of clusters