Last quarter, I started learning how to use topic modeling to analyze open-ended data (or unstructured data in data science terms). Basically data where participants are able to say anything they want in response to a prompt I give them.
Another time I’ll write a post about what I did at that time, but first I wanted to note something new I learned related to topic modeling today!
Today I wanted to revisit my data with topic modeling and in the process learn more about topic modeling. So far I’ve used topic modeling in a narrow sense–how to use it in R. Today I wanted to take a step back to understand what topic modeling is doing theoretically.
One of the first results that popped up for “topic modeling” was actually a link by Stanford! So of course I had to check it out.
It’s really useful! They use scala, so I want to see if I can take their advice and apply it to R (or Python?). But I really liked their explanations and tutorials. They discuss how to test the model and how to modify the parameters to create a better model. It’s neat! I want to try this tomorrow!
I realize now this wasn’t a very useful blog post, so let’s think of it as being a post-it note for useful information that will be revisited in the near future.
Okay fineee, here’s the real reason why I wanted to make this post, to discuss this point:
At the time I read this point, I thought it was about refining the number of topics, but now I realize it’s not about topics, but iterations on the data. So I was super excited by this piece of advice, but now my excitement level has dropped about two notches. Still useful, but not what I was looking for.
However, later on, the tutorial does mention refining number of topics, but their sentence gets cut off:
Number of topics
“…has started to decrease at a” …a what?! a what?! Don’t leave me hanging here…
Sorry folks, you’ll just have to be left hanging! If I find out more info, I will update this post or make a new one. Until next time!