In this tutorial, we’ll learn about topic modeling, some of its applications, and we’ll dive deep into a specific technique named Latent Dirichlet Allocation.
[…]
We see how the algorithm created an intermediate layer with topics and figured out the weights between documents and topics and between topics and words. Documents are no longer connected to words but to topics.
[…]
In a random distribution, documents would be evenly distributed across the four topics:
[…]
In the example with documents, topics, and words, we’ll have two PMFs:
[…]
We start with the distribution D1 of documents over topics:
[…]
LDA will produce a distribution of topics over words. By analyzing that distribution, we can extract the most frequent words for a topic and get an idea of what it is about.
[…]