12/13/2011

LDA Collocation modeling


Most of the popular topic models (such as Latent Dirichlet Allocation) have an underlying assumption: bag of words. However, text is indeed a sequence of discrete word tokens, and without considering the order of words (in another word, the nearby context where a word is located), the accurate meaning of language cannot be exactly captured by word co-occurrences only.

http://www.cs.umass.edu/~mccallum/papers/tng-tr05.pdf

No comments:

Post a Comment