arXiv Open Access 2016

Hierarchical Latent Word Clustering

Halid Ziya Yerebakan Fitsum Reda Yiqiang Zhan Yoshihisa Shinagawa
Lihat Sumber

Abstrak

This paper presents a new Bayesian non-parametric model by extending the usage of Hierarchical Dirichlet Allocation to extract tree structured word clusters from text data. The inference algorithm of the model collects words in a cluster if they share similar distribution over documents. In our experiments, we observed meaningful hierarchical structures on NIPS corpus and radiology reports collected from public repositories.

Topik & Kata Kunci

Penulis (4)

H

Halid Ziya Yerebakan

F

Fitsum Reda

Y

Yiqiang Zhan

Y

Yoshihisa Shinagawa

Format Sitasi

Yerebakan, H.Z., Reda, F., Zhan, Y., Shinagawa, Y. (2016). Hierarchical Latent Word Clustering. https://arxiv.org/abs/1601.05472

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2016
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓