The present book investigates distortion-rate ratio in the context of multi-document summarization. The multi-document summarization is considered as a data compression task. Optimal Tree Pruning algorithm introduced by Breiman et al. and extended by Chou et al. is adapted to multi-document summarization. The main issue in the multi-document task is redundancy. The input documents discuss the similar topics and thus contain repeated information about them. To avoid the inclusion of the repeated information in the summary, the redundant information have to be detected and eliminated. Hierarchical Agglomerative Clustering algorithm is used to detect the redundancy in the documents. This algorithm is chosen, since it yields a binary tree that is used in the optimal tree pruning algorithm.
Mr. Ulukbek Attokurov was born in Kyrgyzstan, Bishkek. He graduated from Manas University and earned his Master's degree at Istanbul Technical University. His major research areas are Regression Analysis, Probability Theory and Bayesian Inference. In addition he is interested in topic modeling and text summarization.
Number of Pages:
LAP LAMBERT Academic Publishing
distortion, Natural Language Processing, rate, Text Summarization, Hiearchical Agglomerative clustering
COMPUTERS / Information Technology