|
|
|||||||||||||||||||
|
|||||||||||||||||||
ABSTRACT
Most existing clustering algorithms cluster highly related data objects such as Web pages and Web users separately. The interrelation among different types of data objects is either not considered, or represented by a static feature space and treated in the same ways as other attributes of the objects. In this paper, we propose a novel clustering approach for clustering multi-type interrelated data objects, ReCoM (Reinforcement Clustering of Multi-type Interrelated data objects). Under this approach, relationships among data objects are used to improve the cluster quality of interrelated data objects through an iterative reinforcement clustering process. At the same time, the link structure derived from relationships of the interrelated data objects is used to differentiate the importance of objects and the learned importance is also used in the clustering process to further improve the clustering results. Experimental results show that the proposed approach not only effectively overcomes the problem of data sparseness caused by the high dimensional relationship space but also significantly improves the clustering accuracy. REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references. 1 P. Berkhin, Survey of Clustering Data Mining Techniques, http://www.accrue.com/products/researchpapers.html, 2002. 2 J. S. Breese et al, Empirical Analysis of Predictive Algorithms for Collaborative Filtering, Technical report, Microsoft Research, 1998. 3 S. Brin and L. Page, The Anatomy of a Large-Scale Hypertextual Web Search Engine, in Proc. of the 7th international World Wide Web Conference Vol.7, 1998. 4 S. Chakrabarti, Data Mining for Hypertext: A Tutorial survey, In ACMSIGKDD Explorations, 2000. 5 L. Chen and K. Sycara, "Webmate: A personal agent for browsing and searching," In Proceedings 2nd Intl. Conf. Autonomous Agents, pp. 132--139, 1998. 6 D. Cohn & T. Hofman, The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity, in Proc. Neural Information Processing Systems, 2001. 7 T. M. Cover and J. A. Thomas, Elements of Information Theory, Wiley, 1991. 8 I. Dhillon et al, Efficient Clustering of Very Large Document Collections, In Data Mining for Scientific and Engineering Applications, Kluwer Academic Publishers, 2001. 9 D. Gibson, J. Kleinberg, and P Raghavan. Inferring Web communities from link topology, In Proc. 9th ACM Conference on Hypertext and Hypermedia, pages 225--234, 1998. 10 J. Heer and E. H. Chi, Identification of Web User Traffic Composition Using Multi-Modal Clustering and Information Scent, in 1st SIAM ICDM, Workshop on Web Mining, Chicago, 2001. 11 J. Kleinberg, Authoritative Sources in a Hyperlinked Environment, in Proc. of the 9th ACM-SIAM Symposium on Discrete Algorithms, 1998. 12 B. Liu et al, Clustering Through Decision Tree Construction, the 9th International Conference on Information and Knowledge Management (CIKM), 2000. 13 J. Neville and D. Jensen, Iterative Classification in Relational Data, In Proc. AAAI-2000 Workshop on Learning Statistical Models from Relational Data, AAAI Press, 2000. 14 S. Slattery and M. Craven, Combining statistical and relational methods in hypertext domains. In Proc.ILP, 1998. 15 M. Steinbach et al, A Comparison of Document Clustering Techniques, in 6th ACM SIGKDD, World Text Mining Conference, Boston, 2000. 16 Z. Su et al, Correlation-based Document Clustering using Web Logs, In Proc. of the 34th Hawaii International Conference On System Sciences (HICSS-34), 2001. 17 B. Taskar et al, Probabilistic Classification and Clustering in Relational Data, in Proc. of IJCAI-01, 17th International Joint Conference on Artificial Intelligence, 2001. 18 L. H. Ungar, D.P.Foster, Clustering Methods for Collaborative Filtering, In Workshop on Recommendation System at the 15th National Conference on Artificial Intelligence, 1998. 19 J. Wen, J.Y. Nie, H. Zhang, "Query Clustering Using User Logs," ACM Transactions on Information Systems, 20 (1): 59--81, 2002. 20 H. Zeng et al, A Unified Framework for Clustering Heterogeneous Web Objects, in Proc. of the 3rd International Conference on Web Information System Engineering, Singapore, 2002. 21 Open Directory Project, http://dmoz.org/ INDEX TERMS
Primary Classification:
General Terms:
Keywords:
|
|||||||||||||||||||