ReCoM: reinforcement clustering of multi-type interrelated data objects
Full text pdf formatPdf (205 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval table of contents
Toronto, Canada
SESSION: Clustering table of contents
Pages: 274 - 281  
Year of Publication: 2003
ISBN:1-58113-646-3
Authors
Jidong Wang  Microsoft Research Asia, Beijing, P.R.China
Huajun Zeng  Microsoft Research Asia, Beijing, P.R.China
Zheng Chen  Microsoft Research Asia, Beijing, P.R.China
Hongjun Lu  Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
Li Tao  Microsoft Research Asia, Beijing, P.R.China
Wei-Ying Ma  Microsoft Research Asia, Beijing, P.R.China
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM Press   New York, NY, USA
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Discussions    Find similar Articles   Review this Article  
Save this Article to a Binder    Display in BibTex Format   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/860435.860486
What is a DOI?

ABSTRACT

Most existing clustering algorithms cluster highly related data objects such as Web pages and Web users separately. The interrelation among different types of data objects is either not considered, or represented by a static feature space and treated in the same ways as other attributes of the objects. In this paper, we propose a novel clustering approach for clustering multi-type interrelated data objects, ReCoM (Reinforcement Clustering of Multi-type Interrelated data objects). Under this approach, relationships among data objects are used to improve the cluster quality of interrelated data objects through an iterative reinforcement clustering process. At the same time, the link structure derived from relationships of the interrelated data objects is used to differentiate the importance of objects and the learned importance is also used in the clustering process to further improve the clustering results. Experimental results show that the proposed approach not only effectively overcomes the problem of data sparseness caused by the high dimensional relationship space but also significantly improves the clustering accuracy.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1   P. Berkhin, Survey of Clustering Data Mining Techniques, http://www.accrue.com/products/researchpapers.html, 2002.

2   J. S. Breese et al, Empirical Analysis of Predictive Algorithms for Collaborative Filtering, Technical report, Microsoft Research, 1998.

3   S. Brin and L. Page, The Anatomy of a Large-Scale Hypertextual Web Search Engine, in Proc. of the 7th international World Wide Web Conference Vol.7, 1998.

4   S. Chakrabarti, Data Mining for Hypertext: A Tutorial survey, In ACMSIGKDD Explorations, 2000.

5   L. Chen and K. Sycara, "Webmate: A personal agent for browsing and searching," In Proceedings 2nd Intl. Conf. Autonomous Agents, pp. 132--139, 1998.

6   D. Cohn & T. Hofman, The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity, in Proc. Neural Information Processing Systems, 2001.

7   T. M. Cover and J. A. Thomas, Elements of Information Theory, Wiley, 1991.

8   I. Dhillon et al, Efficient Clustering of Very Large Document Collections, In Data Mining for Scientific and Engineering Applications, Kluwer Academic Publishers, 2001.

9   D. Gibson, J. Kleinberg, and P Raghavan. Inferring Web communities from link topology, In Proc. 9th ACM Conference on Hypertext and Hypermedia, pages 225--234, 1998.

10   J. Heer and E. H. Chi, Identification of Web User Traffic Composition Using Multi-Modal Clustering and Information Scent, in 1st SIAM ICDM, Workshop on Web Mining, Chicago, 2001.

11   J. Kleinberg, Authoritative Sources in a Hyperlinked Environment, in Proc. of the 9th ACM-SIAM Symposium on Discrete Algorithms, 1998.

12   B. Liu et al, Clustering Through Decision Tree Construction, the 9th International Conference on Information and Knowledge Management (CIKM), 2000.

13   J. Neville and D. Jensen, Iterative Classification in Relational Data, In Proc. AAAI-2000 Workshop on Learning Statistical Models from Relational Data, AAAI Press, 2000.

14   S. Slattery and M. Craven, Combining statistical and relational methods in hypertext domains. In Proc.ILP, 1998.

15   M. Steinbach et al, A Comparison of Document Clustering Techniques, in 6th ACM SIGKDD, World Text Mining Conference, Boston, 2000.

16   Z. Su et al, Correlation-based Document Clustering using Web Logs, In Proc. of the 34th Hawaii International Conference On System Sciences (HICSS-34), 2001.

17   B. Taskar et al, Probabilistic Classification and Clustering in Relational Data, in Proc. of IJCAI-01, 17th International Joint Conference on Artificial Intelligence, 2001.

18   L. H. Ungar, D.P.Foster, Clustering Methods for Collaborative Filtering, In Workshop on Recommendation System at the 15th National Conference on Artificial Intelligence, 1998.

19   J. Wen, J.Y. Nie, H. Zhang, "Query Clustering Using User Logs," ACM Transactions on Information Systems, 20 (1): 59--81, 2002.

20   H. Zeng et al, A Unified Framework for Clustering Heterogeneous Web Objects, in Proc. of the 3rd International Conference on Web Information System Engineering, Singapore, 2002.

21   Open Directory Project, http://dmoz.org/


Collaborative Colleagues: