Zhao, H. and Lin, X. (2010). A comparison of mapping algorithms for author co-citation data analysis. In Proceedings of American Society for Information Science and Technology 2010, pp. 13, October 22–27, 2010, Pittsburgh, PA, USA.
本論文比較 MDS和凝聚式階層叢集, PFNet, SOM和Blondel社群偵測等四種映射演算法在作者共被引分析的應用,所採用的資料集是資訊科學1999-2008年間被引用次數最高的前100位作者的共被引次數。四種方法的映射結果都能夠明顯發現包含資訊檢索、使用者研究和書目計量學等三個主題的作者群集,另外在自組織映射圖裡還發現了基本理論研究的作者群,而Blondel社群偵測法則另外發現了人機互動與社會資訊學(social informatics)兩個群集。作者認為這四種方法的結果都可以發現資訊檢索、使用者研究和書目計量學等相同的群集,顯然作者共被引分析是有效的。在四種方法中,MDS和凝聚式階層叢集以及Blondel社群偵測兩種方法能清楚地表示整個領域的區分情形,PFNet和SOM則是可以利用在圖形上映射位置的相對關係對領域有細膩地描述;並且MDS和凝聚式階層叢集以及PFNet方法都可以從圖形上的布局清楚地理解看出群集的從屬關係,SOM對此功能較為欠缺,Blondel社群偵測則由於有較多的連結線,因此容易顯得雜亂。
In this study, we selected and applied four of the mapping methods to the same dataset, the author co-citation matrix of the top 100 highly cited information scientists.
Dataset used in this paper is a 100 by 100 author co-citation matrix, of which the rows and columns are the top 100 highly cited authors in Library and Information Science (LIS) during the 1999 to 2008 period.
We applied here four algorithms in the mapping process of our dataset: (1) Multidimensional Scaling with Agglomerative Hierarchical Clustering; (2) Pathfinder Networks; (3) Kohonen Map; and (4) Blondel Community Detection Algorithm.
In this method, multidimensional scaling is used for ordination and agglomerative hierarchical clustering for grouping authors. We use Pearson r as the measure of similarity between authors. The 100 by 100 co-citation matrix is converted to Pearson r correlation matrix, before being submitted to multidimensional scaling and agglomerative hierarchical clustering procedures.
Figure 1 shows that there are four distinct clusters identified. We label them as Bibliometrics I, represented by ROUSSEAU R, EGGHE L etc., Bibliometrics II (citation) by WHITE HD, MCCAIN KW etc., Information Retrieval by SALTON G, JONES KS, etc., and User Study by BELKIN NJ, BATES MJ, etc.
Pathfinder Networks algorithm approaches the ACA mapping problem as a graph pruning problem. With nodes representing authors, weighted links representing their cocitation counts, the goal is to discard insignificant links while preserving the salient semantic connection patterns in the original network (Schvaneveldt, 1990).
The result (Figure 2) shows that there are three major clusters identified, with GARFIELD E centered the Bibliometrics cluster, SALTON G the Information Retrieval cluster, and BELKIN NJ the Information Behavior cluster.
Kohonen Map algorithm is an unsupervised learning algorithm in the family of artificial neural networks (Kohonen, 2000). It learns the underlying structure of the original high dimensional inputs in a recursive process and presents the results as rectangle regions.
Figure 3 shows our Kohonen Map for the 100 authors. Several distinct regions are labeled, including User Study represented by BATES MJ, KUHLTHAU CC, etc., Information Retrieval by SALTON G, CROFT WB, etc., and Bibliometrics by GARFIELD E, SMALL H, etc.. An interesting group shown explicitly on this map is the Theorist, including WILSON P, BUCKLAND MK, BUDD JM, etc.
Community Detection Methods treat the mapping problem as a graph division problem (Newman & Girvan, 2004). We apply on the 100 by 100 co-citation matrix the Blondel community detection algorithm introduced in Wallace, Gingras, & Duhon (2009). Implementation of this algorithm is based on the Network Workbench (NWB Team, 2006).
Five communities of different sizes are identified. A visualization using the Circular Hierarchy layout is showed in Figure 4. In additional to the three major clusters, Information Retrieval, Bibliometrics and User Study, which are identified in other clustering methods, another two distinct clusters, Human Computer Interaction and Social Informatics are detected using this method.
... different algorithms reveal the structure of LIS in different manners: MDS with AHC and Blondel Community Detection give clear global division of the field, while PFNET and Kohonen Map preserve much finer granularity descriptions in terms of the relative positioning of LIS authors.
Among all the mapping layouts, PFNET and MDS are most easily to comprehend, because grouping and membership information can be easily derived from their layout. While Kohonen Maps present richer information about local proximity among authors, it fails to show membership information at a larger scale. For the community detection algorithm, because it does not do any edge pruning, it generates cluttered mapping result.
沒有留言:
張貼留言