2013年4月14日 星期日

Waltman, L., van Eck, N. J., & Noyons, E. (2010). A unified approach to mapping and clustering of bibliometric networks. Journal of Informetrics, 4(4), 629-635.

Waltman, L., van Eck, N. J., & Noyons, E. (2010). A unified approach to mapping and clustering of bibliometric networks. Journal of Informetrics, 4(4), 629-635.

information visualization

本研究提出整合書目計量網絡(bibliometric networks)的映射(mapping)與叢集(clustering)的技術。映射與叢集技術經常一起用於分析書目網絡的結構,以了解科學領域的重要研究主題、研究主題之間的關係和領域的發展等問題。在過去對於書目網絡的相關研究中,有些研究建構一個映射圖來呈現網絡上的節點,並且在圖上呈現節點的叢集情形,例如McCain (1990)、White & Griffith (1981)、Leydesdorff & Rafols (2009)和Van Eck, Waltman, Dekker, & Van den Berg (in press);有些研究則先對節點進行叢集,然後建構一個映射圖來呈現節點的叢集,例如Small, Sweeney, & Greenlee (1985)和Noyons, Moed, & Van Raan, (1999);第三種方式則是先建構一個呈現節點的映射圖,再利用節點在映射圖上的座標進行叢集,例如Boyack, Klavans, & Börner (2005)和Klavans & Boyack (2006)。在書目計量學和科學計量學的研究裡常使用的映射與叢集技術組合是以多維縮放(multidimensional scaling)和 階層叢集(hierarchical clustering)技術的組合,著名的早期研究有McCain (1990)、Peters &Van Raan (1993)、Small et al., (1985)和White & Griffith (1981)。其他知名的映射技術還有經常配合尋徑者網路縮放方法(pathfinder network scaling)的Kamada and Kawai (1989)映射演算法,例如 Chen (1999)、de Moya-Anegón et al. (2007)和White (2003)等,Boyack等研究者提出的VxOrd(Boyack et al., 2005; Klavans & Boyack, 2006)和Van Eck等研究者提出的VOS (Van Eck et al., in press)也都是常被使用的映射技術。在叢集方面,除了階層式叢集以外,因素分析(factor analysis)也常被使用,例如de Moya-Anegón et al. (2007)、Leydesdorff & Rafols (2009)和Zhao & Strotmann (2008)等研究,近年來書目計量學和科學計量學的研究裡經常被應用的技術是建立在Newman and Girvan (2004)提出的模組性函數(modularity function)的叢集技術,例如Chen & Redner (2010)、Lambiotte & Panzarasa (2009)、Schubert & Soós (2010)、Takeda & Kajikawa (2009)、Wallace, Gingras, & Duhon, (2009)和Zhang, Liu, Janssens, Liang, & Glänzel (2010)。然而正如以上的分析,映射與叢集技術一起用於分析書目網絡的結構的技術,雖然極為相關,但是大多是獨立發展。本研究便是基於這個問題,藉由對於過去發展的VOS映射技術以及以模組性為基礎的叢集技術引導出一致的原則,建立這兩種技術的關連(relation),來進行整合。另一個整合映射和叢集技術的研究Noack (2009)則定義了一個參數化的目標函數(a parameterized objective function)來描述一類的映射技術, 並且證明以模組化為基礎的叢集技術也可以納入這個目標函數,因此可以建立映射和叢集技術之間的關係。本研究與Noack(2009)的不同在於本研究提出的方法直接建立VOS映射技術和模組性為基礎的叢集技術之間的關係,而不是透過目標函數做為映射和叢集技術之間的關係,並且也包含一個權重因素(weighing factor),最後本研究的方法利用解析度(resolution)參數來解決模組性為基礎的叢集技術在解析度上的問題。為了驗證這個技術的可行性,本研究並且以資訊科學在1999到2008年間最常被引用的1242筆文獻進行映射和叢集,利用書目耦合和共被引次數的總和來估計文獻間的關連程度,產生的結果圖形上可以觀察到在資訊科學的結構中包含資訊尋求和檢索(information seeking and retrieval)以及資訊計量學(informetrics)兩個大的次領域,這個結果與其他以資訊科學為分析對象的書目計量研究相似。

In bibliometric and scientometric research, a lot of attention is paid to the analysis of networks of, for example, documents, keywords, authors, or journals. Mapping and clustering techniques are frequently used to study such networks.The aim of these techniques is to provide insight into the structure of a network. The techniques are used to address questions such as:
• What are the main topics or the main research fields within a certain scientific domain?
• How do these topics or these fields relate to each other?
• How has a certain scientific domain developed over time?
To satisfactorily answer such questions, mapping and clustering techniques are often used in a combined fashion.

One approach is to construct a map in which the individual nodes in a network are shown and to display a clustering of the nodes on top of the map, for example by marking off areas in the map that correspond with clusters (e.g., McCain, 1990; White & Griffith, 1981) or by coloring nodes based on the cluster to which they belong (e.g., Leydesdorff & Rafols, 2009; Van Eck, Waltman, Dekker, & Van den Berg, in press).

Another approach is to first cluster the nodes in a network and to then construct a map in which clusters of nodes are shown. This approach is for example taken in the work of Small et al. (e.g., Small, Sweeney, & Greenlee, 1985) and in earlier work of our own institute (e.g., Noyons, Moed, & Van Raan, 1999).

A third approach is to first construct a map in which the individual nodes in a network are shown and to then cluster the nodes based on their coordinates in the map (e.g., Boyack, Klavans, & Börner, 2005; Klavans & Boyack, 2006).

In the bibliometric and scientometric literature, the most commonly used combination of a mapping and a clustering technique is the combination of multidimensional scaling and hierarchical clustering (for early examples, see McCain, 1990; Peters&Van Raan, 1993; Small et al., 1985; White&Griffith, 1981).

A popular alternative to multidimensional scaling is the mapping technique of Kamada and Kawai (1989); (see e.g. Leydesdorff & Rafols, 2009; Noyons & Calero-Medina, 2009), which is sometimes used together with the pathfinder network technique (Schvaneveldt, Dearholt, & Durso, 1988; see e.g. Chen, 1999; de Moya-Anegón et al., 2007; White, 2003). Two other alternatives to multidimensional scaling are the VxOrd mapping technique (e.g., Boyack et al., 2005; Klavans & Boyack, 2006) and our own VOS mapping technique (e.g., Van Eck et al., in press).

Factor analysis, which has been used in a large number of studies (e.g., de Moya-Anegón et al., 2007; Leydesdorff & Rafols, 2009; Zhao & Strotmann, 2008), may be seen as a kind of clustering technique and, consequently, as an alternative to hierarchical clustering. Another alternative to hierarchical clustering is clustering based on the modularity function of Newman and Girvan (2004); (see e.g. Wallace, Gingras, & Duhon, 2009; Zhang, Liu, Janssens, Liang, & Glänzel, 2010).

In bibliometric and scientometric research, modularity-based clustering has been used in a number of recent studies (Chen & Redner, 2010; Lambiotte & Panzarasa, 2009; Schubert & Soós, 2010; Takeda & Kajikawa, 2009; Wallace et al., 2009; Zhang et al., 2010).

As we have discussed, mapping and clustering techniques have a similar objective, namely to provide insight into the structure of a network, and the two types of techniques are often used together in bibliometric and scientometric analyses. However, despite their close relatedness, mapping and clustering techniques have typically been developed separately from each other.

In our view, when a mapping and a clustering technique are used together in the same analysis, it is generally desirable that the techniques are based on similar principles as much as possible. This enhances the transparency of the analysis and helps to avoid unnecessary technical complexity. Moreover, by using techniques that rely on similar principles, inconsistencies between the results produced by the techniques can be avoided.

In this paper, we propose a unified approach to mapping and clustering of bibliometric networks. We show how a mapping and a clustering technique can both be derived from the same underlying principle. In doing so, we establish a relation between on the one hand the VOS mapping technique (Van Eck &Waltman, 2007; Van Eck et al., in press) and on the other hand clustering based on a weighted and parameterized variant of the well-known modularity function of Newman and Girvan (2004).

It follows from (6) and (7) that our proposed clustering technique can be seen as a kind of weighted variant of modularity-based clustering (see Appendix B for a further discussion). However, unlike modularity-based clustering, our clustering technique has a resolution parameter . This parameter helps to deal with the resolution limit problem (Fortunato & Barthélemy, 2007) of modularity based clustering. Due to this problem, modularity-based clustering may fail to identify small clusters. Using our clustering technique, small clusters can always be identified by choosing a sufficiently large value for the resolution parameter .

The above result showing how mapping and clustering can be performed in a unified and consistent way resembles to some extent a result derived by Noack (2009). Noack defined a parameterized objective function for a class of mapping techniques (referred to as force-directed layout techniques by Noack). This class of mapping techniques includes for example the well-known technique of Fruchterman and Reingold (1991). Noack showed that his parameterized objective function subsumes the modularity function of Newman and Girvan (2004). In this way, Noack established a relation between on the one hand a class of mapping techniques and on the other hand modularity-based clustering.

First, the result of Noack does not directly relate well-known mapping techniques such as the one of Fruchterman and Reingold to modularity-based clustering. Instead, Noack’s result shows that the objective functions of some well-known mapping techniques and the modularity function of Newman and Girvan are special cases of the same parameterized function. Our result establishes a direct relation between a mapping technique that has been used in various applications, namely the VOS mapping technique, and a clustering technique.

Second, the mapping and clustering techniques considered by Noack and the ones that we consider differ from each other by a weighing factor. This is the weighing factor given by (7).

Third, the clustering technique considered by Noack is unparameterized, while our clustering technique has a resolution parameter.

In Fig. 1, we show a combined mapping and clustering of the 1242 most frequently cited publications that appeared in the field of information science in the period 1999–2008. The mapping and the clustering were produced using our unified approach.

For these publications, we determined the number of co-citation links and the number of bibliographic coupling links. These two types of links were added together and served as input for both our mapping technique and our clustering technique.

The combined mapping and clustering shown in Fig. 1 provides an overview of the structure of the field of information science. The left part of the map represents what is sometimes referred to as the information seeking and retrieval (ISR) subfield (Åström, 2007), and the right part of the map represents the informetrics subfield.

The clustering shown in Fig. 1 consists of 25 clusters. The distribution of the number of publications per cluster has a mean of 49.7 and a standard deviation of 31.5.

沒有留言:

張貼留言