2013年3月27日 星期三

Leydesdorff, L. & Rafols, I. (2011). Indicators of the interdisciplinarity of journals: Diversity, centrality, and citations. Journal of Informetrics, 5, 87-100.

Leydesdorff, L. & Rafols, I. (2011). Indicators of the interdisciplinarity of journals: Diversity, centrality, and citations. Journal of Informetrics, 5, 87-100.

scientometrics

Van den Besselaar & Leydesdorff (1996)認為跨學科性(interdisciplinarity)是一種暫時的現象,當某一個新的專業(specialty)出現,它可能需要大量仰賴它的來源學科或專業(mother disciplines/specialties),但一旦在它成熟後,新的期刊彼此引用增加,形成封閉迴路,便是典型的學科。然而跨學科性也指某些期刊因為應用的緣故,需要採納不同知識體系的期刊,這些期刊通常在期刊階層(journal hierarchy)的上方。

本研究探討六種測量期刊的跨學科性的方式。

在這些方式中,Shannon所提出的熵(entropy)和Gini係數(Gini coefficient, Buchan, 2002)在對某一種期刊進行測量時,是以這個期刊引用其他期刊或是被其他期刊引用的次數為基礎來計算。Shannon的熵 H計算如pi是分布在第i個元素上的機率,熵用來計算分佈上的不確定性(uncertainty)。Gini係數的計算如xi是第i個元素的統計次數,Gini可以用來計算分佈上的不相等(inequality)或不均勻(unevenness)等情形。

除了Shannon的entropy和Gini係數計算外,也可利用期刊間的共被引次數或共引用次數為基礎建立以期刊為節點的網路圖,利用中介中心性(betweenness centrality)估計(Leydesdorff, 2007),本研究的測量方式包括直接以共被引或是共引用次數以及利用矩陣的cosine來分析等兩種方式。

最後,Stirling(2007)建議利用整合來測量多樣性(diversity),dij是兩種期刊之間的距離,在本研究裡分別以歐幾里德距離(Euclidian distance)和(1-cosine)來估算。Stirling(2007)的計算方法曾被Porter & Rafols (2009) 和 Rafols & Meyer (2010)應用在測量文章層次的跨學科性,本研究則是將這種方法應用於期刊層次。

在本研究裡利用Spearman的排名次序相關性(Spearman’s rank-order correlations)來比較上述的六種方式。研究結果發現:以被引用和引用為基礎所產生的結果相關性都不大,作者認為這個現象可能與期刊根據引用所展現的多元知識基礎並不代表其在被引用所展現上也能夠具有多元的讀者,另外,也可能是因為研究前沿的變化較大,而知識基礎則較為穩定。

利用因素分析(factor analysis)將這些指標進行分析,其結果顯示Shannon熵、Gini係數與基於(1-cosine)的Rao-Stirling多樣性歸屬於同一個因素,稱為「跨學科性」;中介中心性與網路圖的規模(size)有關;最後,基於Euclidean距離的Rao-Stirling多樣性則與期刊引用的影響力(impact)有關。

In this study, we investigate network indicators (betweenness centrality), unevenness indicators (Shannon entropy, the Gini coefficient), and more recently proposed Rao–Stirling measures for “interdisciplinarity.”

Among the various journal indicators based on citations, such as impact factors, the immediacy index, cited half-life, etc., a specific indicator of interdisciplinarity has hitherto been lacking (Kajikawa & Mori, 2009; Porter, Roessner, Cohen, & Perreault, 2006; Porter, Cohen, David Roessner, & Perreault, 2007; Wagner et al., in press; Zitt, 2005).

Given the matrix of aggregated journal–journal citations as derived from the Journal Citation Reports(JCR) of the (Social) Science Citation Index, a clustering algorithm usually aims to partition the database in terms of similarities in the distributions— a single (disciplinary) framework.

Some journals reach across boundaries because they relate different subdisciplines into a single (disciplinary) framework. ... Other journals combine intellectual contributions based on methods or instruments used in different disciplines.

Furthermore, interdisciplinarity may be a transient phenomenon. As a new specialty emerges, it may draw heavily on its mother disciplines/specialties, but as it matures a set of potentially new journals can be expected to cite one another increasingly, and thus to develop a type of closure that is typical of “disciplinarity” (Van den Besselaar & Leydesdorff, 1996).

Interdisciplinarity, however, may mean something different at the top of the journal hierarchy (as in the case of Science and Nature) than at the bottom, where one has to draw on different bodies of knowledge for the sake of the application (e.g., in engineering).

Among the network indicators, betweenness centrality seems an obvious candidate for the measurement of interdisciplinarity (Freeman, 1977; Freeman, 1978/1979). One of us experimented with betweenness centrality as an indicator of interdisciplinarity in aggregated journal–journal citation networks (Leydesdorff, 2007). ... Using rotated factor analysis, Bollen et al. (2009b, pp. 4 ff.) found betweenness centrality positioned near the origin of a two-factor solution; this suggests that betweenness centrality might form a separate (third) dimension in their array of 39 possible journal indicators.

The occasion for returning to the research question of a journal indicator for “interdiscipinarity” was provided by the new interest in “interdisciplinarity” in bibliometrics (Laudel & Origgi, 2006; Wagner et al., in press) and the availability of another potential measure: diversity as defined by Stirling (2007; cf.Rao, 1982). Would it perhaps be possible to benchmark the various possible indicators of “interdisciplinarity” against each other? Using this new measure, Porter & Rafols (2009) and Rafols & Meyer (2010), for example, suggested that this new measure would be useful to indicate interdisciplinarity at the article level

Stirling (2007, p. 712) proposed to integrate the (in)equality in a vector with the network structure using the following formula for diversity D:



This measure is also known in the literature as “quadratic entropy” (e.g., Izsáki & Papp, 1995) because unlike traditional measures of diversity such as the Shannon entropy and the Gini index, the probability distributions (pi and pj) of the units of analysis (in our case, the citation distributions of the individual journals) are multiplied by the distance in the (citation) network among them (dij).

Stirling (2007) proposed his approach as “a general framework for analyzing diversity in science, technology and society” because the two dimensions – (un)evenness in the distributions at the vector level and similarity among the vectors at the matrix level – are combined (Rafols & Meyer, 2010).

Data was harvested from the CD-Rom versions of the JCRs 2008 of the Science Citation Index (6598 journals) and the Social Sciences Citation Index (1980 journals). 371 of these journals are covered by both databases. Our set is therefore 6598 + 1980−371 = 8207 journals.

Let us first turn to the vector-based measures. These are based on the frequency distributions of citations of each of the journals, either in the cited or citing directions.

In the extreme case where a journal only cites or is cited by articles in the journal itself, the inequality in the citation distribution is maximal and the uncertainty minimal. Maximum inequality corresponds to a Gini of unity and minimum uncertainty is equal to a Shannon entropy of zero. The journal is then extremely mono-disciplinary.

The Gini coefficient is a well established measure of inequality or unevenness in a distribution

with n being the number of elements in the population and xi being the number of citations of element i in the ranking. The Gini ranges between zero for a completely even distribution and (n–1)/n for a completely uneven distribution, approaching one for large populations.

For comparisons among smaller populations of varying size, this requires a normalization that brings Gini coefficients for all populations to the same maximum unity, i.e., one.


The uncertainty contained in a distribution can be formalized using Shannon’s (1948) formula for probabilistic entropy:
The maximum information is the same and thus a constant for all vectors, namely log2(8207) = 13.00 bits.

Betweenness centrality is defined as follows: Sum(i) (Sum(j) (gikj/gij)) , i <> j <> k

Or, in words: the betweenness centrality of a vertex k is equal to the proportion of all geodesics between pairs (gij) of vertices that include this vertex (gijk).


We shall thus compare betweenness centrality as an indicator of interdisciplinarity by using both the asymmetrical citation matrix (in both the cited and citing directions) and the two symmetrical co-citation matrices (that is, using the numerators of Eq. (7) for distinguishing between zeros and ones).


Euclidean distances seem a most natural candidate for the distance matrix used for measuring Rao–Stirling diversity (Eq. (1)). First, Euclidean distances involve the least restrictive assumptions; second, Euclidean distances can be transformed through simple scaling of dimensions to represent a wide range of possible geometries (Kruskal, 1964); and third, Euclidian distances are more familiar, parsimonious, and intuitively accessible than most other distance measures.


Throughout the paper, we use Spearman’s rank-order correlations because our primary objective is an indication of interdisciplinarity as a variable attribute among journals.


While the Gini coefficient indicates unevenness, Shannon entropy provides an indicator of evenness. In other words, the Gini coefficient can be considered as an indicator of specificity and therefore disciplinarity, whereas the entropy (H) increases both when more cells of the vector are affected and with greater spread among the different categories.


The negative signs of the rank-order correlations between the two indicators (Table 2) show the opposite directionality.





Not surprisingly, there is no strong correlation between rankings in the cited and citing dimensions: journals that build on diverse knowledge bases (citing patterns) do not necessarily have diverse audiences (cited patterns).


Table 2 shows that correlations between the two indicators in the cited dimension ( = –0.803) are higher than in the citing dimension ( = –0.658). This is understandable, since the citing side represents the research front and therefore introduces variability, while the archive of science is cited and thus can be expected to be more stable (Leydesdorff, 1993).


As expected, the entropy measure is affected by size. ... The Gini coefficient corrects for this size effect because of a normalization in the denominator.


The results based on using (1−cosine) as a distance measure can be provided with an interpretation, but an interpretation is more difficult to provide for results based on Euclidean distances.


In summary, these results first suggest that the (1−cosine)-based measure operates on average better as an indicator of interdisciplinarity than the one based on Euclidean distances.


Shannon entropy measures variety at the vector level and can be thus used as an indicator of interdisciplinarity if one is not primarily interested in a correction for size effects. Betweenness centrality in the cosine-normalized matrix provides a measure for interdisciplinarity. Using cosine values as weights for the edges can be expected to improve this measure further. Rao–Stirling diversity measures are sensitive to the distance measure being used.


Factor analysis enables us to study whether the various indicators cover the same ground or should be considered as different.

Leydesdorff (2009) found two main dimensions – namely, size and impact – in the cited direction when using the ISI set of journals and including network indicators.

On the one hand, the impact factor and the immediacy index are highly correlated (Yue, Wilson, & Rousseau, 2004); on the other, total cites and indegree can be considered as indicators of size (Bensman, 2007; Bollen et al., 2009a, 2009b).


Using these four indicators to anchor the two main dimensions in the cited dimension and the six indicators discussed above, Table 9 shows that in a three-factor model – three factors explain 72.4% of the variance in this case – the first factor can indeed be associated with “size” and the third with “impact.”

Entropy, the Gini coefficient, and Rao–Stirling diversity based on (1−cosine) as a distance measure constitute another (second) dimension which one could designate as “interdisciplinarity.”

Betweenness centrality, however, loads highest on the size factor even after normalization for size.

Rao–Stirling diversity based on relative Euclidean distances loads negatively on the third factor (“impact”), and is in this respect different from all the other indicators under study.


The factor structures in Table 10(a and b) – cited and citing, respectively – are considerably different. These results suggest that the underlying structure is more determined by the functionality in the data matrix (cited or citing) than by correlations among the indicators.

In both solutions, however, betweenness before and after normalization load together on a second factor. This is not surprising since the two measures are related (Bollen et al., 2009b).
In both solutions, we also find Rao–Stirling diversity measured on the basis of (1−cosine) as the distance measure and Shannon entropy loading on the same factor.
The Gini coefficient and the Rao–Stirling diversity based on Euclidean distances have a different (i.e., not consistent) position in the cited or the citing directions.


In summary, Shannon entropy qualifies as a vector-based measure of interdisciplinarity.

Our assumption that the Gini coefficient would qualify as an indicator of inequality and therefore (disciplinary) specificity was erroneous: interdisciplinarity is not just the opposite of disciplinarity.
Betweenness centrality and Rao–Stirling diversity (after cosine-normalizations) indicate different aspects of interdisciplinarity. Betweenness centrality, however, remains associated with size more than Rao–Stirling diversity or entropy despite the normalization. Perhaps setting a threshold would change this dependency on size because larger journals can be expected to be cited in or citing from a larger set.


In order to enhance the interpretation by the readership of this journal, we chose the category of Library and Information Science, which contained 61 journals in 2008. In other words, we compare these 61 journals in terms of how they are cited by the 8207 journals in the database. (Note that one can also compute local values for betweenness centrality, etc., using the 61×61 citation matrix among these journals.)


The two ways to measure betweenness centrality and Rao–Stirling diversity, respectively, provide the first two factors. Entropy loads primarily on Factor One with Rao–Stirling diversity, and to a lower extent on Factor Two with betweenness centrality.


Table 12, finally, shows the top 20 journals ranked on their betweenness centrality after normalization as one of the possible indicators for interdisciplinarity. Entropy correlates at the level of = 0.830 with this indicator, and = 0.732 with Rao–Stirling diversity based on (1−cosine) as the distance measure. The latter measure has the advantage of correlating less with size (for example, total cites) than the other two: the with total cites (in 2008) was 0.549 for Rao–Stirling diversity, 0.880 for betweenness centrality, and 0.793 for Shannon entropy.


Among the vector-based indicators, the Shannon entropy takes into account both the reach of a journal in terms of its degree – because this number (n≤N; N= 8207) limits the maximal uncertainty within the subset – and the spread in the pattern of citations among these n journals. By normalizing the entropy as a percentage of this local maximum (log(n)), one can correct for the size effect. But this brings to the top of the ranking specialist journals that are cited equally across a relatively small set.


Betweenness centrality based on cosine-normalized matrices qualifies as an indicator of interdisciplinarity.


One conceptual advantage of the Rao–Stirling diversity measure over betweenness centrality as used in this study is that the values are not binarized during the computation of diversity. An algorithm that would weigh the cosine values as a basis for the computation of betweenness centrality would perhaps improve our capacity to indicate interdisciplinarity (Brandes, 2001).

沒有留言:

張貼留言