2013年4月21日 星期日

Leydesdorff, L., & Rafols, I. (2009). A global map of science based on the ISI subject categories. Journal of the American Society for Information Science and Technology, 60(2), 348-362.

Leydesdorff, L., & Rafols, I. (2009). A global map of science based on the ISI subject categories. Journal of the American Society for Information Science and Technology, 60(2), 348-362.

information visualization

本研究對175個ISI主題分類(ISI subject categories)的相互引用資料進行探索性因素分析(exploratory factor analysis),從這個結果來驗證ISI主題分類是否能夠進行資訊視覺化。相關的研究包括Boyack, Klavans, and Börner (2005)利用VxOrd的VxInsight演算法將所有ISI資料庫收錄的期刊,根據它們相互的引用關係進行視覺化,能夠反映科學結構的期刊映射圖,同時以k-means叢集演算法產生新的分類;Moya-Anagón et al. (2007)則是利用ISI主題分類的共被引(co-citation)資料做為尋徑者網路(Pathfinder network)演算法的輸入資料來產生映射圖。本研究的研究資料是2006年的ISI期刊引用資料,在2006年175個ISI主題分類內有三個主題分類(“Psychology, biological,” “Psychology, experimental,” and “Transportation”)沒有引用資料,但有被引用的資料。資料先以cosine進行正規化,利用SPSS (v15)進行因素分析,並且利用Pajek程式 (Batagelj & Mrvar, 2007)的Kamada and Kawai (1989)演算法進行視覺化。以具有引用資料的172個主題分類而言,經過因素分析後,共可發現14個因素,這些因素與學科(disciplines)的分類相符合。以被引用的資料進行因素分析,其結果也可以看出學科分類的樣貌。比較引用資料和被引用資料的結果,172個主題分類的154個(89%)在兩個結果中落於相同的因素。

The ISI subject categories classify journals included in the Science Citation Index (SCI). The aggregated journal-journal citation matrix contained in the Journal Citation Reports can be aggregated on the basis of these categories. This leads to an asymmetrical matrix (citing versus cited) that is much more densely populated than the underlying matrix at the journal level. Exploratory factor analysis of the matrix of subject categories suggests a 14-factor solution. This solution could be interpreted as the disciplinary structure of science. The nested maps of science (corresponding to 14 factors, 172 categories, and 6,164 journals) are online at http://www.leydesdorff.net/map06.

In contrast, Boyack, Klavans, and Börner (2005) used the VxInsight algorithm (Davidson, Hendrickson, Johnson, Meyers, & Wylie, 1998) in order to map the whole journal structure as a representation of the structure of science.

Moya-Anagón et al. (2004, 2007) used cocitation and PathFinder for mapping the whole of science on the basis of the ISI subject categories.

Klavans and Boyack (2007, p. 438) noted that a journal may occupy a different position in a different context: Many journals report on developments in multiple disciplines; journals can also function as a major source of references in more than one specialty.

Since citation relations among journals are dense in discipline-specific clusters and otherwise virtually nonexistent, the journal-journal citation matrix can be considered nearly decomposable.

The next-order units represented by the square submatrices— and representing in this case disciplines or specialties—are reproduced in relatively stable sets (of journals), which may change over time. The sets of journals are functional subsystems that show a high density in terms of relations within the center (i.e., core journals), but are more open to change in relations at the margins.

The decomposition into nearly decomposable matrices has no analytical solution. However, algorithms can provide heuristic decompositions when there is no single unique correct answer (Newman, 2006a, 2006b).

The number of category attributions in the Science Citation Index is 9,848 for 6,164 journals in 2006 or, in other words, approximately 1.6 categories per journal. The coverage of the 172 categories ranges from 262 journals sorted under “Biochemistry and Molecular Biology” to 5 journals sorted under a single category. The average number of journals per category is 56.3 (see Figure 1).

In other words, our research question is different from Boyack et al.’s (2005) effort to generate a new classification using a bottom-up strategy and from that of Moya-Anagón et al. (2007), who employed the ISI subject categories as units of measurement (at p. 2169), and used factor analysis of the cocitation matrix for the validation of their so-called “factor scientograms.”

We wish to question the quality and validity of using the ISI subject categories for mapping purposes. Can these subject classifications be used in further research to demarcate the sciences and perhaps as field delineations, and if so, under what conditions?

As noted above, Moya-Anegón et al. (2007, p. 2173) used factor analysis of the cocitation matrix of the 218 categories of the Science Citation Index and Social Science Citation Index (2002) combined for the validation of their visualizations. These authors stated that a scree test had led them to the choice of 16 factors.

We approach the problem first factor-analytically using the asymmetrical matrix of aggregated citations among categories, and will subsequently try to map the sciences hierarchically top-down insofar as our results show that it is legitimate for us to do so.

The data was harvested from the CD-ROM version of the Journal Citation Reports of the Science Citation Index 2006. As indicated above, 175 subject categories are used. Three categories (“Psychology, biological,” “Psychology, experimental,” and “Transportation”) are no longer used as classifiers in the citing dimension, but four journals are still indicated with these three categories in the cited dimension. Thus, we work with 172 citing and 175 cited categories.


The matrix, accordingly, contains two structures: a cited and a citing one. Salton’s cosine was used for normalization in both the cited and citing directions (Ahlgren, Jarneving, & Rousseau, 2003; Salton & McGill, 1983).


Pajek is used for the visualizations (Batagelj & Mrvar, 2007) and SPSS (v15) for the factor analysis. The threshold for the visualizations is pragmatically set at cosine≥0.2. Visualizations are based on the algorithm of Kamada and Kawai (1989).



Let us focus on the structure in the citing dimension because this structure is actively maintained by the indexing service and is therefore current.... The factor loadings for the 172 categories on the 14 factors in the citing dimension are provided in the Appendix. They can be interpreted in terms of disciplines, such as physics, chemistry, clinical medicine, neurosciences, engineering, and ecology.

The factors in the cited dimension can be designated using precisely the same disciplinary classifications, but their rank order (that is, the percentage of variance explained by each factor) is different (Table 2). Out of the 172 categories, 154 (89%) fall in the same factor in both the citing and cited projections. ... The strong overlap between the results of the factor analysis in the cited and the citing dimension (Table 2) suggests that the matrix is nearly decomposable in terms of central tendencies.

Our results are consistent with previously reported maps (Boyack et al. 2005; Boyack & Klavans, 2007; Moya-Anagón et al., 2007), but we chose to exclude the social sciences.

沒有留言:

張貼留言