看見網絡: Chen, C.-M. (2008), Classification of scientific networks using aggregated journal-journal citation relations in the Journal Citation Reports. Journal of the American Society for Information Science and Technology, 59(14), 2296

Chen, C.-M. (2008), Classification of scientific networks using aggregated journal-journal citation relations in the Journal Citation Reports. Journal of the American Society for Information Science and Technology, 59(14), 2296–2304. doi: 10.1002/asi.20935

本研究利用親似傳導法(affinity propagation method, Frey & Dueck, 2007)，以彙整的期刊對期刊引用關係(aggregated journal-journal citation relation)，對期刊間由相似的引用樣式(citation patterns)形成的科學網路進行分類。過去已有許多以期刊對期刊引用資料進行分析的研究，例如Pudovkin and Garfield (2002) 根據引用資料，發展關係係數(relatedness factor)來發現意義相關的期刊(semantically related journals)；Doreian and Fararo (1985)發現網路上結構對等(structure equivalence)的期刊；Leydesdorff and Cozzens (1993)利用主成分分析(principal component analysis)取得科學網路的特徵向量(eigenvectors)。本研究所使用的引用資料包括2001年的SCI(共使用1905種期刊、426065篇文章以及13798138個引用資料)以及2005年的SSCI(共使用1578種期刊、66051篇文章以及2437389個引用資料)。本研究所使用的親似傳導法利用s(i,j)= −d_ij測量期刊j可以做為期刊i所在類別代表期刊的適合性，而d_ij的計算為

cs_ij則是期刊間的引用樣式(citation pattern)的相似性：

親似傳導法反覆計算期刊間的兩種數值估算期刊間的代表性，r(i, j)反應期刊j能否代表期刊i的適合程度，

a(i, j)則反應期刊i是否應選擇期刊j作為代表的適合程度，

對期刊i來說，最大的a(i, j) + r(i, j)便指明哪一個期刊j可以代表它。

根據分類的結果，一個分類的專指性(specificity)可以從所有的成員期刊到此分類的代表期刊的平均距離來表示，愈小的平均距離表示這個分類具有愈高的專指性。成員之間的相關性(relatedness of category members)則以所有的期刊之間的平均距離來表示，愈小表示成員間彼此愈靠近。
本研究對SSCI期刊的分類結果共分為23個分類，每一個分類大致符合SSCI的主題分類，然而分類裡所有成員的平均距離比SSCI相對應的分類還要小。

Traditional classification methods (Glänzel & Schubert, 2003) are based on subjective analysis, whose output could vary from one person to another. In other words, these methods are more artistic than scientific.

On the other hand, a quantitative approach to classification is usually constructed based on a set of simple rules, which offers robust classification schemes that do not rely on human interference.

The aggregated journal-journal (J-J) citation data in JCR contain extensive information about interjournal citations, which could provide an understanding of the interaction among various scientific disciplines.

Based on JCR citation data, Pudovkin and Garfield (2002) have used an intuitive criterion (relatedness factor) for finding semantically related journals.

To avoid subjective analysis, various quantitative methods have been proposed to construct a robust classification system of scientific journals using JCR citation information.

A variety of techniques for analyzing J-J citation relationships have been reported in the literature to cluster scientific journals (Doreian & Fararo, 1985; Leydesdorff, 1986; Tijssen, De Leeuw, & Van Raan, 1987).

For example, by applying the notion of structure equivalence to analyze a small set of journals, Doreian and Fararo (1985) have delineated a set of blocks, which contain journals. These blocks have a very close correspondence to a categorization of the journals based on their aims and objectives.

More recently Leydesdorff and Cozzens (1993) have developed an optimization procedure that stabilizes approximated eigenvectors of the scientific network from principal component analysis as representations of clusters. This principal component analysis has been further extended to rotated component analysis (Leydesdorff, 2006; Leydesdorff & Cozzens, 1993), which enables one to focus on specific subsets with internal coherence.

An alternative method of cocitation clustering has been investigated in constructing a World Atlas of Sciences for ISI (Garfield, Malin, & Small, 1975; Leydesdorff, 1987; Small, 1999).

In this article, I propose a quantitative approach to classify the scientific network in terms of aggregated J-J citation relations of JCR using the affinity propagation method (Frey & Dueck, 2007).

The method used by ISI in establishing journal categories for JCR is a heuristic approach, in which the journal categories have been manually developed initially. The assignment of journals was based upon a visual examination of all relevant citation data.

As the number of journals in a category grew, subdivisions of the category were then established subjectively.

Although this is a useful approach, a more robust, convenient, and automatic classification scheme is desired.

The citation data analyzed include the SCI of 2001 and the SSCI of 2005, which are directly computed from the extraction of the CD version of the ISI database.

There are 2,195 journals of impact factor greater than 1 in the 2001 SCI. After removing 290 journals that did not publish any articles in 2001, there are 1,905 journals left in our data set, which contains 426,065 articles and 13,798,138 citations.

For the 2005 SSCI, there are 1,583 journals in the database, of which 1,578 journals have nonzero contents. The SSCI database contains 66,051 articles and 2,437,389 citations.

In principle, the dissimilarity between two journals can be visualized by the differences in their citation patterns. In other words, the citation pattern of each journal is represented by a normalized citation vector, and these vectors form a rescaled citation matrix. The dissimilarity (or similarity) in citation between two journals is related to the scalar product of their citation vectors.

For mapping or visualization, coefficients of similarity are converted into distances such that closely related journals are short distances apart and remotely related journals are long distances apart.

The affinity propagation method takes as input a collection of similarities between journals, where the similarity s(i, j) measures how well journal j is suited to be the representative of a journal category for journal i. Since the goal is to minimize squared error, we set s(i, j) = −d_ij.

There are two types of messages exchanged between journals, including the responsibility r(i, j), which is sent from journal i to candidate representative journal (RJ) j, and the availability a(i, j), which is sent from candidate representative journal j to journal i. Here the responsibility reflects the accumulated evidence for how well-suited journal j is to serve as the representative for journal i, and the availability shows the accumulated evidence for how appropriate it would be for journal i to choose journal j as its representative.

Taking into account other potential representative journals for journal i, the responsibility is computed iteratively as

where the initial value of a(i, j) is set to zero in the first iteration. Similarly, taking into account the support from other journals that journal j should be a representative, the availability is updated by gathering evidence from journals as to whether each candidate representative would make a good representative journal:

To reflect accumulated evidence that journal j is a representative based on the positive responsibilities sent to candidate representative j from other journals, the self-availability is updated as

During the process of affinity propagation, the sum of availability and responsibility can be used to identify the representative journal of emerging journal categories. In other words, for any journal i, the value of j that maximizes a(i, j) + r(i, j) identifies that journal j is its representative.

In our classifications, the level of specificity of a category can be found by looking at its value of D_RJ (the average distance of members of a category to its representative journal), and relatedness of category members is implied by the value of D_J-J (the average J-J distance within a category).

To demonstrate the applicability of the affinity propagation method in clustering a complete data set of journals, we first apply it to cluster journals in the 2005 SSCI database.

Here the cutoff parameter t is set to 0.0001, implying that the maximal value of D_J-J (D_J-Jmax) is 100. This choice of t is quite reasonable since the probability distribution (PD), or normalized histogram (bin size is 1), of D_J-J in the unclustered SSCI journal database is mostly between 0 and 30, as shown in Figure 1.

With a choice of D_J-Jmax = 100, the distance between unrelated journals is much larger than that between related journals. In other words, for any journal category, unrelated journals will not be located in the vicinity of its members (each journal is considered as a point in a high-dimensional space). Thus only correlated journals will be grouped together by the affinity propagation method.

However, if D_J-Jmax is too close to 30, the positions of unrelated journals are not well separated and the distortion to the journal positions due to the introduction of the cutoff would affect the clustering of journals.

For the predicted SSCI classification, only those J-J distances within the same category are considered in calculating its PD of D_J-J.

In Figure 1, there are two peaks observed from the statistical curves of PD in D_J-J, where the first peak shows the relatedness between journals within the database (or categories), while the second peak at D_J-J = 100 indicates the irrelevance between journals within the database (or categories).

For the predicted SSCI classification, clearly its first peak in the PD of D_J-J is much more prominent and the peak width is much more narrow than that of the unclustered SSCI database.

On the other hand, its second peak of irrelevance is much smaller than that of the unclustered database.

The probability distribution of the first peak is found to decrease exponentially with D_J-J, i.e., P = P₀ exp[−(D_J-J − d₀)/ Δ], where P₀ is the peak value, d₀ is the peak position, and Δ is the decay width. By fitting the statistical data, we find that d₀ = 4 and Δ = 9.08 for the unclustered SSCI curve, while d₀ = 2 and Δ = 1.72 for the clustered SSCI curve.

The entire journal set of SSCI is decomposed into 23 journal categories.

The relatedness of journals within a category can be seen as the average value of D_J-J within the category, and the specificity of a category is related to the average distance of category members to its RJ.

For any category, a smaller value of D_RJ implies a higher level of specificity, and a smaller value of D_J-J implies that journals within a category are more closely related to each other.

In general most categories in our classification scheme have a corresponding category in the ISI classification scheme, and their value of D_J-J seems to be smaller than that of their counterpart in the ISI classification scheme.

When a larger value of the cutoff parameter is used, the maximal distance of D_J-J becomes smaller. ... Since the high-dimensional J-J distance space is now approximated by a high-dimensional sphere of smaller radius, the resolution in clustering journals is higher in this case. Thus the SCI database is expected to be decomposed into more clusters for t = 10⁻³, compared to the case of t = 10⁻⁴. ... Therefore, from comparing clustering results with different values of the cutoff parameter, the relationship among various disciplines can be revealed.

Our results demonstrate that the affinity propagation method can provide a reasonable classification scheme for either a complete database or an incomplete database. This method does not need the number of categories or their size as an input.

Distance between journals is calculated from the similarity of their annual citation patterns with a cutoff parameter to restrain the maximal distance.

Different values of the cutoff parameter lead to different levels of resolution in the classification of journal network. A more coarse-grained classification is obtained when a smaller value of the cutoff parameter (or a larger maximal J-J distance) is used.

We note that, unlike the ISI classification scheme, which allows overlap in the content of journal categories by subjective decisions, each journal uniquely belongs to a category in our classification scheme.

看見網絡

2015年4月6日星期一

Chen, C.-M. (2008), Classification of scientific networks using aggregated journal-journal citation relations in the Journal Citation Reports. Journal of the American Society for Information Science and Technology, 59(14), 2296–2304. doi: 10.1002/asi.20935

1 則留言:

2015年4月6日 星期一

Chen, C.-M. (2008), Classification of scientific networks using aggregated journal-journal citation relations in the Journal Citation Reports. Journal of the American Society for Information Science and Technology, 59(14), 2296–2304. doi: 10.1002/asi.20935

1 則留言:

2015年4月6日星期一