2014年6月20日 星期五

Porter, A. L., & Rafols, I. (2009). Is science becoming more interdisciplinary? Measuring and mapping six research fields over time. Scientometrics, 81(3), 719-745.

Porter, A. L., & Rafols, I. (2009). Is science becoming more interdisciplinary? Measuring and mapping six research fields over time. Scientometrics, 81(3), 719-745.

Scientometrics

本研究將跨學科研究(interdisciplinary  research)操作化的定義為:由團隊或個人從兩個或以上的知識體系(bodies of knowledge)或研究實務整合它們的觀點/概念/理論、工具/技術以及資訊/資料的一種研究模式,也就是這類研究其知識來源具有多樣性,然後分析六個研究領域在1975年和2005年的跨學科程度變化。跨學科指標的計算以引用期刊在WoS (Web of Science)上的主題分類(Subject Categories, SCs)為基礎,並且配合科學映射圖(science maps)表現科學產出在主題分類上的分散情形(dispersion)。整個分析的流程包含五個步驟:


一、將跨學科性的測量操作化。
二、建構主題分類間的相似性矩陣,做為計算整合性指標之用。
三、對相似性矩陣進行因素分析(factor analysis),將主題分類分群成為巨型學科(macro-disciplines)以便進行視覺化。
四、產生科學映射圖。
五、選取六個主題分類,做為目前的基準與未來探索。


針對操作化跨學科性的測量有幾點必須說明:首先根據Stirling的看法,探索跨學科性時,需要針對引用的學科數量、引用在學科間的分布情形、類別的相似性等面向進行研究[RAFOLS & MEYER, FORTHCOMING]。其次,本研究認為知識整合是一種認知範疇(an epistemic category),因此跨學科性指標應該建立在研究結果的內容,而不是團隊的成員,部門組織或合作上。最後,跨學科性的測量通常以引用文獻的期刊所屬的主題分類為基礎,但書目計量學的研究社群已經提出主題分類有一些問題,例如期刊叢集的研究指出僅有約50%的叢集結果和主題分類相近[BOYACK & AL., 2005; (BOYACK, personal communication, 14 September 2008)],根據引用網路得到的分類結果和主題分類之間也沒有很好的符合[LEYDESDORFF, 2006, P. 611]。但這些結果僅對科學映射圖產生有限度的影響,並且在測量整合性上,主題分類目前還是最被廣泛使用的分類資源。

本研究用來測量整合性指標[RAFOLS & MEYER, FORTHCOMING]的公式,由Rao-Stirling提出的多樣性測量方式 [STIRLING, 2007],如下:
此處pi是給定的論文上引用的參考文獻來自主題分類 i 的比例,sij是主題分類 i 和 j 的相似程度,利用cosine測量。由於許多研究 例如[GRUPP, 1990; HAMILTON & AL., 2005, or ADAMS & AL., 2007]都以Shannon或Herfindhal提出的方式測量整合性,Shannon的多樣性測量方式如
Herfindhal的多樣性測量方式如


但這兩種方法都未考慮類別間的不同;反之Rao-Stirling的多樣性則同時考慮類別數量多寡、類別上的分布平衡和類別間的相似性等三個方面。因此本研究比較此一整合性指標與Shannon和Herfindhal多樣性。

本研究以主題分類被引用的次數為資料,對每一對主題分類進行cosine測量這兩個主題分類間的相似性。當兩個主題分類被大部分的論文共同引用時,它們之間便會有很高的相似度;反之,兩個主題分類共同被引用的情形很稀少時,cosine的值接近於零。完成相似性矩陣的建立後,以主成分分析(Principal Components Analysis, PCA)進行因素分析,以最大變異量轉換(Varimax rotation)產生20個因素,將每一主題分類以其具有最高負荷的因素進行歸類。每一個因素對應一個巨型學科,某些無法歸類的主類分類另外歸於一個巨型學科,結果共有21個巨型學科。

然後以主題分類在21個因素上的負荷值為特徵,再以cosine測量主題分類之間的相似性。以Pajek將主題分類之間的相似性映射成網路圖,過濾相似性在0.6以下的連結線,做為科學映射圖。科學映射圖上呈現每一個主題分類、相對的重要性、以及彼此間的關連程度,目的在於在巨型學科間找出特定研究的主體,發現相互關連在時間上的變化以及主要的跨學科關連,更重要的是發現做為知識來源的期刊是來自於密切關連的學科或是跨越完全不同的領域。


本研究選取生物科技與應用微生物學(Biotechnology & Applied Microbiology)、電子電機工程(Engineering, Electrical & Electronic)、數學(Mathematic)、醫學(Medicine – Research & Experimental)、神經科學(Neurosciences)、物理(Physics – Atomic, Molecular & Chemical)等六個主題分類。

研究結果發現:30年間論文的平均作者數、平均參考文獻數和引用的學科數量都有很大幅度的增加,但是從跨學科指標的增加並不大。造成上述現象,可能是由於雖然引用的主題分類數量有明顯的增加,但每篇論文平均引用的參考文獻數量增加地更快,使得在不同主題上的引用比例的實際改變變得不如預期中的重要;另外,許多主題分類的引用較傾向於鄰近的主題分類,但是鄰近區域的主題分類有較高的相似值,對於多樣性的貢獻較低;最後是某些較跨學科研究的領域其測量的整合性已經到達飽和了。從科學映射圖的結果也指出論文引用的分布仍然主要集中於某些鄰近的學科領域。此外,本研究也發現Rao-Stirling的多樣性測量與Herfindhal和Shannon的測量都有很高的相關性,分別為0.91(標準差0.07)及0.88(標準差0.07) 。

Here we investigate how  the degree of interdisciplinarity has changed between 1975 and 2005 over six research domains. ... The results attest to notable changes in research practices over this 30 year period, namely major increases in number of cited disciplines and references per article (both show about 50% growth), and co-authors per article (about 75% growth). However, the new index of 
interdisciplinarity only shows a modest increase (mostly around 5% growth). Science maps hint 
that this is because the distribution of citations of an article remains mainly within neighboring 
disciplinary areas.

We measure how integrative particular research articles are  based on the association of the journals they cite to corresponding Subject Categories  (“SCs”) of the Web of Science (“WoS”)

And, we present a practical way to map  scientific outputs, again based on dispersion across SCs.

This report operationally defined interdisciplinary  research as: 
x a mode of research by teams or individuals that integrates 
x perspectives/concepts/theories and/or 
x tools/techniques and/or 
x information/data 
x from two or more bodies of knowledge or research practice. 

Our approach here is to investigate changes of degree of interdisciplinarity over time  using various established indicators (e.g. number of disciplines cited, percentage of  citations within-field), together with a new indicator developed the NAKFI evaluation  team [PORTER & AL., 2007]: 
Integration – reflecting the diversity of knowledge sources, as shown by the breadth  of references cited by a paper. 

Following Stirling’s heuristic, we have previously argued that in order to explore interdisciplinarity, one needs to investigate multiple aspects, namely: the number of disciplines cited (variety), the distribution of citations among disciplines (balance), and, crucially, how similar or dissimilar these categories are (disparity) [RAFOLS & MEYER, FORTHCOMING]. 

The computation and visualization of the interdisciplinarity measure has taken five  steps, presented consecutively in this section: 
1. Operationalization of an interdisciplinary measure (the Integration index or disciplinary diversity)
2. Construction of a similarity matrix among Subject Categories that is used to compute the Integration index
3. Grouping via factor analysis of the SCs into macro-disciplines using the similarity matrix as a base to facilitate visualization
4. Generating science maps
5. Selection of a bibliometric sample of 6 SCs, to serve as benchmarks here and in future explorations. 

In other words, since knowledge integration is an epistemic category, indicators of interdisciplinarity should be based on the content of the research outcomes rather than on team membership, departmental affiliations, or collaborations (see illustrations in case studies in RAFOLS & MEYER, 2007). 

The bibliometric community has noted that the SCs have some problems. In journal clustering exercises, only about 50% of clusters were found to be closely aligned with SCs [BOYACK & AL., 2005; (BOYACK, personal communication, 14 September 2008)]. Poor matching between SCs and classifications derived from citation networks has also been reported [LEYDESDORFF,
2006, P. 611], but surprisingly the mismatch only has limited effect on the corresponding science maps [RAFOLS & LEYDESDORFF, UNDER REVIEW].

Nonetheless, the SCs offer the most widely available categorization resource that we could ascertain for the purpose of providing an accessible measure of Integration.

As derived in RAFOLS & MEYER [forthcoming], the formula for the Integration index can be expressed as:

where pi is the proportion of references citing the SC i in a given paper. The summation is taken over the cells of the SC x SC matrix. sij is the cosine measure of similarity between SCs i and j (the cosine measure may be understood as a variation of correlation). Here this matrix sij is based on a US national co-citation sample of 30,261 papers from Web of Science as explained below in detail. 

This Integration measure (aka, Rao-Stirling’s diversity) can be compared with Shannon diversity: 

or with Herfindhal’s diversity (the complement of Herfindahl’s concentration):

The power of the Integration index is that it characterizes interdisciplinarity in terms of the diversity of knowledge sources of papers, using a general formulation of diversity [STIRLING, 2007] rather than an ad hoc indicator.

A number of researchers have used these traditional measures of diversity, such as Shannon or Herfindhal, to measure interdisciplinarity [E.G. GRUPP, 1990; HAMILTON & AL., 2005, or ADAMS & AL., 2007]. These measures do not take into account how different the categories are, whereas our Integration measure reveals increased diversity only when added categories are significantly different.

In particular, a broad national sample of articles from WoS is used to create the sij matrix that underlies the metrics used for computing Integration. First we describe the sample used as a basis for the similarity matrix; second, the construction of the matrix.

We combine six separate weeks of all papers in WoS, with one or more authors having a USA address, sampled during 2005–2007, to obtain 30261 articles. This provides a broadly based, yet manageable base sample. We processed the “Cited References” of these abstract records to identify the “Cited SCs.”

Our sample of 30261 WoS articles contains 1,020,528 cited references (an average of 33.7 per article). Of those, our thesauri link 768,440 to a particular Subject Category. Another 28,000 have been checked and assigned to “not being in an SC.”

For our purposes in addressing cited SCs, the list includes a few more than the current set, for a total of 244 SCs. The sample contains 1,114,930 instances of cited SCs.

The 30261 articles, by 244 SCs, described allow for construction of a co-citation similarity matrix, sij, using Salton cosine [SALTON & MCGILL, 1983; AHLGREN & AL., 2003].

The values of sij are high (i.e. closer to one) when SCs i and j are co-cited by a high proportion of articles that cite one or the other. The cosine value approaches zero when two SCs are rarely cited together.

For various purposes and in particular for visualization, it helps to consolidate the narrow research areas of the ISI SCs into larger categories, which we call “macro-disciplines.”

We base our grouping of SCs on a type of factor analysis – Principal Components Analysis (PCA) – following a similar methodology to that developed by LEYDESDORFF & RAFOLS [2008] to cluster SCs into macro-disciplines.

Within VantagePoint, we constructed the matrix of cosine similarities for the 244 cited SCs by 244 cited SCs described in the previous section. ... We explored various factor analysis solutions, eventually adopting a 20-factor solution (Varimax rotation). ... The 21 macro-disciplines reflect this factor solution.

So, to a considerable degree, named sub-disciplines do not fully coalesce within a single macro-discipline. This warns that the evolving research enterprise does not neatly conform to the traditional scholarly disciplines.

These maps present the SCs, their relative importance in size, and how related they are to each other over all science. The main aim of these science maps is to locate particular bodies of research among the macro-disciplines. ... That can help identify changes in degree of interrelationship over time, and key cross-“disciplinary” relationships that might benefit from nurturing. It should also be informative to see whether knowledge sources of a set of publications are coming from research domains that are closely related (little interdisciplinarity) or that span very disparate domains (high interdisciplinarity). 

We then construct a new Salton cosine similarity matrix among SCs using the loadings of each SC on the 21 factors (as discussed in the previous subsection). This matrix is then uploaded into the network analysis software Pajek [BATAGELJ & MVAR, 2008]. In Pajek, the minimum similarity threshold was arbitrarily set to 0.6 (this choice was found to provide a good readability-to-accuracy trade-off) and the SCs were distributed in a 2-D plane according to their similarities, to obtain a base science map.

Since research collaboration is often (and sometimes mistakenly) associated with interdisciplinarity, we examine measures of co-authorship. ... However, within research domain, the number of authors per paper has escalated remarkably, with about 75% average growth. This increase ranges from 48% in Math and 54% in Physics-AMC to 90% in Neurosciences. 

Before turning to Integration scores, we consider the number of distinct SCs that one article cites. ... Table 2 and Figure 4 show a sturdy increase in the breadth of citing in all six of these research domains (about a 50% growth on average). 

Integration scores are tabulated in Table 2 and shown in Figure 5. We see that over time, there is a modest increase in Integration scores and that math researchers are notably less integrative in their citing patterns. However, math has the highest relative growth (39%) whereas other SCs’ growth ranges from 3% to 14% (5% on average). t-tests between the 1975 and 2005 samples show these differences to be highly significant (<.005 for EE, assuming either equal or unequal variances; all others even more highly significant).

Pearson’s correlation between Integration and Herfindhal takes a mean value of 0.91 (standard deviation = 0.07) and between Integration and Shannon, a mean value of 0.88 (standard deviation = 0.07). These high correlations confirm that Integration is very closely associated with traditional diversity indicators – as could be expected by construction.

The main finding is that Integration scores increase over time, but significantly less so than other indicators, such as percentage of single-authored papers, mean authors per paper, and mean number of disciplines per paper.

First, although the number of cited SCs increases significantly, since the average number of references in a paper also shows a quicker increase (see central columns in Table 2), the actual change in the proportions of citation to different SCs is not as important as could be expected.

Second, as we will show in Figures 7 through 10, the citation patterns of a given SC tend to be with SCs in its vicinity. Since these neighboring SCs have high similarity values with the one investigated, their contribution to Integration (to diversity) is smaller than in other indicators. This means that the Integration score “deflates” the diversity recorded by Shannon or Herfindahl because most of the cited SCs are not very different from the SC doing the citing.

This is much easier to convey using science maps that directly show the three aspects of disciplinary diversity, namely:
1. the variety of “disciplines” (i.e., discrete research areas, the SCs, shown by the number of nodes in the map)
2. the balance, or distribution, of disciplines (relative size of nodes)
3. the disparity, or degree of difference, between the disciplines (distance between the nodes)

These maps were created followed the techniques developed in LEYDESDORFF & RAFOLS [2008], in the context of the current interest in science mapping [MOYA-ANEGON & AL., 2004; BOYACK & AL., 2005; MOYA-ANEGON & AL., 2007]. ... In the figures presented in this article, we only label groups of SCs on the basis of macro-disciplines found by factor analysis, as explained in the methodology. 

However, the perspective provided by the Integration score and the science maps suggests that the practice of interdisciplinarity in citations occurs mainly between neighboring SCs and has undergone a much more modest increase (on average only 5%, excluding math).

This is mainly for two reasons: first, although the number of cited SCs has increased, the growth of citations means that the increase in the proportion of citations to new SCs is small; second, the newly cited SCs tend to be in the vicinity of the previous ones – hence they don’t add as much interdisciplinarity as they would if they were very disparate/distant disciplines. Moreover, for already very interdisciplinary SCs, such as Neuroscience, the indicator may have a certain “saturation” effect. 



沒有留言:

張貼留言