Scientometrics
本研究提出六種指標來測量研究機構的跨學科性。最廣義的來說,跨學科性可以視為不同學科某種程度的整合 (Weingart and Stehr 2000; Porter and Rafols 2009; Marcovich and Shinn 2011; Wagner et al. 2011; Rafols et al. 2012),為了將這個想法轉換為量化的指標,本研究認為需要考慮三個問題:
1. 如何定義一個學科
2. 在什麼層次達到整合
3. 學科連結需要到達什麼程度
在學科的定義上,本研究提出三種方式:一、因為是分析CNRS的實驗室,自然可採用CNRS的學科組織(disciplinary organization),包括10個研究所(institutes)以及進一步細分成的40個組(sections);二、如同其他先前的研究,使用WoS(Web of Science)的224種期刊主題分類(Journal Subject Categories, JSCs);三、將文件根據共同的參考文獻,以叢集演算法(clustering algorithms)由下往上地(bottom-up)歸類成認知叢集(cognitive clusters)。在整合的層次,則探討實驗室與論文兩個層級。
以實驗室的跨領域程度來說,較簡單的方式可以定義為:
此處的pi是實驗室的論文在期刊主題分類JSC i上的比例。
除了上述的定義之外,本研究還使用的Stirling’s (2007)方法來表現多樣性的三個不同面向:不同類別的數量(variety)、在各類別上的分布均勻程度(balance)、以及表現類別間的差異(disparity) (Porter and Rafols 2009):
此處的是主題分類JSC i 和 JSC j的相似性,並且此一相似性以cosine測量主題分類間的引用情形得到。(Porter and Rafols 2009).
為了進一步了解實驗室的跨領域多樣性是否在單一論文的認知層次達成,如同上述的情形,計算單一論文的跨領域多樣性時,可以利用下面的方式:
此處的 #pap 是實驗室發表的論文數量。
此外,另兩種指標分別是主流引用外的主題分類比例以及不同機構的人員合作占論文全體比率,分別如下所示:
最後一種指標,先以書目耦合(bibliographic coupling) (Kessler 1963)產生論文之間的關連,計算方式如下:
此處的where #common_refsij 是論文 i 和 j 共同引用的參考文獻數量, #refsi 和 #refsj 分別是論文 i 和 j 包含的參考文獻數量。接下來以書目耦合關連建立論文網路,希望在網路上引用文獻相似的論文會聚集形成叢集。因此,接下來Blondel et al. (2008)的演算法,劃分網路成論文的叢集。整個方法可參見Grauwin and Jensen (2011),結果共劃分成250個叢集。然後以下面的方式計算實驗室在認知叢集上的多樣性
此處的 p_i 和 p_j 分別是實驗室的論文屬於叢集 i 和 j 的比例。
以六種指標計算每一個實驗室的跨學科多樣性後,接下來以主成分分析(Principal Component Analysis, PCA)進行分析,四個主要的成分分別是
1) 實驗室在各種多樣性指標的綜合表現
2) 實驗室連結的學科的認知距離(cognitive distance)
3) 實驗室在實驗室層級或論文層級具有跨學科性
4) 論文發表的期刊具有跨學科的主題分類或是與其他不同機構的實驗室合作。
Interdisciplinarity is as trendy as it is difficult to define. Instead of trying to capture a multidimensional object with a single indicator, we propose six indicators, combining three different operationalizations of a discipline, two levels (article or laboratory) of integration of these disciplines and two measures of interdisciplinary diversity.
Interdisciplinarity means, at the most generic level, some degree of integration of different disciplines (Weingart and Stehr 2000; Porter and Rafols 2009; Marcovich and Shinn 2011; Wagner et al. 2011; Rafols et al. 2012).
To transform this idea into quantitative indicators, we need to answer three questions:
1. How to define a discipline?
2. At what level the integration is achieved?
3. What is the degree of disciplinary linkage achieved?
There are several ways to define a discipline from a scientometrics’ point of view. Since we are dealing with CNRS labs, the most natural would seem to use the disciplinary organization of CNRS in 10 ‘‘institutes’’ and 40 subdisciplinary ‘‘sections’’. A convenient alternative is to use the 224 Journal Subject Categories (JSCs) used by Web of Science (WoS). Finally, instead of using institutionally predefined divisions of science, one could use a more bottom-up definition of ‘‘cognitive clusters’’. To obtain these clusters, we use the roughly 300,000 French articles published between 2007 and 2010 and group them into ‘‘cognitive clusters’’ using clustering algorithms based on shared references.
In this paper, we will use three definitions of ‘‘discipline’’ and two integration levels (laboratory and article) to calculate six partial interdisciplinary indicators.
We adopt Stirling’s (2007) approach to capture the different facets of diversity : ‘variety’, ‘balance’ and ‘disparity’.
‘Variety’ characterizes the number of different categories, ‘balance’ characterizes the evenness of the distribution over these categories and ‘disparity’ characterizes the difference among the categories, usually based on some distance.
A simple indicator of the spread of the disciplines where a laboratory publishes is given by:
where pi is the proportion of articles of the laboratory in JSCi.
As we would like to include the idea of ‘‘distance’’ between disciplines, we calculate the diversity indicator (Stirling 2007; Porter and Rafols 2009) which combines both the spread of the disciplines through the pi and the distance between them.
where sij is the cosine measure of similarity between JSCs i and j. Practically, sij is measured through the citations from publications in JSCsi to publications in JSC j (Porter and Rafols 2009).
To further characterize a lab’s interdisciplinarity, it is useful to introduce an indicator of the interdisciplinarity of single articles, to test whether interdisciplinarity is achieved at this cognitive level.
Specifically, the interdisciplinary diversity of a single article is calculated as:
where pai is the proportion of articles’ references in JSCi.
To quantify the interdisciplinarity of the papers published by a lab, we aggregate the articles’ diversity indicator art_div_corr at the laboratory level by averaging over all the articles published by that laboratory:
where #pap is the number of articles of the lab for which at least one reference was identified.
Then, we choose a threshold to define the most common JSCs for each institute. ... We therefore choose a threshold value of 90 %. ... Then, for each laboratory, we count the percentage of articles outside this 90 % list and normalize by the expected value, i.e. the average value 0.1.
whereare the frequencies of the JSCs that do not belong to the Institute’s JSC main list.
Interdisciplinary collaborations can also be detected by copublications between scientists belonging to different CNRS Institutes. We compute a fifth indicator by calculating the proportion of a lab’s publications that involve authors from other Institutes
where the sum counts the number of articles of the lab involving at least two institutes and
#articles is the total number of articles published by the laboratory.
To build these ‘‘cognitive disciplines’’, we use bibliographic coupling (BC) (Kessler 1963) between the 300,000 papers published by French laboratories in the period 2007–2010 and compiled by the WoS.
where #common_refsij is the number of common references for articles i and j, and #refsi,
#refsj are the numbers of references of articles i and j, respectively.
In comparison to a co-citation link (which is the usual measure of articles’ similarity), BC offers two advantages: it allows to map recent papers (which have not yet been cited) and it deals with all published papers (whether cited or not).
This reinforcement facilitates the partition of the network into meaningful groups of cohesive articles, or clusters. A widely used criterion to measure the quality of a partition is the modularity function (Fortunato and Barthe´lemy 2007), which is roughly is the number of edges ‘inside clusters’ (as opposed to ‘between clusters’), minus the expected number of such edges if the partition were randomly produced. We compute the graph partition using the efficient heuristic algorithm presented in (Blondel et al. 2008). The whole method is described in (Grauwin and Jensen 2011).
Applying this algorithm yields in a partition of French papers into roughly 250 clusters containing more than 100 papers each.
where p_i is and p_j are the proportions of the labs’ papers belonging to clusters i and j respectively.
On average, articles refer to papers from almost 10 different disciplines (9.8 JSC). .... However, when considering those JSC that are used in more than 10 % of the reference list, this average drops to 2.7. This means that, on average, an article spreads its references on 3 main JSCs and 7 additional which benefit from roughly a single reference.
An average laboratory publishes in journals belonging to 34 different JSCs ...
PCA1: combined interdisciplinarity The main axis represents a combination of the various interdisciplinarity indicators.
PCA2: short or long cognitive distance This axis distinguishes those labs that connect distant or nearby disciplines.
PCA3: article or laboratory interdisciplinarity This axis distinguishes labs that achieve interdisciplinarity either at the laboratory or article level.
PCA4: diversity of publications’ JSCs or diversity of collaborations This axis distinguishes labs that publish in journals belonging to different JSCs (high lab_jsc_bal) from labs that co-publish with labs from different CNRS Institutes (high lab_inst_cop_bal).
We have computed the six indicators for the 680 laboratories which have published more than 50 papers over 2007–2010. To allow comparisons and statistical analysis, since the absolute values have no intrinsic meaning, we have scaled all the values to achieve an average value of 0 and a variance of 1. We then carried out a principal component analysis of the (680 9 6) matrix using the free software R (www.r-project.org/). More precisely, we used prcomp from the ‘stats’ package, without any axes rotation.
First, let us note that using the first four PCA axes gives an overall view about the interdisciplinarity practices of each lab. This view has been compared to expert knowledge, namely scientists working in those labs or scientific advisors from CNRS. This comparison, carried out for about 20 different labs from all the disciplines, suggests that these indicators characterize interdisciplinarity
in a meaningful way.
A major drawback of our method is that we cannot distinguish real interdisciplinary collaborations, giving rise to new concepts or to a coherent new scientific field, from simple pluridisciplinary practices that merely juxtapose different disciplines, as when historians use characterizing tools from physics. It seems difficult to learn much about the cognitive dimensions of interdisciplinarity from an automatic analysis of metadata of the papers.
We adopt Stirling’s (2007) approach to capture the different facets of diversity : ‘variety’, ‘balance’ and ‘disparity’.
‘Variety’ characterizes the number of different categories, ‘balance’ characterizes the evenness of the distribution over these categories and ‘disparity’ characterizes the difference among the categories, usually based on some distance.
A simple indicator of the spread of the disciplines where a laboratory publishes is given by:
where pi is the proportion of articles of the laboratory in JSCi.
As we would like to include the idea of ‘‘distance’’ between disciplines, we calculate the diversity indicator (Stirling 2007; Porter and Rafols 2009) which combines both the spread of the disciplines through the pi and the distance between them.
where sij is the cosine measure of similarity between JSCs i and j. Practically, sij is measured through the citations from publications in JSCsi to publications in JSC j (Porter and Rafols 2009).
To further characterize a lab’s interdisciplinarity, it is useful to introduce an indicator of the interdisciplinarity of single articles, to test whether interdisciplinarity is achieved at this cognitive level.
Specifically, the interdisciplinary diversity of a single article is calculated as:
where pai is the proportion of articles’ references in JSCi.
To quantify the interdisciplinarity of the papers published by a lab, we aggregate the articles’ diversity indicator art_div_corr at the laboratory level by averaging over all the articles published by that laboratory:
where #pap is the number of articles of the lab for which at least one reference was identified.
Then, we choose a threshold to define the most common JSCs for each institute. ... We therefore choose a threshold value of 90 %. ... Then, for each laboratory, we count the percentage of articles outside this 90 % list and normalize by the expected value, i.e. the average value 0.1.
whereare the frequencies of the JSCs that do not belong to the Institute’s JSC main list.
Interdisciplinary collaborations can also be detected by copublications between scientists belonging to different CNRS Institutes. We compute a fifth indicator by calculating the proportion of a lab’s publications that involve authors from other Institutes
where the sum counts the number of articles of the lab involving at least two institutes and
#articles is the total number of articles published by the laboratory.
To build these ‘‘cognitive disciplines’’, we use bibliographic coupling (BC) (Kessler 1963) between the 300,000 papers published by French laboratories in the period 2007–2010 and compiled by the WoS.
where #common_refsij is the number of common references for articles i and j, and #refsi,
#refsj are the numbers of references of articles i and j, respectively.
In comparison to a co-citation link (which is the usual measure of articles’ similarity), BC offers two advantages: it allows to map recent papers (which have not yet been cited) and it deals with all published papers (whether cited or not).
This reinforcement facilitates the partition of the network into meaningful groups of cohesive articles, or clusters. A widely used criterion to measure the quality of a partition is the modularity function (Fortunato and Barthe´lemy 2007), which is roughly is the number of edges ‘inside clusters’ (as opposed to ‘between clusters’), minus the expected number of such edges if the partition were randomly produced. We compute the graph partition using the efficient heuristic algorithm presented in (Blondel et al. 2008). The whole method is described in (Grauwin and Jensen 2011).
Applying this algorithm yields in a partition of French papers into roughly 250 clusters containing more than 100 papers each.
where p_i is and p_j are the proportions of the labs’ papers belonging to clusters i and j respectively.
On average, articles refer to papers from almost 10 different disciplines (9.8 JSC). .... However, when considering those JSC that are used in more than 10 % of the reference list, this average drops to 2.7. This means that, on average, an article spreads its references on 3 main JSCs and 7 additional which benefit from roughly a single reference.
An average laboratory publishes in journals belonging to 34 different JSCs ...
PCA1: combined interdisciplinarity The main axis represents a combination of the various interdisciplinarity indicators.
PCA2: short or long cognitive distance This axis distinguishes those labs that connect distant or nearby disciplines.
PCA3: article or laboratory interdisciplinarity This axis distinguishes labs that achieve interdisciplinarity either at the laboratory or article level.
PCA4: diversity of publications’ JSCs or diversity of collaborations This axis distinguishes labs that publish in journals belonging to different JSCs (high lab_jsc_bal) from labs that co-publish with labs from different CNRS Institutes (high lab_inst_cop_bal).
We have computed the six indicators for the 680 laboratories which have published more than 50 papers over 2007–2010. To allow comparisons and statistical analysis, since the absolute values have no intrinsic meaning, we have scaled all the values to achieve an average value of 0 and a variance of 1. We then carried out a principal component analysis of the (680 9 6) matrix using the free software R (www.r-project.org/). More precisely, we used prcomp from the ‘stats’ package, without any axes rotation.
First, let us note that using the first four PCA axes gives an overall view about the interdisciplinarity practices of each lab. This view has been compared to expert knowledge, namely scientists working in those labs or scientific advisors from CNRS. This comparison, carried out for about 20 different labs from all the disciplines, suggests that these indicators characterize interdisciplinarity
in a meaningful way.
A major drawback of our method is that we cannot distinguish real interdisciplinary collaborations, giving rise to new concepts or to a coherent new scientific field, from simple pluridisciplinary practices that merely juxtapose different disciplines, as when historians use characterizing tools from physics. It seems difficult to learn much about the cognitive dimensions of interdisciplinarity from an automatic analysis of metadata of the papers.
沒有留言:
張貼留言