2014年9月15日 星期一

Bonnevie-Nebelong, E. (2006). Methods for journal evaluation: journal citation identity, journal citation image and internationalization. Scientometrics, 66(2), 411-424.

Bonnevie-Nebelong, E. (2006). Methods for journal evaluation: journal citation identity, journal citation image and internationalization. Scientometrics, 66(2), 411-424.

Scientometrics

本研究以引用分析對Journal of Documentation (J DOC) 進行評估,並且與JASIST和JIS進行比較。所使用的引用分析方法包括三個方面:以引用的參考文獻為主的期刊引用認同(journal citation identity)、以被引用的情形為主的期刊引證形象(journal citation image)和以出版品本身為主的國際化(internationalisation)。

在期刊引用認同方面有兩種指標。第一種指標是引用對被引用者比(citations/citee-ratio),計算方式是分析範圍內所有參考文獻數除以參考文獻上出現的期刊種類,如果這個數值愈低,表示出現許多不同種類的期刊,也就是使用的期刊具有多元性(diversity)。另一個指標是自我引用(self-citations),用來測量期刊在科學領域內的獨立性(isolation),如果自我引用的程度低表示在科學領域內的影響力高。自我引用指標的測量包括引用文獻中來自本身期刊的比例(self-citing)和期刊被引用的情形下來自本身的比例(self-cited),前者是期刊引用認同的一部份,而後者則屬於期刊引證形象。

期刊引證形象也包含兩種指標。第一種指標是新期刊擴散因素(new journal diffusion factor),此一指標分析該期刊時間範圍內每一篇論文平均被引用的期刊種類,代表該期刊的想法出口情形(export of ideas)、跨領域性(transdisciplinarity)以及專殊化(specialisation)程度。另一只標示該期刊的共被引期刊,根據共被引情形以及共被引期刊的期刊影響因素(Journal Impact factor)來加以描述。

國際化是測量出版品以及引用期刊論文的作者地區。

各種分析方法與指標整理為Table 1。



首先,引用對被引用者比的結果如Figure 1。另外,1990到2003年的平均引用對被引用者比,JDOC為1.50,JASIST為1.88,JIS則為1.44。較低的引用對被引用者比表示引用的文獻裡重複的期刊較多種,代表這份期刊有較多元的科學基礎(scientific base)。從結果上看來,JDOC比JASIST的科學基礎多元性較高,但較JIS來得低。

JDOC比JASIST和JIS的文章有較高的比例是書評(boo review),這使得JDOC的參考文獻數較少,因為書評平均只有1.6到2筆參考文獻。

在1990到2003年間,JDOC、JASIST和JIS等三種期刊引用本身的比例都有下降的趨勢,表示這三種期刊愈來愈不孤立,測量期刊被引用的情形,則可發現JDOC與JIS來自期刊本身的比例則較低,表示它們在這個領域的能見度(visibility)較高。另外,JIS在從1979年開始的前十年引用來自期刊本身的比例較高,則說明了這個期刊在當時為在領域邊緣的新期刊。

JDOC的新期刊擴散因素比其他兩種期刊稍大,並且有往上的趨勢。

經常與JDOC共同引用的前十種期刊如Table 3所示。期刊共被引的相似度以Jaccard Similarity測量。

JDOC上論文作者的地區分布如Figure 10。主要的作者來自西歐地區,並逐漸增加。



引用JDOC論文的作者地區分布則如Figure 11。以北美地區的作者引用最多,但西歐地區則逐漸增加。



The Journal Citation Identity is a reference analysis. It is measured by looking into  the referencing style of the publishing authors. What is their combined citations/(journal) citee-ratio? This means that the total number of references in the journal must be calculated, year-by-year or all years taken together. The result of this is divided with the number of different journals present in the set of references. If the set contains many different journals, the ratio will be lower. Consequently a low average signifies a greater diversity in the use of journals among the authors as part of their scientific base, and thus a wider horizon.

Self-citations are part of the Journal Citation Identity as well as the Journal Citation Image, depending on the perspective. ... They are indicators of the style of a journal. Many self-citations among the references may signify isolation of the journal in the scientific domain (high rate). A low rate of self-citations may indicate a high level of influence in the scientific community.

The Journal Citation Image is based on citation analyses of two types: the New Journal Diffusion Factor (N JDF) and journal co-citation analysis.

The New Journal Diffusion Factor was proposed by Frandsen, and is inspired by Rowlands’ diffusion factor. It measures breadth by number of citing journals per published article. N JDF is the average number of different journals that an average article is cited by within a given time window. The result of this tells about the scientific style and about breadth, export of ideas, transdisciplinarity and degree of specialisation of a journal. N JDF is tested for JDOC in a time perspective.

The Journal Citation Image “the White way” means to do a co-citation journal-by-journal analysis and interpret the result in a qualitative manner. It is thus a means to evaluate a journal by the journals co-cited with the journal in question. The co-cited journals are displayed in a list ranked by frequency of co-incidences, the number of citations for each co-cited journal taken into consideration by application of the jaccard calculations. Also the Journal Impact factor (JIF) is used to evaluate the co-cited journals. The co-cited journals then function as image-makers of the journal in question.

Internationalisation is measured by looking into the geographic locations of both publishing and cited authors of the JDOC.

A high citation/citee ratio means that the journal has many recited journals among its references. A low ratio signifies less journal re-citations and thus a greater diversity of journals as part of the scientific base and a wider horizon among authors.

Journal self-citations. Journal self-citations can be analysed from two perspectives, by self-citing rate and by self-cited rate. The first mentioned is part of the citation identity, the second one is part of the self-image, but the two types of self-citations are treated together here for practical reason.

The three journals all show decreasing self-citing rates during the years 1980–2003. This may signify a tendency towards less isolation of the field.

2014年9月10日 星期三

Waltman, L., Yan, E., & van Eck, N. J. (2011). A recursive field-normalized bibliometric performance indicator: An application to the field of library and information science. Scientometrics, 89(1), 301-314.

Waltman, L., Yan, E., & van Eck, N. J. (2011). A recursive field-normalized bibliometric performance indicator: An application to the field of library and information science. Scientometrics, 89(1), 301-314.

在利用引用做為研究成效測量指標的基礎時,一般有兩種方法,一種是根據領域分類系統(field classification scheme)使引用次數正規化,另一為遞迴式的引用加權(recursive citation weighting)。前者認為不同領域有不同的引用密度(每一出版品上引用的平均數目),因此需要進行調整。在這種方法裡,調整的方式有兩種:一種是根據出版品所指定的領域進行調整,另一種則是從出版品引用的參考文獻數進行調整,這種方式又稱為來源正規化(source normalization)。後者的遞迴式指標則是認為從有影響力的出版品、有聲譽的期刊以及知名作者處來的引用較為有價值。

過去的研究,根據它們是領域分類系統或是來源正規化已是是否為遞迴式的加權機制分為下面的四種:


可以發現目前並沒有根據分類系統正規化同時採用遞迴式機制的方法,因此本研究便式提出一個根據分類系統正規化同時採用遞迴式機制的引用評估方法。

本研究以圖書資訊學為範例,分析這個領域的期刊以及研究機構的引用影響力。在這裡,圖書資訊學領域是以Journal of the American Society for Information Science and Technology (JASIST) 為種子期刊,以共同引用為基礎,搜尋和JASIST有最強關連的期刊,同時這些期刊也必須被歸類在並且在WoS的資訊科學和圖書館學(information science & library science)主題分類下,結果共有47種期刊。連同JASIST在內的48種期刊便做為圖書資訊學領域的代表。並且針對這些期刊利用它們之間的書目耦合(bibliographic coupling)資料以及VOS叢集演算法歸類成三群,包括圖書館學(Library Science)、資訊科學(Information Science)以及科學計量學(Scientometrics),期刊的題名與它們的歸類情形,如下表所示:


選取圖書資訊學領域48種期刊從2000到2009年發表的類型為Article或Review的文章進行分析,共12202篇。


首先將圖書資訊學全體視為是一個領域。TABLE 4是根據MNCS評估指標的前十種圖書資訊學期刊,將α分別設為1(也就是沒有遞迴式的機制)及20(遞迴的結果收斂)。在α=1的情形,前十種期刊主要為資訊科學及資訊計量學,圖書館學僅占有三種(4, 8與10)。在α=20時,前十種期刊則幾乎是資訊科學及科學計量學,圖書館學僅有排名第9的一種。同樣以MNCS評估指標來評估研究機構時,也可以發現以科學計量學為主要研究項目的機構在α=20時,獲得更好的排名。


當將圖書資訊學分為三個次領域計算時,從TABLE 6的前十種圖書資訊學期刊,可以看到三個次領域的分布已經比較平衡。


scientometrics

Two commonly used ideas in the development of citation-based research performance indicators are the idea of normalizing citation counts based on a field classification scheme and the idea of recursive citation weighing (like in PageRank-inspired indicators).

Our empirical analysis shows that the proposed indicator is highly sensitive to the field classification scheme that is used. The indicator also has a strong tendency to reinforce biases caused by the classification scheme.

One stream of research focuses on the development of indicators that aim to correct for the fact that the density of citations (i.e., the average number of citations per publication) differs among fields.

One approach is to normalize citation counts for field differences based on a classification scheme that assigns publications to fields (e.g., Braun and Gla¨nzel 1990; Moed et al. 1995; Waltman et al. 2011). The other approach is to normalize citation counts based on the number of references in citing publications or citing journals (e.g., Moed 2010; Zitt and Small 2008). The latter approach, which is sometimes referred to as source normalization (Moed 2010), does not need a field classification scheme.

A second stream of research focuses on the development of recursive indicators, typically inspired by the well-known PageRank algorithm (Brin and Page 1998). ...  The underlying idea is that a citation from an influential publication, a prestigious journal, or a renowned author should be regarded as more valuable than a citation from an insignificant publication, an obscure journal, or an unknown author.

It is sometimes argued that non-recursive indicators measure popularity while recursive indicators measure prestige (e.g., Bollen et al. 2006; Yan and Ding 2010).

To test our recursive MNCS indicator, we use the indicator to study the citation impact of journals and research institutes in the field of library and information science (LIS).

We focus on the period from 2000 to 2009. Our analysis is based on data from the Web of Science database.

We first needed to delineate the LIS field. We used the Journal of the American Society for Information Science and Technology (JASIST) as the ‘seed’ journal for our delineation. We decided to select the 47 journals that, based on co-citation data, are most strongly related with JASIST. Only journals in the Web of Science subject category Information Science & Library Science were considered. JASIST together with the 47 selected journals constituted our delineation of the LIS field.

From the journals within our delineation, we selected all 12,202 publications in the period 2000–2009 that are of the document type ‘article’ or ‘review’.

We first collected bibliographic coupling data for the 48 journals in our analysis. Based on the bibliographic coupling data, we created a clustering of the journals. The VOS clustering technique
(Waltman et al. 2010), available in the VOSviewer software (Van Eck and Waltman 2010),
was used for this purpose. We tried out different numbers of clusters. We found that a solution with three clusters yielded the most satisfactory interpretation in terms of well-known subfields of the LIS field. We therefore decided to use this solution. The three clusters can roughly be interpreted as follows. The largest cluster (27 journals) deals with library science, the smallest cluster (7 journals) deals with scientometrics, and the third cluster (14 journals) deals with general information science topics.

We first consider the case of a single integrated LIS field. The recursive MNCS indicator is said to have converged for a certain α if there is virtually no difference between values of the αth-order MNCS indicator and values of the (α + 1)th-order MNCS indicator. For our data, convergence of the recursive MNCS indicator can be observed for α = 20. In our analysis, our main focus therefore is on comparing the first-order MNCS indicator (i.e., the ordinary non-recursive MNCS indicator) with the 20th-order MNCS indicator.

In the case of the first-order MNCS indicator, the top 10 consists of journals from all three subfields. However, journals from the information science and scientometrics subfields seem to slightly dominate journals from the library science subfield.

Let’s now turn to the top 10 journals according to the 20th-order MNCS indicator. This top 10 provides a much more extreme picture. The top 10 is now almost completely dominated by information science and scientometrics journals. There is only one library science journal left, at rank 9.

The top 10 institutes according to both the first-order MNCS indicator and the 20th-order MNCS indicator are listed in Table 5. Comparing the results of the two MNCS indicators, it is clear that institutes which are mainly active in the scientometrics subfield benefit a lot from the use of a higher-order MNCS indicator.

In Table 6, the top 10 journals according to both the first-order MNCS indicator and the 20th-order MNCS indicator is shown. ...  Comparing Table 6 with Table 4, it can be seen that library science journals now play a much more prominent role, both in the case of the first-order MNCS indicator and in the case of the 20th-order MNCS indicator. As a consequence, the top 10 journals now looks much more balanced for both MNCS indicators.

2014年9月9日 星期二

Åström, F. (2007). Changes in the LIS research front: Time‐sliced cocitation analyses of LIS journal articles, 1990–2004. Journal of the American Society for Information Science and Technology, 58(7), 947-957.

Åström, F. (2007). Changes in the LIS research front: Time‐sliced cocitation analyses of LIS journal articles, 1990–2004. Journal of the American Society for Information Science and Technology, 58(7), 947-957.

scientometrics

本研究利用論文間的共被引分析探討1990到2004年間圖書資訊學(LIS)的研究前沿(research front)的改變,了解這個學科目前的處境與發展趨勢。分析資料為21種LIS期刊。將同時間內具有影響力的共被引文章定義為研究前沿(research fronts),並且分為三個5年期間,分析領域的改變。研究結果發現LIS由兩個不同研究領域構成的穩定結構:資訊計量學(informetrics)和資訊搜尋與檢索(information seeking and retrieval),由於分享研究興趣與方法,資訊檢索與資訊計量學有靠近的傾向。而網路為主的研究成為資訊計量學和資訊搜尋與檢索的主要研究則是這個領域的主要變化。

本研究採用的期刊來源為JCR (2003) 的Information Science & Library Science分類下的55種期刊。去除主要是被非LIS期刊引用的期刊以及評論性或商業性期刊後,選擇1990到2004年間有出版的期刊,如下表共21種。

論文的總數為13605筆,從中選取最高被引用的論文,建立共被引次數矩陣。以多維尺度演算法(multidimensional scaling algorithm, MDS)進行處理。
首先是研究基礎(research base)部分,從13605筆論文資料的221586次引用(150145篇參考文獻)中,選取被引用超過50次的文獻,共66筆進行分析。其共被引映射圖如FIG 1.:

與先前研究一致,圖書資訊學在圖形上分為兩個區域,圖形上半部為資訊搜尋與檢索相關論文,下半部則為資訊計量學文獻,此一結果和 Persson (1994) 與 White & McCain (1998)等研究相符合。另外在資訊計量學文獻右邊,還有一群文獻形成網路計量學(webometrics)叢集。網路計量學是利用連結、引用與叢集等資訊計量學方法進行網路本質與特性的分析。

資訊搜尋與檢索從早期的系統導向資訊檢索(systems-oriented information retrieval)發展到使用者-系統互動研究(user-system interaction studies)和資訊行為(information behavior)。

除了網路計量學以外,資訊計量學以書目計量映射(bibliometric mapping)為中心,周圍的部分是書目計量分布(bibliometric distributions)。

為了進一步了解與核對共被引分析的結果,將共被引資料輸入叢集分析。叢集分析所產生的8個叢集符合映射圖的結構,各叢集如TABLE 2。
從TABLE 2各叢集出版年度的中位數,可以將八個叢集分為四個時期:第一個時期圖書資訊學的研究包括實驗性資訊檢索(experimental information retrieval)、書目計量映射以及書目計量分布;第二個時期開始對於資訊檢索的使用者端產生興趣,增加了搜尋過程與認知面向的資訊檢索研究;隨後是在1990年代早期進行的相關性(relevance)研究,同時也傾向於一般的資訊行為;1990年代末期則受到網路科技的影響,開始進行網路以及網路計量學的研究。

接下來的共被引分析,被引用的參考文獻僅限於也在13605篇論文裡的論文,來了解具有影響力的論文,做為研究脈絡(research context)。選取被引用次數超過25次的論文,共65篇。呈現的圖形大致上仍然可明顯的看出分為上半區域的資訊搜尋與檢索和下半區域的資訊計量學。但資訊檢索的研究以認知性資訊搜尋與檢索、相關性和資訊行為為主要,實驗性資訊檢索研究則成為邊緣。

相較於研究基礎,在研究脈絡上可以發現資訊計量學的結果較為分散,包含三個部分:研究合作(research collaboration)、書目計量映射與網路計量學,並且以網路計量學最為主要。


以TABLE 3的叢集結果來看,在研究脈絡中雖然實驗性資訊檢索與書目計量分布消失了,但增加了兒童的資訊行為研究和對於研究合作的資訊計量學分析。雖然這些研究依然存在,但本身並沒有形成叢集,而是歸入其他的叢集中,如IR/Search。

對三個5年的時期進行研究前沿分析,第一個時期1990-1994年,共有3401篇論文,彼此間有1581次引用,39篇論文獲得5次以上的引用。這個時期以ISR為主,特別是使用者觀點的ISR研究;資訊計量學由兩個小叢集組成:一為研究合作,另一聚焦於映射。

第二個時期1995-1999年,包含3318篇論文,彼此間的引用共有2117次,獲得5次以上引用的論文共有52篇。這時期ISR的聚集相當明顯,除了聚焦在資訊科技(information technology)和實驗性資訊檢索(experimental IR)的兩個叢聚外。在一般的資訊計量學之外,另外還有研究成效(research performance)的叢集。

第三個時期2000-2004年,有4147筆論文,彼此間有2926次的引用,62篇論文的引用次數超過7次。在這個時期,可以看出資訊計量學較前面兩個時期緊密連接,主要聚焦在網路計量學,而ISR則較前兩個時期變得較為分散,可分為三個叢集:ISR、兒童的資訊行為(children's information behaviors)以及健康資訊學(health informatics)。

本研究發現LIS有相當穩定的結構,主要為ISR及資訊計量學所構成。另外,從研究基礎上發現,大多為理論或方法學的文獻,但研究脈絡與前沿上的文獻卻以實務性的論文為主。就三個時期的研究來看,1900-1994年以圖書館與資訊服務(library and information service)為主,第二個時期則是線上資料庫與資訊尋求;第三個時期受到WWW影響,主要的研究從群體利用WWW搜尋資訊的方法到發展分析網站影響因素的方法。最後,本研究發現ISR與資訊計量學有愈來愈接近的趨勢,其原因是因為兩者都需要測量文件(或搜尋問題)之間的關係強度,並且也都對將資訊視覺化有興趣,因此彼此引用整合的機會增加。

Based on articles published in 1990–2004 in 21 library and information science (LIS) journals, a set of cocitation analyses was performed to study changes in research fronts over the last 15 years, where LIS is at now, and to discuss where it is heading.

The results show a stable structure of two distinct research fields: informetrics and information seeking and retrieval (ISR). However, experimental retrieval research and user oriented research have merged into one ISR field; and IR and informetrics also show signs of coming closer together, sharing research interests and methodologies, making informetrics research more visible in mainstream LIS research. Furthermore, the focus on the Internet, both in ISR research and in informetrics—where webometrics quickly has become a dominating research area—is an important change.

The nature and intellectual organization of LIS has been thoroughly investigated in analyses describing the general traits of LIS research, as well as mapping how LIS has been organized in different research themes (Persson, 1994; White & Griffith, 1981; White & McCain, 1998).

My approach centers on the following questions. What research topics have dominated LIS during the period 1990–2004? What changes can be observed in the topics addressed over the last 15 years? Can these changes can be used to tell us something about where LIS is heading?

Most definitions of “research fronts” explain them as groups of citing articles being clustered through bibliographic coupling (e.g., Persson, 1994), and their relations to the cited documents clustered by cocitation analysis (Garfield, 1994; Morris et al., 2003; Price, 1965). Although Persson sees the current (citing) articles as the research front and the cited documents as the research base, Garfield, for example, also includes the clusters of cocited core articles into the research front.

In addition, by analyzing the co-occurrence of highly cited documents, we also get an indication on the impact of the articles, thus expanding the definition of research fronts as including influential, as well as current research.

To identify LIS research, and to select journals for the analyses, the Journal Citation Reports: JCR Social Sciences (Thomson ISI, 2003) was used. To defining LIS research, JCR’s Information Science & Library Science classification, covering 55 journals, was used.

To limit the definition, all general LIS journals were identified and the specialized ones were excluded. This was done using the “Citing Journal” field in JCR: If the journal primarily was cited by non-LIS publications, it was excluded from the study.

The analyses were done on a document level, as opposed to an analysis on the author level. Although an author analysis provides more of an overview, the document analysis is more detailed, e.g., by not grouping documents on different topics by the same author.

The result reflects contemporary and influential research within a specific field of research, i.e., the research front.

The research base was based on the 13,605 journal articles published from 1990–2004 and their 221,586 references to 150,145 unique documents. The 66 most-cited documents that received 50 citations or more were selected for further analysis (Figure 1).

The map shows two main areas consistent with the structures found in earlier analyses on LIS (e.g., Persson, 1994; White & McCain, 1998). On the top half of the map, a group of information-seeking and retrieval (ISR) related literature is featured and on the bottom half, a group of informetrics literature. However, on the right side of the informetrics field, a group of webometric studies has formed a cluster. Webometrics is the study of the nature and properties of the World Wide Web, using informetric methodologies such as link, citation, and cluster analyses (Björneborn & Ingwersen, 2001).

In the ISR section of the map, there is a thematic shift from right to left. Systems-oriented information retrieval (IR) literature is on the far right, followed towards the left by user-system interaction studies and information behavior. In comparison to Persson (1994), the “soft” part of the IR-field has increased its impact compared to the “hard” systems-oriented IR research.

Apart from the webometric group on the far right, the informetrics field is centered on bibliometric mapping, surrounded by documents concerning bibliometric distributions.

To enhance the results of the cocitation analysis, a cluster analysis (Persson, 1994) was performed, resulting in eight clusters (Table 2). The clusters support the structures identified in the map, and reveal a division of the soft IR-research: from search- and relevance-focused documents, over cognitive IR and information seeking, to information behavior.

The publication years of the clustered documents shows four generations of research orientations, a trait also visible in the IR part of the map. The first generation of LIS research includes experimental IR, bibliometric distributions, and bibliometric mapping. The second generation of research, with references published from the early 1980s marks the increasing interest in the user side of IR, incorporating the search process and the cognitive perspective into IR and LIS research. This is followed by the relevance studies in the early 1990s; and a contemporary trend to focus on general information behavior. The most recent trend in the LIS research base is studies on World Wide Web and webometrics, dating back to the late 1990s.

The results of the second analysis show influential research areas during the period 1990–2004. It is still the same 13,605 articles providing the material, but only the 18,615 citations to articles present in the set of citing documents are analyzed. Here, as well as in the following time-sliced analysis, the self-citations were removed. Out of the 5024 unique-cited documents, the 65 articles being cited 25 times or more were selected and analyzed (Figure 2).

The general structure of the map is the same: with informetrics on the lower half and ISR on the top half. There are some differences, however. In the top half, a center has developed around “Kuhlthau, 1991” and “Ingwersen, 1996,” focusing on cognitive ISR, relevance, and information behavior, while experimental IR research has become peripheral. Different perspectives on the user-oriented research has dominated the information-seeking and retrieval field; and has together with the wider information behavior field formed a strong research area of different variations on information-seeking research.

At the same time, the informetrics field has become more dispersed, with three clearly defined subfields: research collaboration to the left, bibliometric mapping in the middle, and webometrics on the right side. In comparison with the research base, webometrics has become the dominating research area within the informetrics field.

2014年8月15日 星期五

Milojević, S., Sugimoto, C. R., Yan, E., & Ding, Y. (2011). The cognitive structure of library and information science: Analysis of article title words. Journal of the American Society for Information Science and Technology, 62(10), 1933-1953.

Milojević, S., Sugimoto, C. R., Yan, E., & Ding, Y. (2011). The cognitive structure of library and information science: Analysis of article title words.Journal of the American Society for Information Science and Technology,62(10), 1933-1953.

Scientometrics

圖書資訊學(LIS)為對於記錄下來的資訊(recorded information)和具有文化意義的文物與標本(culturally meaningful artifacts and specimens)有興趣的研究領域(Bates, 2010),包括的領域有檔案學(archival science)、 書目(bibliography)、文獻與文類理論(document and genre theory)、資訊學(informatics)、資訊系統(information systems)、知識管理(knowledge management)、圖書資訊學(LIS)、博物館研究(museum studies)、記錄管理(records management)和資訊的社會研究(social studies of information)。過去有許多研究嘗試定義與描述圖書資訊學的領域並且確認其中包含的研究主題,這些研究使用的方法相當廣泛,包含Järvelin & Vakkari (1990, 1993)採用內容分析(content analysis);Åström (2007, 2010)、Moya-Anegón, Herrero-Solana, & Jiménez-Contreras (2006)和 Persson (1994) 針對期刊或期刊文章進行書目計量分析 (bibliometric analysis) ; Moya-Anegón et al., (2006)和White & McCain (1998)針對作者進行書目計量分析 ;Åström (2002)、 Ding, Chowdhury, & Foo (2001) 和 Janssens, Leta, Glänzel, & De Moor (2006)利用從題名、摘要或全文抽取的詞語進行詞語的共現分析(co-word analysis) ;Sugimoto & McCain (2010)則是用索引詞語的三元共現分析(tri-occurrence analysis) ; van den Besselaar & Heimeriks (2006)利用詞語和參考文獻的組合進行分析;以及Sugimoto, Li, Russell, Finlay, & Ding, (2011)和 Sugimoto & McCain (2010)所使用的主題模型分析方法。

上述的這些方法,許多必須依賴於作者對於領域知識的了解,才能了解領域的主題與認知結構(cognitive structure),例如White & McCain (1998)基於最重要的作家的集群,觀察資訊科學由圍繞在一個微弱中心的許多專業所組成;Åström (2010)則是透過作者與期刊的映射圖說明這個領域的圖書館學(LS)和資訊科學(IS)之間具有差距。除了是認知結構較不直接的指標之外,引用分析另一個的問題是不同的次領域有不同的發表與引用實務。

論文題名包含許多能夠指出該文章內容的詞語(Buxton & Meadows, 1977; Meadows, 1998)。因此,本研究採用的方法是利用期刊論文題名上的重要詞語進行分析。分析的資料來自16種LIS期刊於1988到2007年發表的10344筆論文資料。

選取100個最常出現於題名的詞語。

本研究使用的分析技術包含詞語的相對頻率(relative frequency)並且根據詞語的共現進行叢集,最後並將詞語以及期刊與發表年度等進行多維尺度分析(multidimensional scaling, MDS),產生視覺化的結果。

詞語的共現分析以及階層式集群分析的結果發現三個主要分類LS(圖書館學)、IS(資訊科學)、SCI-BIB(科學計量學-書目計量學)以及兩個較小的分類資訊尋求行為(information-seeking behavior)和書目指導(bibliographic instruction)。LS可再細分為學術圖書館專業(academic librarianship)、公共圖書館專業(public librarianship) (包含館藏建立)、資訊素養和學校圖書館專業(information literacy and school librarianship, technology)、政策(policy)、全球資訊網(the web)、知識管理(knowledge management)、數位圖書館(digital libraries)、電子商務(e-commerce)、法律(law)以及學術出版(scholarly publishing)等主題。IS則包含資訊檢索(information retrieval)、網路搜尋(web search)、分類目錄(catalogs)以及資料庫(database)等主題。SCI-BIB也有書目計量指標(bibliometric indicators)、作者生產力(author productivity)與引用研究(citation study)等主題。整體的結構如下圖

從詞語的使用可以發現LIS中有某些持續出現的核心詞語,但也有一些詞語的使用在20年間有明顯的變化,這些都是與科技相關的(technologically related)詞語,這個現象符合Saracevic(1999)所宣稱的LIS是個科技驅動的(technology driven)領域。大致上來說,LIS內的改變可以從資料庫(database),到數位圖書館(digital libraries),到全球資訊網(the World Wide Web)等詞語使用的移轉上看得出來。

除了科技驅動的特徵外,LIS同時也有很大的範圍在討論資訊尋求行為,這是LS和IS都共同關心的課題。

A number of empirical studies of LIS have been conducted with the aim of describing and defining the field and identifying research areas within it. These studies applied a wide array of approaches: content analysis (Järvelin & Vakkari, 1990, 1993); bibliometric analysis of journals and journal articles (Åström, 2007, 2010; Moya-Anegón, Herrero-Solana, & Jiménez-Contreras, 2006; Persson, 1994); bibliometric analysis of authors (Moya-Anegón et al., 2006,White & McCain, 1998); co-word analysis of both index terms and words extracted from titles, abstracts, and full text (Åström, 2002; Ding, Chowdhury, & Foo, 2001; Janssens, Leta, Glänzel, & De Moor, 2006); tri-occurrence analysis of index terms (Sugimoto & McCain, 2010); analysis of word-reference combinations (van den Besselaar & Heimeriks, 2006); and topic analysis (Sugimoto, Li, Russell, Finlay, & Ding, 2011; Sugimoto & McCain, 2010).

Some notable studies of cognitive structure of LIS have interpreted topics post hoc, by assigning topicality based on knowledge of the author’s domain (e.g., White & McCain, 1998). In White and McCain’s influential visualization of LIS, they concluded that “information science lacks a strong central author, or group of authors, whose work orients the work of others across the board. The field consists of several specialties around a weak center” (p. 343). However, this analysis was based foremost on the clustering of authors, rather than topics. Similarly, Åström (2010) examined the divide between LS and IS components of the field by a bibliometric mapping of authors and journals. Topicality was assigned through expert knowledge of the domains in which these authors wrote and journals published.

Of the various components of textual documents, the titles, and the choice of words in them, are of particular importance. Title words function as “attention triggers” (Bazerman, 1985, 1988). They are devices for capturing interest in the world where information overload is a norm. Title words
have been called “signal-words”1 (Rip & Courtial, 1984) and “macro-actors” or “macro-terms”2 (Callon et al., 1983). Titles of journal articles themselves have undergone a change during the 20th century, becoming more informative, more specific, and containing a larger number of words that indicate article content (Buxton & Meadows, 1977; Meadows, 1998). Leydesdorff (1989) claims that “title words seem to offer a means of making visible the internal cognitive structure” (p. 217) of a discipline. He also claims that “word structure reflects internal intellectual organization in terms
of the codification of word usage in the relevant disciplines” (Leydesdorff, 1989, p. 221). 

Co-word analysis is based on co-occurrence of words (all words, or selected keywords) extracted from titles, abstracts, or text in general, or the index terms assigned by authors or indexers. Co-word analysis is a method that derives “higher level structures from word-occurrence patterns in text” (Chen, 2003, p. 139). Of particular importance in the context of this study is that co-word analysis is “a means to the elucidation of structures of ideas, problems, and so on, represented in appropriate sets of documents” (Whittaker, Courtial, & Law, 1989, p. 473). 

Although co-word analysis has its limitations, (e.g., Leydesdorff, 1997) primarily because of the
change of usage and meaning of words and the lack of context, such analysis has been considered particularly useful in tracking the development of scientific fields over time (Callon et al., 1991; Noyons & van Raan; Rip & Courtial, 1984), which represents another goal of this study.

Although citation analysis is not subject to the same limitation, it is a less direct indicator of cognitive structure. As already mentioned, studies using citations require post hoc assignment of topics. In addition, citation analysis of LIS is less effective in analyzing the cognitive structure of entire fields due to the different publication and citation practices of subfields, thus leaving even large subfields such as LS often invisible.

Selection of journals and articles. Articles from 16 LIS journals were chosen for inclusion in this study. The journals were selected from a ranked list of the most important journals in the field, according to deans and directors of American Library Association (ALA)-accredited, MLS programs in North America (Nisonger & Davis, 2005).

From this journal set, all research and review articles (10,344) published between 1988 and 2007 were included in the analysis.

Identification of the most frequently occurring LIS words and phrases. Word frequency is an important measure in content analysis. This measure is used to identify the most important research topics or concepts in a field by focusing on the most frequently occurring words.

In this study, we base all analyses on the 100 most frequently occurring LIS words or phrases. 

2014年8月14日 星期四

Huang, M. H., & Chang, Y. W. (2012). A comparative study of interdisciplinary changes between information science and library science. Scientometrics, 91(3), 789-803.

Huang, M. H., & Chang, Y. W. (2012). A comparative study of interdisciplinary changes between information science and library science. Scientometrics, 91(3), 789-803.

scientometrics

本研究利用圖書館學與資訊科學領域下各五種期刊於1978到2007年間論文引用的參考文獻,比較這兩個領域的跨學科(interdisciplinary)特性。跨學科性(interdisciplinarity)的定義為使用來自其他學科的知識(knowledge)、方法(methods)、技術(techniques)與設備(devices)成為科學活動的結果(Tijssen 1992),利用來自不同學科參考文獻的引用分布是經常採用的分析技術。研究結果顯示兩者的來源學科有很大不同:圖書館學的研究傾向於引用圖書資訊學(library and information science)、教育學(education)、企業/管理(business/management)、社會學(sociology)和心理學(psychology);然而資訊科學的研究引用大多來自圖書資訊學、一般科學(general science)、電腦科學(computer science)、科技(technology)和醫學(medicine)等學科。除了圖書資訊學本身以外,圖書館學引用的學科主要以社會科學為主,資訊科學的引用則主要來自於自然科學。

從引用比例的變化來看,圖書館學在引用圖書資訊學上有下降的趨勢,引用自教育學的比例則是上升,資訊科學來自電腦科學上的引用,其比例也是上升。

本研究以從Brillouin指標(Brillouin’s Index)測量兩個領域的跨學科性,Brillouin指標的計算方式如下:

N是觀察的數量(the number of observations),也就是參考文獻的總數,ni是屬於第i個類別的觀察的數量,也就是在第i個學科的參考文獻數量。從可以看到這兩個領域的跨學科性都逐年上升,並且資訊科學比圖書館學有較高的跨學科性。


Based on the research generated by five library science journals and five information science journals, library science researchers tend to cite publications from library and information science (LIS), education, business/management, sociology, and psychology, while researchers of information science tend to cite more publications from LIS, general science, computer science, technology, and medicine. This means that the disciplines with larger contributions to library science are almost entirely different from those contributing to information science.

However, a decreasing trend in the percentage of LIS in library science indicates that library science researchers tend to cite more publications from non-LIS disciplines. A rising trend in the proportion of references to education sources is reported for library science articles, while a rising trend in the proportion of references to computer science sources has been found for information science articles.

In addition, this study applies an interdisciplinary indicator, Brillouin’s Index, to measurement of the degree of interdisciplinarity. The results confirm that the trend toward interdisciplinarity in both information science and library science has risen over the years, although the degree of interdisciplinarity in information science is higher than that in library science.

The concept of interdisciplinarity has been discussed by many researchers (Huutoniemi et al. 2010; Leydesdorff and Probst 2009; Rosenfield 1992; Tijssen 1992), and can be defined as the use of knowledge, methods, techniques, and devices as a result of scientific activities from other fields (Tijssen 1992).

2014年8月11日 星期一

Tsay, M. Y. (2011). A bibliometric analysis and comparison on three information science journals: JASIST, IPM, JOD, 1998–2008. Scientometrics, 89(2), 591-606.

Tsay, M. Y. (2011). A bibliometric analysis and comparison on three information science journals: JASIST, IPM, JOD, 1998–2008. Scientometrics, 89(2), 591-606.

Scientometrics

Borko (1968) 將資訊科學定義為「研究資訊的特性與行為、管理資訊流的力量以及使資訊能最佳化的取得與可用性的處理方法」本研究探討與比較JASIST (Journal of the American Society for Information Science and Technology)、IPM (Information Processing and Management)和JOD (Journal of Documentation)三種資訊科學相關期刊在1998到2008年間論文的參考文獻具有的書目計量特性 (bibliometric characteristics) 以及與其他學科的主題關係 (subject relationship)。

研究結果呈現三種期刊都是資訊科學導向,但JOD更傾向於圖書館學,而JASIST和IPM有更多的共同性以及比JOD更深入地擴散到其他學科。若干結果如下:
1. JASIST出版的文章數量為IPM和JOD的兩倍,後兩者出版的文章數量約略相當。但JOD以書評(book reviews)為主(54%)。
2. JASIST和JOD上每一篇論文平均有38和40筆參考文獻,明顯比IPM的32筆多。JOD、JASIST和IPM的參考文獻分別有9.3、7.8和4.1為書籍。
3. 期刊的自我引用情形以JASIST的17.46%最高,IPM和JOD分別為14.11%和10.19%。
4. 三種期刊引用最高的前五種期刊中有四種是相同的,包含JASIST、IPM、Scientometrics和JOD。JOD引用最高的書籍與其他兩者明顯不同,但JASIST和IPM引用最高的前三名則是一樣的,包含Salton 和 McGill的 Introduction to Modern Information Retrieval、 Van Rijsbergen的 Information Retrieval 以及 Salton的 The SMART Retrieval System: Experiments in Automatic Document Processing。
5. 三種期刊引用最高的前十種期刊有40到50%為資訊科學相關期刊,表示這個領域的研究人員引用較多自己領域的研究結果。
6. 引用期刊的前三大類別為‘‘Bibliography. Library Science. Information Resources (General)’’ 、 ‘‘Science’’ 和 ‘‘Social Sciences (General)’’。針對引用書籍而言,JASIST和IPM最大的類別都是science,但JOD則是‘‘Bibliography. Library Science. Information Resources (General)’’。以主題來說,三種期刊的前三大都是一樣的,包括‘‘searching’’、‘‘online information retrieval’’ 和 ‘‘information work’’。


Employing a citation analysis, this study explored and compared the bibliometric characteristics and the subject relationship with other disciplines of and among the three leading information science journals, Journal of the American Society for Information Science and Technology (JASIST), Information Processing and Management and Journal of Documentation. The citation data were drawn from references of each article of the three journals during 1998 and 2008.

Comparison on the characteristics of cited journals and books confirmed that all the three journals under study are information science oriented, except JOD which is library science orientation. JASIST and IPM are very much in common and diffuse to other disciplines more deeply than JOD.

Borko (1968) defined that information science is ‘‘a discipline that investigates the properties and behavior of information, the forces governing the flow of information, and the means of processing information for optimum accessibility and usability.

JASIST published more than twice of articles of IPM and JOD, both published approximately the same number of articles. Interestingly, JOD published more book reviews (54%) than journal articles.

The average number of references cited per paper for JASIST and JOD is 38 and 40. It is significantly higher than that of IPM of 32. There is no significant difference between JASIST and JOD in terms of average number of references cited.

In average, 9.3, 7.8, 4.1 books were cited per paper by JOD, JASIST and IPM, respectively. JOD cites books per paper most, while IPM cites least.

JASIST has the highest self-citation rate of 17.46%, next by IPM of 14.11% and JOD
has the least self-citation rate of 10.19%.

Four of the top five highly cited journals are in common, i.e., Journal of the American Society for Information Science and Technology, Information Processing and Management, Scientometrics, and Journal of Documentation.

On the other hand, the most cited three books in common for JASIST and IPM are Salton and McGill’s Introduction to Modern Information Retrieval, Van Rijsbergen’s Information Retrieval and Salton’s The SMART Retrieval System: Experiments in Automatic Document Processing.

For the three journals under study, most of the top ten highly cited journals, contributing about 40–50% of cited journals, are information science journals indicating that the researchers in the information science field cite more research results in their own field.

The top three main classes of cited journals in papers of the three journals under study are in common and in the same order, i.e., ‘‘Bibliography. Library Science. Information Resources (General)’’, ‘‘Science’’ and ‘‘Social Sciences (General)’’.

As for the books cited, the most cited main class in JASIST and IPM papers is science, while the most cited main class for JOD is ‘‘Bibliography. Library Science. Information Resources (General)’’.

The top three highly cited subjects of library and information science journals are in common and encompass ‘‘searching’’, ‘‘online information retrieval’’, and ‘‘information work’’.

Papers in JOD are less computer-related than JASIST and IPM and JOD is more traditional library science oriented than JASIST and IPM are. On the other hand, ‘‘Information Storage and Retrieval Systems’’ and ‘‘Information Retrieval’’ are two of the three most cited subjects of books cited by the three journals under study.

2014年8月10日 星期日

Ni, C., Sugimoto, C. R., & Cronin, B. (2013). Visualizing and comparing four facets of scholarly communication: producers, artifacts, concepts, and gatekeepers. Scientometrics, 94(3), 1161-1173.

Ni, C., Sugimoto, C. R., & Cronin, B. (2013). Visualizing and comparing four facets of scholarly communication: producers, artifacts, concepts, and gatekeepers.Scientometrics, 94(3), 1161-1173.

network analysis

本研究以發表場域-作者-耦合(Venue-Author-Coupling,VAC)、期刊共被引分析(journal co-citation analysis)、主題分析(topic analysis)和連結編輯委員會成員(interlocking editorial board membership)等四個面向分析資訊科學與圖書館學的期刊網絡。這個研究分析的期刊範圍為2008年 JCR (Journal Citation Report)資訊科學與圖書館學分類的58種期刊,在2005到2009年間的出版資料。分析資料的相關數據如Table 1:


本研究利用VAC代表期刊的生產者(producers)的相似性,根據每一對期刊間相同的作者數量測量它們的接近程度,其原理建立在作者會選擇主題或社會性相似(thematically or socially similar)的期刊發表。期刊共被引分析(McCain, 1991)計算每一對期刊被共同引用的次數,本研究用來測量作品(artifacts)間的相似程度。本研究以修改自LDA模型(Blei et al. 2003)的ACT(Author-Conference-Topic)模型(Tang et al., 2008)透過關鍵詞(keywords)在主題上的分布以及主題在作者及發表場域(期刊)上的分布,本研究以餘弦(cosine)測量評估期刊之間的相似程度。連結編輯委員會成員則是編輯委員會上的共同成員數測量期刊間的相似程度。兩種期刊間共同的成員愈多,代表這兩種期刊在認知上或是社會性上愈相似。

根據上面的四種期刊間的相似程度所得到的結果,除了進行階層式集群分析(hierarchical cluster analysis)之外,也用來建立網絡,以Kamada-Kawaii 法呈現網絡的型態。分析得到的四種網絡並且以二次指派程序(Quadratic Assignment Procedure) (Lawler 1963)比較網絡之間可能的相關性(correlation)。

VAC方法得到的期刊網絡如下
四個集群分別為MIS(黃)、IS(藍)、LS(綠)以及專門性期刊(紅)。其中的MIS期刊集群與其他的集群相當分離。IS與LS距離較近。相較於其他三個期刊集群,專門性期刊彼此間的連結較弱。
期刊共被引分析所得的網絡如下:

主題模型產生的五個主題如Table 2
五個主題在網絡上的分布如下圖
MIS(黃)仍然與其他集群較為分離,但與健康和傳播(communication)等專門性期刊的距離較近。IS(藍)和LS(粉紅)的位置與VAC和期刊共被引分析的網絡上有所不同。圖書館服務與實務(綠)與專門性期刊和LS很接近。

在利用連結編輯委員會成員的期刊網絡上,有10種期刊沒有和其它期刊有共同編輯委員。其餘的集群分為四群。以傳播研究相關的期刊是新增加的集群(綠)。

四個網絡的QAP結果如Table 3

總結以上,在JCR的資訊科學與圖書館學分類下約略可以將期刊分為四個集群:MIS、IS、LS和傳播相關的期刊。MIS相較來說較為獨立。另外,QAP的結果可以看到編輯委員會成員的結果與期刊共被引分析有很高的相關性,其原因可能是由於擔任編輯委員的研究人員往往有較好的學術成就,被引用的機會較高。編輯委員會成員與VAC有較高的相關性,其原因也可能是編輯委員有較高的生產力。運用多種面向的分析可以較全面地了解整個學術傳播網絡。

Fifty-eight journals from the Information Science and Library Science category in the 2008 Journal Citation Report were studied and the network proximity of these journals based on Venue-Author-Coupling (producer), journal co-citation analysis (artifact), topic analysis (concept) and interlocking editorial board membership (gatekeeper) was measured. The resulting networks were examined for potential correlation using the Quadratic Assignment Procedure.

The VAC approach is used to represent the producers in this dataset. This approach measures journal proximity based on the number of authors shared by each journal pair. The VAC approach is based on the idea that an author’s choice of publication venue reflects similarity judgments authors are likely to choose venues that are thematically or socially similar.

Artifacts are measured by means of journal co-citation. This measure, introduced by McCain (1991), refers to the appearance of two journals in the same reference list of an article. The more frequently two journals appear in the same reference lists, the greater the similarity between the two journals. The journal co-citation approach measures journal proximity by the frequency with which each journal pair is co-cited by the same articles.

Topic modeling is used to capture concepts. ... The technique adopted here, the author-conference-topic (ACT) model (Tang et al., 2008), extends the LDA model by considering the author and publishing venue of the articles. LDA was developed originally as a topic modeling technique concerning the probability distribution of keywords for topics, and is particularly helpful with the ‘‘classification, novelty detection, summarization, and similarity and relevance judgment’’ of large-scale data (Blei et al. 2003, p. 993). ... This model extends the idea of LDA by taking into account the authors and publishing venues, and estimates not only the distribution of words on topics, but also the distribution of authors and venues on the topics modeled. ... Here, the outcome of the ACT model is the probability distribution of each author and each journal over topics, and the journal proximity is calculated using the cosine similarity of the journals.

The interlocking editorship approach, employed by Ni and Ding (2010), measures journal proximity based on common editorial board membership. The number of editorial board members that two journals share can be viewed as an indicator of journal similarity. ... Thus, it can be expected that if two journals have scholars in common on their editorial boards, these two journals have some degree of similarity, either cognitively or socially.

The journals were clustered using a hierarchical clustering technique with squared Euclidean distance and Ward’s method. Each journal clustering was displayed as a network (Kamada-Kawaii layout); each node (journal) was colored according to the hierarchical clustering result with the size of a
node proportional to its centrality (either degree or closeness).

Additionally, a comparison of journal proximity results was conducted using the Quadratic Assignment Procedure (QAP). QAP is commonly used in social network analysis as a means of investigating correlations between two networks. ... (Lawler 1963).

2014年6月25日 星期三

Liu, Y., Rafols, I., & Rousseau, R. (2012). A framework for knowledge integration and diffusion. Journal of Documentation, 68(1), 31-44.

Liu, Y., Rafols, I., & Rousseau, R. (2012). A framework for knowledge integration and diffusion. Journal of Documentation, 68(1), 31-44.

Scientometrics

雖然許多研究區分"interdisciplinary"、"multidisciplinary"、"transdisciplinary" 以及 "crossdisciplinary" (Organisation for Economic Co-operation and Development, 2005),但在實務上可以發現它們之間具有連貫性,很難加以區別(Barry et al., 2008, pp. 27-8)。因此,本研究以interdisciplinary research(跨學科研究)加以統稱。根據美國國家科學院(National Academy of Science of the USA) (2004, p. 2)將跨學科研究(interdisciplinary research)定義為:為了了解或解決解答某些需要超越單一學科或研究實務領域的問題,團體或個人整合來自兩個或以上學科與特殊知識體系的資訊、資料、技術、工具、觀點、概念以及理論的一種方式。在這個定義中,關鍵的概念是知識整合(knowledge integration)。而所謂的學科(discipline)係指科學組織的結果。學科結構可以應用叢集演算法(cluster algorithm)等統計工具來產生,例如Rosvall and Bergstrom (2008) 和 Leydesdorff and Rafols (2009)等研究;也有像是國際十進位分類法(Universal Decimal Classification, UDC)的哲學結果:或是例如Moya-Anego´n et al., 2004 和 Leydesdorff and Rafols (2009)等有關跨學科性的實務研究上經常利用期刊引文報告(Journal Citation Reports, JCR)的期刊主題分類(Subject Categories)。

Rafols and Meyer (2010)提出一個研究跨學科性的分析架構,他們強調跨學科性的關鍵過程是知識整合(knowledge integration),兩個重要的觀察面向是多樣性(diversity)和凝聚性(coherence)。多樣性所指的是使用類別的廣度,也就是強調整合的知識有多麼不同,Rafols and Meyer (2010)提出使用參考文獻的參考文獻(references-of-references)的JCR分類來測量。Stirling (2007)認為多樣性是描述一個系統的元素如何被分配到的類別的特性,其概念包含三個層次:一為牽涉的類別數量(Variety),二是以Simpson指標、Shannon熵、Gini指標或是變異量係數(the coefficient of variation)等各種古典多樣性(Classical diversity)來測量均勻度(evenness),三是同時涵蓋多樣性的variety、balance和disparity等三個面向的最佳測量,能將類別間的距離與差異考慮進來的Rao-Stirling測量

此處的dij是類別 i 和 j之間的差異度, pi 和 pj 分別是所有項目在類別 i 和 j 上的比例。 α 和 β 則是調整重要性的參數,通常設為1。

凝聚性則是在這個研究上元素彼此間透過主題或類別間相互關連的程度,強調研究中不同體系間如何連貫銜接並形成有意義的群體,是由元素所構成網路的特性,Rafols and Meyer (2010)使用在參考文獻網路中的書目耦合(bibliographic coupling)強度來測量,也就是以參考文獻為節點,參考文獻彼此間的書目耦合值的大小決定節點間連結線的存在與否,並且以平均路徑長度(mean path length)測量凝聚性。




The authors propose that in order to characterise knowledge integration and diffusion of a given issue (the source, for example articles on a topic or by an organisation, etc.), one has to choose a set of elements from the source (the intermediary set, for example references, keywords, etc.). This set can then be classified into categories (cats), thus making it possible to investigate its diversity. The set can also be characterised according to the coherence of a network associated to it.

In a recent article, Rafols and Meyer (2010) presented an analytic framework for the study of interdisciplinarity. The two main factors of this framework are diversity and coherence. These authors stress that the key process characterizing interdisciplinarity is knowledge integration (National Academy of Sciences, 2004; Porter et al. 2006).

According to the National Academy of Sciences (2004, p. 2) of the USA, interdisciplinary research is:
[...] a mode of research by teams or individuals that integrates information, data, techniques, tools, perspectives, concepts and/or theories from two or more disciplines or bodies of specialised knowledge to advance fundamental understanding or to solve problems whose solutions are beyond the scope of a single discipline or area of research practice.

In this definition the key concept is knowledge integration.

Although some researchers make a distinction between the terms “interdisciplinary”, “multidisciplinary”, “transdisciplinary“ and “crossdisciplinary“ research (Organisation for Economic Co-operation and Development, 2005) in empirical studies one finds a continuum that makes it difficult to distinguish among these modes (Barry et al., 2008, pp. 27-8). Hence, we just use the term “interdisciplinary” as a general term, comprising all the latter, as was done in Rafols and Meyer (2010).

When studying interdisciplinarity, the notion of “a discipline” comes logically first. A discipline is the result of the organisation of science (Turner, 2000). The disciplinary structure can be captured using statistical tools, for example by applying a cluster algorithm, as in Rosvall and Bergstrom (2008) and Leydesdorff and Rafols (2009), a philosophical result (as the categories used in the Universal Decimal Classification (UDC)) or a practice-based categorisation, supported by statistics, such as the Subject Categories of the Journal Citation Reports (JCR) (Moya-Anego´n et al., 2004; Leydesdorff and Rafols, 2009).

Diversity refers to the breadth in categories used (Stirling, 2007); coherence to the extent that different elements in the research (categories or topics) are interrelated. The notion of diversity puts the emphasis on how different the incorporated knowledge is, while the notion of coherence emphasizes how different bodies of research are consistently articulated and form a meaningful constellation (Rafols and Meyer, 2010).

In this sense, an increase in diversity reflects the divergence of knowledge integration and diffusion, whereas an increase in coherence reflects their convergence.

In Rafols and Meyer (2010), diversity is measured using the JCR categories of the references-of-references and coherence using the strength of bibliographic coupling in the network of references.

In order to capture diversity and coherence we will consider a framework that consists of three entities:
(1) the source or object of enquiry (often an article or set of articles) used as a representation of an author or group of authors;
(2) an intermediary set (IM) derived from the source; and
(3) a target set, defining the notion we want to study.
These three sets are connected by two mappings: one from the source set to the intermediary set, and one from the intermediary set to the target set.


[...] knowledge integration can be captured as a property of an article (Porter et al., 2007). This leads to the question: how does one describe a relation between an article and the set of all cats?
(1) words used in the article;
(2) words used in the articles in the reference list;
(3) the byline;
(4) the byline of the articles in the reference list;
(5) the reference list;
(6) the reference lists of the articles in the reference list; and
(7) the union of items 5 and 6.

If the cats are disciplines delineated by a set of keywords and IM consists of words then each word is either mapped to itself (if it happens to be a keyword) or to that keyword (or keywords) that are closest to it in meaning (to be determined by a specific algorithm).

If the cats are disciplines and IM consists of journals, then each journal is mapped to the discipline associated with this journal (the case of journals covered by the WoS and the corresponding JCR subject categories is an obvious example).

Contrary to the case of knowledge integration, knowledge diffusion with respect to one article is largely determined by outsiders[1]. ... Instead, we have to determine an intermediary set taking into account the properties of the “audience” or users of the article(s).
This may be (and again no exhaustiveness is claimed):
(1) the set of articles citing the article under consideration, denoted CIT;
(2) the union of CIT and all articles citing an article in CIT (hence including several citation generations, as studied, for example, in Hu et al., 2011);
(3) the set of journals citing the article;
(4) all books citing the article;
(5) the set of persons downloading this article;
(6) the departments where downloading has taken place; and
(7) all web pages linked to the article (if it exists in electronic form).

If the cats are countries or regions and IM consists of citing articles (the case of knowledge diffusion), then each citing article is mapped to those countries appearing in the byline.
For the same set of cats and an IM consisting of books that appear in the reference list, each book is mapped to the country of the publisher.
If IM consists of web pages, each web page is mapped to its country domain name (either removing other domain names; or considering.com,.org, etc. as “regions”).

If the cats are journals and IM consists of citing articles, then each citing article is mapped to the journal in which the citing article has been published.

Diversity is the property of how the elements of a system are apportioned into categories (Stirling, 2007). As one aspect of knowledge integration, diversity is now determined on the image of the cats-mapping. There are three levels on which one may work.
(1) Variety: the number of cats involved, or (maybe better) the relative number of cats (with respect to the total number of cats) involved.
(2) Classical diversity (as the opposite of evenness). This quantity can be measured using a classical evenness measure such as the Simpson index, the Shannon entropy measure, the Gini index or the coefficient of variation (Nijssen et al., 1998).
(3) As explained by Rafols and Meyer (2010) the best approach is to take the three aspects of diversity – i.e. variety, balance and disparity – into account. If a distance or dissimilarity measure exists in cats (and this is assumed in our framework) this suggests using the Rao-Stirling measure, or one of the generalisations that can be derived from Stirling’s (2007) framework [2].

Recall that the Rao-Stirling measure is defined as:

Here, dij denotes the dissimilarity between cat i and cat j, and pi and pj denote the proportions of the total number of items under study in cat i and cat j, respectively. Finally, α and β are parameters that adjust the importance given to small distances (α) and weights (β).

Coherence is the property describing how the elements of a system are related to each other. Hence, coherence is a property of networks.

It is independent from the concept of diversity: diversity reflects the distribution of elements in the IM set into categories; coherence reflects how these elements are related to each other (as measured through cats).

Different network measures may be used to capture coherence, such as the mean path length or the mean dissimilarity between elements (or linkage strength).

Rafols and Meyer (2010), the network nodes consist of the set of references of the original article. These references are the nodes of the network, and the relation studied, determining the existence of links, is bibliographic coupling.

In this way each source can be represented on a two-dimensional graph (coherence versus diversity) as in the Rafols and Meyer (2010) study, where they represented mean-linkage strength versus Stirling measure. In the case where the study of coherence reveals a fragmented structure, the diversity of each of the clusters can then be analysed for each of them.

However, for a group of articles, such as those published by one author, one may calculate an interdisciplinarity value per article and study how this measure changes over time (when new articles are added to the group of articles). Hence the time aspect of the group of articles under study is carried over to a time aspect of the corresponding interdisciplinarity, or more general, knowledge integration measure.


2014年6月21日 星期六

Jensen, P., & Lutkouskaya, K. (2014). The many dimensions of laboratories’ interdisciplinarity. Scientometrics, 98(1), 619-631.

Jensen, P., & Lutkouskaya, K. (2014). The many dimensions of laboratories’ interdisciplinarity. Scientometrics, 98(1), 619-631.

Scientometrics

本研究提出六種指標來測量研究機構的跨學科性。最廣義的來說,跨學科性可以視為不同學科某種程度的整合 (Weingart and Stehr 2000; Porter and Rafols 2009; Marcovich and Shinn 2011; Wagner et al. 2011; Rafols et al. 2012),為了將這個想法轉換為量化的指標,本研究認為需要考慮三個問題:
1. 如何定義一個學科
2. 在什麼層次達到整合
3. 學科連結需要到達什麼程度

在學科的定義上,本研究提出三種方式:一、因為是分析CNRS的實驗室,自然可採用CNRS的學科組織(disciplinary organization),包括10個研究所(institutes)以及進一步細分成的40個組(sections);二、如同其他先前的研究,使用WoS(Web of Science)的224種期刊主題分類(Journal Subject Categories, JSCs);三、將文件根據共同的參考文獻,以叢集演算法(clustering algorithms)由下往上地(bottom-up)歸類成認知叢集(cognitive clusters)。在整合的層次,則探討實驗室與論文兩個層級。

以實驗室的跨領域程度來說,較簡單的方式可以定義為:
此處的pi是實驗室的論文在期刊主題分類JSC i上的比例。

除了上述的定義之外,本研究還使用的Stirling’s (2007)方法來表現多樣性的三個不同面向:不同類別的數量(variety)、在各類別上的分布均勻程度(balance)、以及表現類別間的差異(disparity) (Porter and Rafols 2009):
此處的是主題分類JSC i 和 JSC j的相似性,並且此一相似性以cosine測量主題分類間的引用情形得到。(Porter and Rafols 2009).

為了進一步了解實驗室的跨領域多樣性是否在單一論文的認知層次達成,如同上述的情形,計算單一論文的跨領域多樣性時,可以利用下面的方式:
此處的pai是此論文引用的參考文獻在期刊主題分類JSC i上的比例。進一步用實驗室發表的論文考慮實驗室的跨領域程度時,可以將所有論文的跨領域多樣性加以平均,如

此處的 #pap 是實驗室發表的論文數量。

此外,另兩種指標分別是主流引用外的主題分類比例以及不同機構的人員合作占論文全體比率,分別如下所示:


最後一種指標,先以書目耦合(bibliographic coupling) (Kessler 1963)產生論文之間的關連,計算方式如下:
此處的where #common_refsij 是論文 i 和 j 共同引用的參考文獻數量, #refsi 和 #refsj 分別是論文 i 和 j 包含的參考文獻數量。接下來以書目耦合關連建立論文網路,希望在網路上引用文獻相似的論文會聚集形成叢集。因此,接下來Blondel et al. (2008)的演算法,劃分網路成論文的叢集。整個方法可參見Grauwin and Jensen (2011),結果共劃分成250個叢集。然後以下面的方式計算實驗室在認知叢集上的多樣性
此處的 p_i 和 p_j 分別是實驗室的論文屬於叢集 i 和 j 的比例。

以六種指標計算每一個實驗室的跨學科多樣性後,接下來以主成分分析(Principal Component Analysis, PCA)進行分析,四個主要的成分分別是
1) 實驗室在各種多樣性指標的綜合表現
2) 實驗室連結的學科的認知距離(cognitive distance)
3) 實驗室在實驗室層級或論文層級具有跨學科性
4) 論文發表的期刊具有跨學科的主題分類或是與其他不同機構的實驗室合作。

Interdisciplinarity is as trendy as it is difficult to define. Instead of trying to capture a multidimensional object with a single indicator, we propose six indicators, combining three different operationalizations of a discipline, two levels (article or laboratory) of integration of these disciplines and two measures of interdisciplinary diversity.

Interdisciplinarity means, at the most generic level, some degree of integration of different disciplines (Weingart and Stehr 2000; Porter and Rafols 2009; Marcovich and Shinn 2011; Wagner et al. 2011; Rafols et al. 2012).

To transform this idea into quantitative indicators, we need to answer three questions:
1. How to define a discipline?
2. At what level the integration is achieved?
3. What is the degree of disciplinary linkage achieved?

There are several ways to define a discipline from a scientometrics’ point of view. Since we are dealing with CNRS labs, the most natural would seem to use the disciplinary organization of CNRS in 10 ‘‘institutes’’ and 40 subdisciplinary ‘‘sections’’. A convenient alternative is to use the 224 Journal Subject Categories (JSCs) used by Web of Science (WoS). Finally, instead of using institutionally predefined divisions of science, one could use a more bottom-up definition of ‘‘cognitive clusters’’. To obtain these clusters, we use the roughly 300,000 French articles published between 2007 and 2010 and group them into ‘‘cognitive clusters’’ using clustering algorithms based on shared references.

In this paper, we will use three definitions of ‘‘discipline’’ and two integration levels (laboratory and article) to calculate six partial interdisciplinary indicators.

We adopt Stirling’s (2007) approach to capture the different facets of diversity : ‘variety’, ‘balance’ and ‘disparity’.

‘Variety’ characterizes the number of different categories, ‘balance’ characterizes the evenness of the distribution over these categories and ‘disparity’ characterizes the difference among the categories, usually based on some distance.

A simple indicator of the spread of the disciplines where a laboratory publishes is given by:
where pi is the proportion of articles of the laboratory in JSCi.

As we would like to include the idea of ‘‘distance’’ between disciplines, we calculate the diversity indicator (Stirling 2007; Porter and Rafols 2009) which combines both the spread of the disciplines through the pi and the distance between them.
where sij is the cosine measure of similarity between JSCs i and j. Practically, sij is measured through the citations from publications in JSCsi to publications in JSC j (Porter and Rafols 2009).

To further characterize a lab’s interdisciplinarity, it is useful to introduce an indicator of the interdisciplinarity of single articles, to test whether interdisciplinarity is achieved at this cognitive level.

Specifically, the interdisciplinary diversity of a single article is calculated as:
where pai is the proportion of articles’ references in JSCi.

To quantify the interdisciplinarity of the papers published by a lab, we aggregate the articles’ diversity indicator art_div_corr at the laboratory level by averaging over all the articles published by that laboratory:

where #pap is the number of articles of the lab for which at least one reference was identified.

Then, we choose a threshold to define the most common JSCs for each institute. ... We therefore choose a threshold value of 90 %. ... Then, for each laboratory, we count the percentage of articles outside this 90 % list and normalize by the expected value, i.e. the average value 0.1.


whereare the frequencies of the JSCs that do not belong to the Institute’s JSC main list.

Interdisciplinary collaborations can also be detected by copublications between scientists belonging to different CNRS Institutes. We compute a fifth indicator by calculating the proportion of a lab’s publications that involve authors from other Institutes

where the sum counts the number of articles of the lab involving at least two institutes and
#articles is the total number of articles published by the laboratory.

To build these ‘‘cognitive disciplines’’, we use bibliographic coupling (BC) (Kessler 1963) between the 300,000 papers published by French laboratories in the period 2007–2010 and compiled by the WoS.
where #common_refsij is the number of common references for articles i and j, and #refsi,
#refsj are the numbers of references of articles i and j, respectively.

In comparison to a co-citation link (which is the usual measure of articles’ similarity), BC offers two advantages: it allows to map recent papers (which have not yet been cited) and it deals with all published papers (whether cited or not).

This reinforcement facilitates the partition of the network into meaningful groups of cohesive articles, or clusters. A widely used criterion to measure the quality of a partition is the modularity function (Fortunato and Barthe´lemy 2007), which is roughly is the number of edges ‘inside clusters’ (as opposed to ‘between clusters’), minus the expected number of such edges if the partition were randomly produced. We compute the graph partition using the efficient heuristic algorithm presented in (Blondel et al. 2008). The whole method is described in (Grauwin and Jensen 2011).

Applying this algorithm yields in a partition of French papers into roughly 250 clusters containing more than 100 papers each.
where p_i is and p_j are the proportions of the labs’ papers belonging to clusters i and j respectively.

On average, articles refer to papers from almost 10 different disciplines (9.8 JSC). .... However, when considering those JSC that are used in more than 10 % of the reference list, this average drops to 2.7. This means that, on average, an article spreads its references on 3 main JSCs and 7 additional which benefit from roughly a single reference.

An average laboratory publishes in journals belonging to 34 different JSCs ...

PCA1: combined interdisciplinarity The main axis represents a combination of the various interdisciplinarity indicators.

PCA2: short or long cognitive distance This axis distinguishes those labs that connect distant or nearby disciplines.

PCA3: article or laboratory interdisciplinarity This axis distinguishes labs that achieve interdisciplinarity either at the laboratory or article level.

PCA4: diversity of publications’ JSCs or diversity of collaborations This axis distinguishes labs that publish in journals belonging to different JSCs (high lab_jsc_bal) from labs that co-publish with labs from different CNRS Institutes (high lab_inst_cop_bal).

We have computed the six indicators for the 680 laboratories which have published more than 50 papers over 2007–2010. To allow comparisons and statistical analysis, since the absolute values have no intrinsic meaning, we have scaled all the values to achieve an average value of 0 and a variance of 1. We then carried out a principal component analysis of the (680 9 6) matrix using the free software R (www.r-project.org/). More precisely, we used prcomp from the ‘stats’ package, without any axes rotation.

First, let us note that using the first four PCA axes gives an overall view about the interdisciplinarity practices of each lab. This view has been compared to expert knowledge, namely scientists working in those labs or scientific advisors from CNRS. This comparison, carried out for about 20 different labs from all the disciplines, suggests that these indicators characterize interdisciplinarity
in a meaningful way.

A major drawback of our method is that we cannot distinguish real interdisciplinary collaborations, giving rise to new concepts or to a coherent new scientific field, from simple pluridisciplinary practices that merely juxtapose different disciplines, as when historians use characterizing tools from physics. It seems difficult to learn much about the cognitive dimensions of interdisciplinarity from an automatic analysis of metadata of the papers.

2014年6月20日 星期五

Porter, A. L., & Rafols, I. (2009). Is science becoming more interdisciplinary? Measuring and mapping six research fields over time. Scientometrics, 81(3), 719-745.

Porter, A. L., & Rafols, I. (2009). Is science becoming more interdisciplinary? Measuring and mapping six research fields over time. Scientometrics, 81(3), 719-745.

Scientometrics

本研究將跨學科研究(interdisciplinary  research)操作化的定義為:由團隊或個人從兩個或以上的知識體系(bodies of knowledge)或研究實務整合它們的觀點/概念/理論、工具/技術以及資訊/資料的一種研究模式,也就是這類研究其知識來源具有多樣性,然後分析六個研究領域在1975年和2005年的跨學科程度變化。跨學科指標的計算以引用期刊在WoS (Web of Science)上的主題分類(Subject Categories, SCs)為基礎,並且配合科學映射圖(science maps)表現科學產出在主題分類上的分散情形(dispersion)。整個分析的流程包含五個步驟:


一、將跨學科性的測量操作化。
二、建構主題分類間的相似性矩陣,做為計算整合性指標之用。
三、對相似性矩陣進行因素分析(factor analysis),將主題分類分群成為巨型學科(macro-disciplines)以便進行視覺化。
四、產生科學映射圖。
五、選取六個主題分類,做為目前的基準與未來探索。


針對操作化跨學科性的測量有幾點必須說明:首先根據Stirling的看法,探索跨學科性時,需要針對引用的學科數量、引用在學科間的分布情形、類別的相似性等面向進行研究[RAFOLS & MEYER, FORTHCOMING]。其次,本研究認為知識整合是一種認知範疇(an epistemic category),因此跨學科性指標應該建立在研究結果的內容,而不是團隊的成員,部門組織或合作上。最後,跨學科性的測量通常以引用文獻的期刊所屬的主題分類為基礎,但書目計量學的研究社群已經提出主題分類有一些問題,例如期刊叢集的研究指出僅有約50%的叢集結果和主題分類相近[BOYACK & AL., 2005; (BOYACK, personal communication, 14 September 2008)],根據引用網路得到的分類結果和主題分類之間也沒有很好的符合[LEYDESDORFF, 2006, P. 611]。但這些結果僅對科學映射圖產生有限度的影響,並且在測量整合性上,主題分類目前還是最被廣泛使用的分類資源。

本研究用來測量整合性指標[RAFOLS & MEYER, FORTHCOMING]的公式,由Rao-Stirling提出的多樣性測量方式 [STIRLING, 2007],如下:
此處pi是給定的論文上引用的參考文獻來自主題分類 i 的比例,sij是主題分類 i 和 j 的相似程度,利用cosine測量。由於許多研究 例如[GRUPP, 1990; HAMILTON & AL., 2005, or ADAMS & AL., 2007]都以Shannon或Herfindhal提出的方式測量整合性,Shannon的多樣性測量方式如
Herfindhal的多樣性測量方式如


但這兩種方法都未考慮類別間的不同;反之Rao-Stirling的多樣性則同時考慮類別數量多寡、類別上的分布平衡和類別間的相似性等三個方面。因此本研究比較此一整合性指標與Shannon和Herfindhal多樣性。

本研究以主題分類被引用的次數為資料,對每一對主題分類進行cosine測量這兩個主題分類間的相似性。當兩個主題分類被大部分的論文共同引用時,它們之間便會有很高的相似度;反之,兩個主題分類共同被引用的情形很稀少時,cosine的值接近於零。完成相似性矩陣的建立後,以主成分分析(Principal Components Analysis, PCA)進行因素分析,以最大變異量轉換(Varimax rotation)產生20個因素,將每一主題分類以其具有最高負荷的因素進行歸類。每一個因素對應一個巨型學科,某些無法歸類的主類分類另外歸於一個巨型學科,結果共有21個巨型學科。

然後以主題分類在21個因素上的負荷值為特徵,再以cosine測量主題分類之間的相似性。以Pajek將主題分類之間的相似性映射成網路圖,過濾相似性在0.6以下的連結線,做為科學映射圖。科學映射圖上呈現每一個主題分類、相對的重要性、以及彼此間的關連程度,目的在於在巨型學科間找出特定研究的主體,發現相互關連在時間上的變化以及主要的跨學科關連,更重要的是發現做為知識來源的期刊是來自於密切關連的學科或是跨越完全不同的領域。


本研究選取生物科技與應用微生物學(Biotechnology & Applied Microbiology)、電子電機工程(Engineering, Electrical & Electronic)、數學(Mathematic)、醫學(Medicine – Research & Experimental)、神經科學(Neurosciences)、物理(Physics – Atomic, Molecular & Chemical)等六個主題分類。

研究結果發現:30年間論文的平均作者數、平均參考文獻數和引用的學科數量都有很大幅度的增加,但是從跨學科指標的增加並不大。造成上述現象,可能是由於雖然引用的主題分類數量有明顯的增加,但每篇論文平均引用的參考文獻數量增加地更快,使得在不同主題上的引用比例的實際改變變得不如預期中的重要;另外,許多主題分類的引用較傾向於鄰近的主題分類,但是鄰近區域的主題分類有較高的相似值,對於多樣性的貢獻較低;最後是某些較跨學科研究的領域其測量的整合性已經到達飽和了。從科學映射圖的結果也指出論文引用的分布仍然主要集中於某些鄰近的學科領域。此外,本研究也發現Rao-Stirling的多樣性測量與Herfindhal和Shannon的測量都有很高的相關性,分別為0.91(標準差0.07)及0.88(標準差0.07) 。

Here we investigate how  the degree of interdisciplinarity has changed between 1975 and 2005 over six research domains. ... The results attest to notable changes in research practices over this 30 year period, namely major increases in number of cited disciplines and references per article (both show about 50% growth), and co-authors per article (about 75% growth). However, the new index of 
interdisciplinarity only shows a modest increase (mostly around 5% growth). Science maps hint 
that this is because the distribution of citations of an article remains mainly within neighboring 
disciplinary areas.

We measure how integrative particular research articles are  based on the association of the journals they cite to corresponding Subject Categories  (“SCs”) of the Web of Science (“WoS”)

And, we present a practical way to map  scientific outputs, again based on dispersion across SCs.

This report operationally defined interdisciplinary  research as: 
x a mode of research by teams or individuals that integrates 
x perspectives/concepts/theories and/or 
x tools/techniques and/or 
x information/data 
x from two or more bodies of knowledge or research practice. 

Our approach here is to investigate changes of degree of interdisciplinarity over time  using various established indicators (e.g. number of disciplines cited, percentage of  citations within-field), together with a new indicator developed the NAKFI evaluation  team [PORTER & AL., 2007]: 
Integration – reflecting the diversity of knowledge sources, as shown by the breadth  of references cited by a paper. 

Following Stirling’s heuristic, we have previously argued that in order to explore interdisciplinarity, one needs to investigate multiple aspects, namely: the number of disciplines cited (variety), the distribution of citations among disciplines (balance), and, crucially, how similar or dissimilar these categories are (disparity) [RAFOLS & MEYER, FORTHCOMING]. 

The computation and visualization of the interdisciplinarity measure has taken five  steps, presented consecutively in this section: 
1. Operationalization of an interdisciplinary measure (the Integration index or disciplinary diversity)
2. Construction of a similarity matrix among Subject Categories that is used to compute the Integration index
3. Grouping via factor analysis of the SCs into macro-disciplines using the similarity matrix as a base to facilitate visualization
4. Generating science maps
5. Selection of a bibliometric sample of 6 SCs, to serve as benchmarks here and in future explorations. 

In other words, since knowledge integration is an epistemic category, indicators of interdisciplinarity should be based on the content of the research outcomes rather than on team membership, departmental affiliations, or collaborations (see illustrations in case studies in RAFOLS & MEYER, 2007). 

The bibliometric community has noted that the SCs have some problems. In journal clustering exercises, only about 50% of clusters were found to be closely aligned with SCs [BOYACK & AL., 2005; (BOYACK, personal communication, 14 September 2008)]. Poor matching between SCs and classifications derived from citation networks has also been reported [LEYDESDORFF,
2006, P. 611], but surprisingly the mismatch only has limited effect on the corresponding science maps [RAFOLS & LEYDESDORFF, UNDER REVIEW].

Nonetheless, the SCs offer the most widely available categorization resource that we could ascertain for the purpose of providing an accessible measure of Integration.

As derived in RAFOLS & MEYER [forthcoming], the formula for the Integration index can be expressed as:

where pi is the proportion of references citing the SC i in a given paper. The summation is taken over the cells of the SC x SC matrix. sij is the cosine measure of similarity between SCs i and j (the cosine measure may be understood as a variation of correlation). Here this matrix sij is based on a US national co-citation sample of 30,261 papers from Web of Science as explained below in detail. 

This Integration measure (aka, Rao-Stirling’s diversity) can be compared with Shannon diversity: 

or with Herfindhal’s diversity (the complement of Herfindahl’s concentration):

The power of the Integration index is that it characterizes interdisciplinarity in terms of the diversity of knowledge sources of papers, using a general formulation of diversity [STIRLING, 2007] rather than an ad hoc indicator.

A number of researchers have used these traditional measures of diversity, such as Shannon or Herfindhal, to measure interdisciplinarity [E.G. GRUPP, 1990; HAMILTON & AL., 2005, or ADAMS & AL., 2007]. These measures do not take into account how different the categories are, whereas our Integration measure reveals increased diversity only when added categories are significantly different.

In particular, a broad national sample of articles from WoS is used to create the sij matrix that underlies the metrics used for computing Integration. First we describe the sample used as a basis for the similarity matrix; second, the construction of the matrix.

We combine six separate weeks of all papers in WoS, with one or more authors having a USA address, sampled during 2005–2007, to obtain 30261 articles. This provides a broadly based, yet manageable base sample. We processed the “Cited References” of these abstract records to identify the “Cited SCs.”

Our sample of 30261 WoS articles contains 1,020,528 cited references (an average of 33.7 per article). Of those, our thesauri link 768,440 to a particular Subject Category. Another 28,000 have been checked and assigned to “not being in an SC.”

For our purposes in addressing cited SCs, the list includes a few more than the current set, for a total of 244 SCs. The sample contains 1,114,930 instances of cited SCs.

The 30261 articles, by 244 SCs, described allow for construction of a co-citation similarity matrix, sij, using Salton cosine [SALTON & MCGILL, 1983; AHLGREN & AL., 2003].

The values of sij are high (i.e. closer to one) when SCs i and j are co-cited by a high proportion of articles that cite one or the other. The cosine value approaches zero when two SCs are rarely cited together.

For various purposes and in particular for visualization, it helps to consolidate the narrow research areas of the ISI SCs into larger categories, which we call “macro-disciplines.”

We base our grouping of SCs on a type of factor analysis – Principal Components Analysis (PCA) – following a similar methodology to that developed by LEYDESDORFF & RAFOLS [2008] to cluster SCs into macro-disciplines.

Within VantagePoint, we constructed the matrix of cosine similarities for the 244 cited SCs by 244 cited SCs described in the previous section. ... We explored various factor analysis solutions, eventually adopting a 20-factor solution (Varimax rotation). ... The 21 macro-disciplines reflect this factor solution.

So, to a considerable degree, named sub-disciplines do not fully coalesce within a single macro-discipline. This warns that the evolving research enterprise does not neatly conform to the traditional scholarly disciplines.

These maps present the SCs, their relative importance in size, and how related they are to each other over all science. The main aim of these science maps is to locate particular bodies of research among the macro-disciplines. ... That can help identify changes in degree of interrelationship over time, and key cross-“disciplinary” relationships that might benefit from nurturing. It should also be informative to see whether knowledge sources of a set of publications are coming from research domains that are closely related (little interdisciplinarity) or that span very disparate domains (high interdisciplinarity). 

We then construct a new Salton cosine similarity matrix among SCs using the loadings of each SC on the 21 factors (as discussed in the previous subsection). This matrix is then uploaded into the network analysis software Pajek [BATAGELJ & MVAR, 2008]. In Pajek, the minimum similarity threshold was arbitrarily set to 0.6 (this choice was found to provide a good readability-to-accuracy trade-off) and the SCs were distributed in a 2-D plane according to their similarities, to obtain a base science map.

Since research collaboration is often (and sometimes mistakenly) associated with interdisciplinarity, we examine measures of co-authorship. ... However, within research domain, the number of authors per paper has escalated remarkably, with about 75% average growth. This increase ranges from 48% in Math and 54% in Physics-AMC to 90% in Neurosciences. 

Before turning to Integration scores, we consider the number of distinct SCs that one article cites. ... Table 2 and Figure 4 show a sturdy increase in the breadth of citing in all six of these research domains (about a 50% growth on average). 

Integration scores are tabulated in Table 2 and shown in Figure 5. We see that over time, there is a modest increase in Integration scores and that math researchers are notably less integrative in their citing patterns. However, math has the highest relative growth (39%) whereas other SCs’ growth ranges from 3% to 14% (5% on average). t-tests between the 1975 and 2005 samples show these differences to be highly significant (<.005 for EE, assuming either equal or unequal variances; all others even more highly significant).

Pearson’s correlation between Integration and Herfindhal takes a mean value of 0.91 (standard deviation = 0.07) and between Integration and Shannon, a mean value of 0.88 (standard deviation = 0.07). These high correlations confirm that Integration is very closely associated with traditional diversity indicators – as could be expected by construction.

The main finding is that Integration scores increase over time, but significantly less so than other indicators, such as percentage of single-authored papers, mean authors per paper, and mean number of disciplines per paper.

First, although the number of cited SCs increases significantly, since the average number of references in a paper also shows a quicker increase (see central columns in Table 2), the actual change in the proportions of citation to different SCs is not as important as could be expected.

Second, as we will show in Figures 7 through 10, the citation patterns of a given SC tend to be with SCs in its vicinity. Since these neighboring SCs have high similarity values with the one investigated, their contribution to Integration (to diversity) is smaller than in other indicators. This means that the Integration score “deflates” the diversity recorded by Shannon or Herfindahl because most of the cited SCs are not very different from the SC doing the citing.

This is much easier to convey using science maps that directly show the three aspects of disciplinary diversity, namely:
1. the variety of “disciplines” (i.e., discrete research areas, the SCs, shown by the number of nodes in the map)
2. the balance, or distribution, of disciplines (relative size of nodes)
3. the disparity, or degree of difference, between the disciplines (distance between the nodes)

These maps were created followed the techniques developed in LEYDESDORFF & RAFOLS [2008], in the context of the current interest in science mapping [MOYA-ANEGON & AL., 2004; BOYACK & AL., 2005; MOYA-ANEGON & AL., 2007]. ... In the figures presented in this article, we only label groups of SCs on the basis of macro-disciplines found by factor analysis, as explained in the methodology. 

However, the perspective provided by the Integration score and the science maps suggests that the practice of interdisciplinarity in citations occurs mainly between neighboring SCs and has undergone a much more modest increase (on average only 5%, excluding math).

This is mainly for two reasons: first, although the number of cited SCs has increased, the growth of citations means that the increase in the proportion of citations to new SCs is small; second, the newly cited SCs tend to be in the vicinity of the previous ones – hence they don’t add as much interdisciplinarity as they would if they were very disparate/distant disciplines. Moreover, for already very interdisciplinary SCs, such as Neuroscience, the indicator may have a certain “saturation” effect.