看見網絡: Wang, Q., & Waltman, L. (2016). Large-scale analysis of the accuracy of the journal classification systems of Web of Science and Scopus. Journal of Informetrics, 10(2), 347-364.

Wang, Q., & Waltman, L. (2016). Large-scale analysis of the accuracy of the journal classification systems of Web of Science and Scopus. Journal of Informetrics, 10(2), 347-364.

本研究以引用資料比較Web of Science和Scopus兩個資料庫提供的期刊分類系統(journal classification systems)的正確性。分類系統能應用於各種問題；例如，它可以被用來標定研究區域(Glänzel & Schubert, 2003; Waltman & Van Eck, 2012)，評估和比較研究對各領域的影響(Leydesdorff and Bornmann, 2015; Van Eck, Waltman, Van Raan, Klautz, & Peul, 2013)，以及跨學科的研究(Porter & Rafols, 2009; Porter, Roessner, & Heberger, 2008)。除了Web of Science和Scopus之外，期刊分類系統還有Science-Metrix、NSF(National Science Foundation)分類系統、the UCSD (University of California, San Diego)分類系統以及ANZSRC (Australian and New Zealand Standard Research Classification)，另外Glänzel and Schubert (2003)也提出一個包括期刊和論文的階層式分類系統，以演算法建構的期刊分類方法也有Bassecoulard and Zitt (1999)、Chen (2008)以及 Rafols and Leydesdorff (2009)等，Waltman and Van Eck (2012)的演算法則是以期刊裡出版的論文分類為主。

根據Waltman (2015, Section 3)的文獻分析，在比較Web of Science和Scopus時，主要針對資料庫的覆蓋情形(the coverage of the databases)，例如LópezIllescas, De Moya-Anegón, & Moed (2008)、 Meho & Rogers (2008)、 Mongeon & Paul-Hus (2016)、 Norris & Oppenheim (2007)，或是資料庫用來評估研究產生與影響的準確性，如Archambault, Campbell, Gingras, & Larivière (2009)、Bar-Ilan, Levene, & Lin (2007)、Meho & Rogers (2008)、Meho & Sugimoto (2009)，並未有研究比較與分析它們的分類系統的準確性。

過去Pudovkin and Garfield (2002)曾說明WoS首先利用人工的經驗法則將期刊分配到各類別，之後使用根據引用資料的Hayne-Coulson演算法對新的期刊進行分類。除此之外，Katz and Hicks (1995)、Leydesdorff (2007)、Leydesdorff and Rafols (2009)等研究也曾指出WoS的分類系統是綜合了引用模式、期刊題名與專家意見。但Scopus則未曾有文獻提到其分類系統的建構方式。事實上，在WoS上有兩個分類系統，一個為具有250個類別的類別系統(a system of categories)，另一個則是包含約150個研究領域的研究領域系統(a system of research areas)，此外，另一個分類系統僅包含科學與社會科學，稱為ESI (Essential Science Indicators)。本研究的分析對象是WoS上的類別系統。

Scopus的期刊分類系統則名為ASJC ( All Science Journal Classification)，分為兩個層級，下層有304個類別，上層則分為27個類別。

既然期刊分類系統相當有用，因此有許多研究提出對WoS及Scopus的分類系統進行改善的方法，例如Glänzel等人研究各種方法來驗證與改善WoS的分類系統 (Janssens, Zhang, De Moor, & Glänzel, 2009; Thijs, Zhang, & Glänzel, 2015; Zhang, Janssens, Liang, & Glänzel, 2010)，López-Illescas, Noyons, Visser, De Moya-Anegón, & Moed (2009) 則是對利用WoS分類系統進行的領域劃分，提出改進的方法。在Scopus分類系統的改進方面，則有SCImago團隊(Gómez-Núnez, ˜ Vargas-Quesada, De Moya-Anegón, & Glänzel, 2011, Gómez-Núnez, ˜ Batagelj, Vargas-Quesada, De Moya-Anegón, & Chinchilla-Rodríguez, 2014, Gómez-Núnez, ˜ Vargas-Quesada, & De Moya-Anegón, 2016)。

用來評估期刊分類系統準確性的方法可分為以專家為基礎的方法與書目計量方法(bibliometric approach)。以專家為基礎的方法在遭遇大量資料時有很大的困難，沒有專家有足夠知識來評估所有科學學科上的期刊分類，因此需要相當多的專家加入。書目計量方法可再分為以文本為基礎與以引用為基礎兩種方法，分別以在同一類別下期刊論文的文本相似性與引用模式的相似性大小做為衡量期刊是否應在同一類別的標準。本研究採用的是直接引用(direct citation)關係，先前Klavans and Boyack (2015)曾經利用直接引用關係建構論文的分類系統演算法，他們的結論認為直接引用較書目耦合(bibliographic coupling)或共被引(co-citation)等間接地引用關係更加準確。

綜合以上所述，本研究提出方法的原理可歸納為任一期刊引用該期刊所屬類別下的期刊或被這些期刊所引用的頻次必然較其他類別的期刊之頻次來得高。根據這樣基本原則，本研究制訂兩個檢驗期刊是否被指定於適當類別的標準：
標準1：某一期刊與它所屬類別的其他期刊之間若是只有相當少的引用關係，則這個期刊的分類可能有問題。
標準2：如果某一期刊與其他類別的期刊之間有相當多的引用關係，則這個期刊可能被分到不正確的類別下。

本研究以2010到2014年的WoS及Scopus資料庫上的所有期刊作為分析資料，相關的統計數據如表1所示，比較兩者所收錄的資料，Scopus資料庫除了比WoS更多的期刊種類與類別以外，每一種期刊被指定的類別數目也通常比較多，WoS上的期刊平均被指定到約1.6個類別，但是Scopus則為2.1。

根據第一項標準，WoS及Scopus兩個資料庫上都有許多期刊被指定到不適合的類別上，並且Scopus尤為嚴重，如表3所示。在各個至少有10種期刊的類別當中，選擇至少有一半的期刊符合標準一的類別，這樣的類別，WoS共有17個，Scopus則有高達76個，在兩個資料庫上都出現的類別包括建築學(ARCHITECTURE)、生物物理學(BIOPHYSICS)以及醫學實驗室技術(MEDICAL LABORATORY TECHNOLOGY)。

兩個資料庫的情況在標準二上都有還不錯的結果(表6)。

同時符合標準一與標準二的期刊，一方面與本身被指定的類別只有較弱的連結，另一方面則與未被指定的類別有較強的連結，分析同時符合標準一與標準二的期刊可以發現，一種可能是在這些期刊上的發表已經和它們的題名和範圍宣告(scope statement)有所差異，另一種可能則是在分類時僅依賴它們的題名。

根據以上的實驗結果，可以歸納以下的幾點結論：
1. 在標準一，WoS的表現比Scopus還要好，因此可以說，Scopus上的期刊通常與它們被指定的類別只有較弱的連結。
2. 在標準二，兩個資料庫的表現都相當好，也就是如果某一期刊與某一個類別的連結較強的話，WoS及Scopus通常會將它指定到這個類別。
3. 整合兩個標準，WoS比Scopus的表現通常要好上許多。

除了上述的結論外，本研究還指出Scopus有些類別有容易混淆的名稱，例如有兩個類別分別命名為LINGUISTICS & LANGUAGE和LANGUAGE & LINGUISTICS，另兩個則為INFORMATION SYSTEMS & MANAGEMENT與MANAGEMENT INFORMATION SYSTEMS。

而且兩個分類系統都缺乏透明性，本研究的作者沒有發現建構與更新分類系統的適當文件。

To examine and compare the accuracy of journal classification systems, we define two criteria on the basis of direct citation relations between journals and categories. We use Criterion I to select journals that have weak connections with their assigned categories, and we use Criterion II to identify journals that are not assigned to categories with which they have strong connections. If a journal satisfies either of the two criteria, we conclude that its assignment to categories may be questionable.

Accordingly, we identify all journals with questionable classifications in Web of Science and Scopus. Furthermore, we perform a more in-depth analysis for the field of Library and Information Science to assess whether our proposed criteria are appropriate and whether they yield meaningful results.

It turns out that according to our citation-based criteria Web of Science performs significantly better than Scopus in terms of the accuracy of its journal classification system.

Classifying journals into research areas is an essential subject for bibliometric studies.

A classification system can assist with various problems; for instance, it can be used to demarcate research areas (e.g., Glänzel & Schubert, 2003; Waltman & Van Eck, 2012), to evaluate and compare the impact of research across scientific fields (e.g., Leydesdorff and Bornmann, 2015; Van Eck, Waltman, Van Raan, Klautz, & Peul, 2013), and to study the interdisciplinarity of research (e.g., Porter & Rafols, 2009; Porter, Roessner, & Heberger, 2008).

Besides the WoS and Scopus classification systems, there are various other multidisciplinary classification systems, for instance the system of Science-Metrix,the system of the National Science Foundation (NSF) in the US,the UCSD classification system, and the system of the Australian and New Zealand Standard Research Classification (ANZSRC).

Science-Metrix assigns “individual journals to single, mutually exclusive categories via a hybrid approach combining algorithmic methods and expert judgment” (Archambault, Beauchesne, & Caruso, 2011, p. 66). The Science-Metrix system includes 176 categories.

The NSF system also offers a mutually exclusive classification of journals, but it is more aggregated, consisting of only 125 categories (Boyack & Klavans, 2014). The system is used in the Science & Engineering Indicators of the NSF.

A more detailed classification system is the so-called University of California, San Diego (UCSD) classification system. This system, which includes more than 500 categories, has been constructed in a largely algorithmic way. The construction of the UCSD classification system is discussed by Börner et al. (2012).

The ANZSRC’s Field of Research (FoR) classification system has a three-level hierarchical structure. Journals are classified at the top level and at the intermediate level. Journals can have multiple classifications.

Furthermore, Glänzel and Schubert (2003) designed a two-level hierarchical classification system, which can be applied at the levels of both journals and publications. They adopted a top-bottom strategy; specifically, they first defined categories on the basis of the experience of bibliometric studies and external experts. They then assigned journals and individual publications to the categories. This classification system has for instance been used for measuring interdisciplinarity. In their analysis of interdisciplinarity, Wang, Thijs, & Glänzel (2015) explain that instead of the WoS subject categories they use the more aggregated classification system developed by Glänzel and Schubert (2003).

Algorithmic approaches to construct classification systems at the level of journals have been studied by for instance Bassecoulard and Zitt (1999), Chen (2008), and Rafols and Leydesdorff (2009).

A more recent development is the algorithmic construction of classification systems at the level of individual publications rather than journals. Waltman and Van Eck (2012) developed a methodology for algorithmically constructing classification systems at the level of individual publications on the basis of citation relations between publications. Their approach has for instance been used in the calculation of field-normalized citation impact indicators (Ruiz-Castillo & Waltman, 2015).

According to a recent literature review (Waltman, 2015, Section 3), previous studies comparing WoS and Scopus are mainly focused on two aspects. One is the coverage of the databases (e.g., LópezIllescas, De Moya-Anegón, & Moed, 2008; Meho & Rogers, 2008; Mongeon & Paul-Hus, 2016; Norris & Oppenheim, 2007) and the other is the accuracy of the databases when used to assess research output and impact at different levels, ranging from individual researchers to departments, institutes, and countries (e.g., Archambault, Campbell, Gingras, & Larivière, 2009; Bar-Ilan, Levene, & Lin, 2007; Meho & Rogers, 2008; Meho & Sugimoto, 2009). However, no study has systematically compared WoS and Scopus in terms of the accuracy of their journal classification systems.

In the case of WoS, Pudovkin and Garfield (2002) have offered a brief description of the way in which categories are constructed. According to Pudovkin and Garfield, when WoS was established, a heuristic and manual method was adopted to assign journals to categories, and after this, the so-called Hayne-Coulson algorithm was used to assign new journals. This algorithm is based on a combination of cited and citing data, but it has never been published.

Besides this, Katz and Hicks (1995), Leydesdorff (2007), and Leydesdorff and Rafols (2009) have indicated that the WoS classification system is based on a comprehensive consideration of citation patterns, titles of journals, and expert opinion.

In the case of Scopus, there seems to be no information at all on the construction of its classification system.

It should be mentioned that in the most recent versions of WoS two classification systems are available, namely a system of categories and a system of research areas.

The system of categories is more detailed. This system, which is the traditional classification system of WoS and the system on which we focus our attention in this paper, consists of around 250 categories and covers the sciences, social sciences, and arts and humanities.

The system of research areas, which has become available in WoS more recently, is less detailed and comprises around 150 areas.

Besides these two systems, Thomson Reuters also has a classification system for its Essential Science Indicators. This system consists of 22 subject areas in the sciences and social sciences. It does not cover the arts and humanities.

The Scopus journal classification system is called the All Science Journal Classification (ASJC). It consists of two levels. The bottom level has 304 categories, which is somewhat more than the about 250 categories in the WoS classification system. The top level includes 27 categories.

The accuracy of a classification system can seriously influence bibliometric studies. For instance, Leydesdorff and Bornmann (2015) investigated the use of the WoS categories for calculating field-normalized citation impact indicators. They focused specifically on two research areas, namely Library and Information Science and Science and Technology Studies. Their conclusion is that “normalizations using (the WoS) categories might seriously harm the quality of the evaluation”.

A similar conclusion was reached by Van Eck et al. (2013) in a study of the use of the WoS categories for calculating field-normalized citation impact indicators in medical research areas.

Glänzel and colleagues have studied several approaches to validate and improve WoS-based classification systems (Janssens, Zhang, De Moor, & Glänzel, 2009; Thijs, Zhang, & Glänzel, 2015; Zhang, Janssens, Liang, & Glänzel, 2010). They have also proposed an improved way of handling publications in multidisciplinary journals (Glänzel, Schubert, & Czerwon, 1999;Glänzel, Schubert, Schoepflin, & Czerwon, 1999).

Related to this, López-Illescas, Noyons, Visser, De Moya-Anegón, & Moed (2009) have studied an approach to improve the field delineation provided by categories in the WoS classification system.

The SCImago research group has made a number of attempts to improve the Scopus classification system (Gómez-Núnez, ˜ Vargas-Quesada, De Moya-Anegón, & Glänzel, 2011, Gómez-Núnez, ˜ Batagelj, Vargas-Quesada, De Moya-Anegón, & Chinchilla-Rodríguez, 2014, Gómez-Núnez, ˜ Vargas-Quesada, & De Moya-Anegón, 2016).

Two types of approaches can be distinguished for assessing the accuracy of journal classification systems. One is the expert-based approach and the other is the bibliometric approach.

Applying the expert-based approach at a large scale is challenging. No expert has sufficient knowledge to assess the classification of journals in all scientific disciplines, so a large number of experts would need to be involved.

In the case of the bibliometric approach, a further distinction can be made between text-based and citation-based approaches.

Text-based approaches could for instance assess whether the textual similarity of publications in journals assigned to the same category is higher than the textual similarity of publications in journals assigned to different categories.

Instead, we take a citation-based approach to assess the accuracy of journal classification systems.

In this paper, we use direct citation relations. This is because “a co-citation or bibliographic coupling relation requires two direct citation relations” (Waltman & Van Eck, 2012, p. 2380), which means that bibliographic coupling and co-citation relations are more indirect signals of the relatedness of journals than direct citation relations.

The use of direct citation relations is also supported by Klavans and Boyack (2015), who study the algorithmic construction of classification systems at the level of individual publications. They conclude that the use of direct citation relations yields more accurate results than the use of bibliographic coupling or co-citation relations.

Thus, the rationale of our approach can be summarized as follows: A journal should cite or be cited by journals within its own category with a high frequency in comparison with journals outside its category.

Based on this basic principle, we define two criteria to identify journals with questionable classifications. One criterion is that if a journal has only a very small number of citation relations with other journals within its own category, then we believe the classification of the journal to be questionable. The other criterion is that if a journal has many citation relations with journals in a category to which the journal itself does not belong, then it seems likely that the journal incorrectly has not been assigned to this category.

We retrieved from the WoS and Scopus databases all journals that have publications between 2010 and 2014....The choice of a five-year time window is a trade-off between on the one hand the stability of journal classification systems and on the other hand the accuracy of our approach based on direct citation relations.

As can be seen in Table 1, the number of Scopus journals included in the analysis is almost twice as large as the number of WoS journals, and Scopus also includes 80 more categories than WoS. Furthermore, although both databases often assign journals to multiple categories, we found that Scopus tends to assign journals to more categories than WoS. WoS assigns journals to at most six categories, whereas in Scopus there turns out to be a journal that is assigned to 27 categories. Additionally, we found that the average number of categories to which journals belong equals 1.6 in WoS and 2.1 in Scopus. This shows that on average journals have significantly more category assignments in Scopus than in WoS.

As can be seen, almost 60% of all journals in WoS belong to only one category, whereas in Scopus more than 60% of all journals are assigned to two or more categories.

WoS has 1390 journals with ti < 100, accounting for 11% of the total number of WoS journals, whereas Scopus has 5808 journals with ti < 100, which is 24% of the total.3 Hence, Scopus has more journals with ti < 100 than WoS not only in an absolute sense but also from a relative point of view.

Taking a further look at Scopus journals with ti < 100, it turns out that they can be roughly divided into three groups. One group consists of arts and humanities journals, another group consists of newly included journals, and a third group consists of non-English language journals.

Table 2 provides some basic statistics on the assignment of journals to categories in WoS and Scopus when journals with ti < 100 and assignments of journals to multidisciplinary categories are excluded. The table shows the number of journals that belong to at least one non-multidisciplinary category and the number of assignments of journals to non-multidisciplinary categories. As can be seen in the table, in the case of Scopus the constraints that we have introduced cause a much larger decrease in the number of journals and the number of journal-category assignments than in the case of WoS.

Table 3 reports for both WoS and Scopus and for three values of the threshold˛the number of journals and the number of journal-category assignments that satisfy Criterion I.

As can be seen, both databases have assigned a significant number of journals to categories that according to Criterion I seem to be inappropriate.

Moreover, no matter which threshold is considered, Scopus performs substantially worse than WoS, not only in the absolute number of journals and journal-category assignments satisfying Criterion I but, more importantly, also in the percentage of journals and journal-category assignments satisfying the criterion.

Next, we identify WoS and Scopus categories with a high percentage of journals satisfying Criterion I. The identified categories may be seen as the most problematic categories in the two databases, because many of the journals belonging to these categories are only weakly connected to each other in terms of citations.

We select categories that include at least 10 journals with ti ≥ 100 and that, for α˛= 0.1, have at least 50% of their journals satisfying Criterion I. The results for WoS and Scopus are reported in Tables 4 and 5, respectively. In the case of WoS 17 categories have been identified, whereas in the case of Scopus 76 categories have been identified, so more than four times as many as in the case of WoS.

There are three categories that have been identified in the case of both databases: ARCHITECTURE, BIOPHYSICS, and MEDICAL LABORATORY TECHNOLOGY.

Table 6 presents for both WoS and Scopus and for five values of the threshold ˇ the number of journals that satisfy Criterion II.

A journal satisfies both Criterion I and Criterion II if on the one hand it has weak connections, in terms of citations, with its assigned categories while on the other hand it has a strong connection with a category to which it is not assigned. More precisely, our focus is on journals for which the current category assignments all satisfy Criterion I, while there is an alternative category assignment that satisfies Criterion II.

Based on the three journals discussed above, we conclude that journals satisfying the combined Criteria I and II can be classified into at least two types. One type refers to journals for which there is a discrepancy between on the one hand their title and their scope statement and on the other hand what they have actually published. ... The second type refers to journals that seem to have been assigned to a category based only on their title.

First, WoS performs much better than Scopus according to Criterion I. Using the parameter values ˛= 0.05 and ˛= 0.1, the percentage of journals and journal-category assignments satisfying Criterion I is more than two times higher for Scopus than for WoS. Hence, in Scopus journals are assigned to categories with which they are only weakly connected much more frequently than in WoS.

Second, based on Criterion II, WoS and Scopus both perform reasonably well, with WoS having a somewhat better performance than Scopus. For all parameter values that were considered, less than 5% of all journals in WoS and Scopus satisfy Criterion II. In other words, if a journal is strongly connected to a category, WoS and Scopus typically assign the journal to that category.

Third, WoS also presents a significantly better result than Scopus based on the combined Criteria I and II. In WoS there is only one journal satisfying the combined criteria, whereas in Scopus there are 32.

First, Scopus sometimes has confusing category labels. In particular, Scopus sometimes has two categories with very similar labels. Examples are the categories LINGUISTICS & LANGUAGE and LANGUAGE & LINGUISTICS and the categories INFORMATION SYSTEMS & MANAGEMENT and MANAGEMENT INFORMATION SYSTEMS.

Second, lack of transparency is a weakness of both the WoS and the Scopus classification system. We did not find proper documentation of the methods used to construct and update the WoS and Scopus classification systems.

For instance, in the case of a small category, it may be hardly possible for a journal to have a reasonably high relatedness with the category. Therefore it can be expected that many journals belonging to the category will satisfy Criterion I. This may be caused not so much by the misclassification of these journals but more by the small size of the category. On the other hand, in the case of a large category, there may be other problems. A large category may for instance be of a heterogeneous nature and may cover multiple fields that are hardly connected to each other.

看見網絡

2016年7月11日星期一

Wang, Q., & Waltman, L. (2016). Large-scale analysis of the accuracy of the journal classification systems of Web of Science and Scopus. Journal of Informetrics, 10(2), 347-364.

沒有留言:

張貼留言

2016年7月11日 星期一

Wang, Q., & Waltman, L. (2016). Large-scale analysis of the accuracy of the journal classification systems of Web of Science and Scopus. Journal of Informetrics, 10(2), 347-364.

沒有留言:

張貼留言

2016年7月11日星期一