Wolfram, D., & Zhao, Y. (2014). A comparison of journal similarity across six disciplines using citing discipline analysis. Journal of Informetrics, 8(4), 840-853.
揭露與更好地了解科學傳播 (scholarly communication) 的中研究人員、研究團隊、機構、地區/國家、學科、出版品之間的關係,可以在不同尺度下進行研究。分析資料之間的連結可以從直接引用、共被引、共同著作、詞語或主題的共現,或者隱含式主題等形式進行。期刊間的相似性經常利用共被引,過去曾經進行期刊共被引研究的學科有經濟學 (McCain, 1991)、資訊檢索 (Ding, Chowdhury & Foo, 2000)、資訊系統 (Marion, Wilson & Davis, 2005)、醫療資訊學 (medical informatics, Morris & McCain, 1998)、類神經網路 (neural networks, McCain, 1998)以及半導體研究 (Tsay, Xu & Wu, 2003)。然而共被引研究有以下的困難:首先,在多個學科裡的應用,期刊與期刊間的共被引矩陣可能相當稀疏(Boyack, Klavans, & Börner, 2005)。其次,若是沒有Web of Science等資料庫來源,共被引資料的取得將會相當困難。最後,共被引分析大多只利用共被引次數,而沒有考慮引用來源的任何特性。
資訊計量學的重要研究之一是利用引用資料來確認期刊的學科或專業背景,錯誤的分類結果將會影響期刊在領域內的排名。Glänzel and Schubert (2003)發展一個三個步驟的期刊分類程序,使得期刊裡的文章可以根據參考文獻指定主題。Rafols and Leydesdorff (2009)對Web of Science的主題分類(Subject Categories)和Glänzel and Schubert (2003)的主題分類,比較兩種大型矩陣的分解演算法。Leydesdorff and Rafols (2009) 也使用主題分類引用頻率的引用矩陣研究170多種Web of Science的主題分類之間的關係。Leydesdorff and Schank (2008)以視覺化及動畫的方式呈現期刊之間的關係與它們的跨領域性。
過去的研究有利用期刊引用形象(journal citation image),也就是引用目標期刊的所有期刊的列表,做為期刊之間相似度評估的特徵。在計算上,可將期刊引用形象加入引用的頻率分布做為目標期刊的一種特徵(signature)。然而由於具有影響力與聲譽的期刊可能有相當大量的引用期刊,造成期刊引用形象的計算量較大。Wang and Wolfram (2014)提出使用引用期刊所屬的學科,利用引用學科的引用頻率做為期刊的特徵,來降低計算量。Wang and Wolfram (2014),提出引用學科分析(citing discipline analysis)來評估被引用期刊(cited journals)之間的相似性。引用學科分析根據目標期刊的引用期刊(citing journals)在Web of Science的研究領域上的頻率分布,取代直接。Wang and Wolfram (2014)並將引用學科分析應用於JCR的資訊科學與圖書館學 (Information Science & Library Science)的40種期刊,他們發現在多元尺度與群集分的結果中,若干期刊與其他期刊並不接近,有些同時被歸類於其他學科的期刊並不接近於資訊科學與圖書館學的期刊,而且當這些期刊同時也歸類於較大的相關領域時,其具有較高的影響係數(impact factors)將會降低其他期刊的排名。由於Wang and Wolfram (2014)只探討一個學科內的期刊,無法了解多個學科的期刊是否也具有同樣的情形。
本研究同樣利用引用學科分析估計期刊間的相似性。本研究使用來自於6個學科的120種期刊做為研究資料,其中5個學科彼此間較為接近,包括傳播學 (Communication)、電腦科學-資訊系統 (Computer Science-Information Systems)、教育學與教育研究 (Education & Educational Research)、資訊科學與圖書館學 (Information Science & Library Science)、管理學 (Management),另一個學科-地理學 (Geology) 則較遠。選取期刊的出版期間為1987到2012年,分為三個時期1987–1995、 1996–2004和 2005–2012。利用餘弦測量計算期刊間的相似性,並將估計結果應用於多元尺度(multidimensional scaling)、階層式群集分析(hierarchical cluster analysis)、主成分分析(Principal Component Analysis)等技術。
第一時期可以發現六個學科的相關期刊分為五個群集,其中電腦科學-資訊系統的期刊因為在這時期的數量較少而與資訊科學與圖書館學形成一個群集,地理學的群集與其他群集的距離相當遠。原本主題被分類在傳播學的 Journal of Advertising Research(JAR),在這時期的結果裡與管理學的相關期刊較接近。MIS相關期刊以及 Telecommunications Policy (TP)在資訊科學與圖書館學與管理學之間,Social Science Computer Review (SSCR)則是在資訊科學與圖書館學、傳播學與管理學三個學科間,另外Science Communication (SCOMM)雖然主題被分類在傳播學,但在本研究裡的結果,則是歸入資訊科學與圖書館學的群集。
第二時期電腦科學-資訊系統已經和與資訊科學與圖書館學分開,120種期刊共形成6個群集。在這時期的結果,部分主題分類於資訊科學與圖書館學的期刊被歸入其他學科,包括The Journal of the American Medical Informatics Association (JAMIA)和The International Journal of Geographical Information Science (IJGIS)被歸入電腦科學-資訊系統,後者甚至也很接近地理學;Decision Support Systems (DSS)則是原本在電腦科學-資訊系統分類下,而被歸入資訊科學與圖書館學的群集。Journal of Health Communication (JHC)的主題分類有傳播學和資訊科學與圖書館學,但在這時期更靠近傳播學的相關期刊而被歸類在傳播學的群集內。此外,原本分別在教育學與教育研究以及傳播學的the Academy of Management Learning& Education (AMLE)和JAR都被歸類在管理學的群集裡。
第三時期的六個群集對應到六個學科,但原先主題分類為傳播學的一些期刊更靠近於管理學期刊,而同時具有教育學與教育研究以及資訊科學與圖書館學兩種主題的International Journal of Computer-supported Collaborative Learning (IJCSCL),在結果上明顯可歸類為教育學與教育研究的群集。
有些被指定在某一個學科的期刊結果更接近其他學科,但有些被指定在多個學科的期刊則發現只有接近其中的一個學科。因此這個研究所提出的方法在計算期刊間的接近程度時能夠與期刊共被引分析(journal co-citation analysis)等傳統方法互補。由於以下的幾種原因,有愈來愈多的期刊跨越學科的邊界:1) 期刊出版的範圍愈來愈跨學科(more interdisciplinary),因此吸引愈來愈多其他學科的出版品引用。2) 整合式搜尋工具愈來愈普遍,更容易讓其他學科的作者發現。3) 愈多的期刊加入分析,使得期刊在引用學科的特徵上獨特性減少,彼此間更加相似。
為了更加了解跨學科的期刊,本研究利用主成分分析探討期刊的引用學科特徵。在第一時期,資訊科學與圖書館學的相關期刊中,共有五種期刊屬於兩種成分,包含ISJ, ISR, MISQ等三種管理學期刊以及屬於傳播學和電腦科學-資訊系統的SCOMM和DSS。但在資訊科學與圖書館學主題分類下的ARIST, EJIS, IM, JASIST, JIS, JIT, JSIS以及 MISQ同時也被分類在電腦科學-資訊系統主題內,但卻沒有出現在電腦科學-資訊系統的成份裡。並且JAR和 AMLE分別被分類在傳播學和教育學與教育研究,但在本研究的第一時期結果只屬於管理學。
在第二時期的結果中,IM, ISJ, ISR, JIT, JMIS, JSIS, MISQ等許多MIS期刊同時出現在管理學和資訊科學與圖書館學的成份裡。雖然SSCR僅有被分類在資訊科學與圖書館學主題,但同時包含在資訊科學與圖書館學與傳播學的成分中。另外,雖然IJGIS被分類在資訊科學與圖書館學主題,但並沒有出現在任何成分,顯然是六個學科較周邊的期刊。
第三時期同時在管理學和資訊科學與圖書館學成份裡的MIS期刊更增加了。沒有出現在任何成分的期刊,除了IJGIS以外,還增加了Journal of Chemical Information and Modeling (JCIM) and Journal of Cheminformatics (JCHEM)。
A similarity comparison is made between 120 journals from five allied Web of Science
disciplines (Communication, Computer Science-Information Systems, Education & Educational
Research, Information Science & Library Science, Management) and a more distant
discipline (Geology) across three time periods using a novel method called citing discipline
analysis that relies on the frequency distribution of Web of Science Research Areas
for citing articles.
Similarities among journals are evaluated using multidimensional scaling
with hierarchical cluster analysis and Principal Component Analysis.
The resulting visualizations
and groupings reveal clusters that align with the discipline assignments for the
journals for four of the six disciplines, but also greater overlaps among some journals for
two of the disciplines or categorizations that do not necessarily align with their assigned
disciplines.
Some journals categorized into a single given discipline were found to be more
closely aligned with other disciplines and some journals assigned to multiple disciplines
more closely aligned with only one of the assigned disciplines.
The proposed method offers
a complementary way to more traditional methods such as journal co-citation analysis to
compare journal similarity using data that are readily available through Web of Science.
Aspects of scholarly communication may be investigated from different levels of granularity to reveal and better
understand relationships between researchers, research groups, institutions, regions/nations, specializations/disciplines,
publications or publication outlets.
Connections that exist between sources of interest may take the form of direct citations,
co-citations, co-authorship, co-occurrence of words or subjects, or more recently, latent topics.
Journal similarity comparison has been frequently studied using co-citations. Journal
co-citation studies have been carried out on a number of fields including economics (McCain, 1991), information retrieval (Ding, Chowdhury & Foo, 2000), information systems (Marion, Wilson & Davis, 2005), medical informatics (Morris & McCain,
1998), neural networks (McCain, 1998), and semiconductor research (Tsay, Xu & Wu, 2003).
One reason co-citation studies tend to focus on individual fields is that
the journal–journal co-citation matrix that emerges when multiple disciplines are employed can be quite sparse (Boyack,
Klavans, & Börner, 2005).
Co-citation data can also be labor-intensive to extract and are not easily available through citation
database sources such Thomson Reuters Web of Science (WoS) without downloading all references from a corpus of articles.
Citation-based data may also be used to identify disciplinary or specialization affiliations for journals. This is particularly
important for informetrics studies, where the misclassification of journals may affect the ranking of journals within a given
field.
Pudovkin and Garfield (2002) developed
a journal relatedness factor based on citing and cited journals. The goal of their proposed method was to help identify
thematically related journals.
Similarly, Glänzel and Schubert (2003) developed a three-step process for the categorization
of journals that involved pre-defined categories, journal classification and article classification for articles in journals with
ambiguous subject assignments based on references.
More recently, Rafols and Leydesdorff (2009) compared the outcomes of
two algorithms for the decomposition of large matrices against Web of Science Subject Categories and Glänzel and Schubert’s
categorization. The four methods they used resulted in similar map outcomes on a large scale. Leydesdorff and Rafols (2009)
also investigated the relationships among 170+ Web of Science Subject Categories using a citation matrix consisting of
the subject category citation frequencies. They concluded that a classification scheme could be developed using analytical
arguments.
Similarly, Leydesdorff and Schank (2008) visualized and animated the disciplinary ties of three seed journals over
time to demonstrate relationships among journals and their interdisciplinarity.
Co-citation analysis relies on citing articles to identify the strength of relationships between the units of interest, whether
authors, papers or journals; however, it does not consider any attributes of the source of the citations – only that the citations
or co-citations exist. Authors such as White (2001) and Ajiferuke, Lu, and Wolfram (2010) have called for a shift in the focus
of citation-based research away from citation counts received by an author of interest to the origin of the citation and its
characteristics to assess author impact from a different perspective.
This research investigates the use of data derived from citing journals to assess the similarity of cited journals.
The journal
citation image of a target journal, which is determined by the list of journals that cite the target journal, provides an indicator
of the reach of a journal. When combined with the frequencies of citation by the citing journals, the frequency distribution
of citations provided by the citing journals creates a “signature” for each cited journal. These signatures may be compared
using various analytical methods.
One possible challenge associated with using the citing journals themselves to create a
signature for a cited journal is the potentially high number of citing journals that an influential and prolific journal might
attract.
Wang and Wolfram (forthcoming) proposed a method to reduce the computational overhead associated
with the citing journal data. Their method of citing discipline analysis uses the subjects/disciplines assigned to the citing
journal and the resulting citation frequencies of the citing disciplines to constitute the cited journal’s signature.
Wang and Wolfram (forthcoming) employed citing discipline analysis to explore journal similarity among 40 high impact journals in Information Science and Library Science (ISLS) as classified in Journal Citation Reports (JCR). They found that some of the journals classified into the ISLS category did not map in close proximity to one another based on multidimensional scaling and cluster analysis. A number of the journals included were also classified into allied fields, but did not cluster or appear in close proximity to a number of journals only classified in ISLS.
The authors noted that how journals are classified can impact journal rankings within a given field, where journals from related, but larger, fields may have higher journal impact factors (IF), which can reduce the rank of journals that are directly in the field. They observed that many of the high impact journals were from allied areas to ISLS.
One limitation of their exploratory study was the focus on a single discipline. Could similar affinities or differences in journal similarity be revealed using citing discipline analysis with journals from multiple fields? Also, by looking at multiple fields, does this pull journals also classified into other disciplines further out of an assigned category when included?
The present research is guided by the following questions:
1. To what extent are high impact journals from allied disciplines similar to one another based on the discipline of the articles that cite a given journal?
2. Do journals classified into multiple disciplines more closely align with one discipline than another or serve as bridges between the disciplines to which they are mapped based on the citing discipline distribution?
The field of Information Science and Library Science was selected as the seed discipline based on its interdisciplinary nature and familiarity to the authors. The top 20 journals based on 2012 JCR impact factors were selected. Four additional allied WoS disciplines were also selected based on the co-classification of journals appearing in the top 20 ISLS list with other disciplines, and the affiliation of information science and library science academic units with other disciplinary units, which demonstrates another type of alliance.
The four JCR disciplines selected comprised:
◦ Communication (COMM) – based on the existence of schools of communication & information.
◦ Computer Science, Information Systems (CSIS) – based on a number of iSchools and journal overlap in JCR.
◦ Education & Educational Research (EDER) – based on a number of ISLS units affiliated with colleges/schools of education.
◦ Management (MGMT) – based on the overlap of journals, particularly in Management Information Systems (MIS).
A sixth, more intellectually distant, discipline, namely Geology (GEOL), was also included. Geology was selected based on the outcomes of the UCSD Map of Science (Börner et al., 2012), where Earth Sciences were mapped as distant from the Social Sciences. By including journals from a more distant discipline, the ability for the citing discipline method to distinguish between more closely aligned and distant disciplines could be tested, where the distinctiveness of allied disciplines may be less defined by including a more distant discipline in the analysis.
A total of 120 journals were studied over the time period 1987–2012. To allow for a comparison over time, the journals were subdivided into three time periods: 1987–1995, 1996–2004, and 2005–2012.
The data collection method for determining the frequency distribution of citing disciplines used in Wang and Wolfram(forthcoming) was adopted for the present study.
The “Create Citation Report” option in WoS was selected to identify all citing articles. The number of Citing Articles was then selected to retrieve the list of citing articles. The WoS “Analyze Results” feature was next selected for the list of citing articles. On the Results Analysis page, “Research Areas” were selected as the ranking field to provide the tabulated list of citing disciplines.
Salton’s Cosine measure was used determine the similarity between pairs of journals, resulting in a symmetric similarity matrix (Ahlgren, Jarneving, & Rousseau, 2003; Egghe & Leydesdorff, 2009; Leydesdorff, 2006).
Multidimensional scaling (MDS) analysis and hierarchical cluster analysis using SPSS v.20 were employed to visualize and categorize the relationships among the journals for each time period.
To provide a complementary analysis of the hidden groups that may be present in the data, SPSS’s Factor Analysis using Principal Component extraction with varimax rotation was also applied to the data using routines.
Fig. 1 shows the MDS locus of 70 selected journals in the first time period (1987–1995). The raw stress value was 0.01294,and the stress-I was 0.11376.
Only five clusters are shown because a distinctive sixth cluster did not emerge for this time period.
At the five-cluster level of assignment, journals from COMM, EDER, and MGMT form coherent clusters, although based on the MDS map some journals in each field are more closely located to journals in an allied discipline.
The fourth cluster combines journals from ISLS and CSIS. It is possible that the relatively small number of purely CSIS journals for this time period did not provide enough data for these journals to cluster into separate groups.
The MIS journals are situated in the ISLS cluster, but are located between the Library and Information Science (LIS) journals and MGMT journals.
The fifth cluster on the right side of the map consists of GEOL journals and, as would be expected, is quite distinctive from the other five disciplines.
The location of some journals on the map suggests that they are more similar to journals in one of the other given categories in JCR. The Journal of Advertising Research (JAR), for example, is situated with management journals but is only classified with COMM (and with Business, but this discipline is not included in this study).
Some journals classified in two disciplines served as a bridge between the two disciplines in the map. For instance, Science Communication (SCOMM), although classified with COMM journals, is situated in the ISLS cluster, but is in closer proximity to the COMM journals. In the remaining two time periods, SCOMM clusters with the COMM journals. Social Science Computer Review (SSCR), which is classified with ISLS and clusters with the discipline, is situated between ISLS, COMM and/or MGMT journals in each time period. The same is observed with Telecommunications Policy (TP), which bridges ISLS and MGMT for each of the periods of study.
The outcome for the second time period (1996–2004) appears in Fig. 2. The raw stress and stress-I values are 0.01648 and 0.12798, respectively.
In this map, 93 journals were categorized into six clusters, with CSIS separating from the ISLS cluster during this time period.
Of note is the greater number of journals assigned to one or more disciplines but aligning more closely to another discipline or only one of the assigned disciplines.
As an example, the Academy of Management Learning& Education (AMLE) and JAR are situated in the management cluster, and are located relatively far from their assigned disciplines, EDER and COMM, respectively.
Decision Support Systems (DSS) clusters with the ISLS journals but is classified in CSIS only. The same classification is observed for this journal in the third time period.
The Journal of the American Medical Informatics Association (JAMIA) is classified in ISLS and CSIS, but clusters with the CSIS journals for the remaining time periods.
Similarly, the Journal of Health Communication (JHC), which is classified in COMM and ISLS, does not appear to be similar to other ISLS journals and is situated more closely to COMM journals and clusters with them.
The International Journal of Geographical Information Science (IJGIS), which is classified with ISLS journals clusters with CSIS journals for this time period and the third period, although it is at the periphery of the cluster, perhaps indicating the CSIS discipline is the best match of the disciplines studied, but is not a very close match. Proximally, it is situated between CSIS and GEOL journals, which may indicate at least a peripheral similarity to some GEOL journals.
Again, the GEOL journals all cluster together farther from the other disciplinary groups.
Results for the third time period (2005–2012) appear in Fig. 3. The six clusters roughly correspond to the six disciplines. The results of raw stress calculation and the stress-I calculation are still relatively low, at 0.02224 and 0.14914, respectively.
Additional journals classified in COMM map closely to and cluster more closely with MGMT journals.
The International Journal of Computer-supported Collaborative Learning (IJCSCL) is classified with both EDER and ISLS but is clearly situated and clusters with the EDER journals.
Business Strategy and the Environment (BSE), although clustered with MGMT journals, appears to be pulled toward the GEOL journals, indicating a possible relationship with some of these journals.
Once again, the GEOL journals are distinctly clustered away from the remaining journals.
With each time period, more journals cross disciplinary boundaries by clustering with journals from allied disciplines or by mapping more closely to journals in allied disciplines. There may be several influencing factors to account for this observation.
First, the journals indeed may be becoming more interdisciplinary in their publication coverage, thereby attracting more citations from publications in other disciplines.
Second, the journals themselves may not be more interdisciplinary in their coverage, but are now more easily discovered by authors in other disciplines given the wider availability of federated search tools.
Third, with a greater number of journals included in the analysis for each time period, the distinctiveness of the citing discipline signatures may be decreasing, so some journals classified in allied disciplines may appear more similar to one another.
To determine common dimensions from the dataset, a Principal Component Analysis was conducted in SPSS for each time period. Outcomes for the Kaiser–Meyer–Olkin measure of sample adequacy (above 0.7) and Bartlett’s Test of Sphericity (p < .05) indicate the data were appropriate for PCA for all three time periods.
In this period, the six components explain 88.1% of the total variance that correspond to the six disciplines.
There are five journals underlined in the Table 2 that belong to two components, introducing inter-factorial complexity (Van den Besselaar & Heimeriks, 2001; Leydesdorff, 2007), including three MGMT journals (ISJ, ISR, MISQ).
SCOMM and DSS are assigned only to COMM and CSIS by WoS, respectively, but they also appear in the ISLS component, which supports the MDS and clustering outcomes. DSS continues to also load with ISLS for the remaining time periods.
Similarly, JAR and AMLE, journals classified by WoS only in COMM and EDER, respectively, load with the MGMT component for all the time periods in which they appear, but not their classified discipline, lending support for the re-classification of these journals.
The ISLS journals ARIST, EJIS, IM, JASIST, JIS, JIT, JSIS, and MISQ are also classified at CSIS, but do not load with the CSIS component.
Outcomes for 1996–2004 appear in Table 3.
As with the first time period, several MIS journals (IM, ISJ, ISR, JIT, JMIS, JSIS, MISQ) load to both MGMT and ISLS.
SSCR, which is classified with ISLS only, loads into the ISLS component, but also loads with a higher value into the COMM component, perhaps indicating the need for an additional classification assignment.
IJGIS does not load into any of the six components for second or third time period, lending evidence to the peripheral nature of the journal to the six fields studied and indicating it might be misclassified in ISLS.
Component outcomes for 2005–2012 appear in Table 4.
Similar to the previous time period, a growing number of MIS journals load to the ISLS and MGMT components.
As with IJGIS, two other journals, Journal of Chemical Information and Modeling (JCIM) and Journal of Cheminformatics (JCHEM) do not load into any of the six components, indicating a poor association with the six disciplines.
The boundaries between ISLS and CSIS are not as clear in the MDS and cluster analysis outcomes, where combinations of computer science, library and information science and management information systems journals may cluster together depending on the time period. These results may be influenced by the fact that a number of journals in the ISLS area are also categorized in the CSIS or MGMT category, thereby strengthening their relationships.
Despite the influence of the assigned discipline(s) – which then strengthens the disciplinary relationship(s) through journal self-citations, whether or not it is the best fit – some journals appear to be misclassified based on the disciplinary designations of the citations they attract.
As a prime example, JAR, which is only assigned to the COMM field, is situated in the MGMT category for each of the time periods studied for the MDS and clustering analysis as well as with the Principal Component Analysis.
A similar outcome is observed for AMLE for the two time periods in which it is included. It is classified as an EDER journal, but is situated with the MGMT journals based on the analyses conducted.
Several other journals are classified in more than one discipline, but clearly associate only with journals in one of the disciplines. JHC is classified in COMM and ISLS but clusters only with COMM journals. The same is observed for IJCSCL, which is classified in ISLS and EDER but groups only with EDER journals for each of the grouping methods used.
Other journals appear to move between disciplines over time. SCOMM is classified in COMM, but the MDS and clustering outcome places it initially with the ISLS journals for the first time period, and then in the COMM cluster and further away from ISLS in last two time periods.
Some journals, such as JCMC and SSCR, appear to be situated near borders between disciplines, which point to their interdisciplinary appeal or may indicate they serve as bridges between the disciplines.
A number of the ISLS journals that are considered to be library and information science journals (Nisonger & Davis, 2005) appear between CSIS journals and those in the MGMT cluster. In particular, MIS journals appear between MGMT and the ISLS/CSIS cluster for the first time period.
One application of citing discipline analysis that emerges from this analysis is that of decision support for the additional assignment or reassignment of journals to one or more disciplines.
Journal disciplinary classifications should be revisited over time to accommodate shifts in how journals are being cited by other disciplines. ... However, the shifts are at least an indication that the subject affiliations of the citing the journals are changing.
Citing discipline analysis provides, with some modest programming, a relatively easily implemented method for assessing the similarity of journals within disciplines or across allied disciplines that is computationally less expensive than using citing journal-based data.
The analyses reveal distinct groupings of journals based on their disciplinary assignments. As observed earlier in Wang and Wolfram (forthcoming), who only examined journals in a single field, the current research has demonstrated that citing discipline analysis can provide coherent and meaningful disciplinary groupings for journals in allied fields, even when journals from a more intellectually distant field are included.
The clustering and proximity of some journals classified in allied fields has changed over time, perhaps indicating a changing citing relationship between these fields.