Chen, C. (2006). CiteSpace II: detecting and visualizing emerging trends and transient patterns in scientific literature. Journal of American Society for Information Science and Technology, 57(3), 359-377.
information visualization
本研究提出一個整合研究專業(specialty)的研究前沿(research front)以及其引用的知識基礎(intellectual base)的視覺化介面。本論文定義研究前沿為研究專業上一組急遽出現的概念(concepts)與研究議題(research issues);研究前沿的知識基礎則是包含這些概念與研究議題的論文引用或者共同被引用的論文。在針對某一個專業進行其研究前沿與知識基礎進行視覺化時,首先蒐集專業相關的論文,從這些論文抽取代表研究前沿的詞語,並以論文所引用或共被引的論文做為專業的知識基礎,建立分別代表研究前沿的詞語和知識基礎的論文的二方網路(bipartite networks)以同時呈現研究前沿的相關概念與研究議題以及知識基礎的論文。在建立起來的網路上透過詞語和論文形成的叢集可以發現重要的研究前沿和知識基礎,藉由詞語呈現叢集的概念與研究議題更能有效地表達研究前沿的意涵,並且如果加上論文的發表時間來分析,可以從急遽出現在較多論文的相關詞語找出發展中的研究前沿。此外,對於網路進行中介中心性(centrality of betweenness)分析可以發現研究前沿間具有樞紐地位的論文,並且透過Pathfinder演算法可以發現論文間的主要關連。
A specialty is conceptualized and visualized as a time-variant duality between two fundamental concepts in information science: research fronts and intellectual bases.
A research front is defined as an emergent and transient grouping of concepts and underlying research issues.
The intellectual base of a research front is its citation and co-citation footprint in scientific literature— an evolving network of scientific publications cited by
research-front concepts.
research-front concepts.
The concept of a research front was originally introduced by Price (1965) to characterize the transient nature of a research field. Price observed what he called the immediacy factor: There seems to be a tendency for scientists to cite the most recently published articles. In a given field, a research front refers to the body of articles that scientists actively cite.
A specialty can be conceptualized as a time-variant mapping from its research front to its intellectual base.
Typical questions regarding a research front may include:
How did it get started? What is the state of the art? What are the critical paths in its evolution?
How did it get started? What is the state of the art? What are the critical paths in its evolution?
To address such questions, we need to detect and analyze emerging trends and abrupt changes associated with a research front over time. We also need to identify the focus of a research front at a particular time in the context of its intellectual base, to reveal significant intellectual turning points as a research front evolves, and to discover the interconnections between different research fronts.
Braam, Moed, and Raan (1991) defined a specialty as “focused attention by a number of scientific researchers to a set of related research problems and concepts” (p. 252). They studied the continuity and stability of a specialty in terms of the similarity between co-citation clusters across consecutive years. The similarity between two co-citation clusters is determined by comparing aggregated word profiles of the clusters.
In part, this is because we define a research front differently to emphasize emerging trends and abrupt changes as the defining features of a research front. A research front is the domain of a time-variant mapping, and its intellectual base is the co-domain of the mapping.
Griffith et al. (1974) found that between-cluster co-citation links tend to be weaker than within-cluster co-citation links. ... To understand how specialties and different thematic trends interact with each other, it is essential to study the nature of long-range, between-cluster links and understand why articles in different specialties were connected.
Labeling clusters is concerned with the clarity and interpretability of co-citation clusters. The standard approach relies on word profiles derived from articles citing a cluster of co-cited articles. ... Word-profile approaches have drawbacks. First, word profiles may not converge to a focused message. Analysts and users will make a substantial amount of sense-making efforts to synthesize a diverse range of word profiles. Second, cluster labels based on aggregating word profiles tend to be too broad to be useful. In practice, many users would be interested in not only the most commonly used terms but also terms that can lead to profound changes. Terms associated with an emerging trend could be overshadowed by a broader and more persistent theme.
In CiteSpace II, a current research front is identified based on such burst terms extracted from titles, abstracts, descriptors, and identifiers of bibliographic records. These terms are subsequently used as labels of clusters in heterogeneous networks of terms and articles.
CiteSpace II makes it easier for users to identify pivotal points. In addition to inspecting salient visual attributes, the user easily can see nodes with high betweenness centrality (Freeman, 1979).
The procedure of using CiteSpace II is described in the following steps,
(1) Identify a knowledge domain using the broadest possible term.
(2) Data collection
(3) Extract research front terms: CiteSpace II first collects n-grams, or terms, from titles, abstracts, descriptors, and identifiers of citing articles in a dataset. The present study used single words or phrases of up to four words. ... Research-front terms are determined by the sharp growth rate of their frequencies.
(4) Time slicing
(5) Threshold selection
(6) Pruning and merging: Pathfinder network scaling is the default option in CiteSpace II for network pruning (Chen, 2004; Schvaneveldt, 1990).
(7) Layout
(8) Visual inspection
(9) Verify pivotal points
We demonstrate the new features of CiteSpace with case studies of two research fields: mass-extinction research (1981–2003) and terrorism research (1990–2003).
Mass-extinction research (1981–2003).
The input data for CiteSpace II were retrieved from citation index databases via the Web of Science based on a topic search for articles published between 1981 and 2003 on mass extinction. The scope of the search included four topic fields in each bibliographic record: title, abstract, descriptors, and identifiers. The search was limited to articles in English only.
The resultant dataset contains a total of 771 records.
A total of 333 research-front terms were detected from the four topic fields of these records.
Terrorism research (1990–2003).
The terrorism research (1990–2003) dataset consists of 1,776 records resulted from a topic search on terrorism in the Web of Science.
A total of 1,108 research-front terms were found.
The fully integrated representation of research fronts and intellectual bases in the same network visualization has three practical advantages.
First, using surged topical terms rather than the most frequently occurring title words is particularly suitable for detecting emerging trends and abrupt changes. In visualized networks, research-front terms are explicitly linked to intellectual-base articles. This design presents a compact representation of the duality between a research front and its intellectual base.
Second, research-front terms naturally lend themselves to be used as labels of specialties.
Third, it overcomes a common drawback of word-profile-based labeling approaches. Aggregated word profiles may not converge to an intrinsic focus. Terms selected based on sudden increased popularity measures are particularly suitable to characterize a current research front.
The Pathfinder algorithm extracts the most salient patterns from a network, but it does not scale well. CiteSpace II implements a concurrent version of the algorithm. The concurrent Pathfinder algorithm has substantially optimized the network scaling module, although it still took 6,000 seconds to process 14 networks and merge them into a 1,704-node network.
In conclusion, the new features introduced to CiteSpaceII for detecting and visualizing emerging trends and abrupt changes in a field of research have produced promising and encouraging results. The major findings are that
• the surge of interest is an informative indicator for a new research front;
• using heterogeneous networks of terms and articles provides a comprehensive representation of the dynamics of a specialty;
• research-front terms are informative cluster labels;
• citation tree-ring visualizations are visually appealing and semantically interpretable;
• betweenness centrality metrics identify semantically valid pivotal points.
• the surge of interest is an informative indicator for a new research front;
• using heterogeneous networks of terms and articles provides a comprehensive representation of the dynamics of a specialty;
• research-front terms are informative cluster labels;
• citation tree-ring visualizations are visually appealing and semantically interpretable;
• betweenness centrality metrics identify semantically valid pivotal points.