Takeda, Y., & Kajikawa, Y. (2009). Optics: A bibliometric approach to detect emerging research domains and intellectual bases. Scientometrics, 78(3), 543-558.
information visualization
本研究利用網路型態(topology)進行叢集,從期刊論文的引用所構成的網路上探討研究的結構(the structure of research)並且偵測興起中的研究領域(emerging research domains)。用來分析的資料是從ISI資料庫中以發表期刊的主題屬於光學(optics)以及內容與光學相關的論文以及它們的引用關係,論文共有281,404筆。然後以最大的成分做為叢集演算法的輸入,選擇做為輸入的論文共有203,203筆,從論文發表的年份和被引用的情形來看,被捨棄的論文通常是那些比較老舊並且被引用較少的論文。所使用的叢集演算法[21, 22]是由Newman and Girvan (2004)提出以模組性(modularity)最佳化為基礎的演算法。叢集的結果總共將論文分為825個群體,其中前五個群體的成員幾乎佔輸入論文的90%。這五個群體的主題分別是光通訊(optical communication)、量子光學(quantum optics)、光學資料處理(optical data processing)、光學分析(optical analysis)和雷射(lasers)。接著反覆將叢集結果產生的每一個群體輸入叢集演算法,產生次一級的子群體。分析每一級較大的子群體的主題並計算成成員論文出版年份的平均值,根據論文出版年份平均值來判斷該子群體是否是興起中的研究領域。本研究也發現第一個群體「光通訊」的論文出現較晚,直到1990年後才有非常快速的成長,但是很快地便超越其他的主題。就論文所屬的國家來比較,美國、日本、德國、法國和中國都生產了大量的論文,但美國的論文有較高的引用情形,而中國的論文則少有被引用。本研究並且將叢集產生的群體進行視覺化[24](Adai, Date, Wieland, and Marcotte, 2004),使得彼此有引用關係的節點被定位在鄰近,形成群聚,從產生的圖形判斷群體的主題為光學這個學科的研究前沿(research fronts)或是知識基礎(intellectual bases)。前者的圖形較為緊湊(compact),大多為應用或基礎型的研究,例如光通訊、量子光學和光學資料處理。屬於知識基礎的主題,例如光學分析和雷射,大多與做為研究所使用的儀器相關,其圖形則看起來較為伸展(stretched)。
In this paper, we constructed a citation network of papers and performed topological clustering method to investigate the structure of research and to detect emerging research domains in optics.
There are various motivations to conduct bibliometric works; to evaluate research output [2–4], to grasp overall structure of research [5–8], and to detect emerging research domains [7–9].
We collected citation data of optics-related publications from the Science Citation Index (SCI) compiled by the Institute for Scientific Information (ISI). We used the Web of Science, which is a Web-based user interface of ISI’s citation databases.
We collected citation data by two manners. One is journal based, and another is topic based approach.
Therefore, we focused on the maximum connected component. The retrieved data were converted into a non-weighted, non-directed network. The obtained network currently has 203,203 papers (72.21% of the retrieved data).
Subsequently, the network was divided into clusters using the topological clustering method [21, 22]. Traditionally, co-citation has been used to analyze a citation network. However, because co-citation is accompanied by a time lag to create a link, and analysis of intercitation is more relevant in the similarity of pairs of documents than co-citation [23], we used intercitation as a link.
The clustering algorithm is based on modularity Q, which is defined as follows [21, 22]:
In other words, Q is the fraction of links that fall within clusters, minus the expected value of the same quantity if the links fall at random without regard for the clustered structure. Since a high value of Q represents a good division, we stopped joining when ΔQ became minus. A good partition of a network into clusters means there are many within-cluster links and minimal between-cluster links.
After clustering the network, we heuristically characterized each cluster by the titles and abstracts of papers that are frequently cited by the other papers in the same cluster.
The clustered network is visualized by using a large graph layout (LGL) [24], which is based on a spring layout algorithm where links play the role of spring connecting nodes. Thanks to such layout, papers that cite each other and form a group can be located in closer proximity.
The citation network of optics can be divided into 825 clusters by topological clustering method, where the number of nodes in each cluster varies from 2 (the smallest clusters) to 50,725 (the biggest cluster, #1). ... Cluster size, i.e., the number of nodes in each cluster, steeply decreases until the 4th cluster, and after the 5th cluster they become negligible. Therefore, in the following, we focus on the top 5 clusters. They cover almost 90% of papers in the network.
Therefore, to detect research front, in our words, emerging research domains, the usage of average publication year in the cluster or time slices of the networks [9, 26] are effective. In our analysis, we can detect emerging subdomains of optics research by using average publication year.
We found that optics consist of main five subclusters, optical communication, quantum optics, optical data processing, optical analysis, and lasers. In all of these clusters, USA has the largest publications and receives citations, which means the leadership and the presence of USA. China has large number of publications, but not citations. The average publication year of papers in each cluster indicated that the largest cluster, optical communication, has the lowest publications before 1990 but recently it steeply increases.
The visualization of the network for the top five clusters showed that three large clusters are compact but two relatively small clusters were stretched. This implies that the later works as an intellectual base for the former.
沒有留言:
張貼留言