Cobo, M. J., López‐Herrera, A. G., Herrera‐Viedma, E., & Herrera, F. (2011). Science mapping software tools: Review, analysis, and cooperative study among tools. Journal of the American Society for Information Science and Technology, 62(7), 1382-1402.
information visualization
科學映射圖(science mapping)的目的是對於一個特定的學科或科學研究領域建立一個能夠描述它的認知、知識或社會結構的圖形。利用科學映射圖進行分析的工作流程一般包含以下的幾個步驟:資料檢索(data retrieval)、前處理(preprocessing)、網路抽取(network extraction)、正規化(normalization)、映射(mapping)、分析(analysis)與視覺化(visualization)。資料檢索是從WoS和Scopus等論文資料庫或是專利、獎補助計畫等各種的資料來源,獲取分析的資料。由於分析結果的好壞取決於資料的品質,因此需要先進行重複和拼錯偵測等前處理。資料的前處理完成後,便可從這些資料裡抽取建置網路的資訊,例如使用Cosine (Salton & McGill, 1983)、Jaccard指標 (Peters & van Raan, 1993), 相同性指標(equivalence index) (Callon, Courtial, & Laville, 1991)和關連強度(association strength) (Coulter, Monarch, & Konda„ 1998; van Eck & Waltman, 2007)等相似性測量(similarity measures)估計分析單元間的相關性。因應作者、文件、期刊和詞語等不同的分析單元,目前已發展出多種網路分析方法,包括針對研究領域認知結構的詞語共現分析(co-word analysis)(Callon, Courtial, Turner, & Bauin, 1983)、從研究人員的合作網絡探討研究領域社會結構的共同作者分析(co-author analysis)(Gänzel, 2001; Peters & van Raan, 1991)、以文獻的引用情形分析研究領域知識結構的書目耦合(bibliographic coupling)(Kessler, 1963)或是共被引分析(co-citation analysis) (Small, 1973),並且可以依據文獻的作者或是出版的期刊將文獻累積起來,進行作者書目耦合(Zhao & Strotmann, 2008)、作者共被引分析(White & Griffith, 1981)、期刊書目耦合(Gao & Guan, 2009; Small & Koenig, 1977)和期刊共被引分析(McCain, 1991),從較宏觀的視野分析。此外,視覺化技術用於呈現出科學映射圖與各種不同的結果。本研究依據上述的步驟以a)資料來源、b)分析單元、c)資料前處理、d)相似性測量、e)映射步驟、f)應用的分析方法、g)視覺化技術、h)結果的詮釋方向對幾種應用科學映射圖分析研究領域的軟體工具進行比較分析,包括Bibexcel (Persson et al., 2009)、CiteSpace II (Chen, 2004, 2006)、CoPalRed (Bailón-Moreno et al., 2005, 2006)、IN-SPIRE (Wise, 1999)、Leydesdorff’s Software、Network Workbench Tool (Börner et al., 2010; Herr et al., 2007)、VantagePoint (Porter & Cunningham, 2004)和VOSViewer (van Eck & Waltman, 2010)等。比較的結果如論文的TABLE 7,本研究認為CiteSpace II具有多種網路圖的布局(layout)方式,能夠提供非常好的互動探索,對於發現的叢集也能利用多種方式抽取關鍵詞來命名;CoPalRed能夠將詞語叢集成主題,對每一個主題測量它們密度(density)與中心性(centrality),做成策略運用的圖表,也能夠用關鍵詞語表達每一個主題的網絡;CoPalRed提供兩種科學映射圖的呈現方式,一為能夠偵測出映射圖上重要區域的主題呈現(theme view),另一則為呈現文件間關係的銀河呈現(galaxy view);NWB 和Sci2都提供分析人員能夠利用Plug-in和Scripting的方式設計本身所需的呈現方式,另外Sci2還能夠將資訊呈現在世界地圖上,方便進行地理分布的分析;VantagePoint能夠產生cross correlation map、auto correlation map和factor map等三種圖形;VOSViewer則有強大的圖形介面功能。
Science mapping aims to build bibliometric maps that describe how specific disciplines, scientific domains, or research fields are conceptually, intellectually, and socially structured.
The general workflow in a science mapping analysis has different steps: data retrieval, preprocessing, network extraction, normalization, mapping, analysis and visualization. At the end of this process, the analyst has to interpret and obtain some conclusions from the results.
Different approaches have been developed to extract networks using the selected units of analysis (authors, documents, journals, and terms). Co-word analysis (Callon, Courtial, Turner, & Bauin, 1983) uses the most important words or keywords of the documents to study the conceptual structure of a research field. Co-author analyzes the authors and their affiliations to study the social structure and collaboration networks (Gänzel, 2001; Peters & van Raan, 1991). Finally, the cited references are used to analyze the intellectual base used by the research field or to analyze the documents that cite the same references. In this sense, bibliographic coupling (Kessler, 1963) analyzes the citing documents, whereas co-citation analysis (Small, 1973) studies the cited documents. Other approaches such as author bibliographic coupling (Zhao & Strotmann, 2008), author co-citation (White & Griffith, 1981), journal bibliographic coupling (Gao & Guan, 2009; Small & Koenig, 1977), and journal co-citation (McCain, 1991) are examples of macro analysis using aggregated data.
Additionally, visualization techniques are used to represent a science map and the result of the different analyses, for example, the networks can be represented using heliocentric maps (Moya-Anegón et al., 2005), geometrical models (Skupin, 2009), thematic networks (Bailón-Moreno, Jurado Alameda, & Ruíz-Baños, 2006; Cobo, López-Herrera, Herrera-Viedma, & Herrera, 2011), or maps where the proximity between items represents their similarity (Davidson,Wylie, & Boyack, 1998; Polanco, François, & Lamirel, 2001; van Eck & Waltman, 2010). To show the evolution in different time periods, Cluster string (Small, 2006; Small & Upham, 2009; Upham&Small, 2010), and thematic areas (Cobo et al., 2011) can be used.
Science mapping or bibliometric mapping is a spatial representation of how disciplines, fields, specialities, and individual documents or authors are related to one another (Small, 1999).
In this section, different important aspects of a science mapping analysis are described, such as: (a) the data sources, (b) the units of analysis, (c) the data preprocessing, (d) the similarity measures that can be used to normalize the relations between the units of analysis, (e) the mapping steps, (f) the types of methods of analysis that can be employed, (g) some visualization techniques, and finally, (h) interpretation of results.
The most common units of analysis in science mapping are journals, documents, cited references, authors (the author’s affiliation can also be used), and descriptive terms or words (Börner et al., 2003). The words can be selected from the title, abstract, body of the document, or some combinations of them. Furthermore, we can select the original keywords of the documents (author’s keywords) or the indexing ones provided by the bibliographic data sources (e.g., ISI Keywords Plus) as words to analyze.
The difference between bibliographic coupling and co-citation is that bibliographic coupling is a fixed and permanent relationship because it depends on the references contained in coupled documents, whereas co-citation will vary over time (Jarneving, 2005).
Different similarity measures have been used in the literature, the most popular being Salton’s Cosine (Salton & McGill, 1983), Jaccard’s Index (Peters & van Raan, 1993), Equivalence Index (Callon, Courtial, & Laville, 1991), and Association Strength (Coulter, Monarch, & Konda„ 1998; van Eck & Waltman, 2007), which is also known as Proximity Index (Peters & van Raan, 1993; Rip & Courtial, 1984) or Probabilistic Affinity Index (Zitt, Bassecoulard, & Okubo, 2000).
Clustering algorithms are used to perform community detection, splitting the global network into different subnetworks. Recently, some authors have proposed new and different clustering algorithms to carry out this task: Streemer (Kandylas, Upham, & Ungar, 2010), spectral clustering (Chen et al., 2010), modularity maximization (Chen & Redner, 2010). and a bootstrap resampling with a significance clustering (Rosvall & Bergstrom, 2010), among others.
To show the evolution of detected clusters in successive time periods (temporal analysis), different techniques have been used: Cluster string (Small, 2006; Small & Upham, 2009; Upham & Small, 2010), rolling clustering (Kandylas et al., 2010), alluvial diagrams (Rosvall & Bergstrom,
2010), ThemeRiver visualization (Havre, Hetzler, Whitney, & Nowell, 2002), and thematic areas (Cobo et al., 2011).
2010), ThemeRiver visualization (Havre, Hetzler, Whitney, & Nowell, 2002), and thematic areas (Cobo et al., 2011).
In this section, we present nine representative software tools specifically developed to analyze scientific domains by means of science mapping. These software tools are as follows:
• Bibexcel (Persson et al., 2009)
• CiteSpace II (Chen, 2004, 2006)
• CoPalRed (Bailón-Moreno et al., 2005, 2006)
• IN-SPIRE (Wise, 1999)
• Leydesdorff’s Software
• Network Workbench Tool (Börner et al., 2010; Herr et al., 2007)
• Sci2 Tool (Sci2 Team, 2009)
• VantagePoint (Porter & Cunningham, 2004)
• VOSViewer (van Eck & Waltman, 2010)
• Bibexcel (Persson et al., 2009)
• CiteSpace II (Chen, 2004, 2006)
• CoPalRed (Bailón-Moreno et al., 2005, 2006)
• IN-SPIRE (Wise, 1999)
• Leydesdorff’s Software
• Network Workbench Tool (Börner et al., 2010; Herr et al., 2007)
• Sci2 Tool (Sci2 Team, 2009)
• VantagePoint (Porter & Cunningham, 2004)
• VOSViewer (van Eck & Waltman, 2010)
To make a better comparison between software, a common science mapping analysis over a specific unit of analysis has to be performed. ... For this reason, we select the words (or keywords) as the unit of analysis to perform the science mapping analysis.
As an example, we study the conceptual structure (Cobo et al., 2011) of the research field of fuzzy set theory (FST; Zadeh, 1965, 2008) by using the publications that have appeared in the most important and prestigious journals during 2005 to 2009, according to their impact factor, on the
topic: Fuzzy Sets and Systems and IEEE Transactions on Fuzzy Systems.
topic: Fuzzy Sets and Systems and IEEE Transactions on Fuzzy Systems.
The amount of documents analyzed was 1,576, and they were downloaded4 from the WoS. Specifically, 1,086 documents were published by the journal Fuzzy Sets and Systems, and 490 by IEEE Transactions on Fuzzy Systems.
The author’s keywords and Keywords Plus of each document were used in the analysis. After a de-duplicating step (CoPalRed was used to carry out this task), there were 5,034 keywords. ... The whole network build from the co-occurrence of these keywords contains an amount of 25,705 links.
First, a co-word analysis was performed using CiteSpace. Given that it does not allow us to load the data in csv format, the dataset had to be loaded without any preprocessing from an ISIWoS format file. In Figure 1, the map generated by CiteSpace is shown. The map was made using the top 200 keywords. The lines between nodes represent the cosine similarity measure. The shadowed nodes represent clusters and the clusters’names were chosen selecting the most important keywords from each cluster according to the tf·idf measure. Inside each cluster there is a sphere which represents its centroid, and its volume is proportional to the size of the cluster.
Second, in Figure 2 the result obtained by CoPalRed is shown. In Figure 2a the generated strategic diagram is shown, and in Figure 2b the thematic network of a specific theme (FUZZY-CONTROL) is drawn. CoPalRed generated the maps using those keywords with a frequency equal to or higher than five and a co-occurrence value equal to or higher than three. The whole network contains 229 nodes and 432 links between them after this pruning.With this pruning, we maintain the most frequent and important keywords.
The strategic diagram shows the main detected themes studied by the FST field in the studied period, categorizing them in four classes according to their Callon’s density and Callon’s centrality measures. Each theme in the strategic diagram is associated with a sphere and a label. Labels were chosen selecting the most central node of its associated thematic networks, where each node corresponds with a keyword. The volume of spheres represents the number of documents associated with each theme (or keyword in thematic networks). This information is also associated with the labels.
Finally, the size of the lines in thematic networks represents the degree of association (equivalence index) between two nodes.
Third, the csv file exported by CoPalRed was loaded in IN-SPIRE.After defining the dataset, and selecting the terms, IN-SPIRE generated two maps: the Galaxy view (Figure 3) and Theme view (Figure 4).
In the galaxy view, the shadows represent groups of documents that are considered to be similar. The names of these themes are generated using the most important keywords according to their tf·idf measure.
In the Theme view, the height of each peak corresponds to topic strength at that location, and the extent of each peak corresponds to the area
Unlike the other software tools analyzed, IN-SPIRE uses the vector space model to represent the documents, so it needs a large amount of terms to correctly detect the themes. In our dataset, the documents do not contain the necessary keywords, so IN-SPIRE could not determine correctly the similarity among documents.
Now, the csv file was loaded into Sci2 Tool, and a cooccurrence network using the keywords (author’s keywords and Keywords Plus) was created.We applied a weak component clustering to the whole network obtained after dropping the keywords with a frequency below five and the links with a co-occurrence value below three (the whole network is the same as the generated by CoPalRed). The biggest weak component is shown in Figure 5. The size of the nodes is proportional to the respective keyword’s frequency, and the size of the lines represents the co-occurrence (without normalization) of the linked nodes. Only the names of the top 50 keywords are shown.
Fifth, a Factor Map was built by VantagePoint (Figure 6) using those keywords with a frequency equal to or higher than five (after this pruning the dataset contains 392 keywords). Each node represents a cluster of terms. The label of each theme was chosen selecting its most important term. The size of nodes is proportional to the number of documents, and the line between nodes represents the similarity (Pearson’s r) between factors.
Finally, the co-occurrence matrix generated by CoPalRed was transferred to the VOSViewer format to visualize the results of a co-word analysis. In Figure 7, the cluster view is shown. We can observe how the different keywords are laid out over a horizontal line. This means that the keywords placed on the left are very dissimilar to those placed on the right side of the maps. The size of the keywords’ labels is proportional to their frequency, VOSViewer visualizes only the labels of the most important ones (most frequent) in the higher zoomed view. VOSViewer selects a random different color for each cluster. Inside each cluster, the strength of a color at one point represents the density of this point. The density is measured using a Gaussian kernel function (van Eck & Waltman, 2010).
The preprocessing capabilities ofVantagePoint are one of its main strengths. It incorporates a high quantity of import filters that allows us to load data from almost all the bibliographical sources. Moreover, the clean-up list method and the possibility of applying a thesaurus to carry out this task, helps the preprocessing task, especially the de-duplicating process. Vantage-Point allows us to export the results into a csv file, so other software tools can read this data to perform their own science mapping analysis over the preprocessed data.
CoPalRed has a good de-duplicating process too, but it is focused only on one kind of unit, the keywords.
NWB Tool and Sci2 Tool have a de-duplicating module, but this needs an external process to be performed using external software. However, both NWB Tool and Sci2 Tool have a good network reduction process.
• CiteSpace is able to visualize the networks using different layouts. The name of the detected clusters can be assigned using different metrics. Finally, the user graphic-interface allows us to interact with the network to carry out a good exploration of it.
• CoPalRed groups the items (keywords) under themes, and they are categorized in a strategic diagram according to their centrality and density. This categorization allows us to detect the motor themes of the field. For each theme, CoPalRed generates a thematic network where the relation between its keywords is shown.
• IN-SPIRE allows us to visualize two kinds of map, if sufficient data are provided. In the Theme view, the analyst can detect the most important zones of the map (where more documents are localized). The Galaxy view allows us to easily detect similar documents based on their content.
• NWBTool and Sci2 Tool generate similar visualizations.They allow us to visualize the networks using different plugins and applying different layouts and scripts to customize the view. Sci2 Tool incorporates thematic maps where the information is shown over a world map.
• VantagePoint has three kinds of map that allow several views to be created. In the map view, VantagePoint shows a legend that explains the size of the lines, being the only software that produces this legend. Maybe one strength of this software tool is the user graphic-interface that allows the user to select a set of items from the map, whereupon it shows the documents associated with these items and other information in the detail window.
• VOSViewer has a powerful user graphic-interface that allows us to examine the generated maps easily. Detecting (in a visual way) the most important themes is not always easy, and in the cluster view it is difficult to say to which cluster the keywords that are between two clusters (borderline keywords) belong.
So, for example, a co-word analysis performed by CoPalRed could be complemented by IN-SPIRE using the terms extracted from abstracts and titles. Moreover, IN-SPIRE could show the conceptual changes over time using its Time tools. In addition, CiteSpace and Sci2 could perform an intellectual and social analysis. CiteSpace could be used for a document co-citation analysis and Sci2 for a co-author analysis.The resulting network of authors could be displayed over a world map using the geolocation capabilities of Sci2. Finally, VantagePoint could be used to build a factor map on keywords, and show the institutional affiliation related to the most interesting detected factors.
沒有留言:
張貼留言