2013年3月27日 星期三

Cobo, M. J., López‐Herrera, A. G., Herrera‐Viedma, E., & Herrera, F. (2012). SciMAT: A new science mapping analysis software tool. Journal of the American Society for Information Science and Technology.

Cobo, M. J., López‐Herrera, A. G., Herrera‐Viedma, E., & Herrera, F. (2012). SciMAT: A new science mapping analysis software tool. Journal of the American Society for Information Science and Technology.

information visualization


本論文介紹科學映射分析(science mapping analysis)工具SciMAT的功能與應用。根據Börner et al., (2003)和Cobo et al., (2011b)等研究,科學映射分析的流程可以分成以下的步驟:1) 資料檢索 (data retrieval)、2) 資料前處理 (data preprocessing)、3) 網路資訊抽取 (network extraction)、4) 網路資訊正規化 (network normalization)、5) 映射 (mapping)、6) 分析 (analysis)以及7) 視覺化 (visualization)。特別要說明的是「資料前處理」是處理原始資料的重複和錯誤、區分時段(time slicing)以及網路資料縮減等工作,是決定科學映射分析能否得到良好結果的重要步驟之一。「網路資訊抽取」則是從論文的書目資料裡建立分析項目之間的關連,包括共現(co-occurrence)、耦合(coupling)和直接連結(direct linkage)等關係。兩個分析項目的共現關係取決於它們是否共同出現在一組文件內以及共同出現的次數;文件間的耦合關係則建立於它們是否具有共同的項目以及其數量大小,作者及期刊間的耦合關係則由屬於他們的文件的共同項目聚集而成;直接連結則是文件與它們的參考文獻之間的引用關係。運用不同的分析項目以及不同的關係可以對科學研究領域進行各種面向的分析,例如以文件中共同出現的作者所抽取的共同作者關係建立的網絡可以分析科學研究領域的社會結構(social structure);對於由詞語在文件內的共現關係所建構的詞語共現網絡進行分析則可以得知領域的概念結構(conceptual structure)和所處理的主要概念;經由文獻引用所產生的共被引關係和書目耦合關係則可以用來分析科學研究領域的知識結構(intellectual structure)。透過上面對於科學映射分析流程的分析,可以知道一個科學映射分析工具最好能夠具備以下的特性:a) 包含多種不同的模組來處理科學映射工作流程中的各個步驟;b) 具備強大的消除重複模組;c) 能夠建構各種書目計量的大型網絡;d) 具有良好的視覺化技術;e)輸出結果應該包含書目計量的測量結果與指標。本研究所提出的SciMAT工具具備上述的各種特性。SciMAT包含三個重要的模組:知識庫(knowledge base)、工作流程的配置以及測量結果與映射圖的視覺化模組。SciMAT的知識庫模組提供分析者匯入各種書目來源的檢索結果,將文件的作者、關鍵詞、期刊和參考文獻等各種資料儲存於知識庫內。運用此知識庫提供的功能,分析者能夠進行編輯與前處理等改善資料品質的工作以獲得更好的分析結果。SciMAT的工作流程配置模組循序漸進地設定分析的時間區段、分析的項目單位和關係、使用資料的次數閾值、進行資料正規化的相似性測量方式、叢集方式以及網絡分析、成效分析、時間分析和歷時性分析等相關參數。視覺化模組可以針對每個分析時段(period)提供詳細的網路圖、策略圖表以及相關的書目計量測量結果,也能夠提供代表研究主題(theme)的叢集在不同時段的演進情形等歷時性(longitudinal)的圖表

The general workflow in a science mapping analysis has different steps (Börner et al., 2003; Cobo et al., 2011b) (see Figure 1): data retrieval, data preprocessing, network extraction, network normalization, mapping, analysis, and visualization. At the end of this process, the analyst has to interpret and obtain conclusions from the results.

Usually, the data retrieved from the bibliographic sources contain errors, so a preprocessing process must be applied first. In fact, the preprocessing step is one of the most important to obtain good results in science mapping analysis. Different preprocessing processes can be applied to the raw data, such as detecting duplicate and misspelled items, time slicing, data reduction, and network reduction (for more information, see Cobo et al., 2011b).

A co-occurrence relation is established between two units (authors, terms, or references) when they appear together in a set of documents; that is, when they co-occur throughout the corpus.

A coupling relation is established between two documents when they have a set of units (authors, terms, or references) in common. Furthermore, the coupling can be established using a higher level unit of aggregation, such as authors or journals. That is, a coupling between two authors or journals can be established by counting the units shared by their documents (using the author’s or journal’s oeuvres).

Finally, a direct linkage establishes a relation between documents and references, particularly a citation relation.

In addition, different aspects of a research field can be analyzed depending on the units of analysis used and the kind of relation selected (Cobo et al., 2011b).

For example, using the authors, a coauthor or coauthorship analysis can be performed to study the social structure of a scientific field (Gänzel, 2001; Peters & van Raan, 1991).

Using terms or words, a co-word (Callon, Courtial, Turner, & Bauin, 1983) analysis can be performed to show the conceptual structure and the main concepts dealt with by a field.

Cocitation (Small, 1973) and bibliographic coupling (Kessler, 1963) are used to analyze the intellectual structure of a scientific research field.

We therefore think it would be desirable to develop a science mapping software tool that satisfies the following requirements: (a) it should incorporate modules to carry out all the steps of the science mapping workflow, (b) it should present a powerful de-duplicating module, (c) it should be able to build a large variety of bibliometric networks, (d) it should be designed with good visualization techniques, and (e) it should enrich the output with bibliometric measures.

SciMAT generates a knowledge base from a set of scientific documents where the relations of the different entities related to each document (authors, keywords, journal, references, etc.) are stored. This structure helps the analyst to edit and preprocess the knowledge base to improve the quality of the data and, consequently, obtain better results in the science mapping analysis.

Taking into account the GUI, there are three important modules: (a) a module dedicated to the management of the knowledge base and its entities, (b) a module (wizard) responsible for configuring the science mapping analysis, and (c) a module to visualize the generated results and maps. These modules allow the analyst to carry out the different steps of the science mapping workflow.

Regarding its functionalities, the module to manage the knowledge base is responsible for building the knowledge base, importing the raw data from different bibliographical sources, and cleaning and fixing the possible errors in the entities. It can be considered as a first stage in the preprocessing step.

As shown, the workflow is divided into four main stages: (a) to build the data set, (b) to create and normalize the network, (c) to apply a cluster algorithm to get the map, and (d) to perform a set of analyses. These stages and their respective steps are described below:
1. Build the data set: At this stage, the user can configure the periods of time used in the analysis (select the periods), the aspects that he or she wants to analyze (select the unit of analysis:  the conceptual (using terms or words), social (using authors), and intellectual (using references) aspects), and the portion of the data that has to be used (to filter the data using a minimum frequency as a threshold).
2. Create and normalize the network: At this stage, the network is built using co-occurrence or coupling relations or, indeed, aggregating coupling. Then, the network is filtered to keep only the most representative items. Finally, a normalization process is performed using a similarity measure (association strength (Coulter et al., 1998; van Eck &Waltman, 2007), Equivalence Index (Callon et al., 1991), Inclusion Index, Jaccard Index (Peters & van Raan, 1993), and Salton’s cosine (Salton & McGill, 1983).
3. Apply a clustering algorithm to get the map and its associated clusters or subnetworks: At this stage, the clustering algorithm used to build the map has to be selected. Different clustering methods are available in SciMAT, such as the Simple Centers Algorithm (Cobo et al., 2011a; Coulter et al., 1998), Single-linkage (Small & Sweeney, 1985), and variants such as Complete-linkage, Average-linkage, and Sum-linkage.
4. Apply a set of analyses: The final step of the wizard consists of selecting the analyses to be performed on the generated map.
(a) Network analysis: By default, SciMAT adds Callon’s density and centrality (Callon et al., 1991; Cobo et al., 2011a) as network measures to each detected cluster in each selected period. Callon’s centrality measures the degree of interaction of a network with other networks, and it can be understood as the external cohesion of the network. ... Callon’s density measures the internal strength of the network, and it can be understood as the internal cohesion of the network. ... These measures are useful to categorize the detected clusters of a given period in a strategic diagram (Cobo et al., 2011a).
(b) Performance analysis: SciMAT is able to assess the output according to several performance and quality measures. To do that, it incorporates into each cluster a set of documents using a document mapper function and then calculates the performance based on quantitative and qualitative measures (using citation-based measures, number of documents, etc.).
(c) Temporal analysis or longitudinal analysis: This allows the user to discover the conceptual, social, or intellectual evolution of the field. SciMAT is able to build an evolution map to detect the evolution areas (Cobo et al., 2011a) and an overlapping items graph (Price & Gürsey, 1975; Small, 1977) across the periods analyzed. Furthermore, SciMAT allows the user to choose different measures to calculate the weight of the “evolution nexus” (Cobo et al., 2011a) between the items of two consecutive periods, such as association strength (Coulter et al., 1998; van Eck & Waltman, 2007), Equivalence Index (Callon et al., 1991), Inclusion Index, Jaccard’s Index (Peters & van Raan, 1993), and Salton’s cosine (Salton & McGill, 1983).

At the end of all the steps in the wizard, the map would be built using the selected configuration. Then, the results would be saved to a file, and the visualization module loaded. The visualization module has two views: Longitudinal and Period.

The Period view (see Figure 12) shows detailed information for each period, its strategic diagram, and for each cluster, the bibliometric measures, the network, and their associated nodes.

Finally, in the Longitudinal view the overlapping map and evolution map are shown. This view helps us to detect the evolution of the clusters throughout the different periods, and study the transient and new items of each period and the items shared by two consecutive periods.

Taking into account quantitative measures such as the number of documents associated with each theme (cluster), we can discover where the fuzzy community has been employing a great effort (e.g., H-INFINITY-CONTROL, FUZZY-CONTROL, T-NORM, etc.). Similarly, taking into account the qualitative measure, we could identify the themes with a greater impact; that is, the themes that have been highly cited.

Combining the units of analysis and the bibliographic relations among them, SciMAT can extract 20 kinds of bibliographic networks, including the common bibliographic networks used in the literature, such as coauthor (Gänzel, 2001; Peters & van Raan, 1991), bibliographic coupling (Kessler, 1963), journal bibliographic coupling (Small & Koenig, 1977), author bibliographic coupling (Zhao & Strotmann, 2008), cocitation (Small, 1973), journal cocitation (McCain, 1991), author cocitation (White & Griffith, 1981), and co-word (Callon et al., 1983).

沒有留言:

張貼留言