2015年4月15日 星期三

Moya-Anegón, F. de, Vargas-Quesada, B., Chinchilla-Rodríguez, Z., Corera-Álvarez, E., Munoz-Fernández, F.J., & Herrero-Solana, V. (2007). Visualizing the marrow of science. Journal of the American Society for Information Science and Technology, 58(14), 2167–2179.

Moya-Anegón, F. de, Vargas-Quesada, B., Chinchilla-Rodríguez, Z., Corera-Álvarez, E., Munoz-Fernández, F.J., & Herrero-Solana, V.(2007). Visualizing the marrow of science. Journal of the American Society for Information Science and Technology, 58(14), 2167–2179.

由於一般認為將領域之間的關係表示為圖形,通過考慮這些關係的可能性能夠提供許多資訊,不論對新進人員或專家皆有助於理解與分析,因此對這方面方法與工具的需求逐漸提高。過去的研究大多以期刊為分析單位,產生所有科學研究領域的科學映射圖。例如Leydesdorff (2004a, 2004b)使用雙重連結成分(biconnected components)的圖形分析演算法,將JCR 2001的科學研究進行分類。Boyack, Klavans, and Börner (2005)則應用了8種不同的期刊相似性測量7121種SCI和SSCI期刊,並採用VxOrd產生科學映射圖。Samoylenko, Chao, Liu, and Chen (2006)建構科學期刊的最小生成樹(minimum spanning trees),他們使用的資料是SCI 1994到2001的資料。本研究提出一個將ISI (Institute of Scientific Information)類別繪製成科學映射圖的方法,這個方法利用根據類別間的共被引資訊建構類別間的連結,以尋徑網路(PathfinderNetwork)縮減不重要的連結,然後以Kamada-Kawai方法決定節點在圖上的布局(layout),最後利用因素分析(factor analysis)進行結構確認。本研究和先前的研究都是針對類別利用共被引資訊呈現科學映射圖。以類別為分析單位在代表上足夠明確,並且比起較小的單位,這種方式對非專家使用者(nonexpert user)較具有資訊且使用者友善。Moya-Anegón et al. (2004)針對西班牙科學研究領域的視覺化,Moya-Anegón et al. (2005)則進一步利用科學映射圖比較英國、法國和西班牙三個國家的科學研究領域。本研究依循Börner, Chen, and Boyack (2003)提出的知識領域映射流程。使用的資料為7585種ISI期刊,ISI的類別共有219個,但扣除多學科科學後(Multidisciplinary Sciences),採用的類別共218個。利用共被引計算期刊相似性的方式為

Cc(ij)為期刊i和期刊j共被引次數,c(i)和c(j)則分別是期刊i和期刊j被引用次數。然後以尋徑網路和Kamada-Kawai方法繪製網路圖,經過尋徑網路處理後,有較多連結的節點具有較重要的地位。而尋徑網路是一種以型態為主的方法,與以群集為主的因素分析彼此間可以互補,因素分析可以識別、界定與定名科學映射圖上呈現的主題區域,而尋徑網路則負責讓使主題區域更加明顯,將類別分組成束,並顯示連接不同顯著類別的路徑,以及總體的型態結構。。最後總計共分析出35個因素,通過陡坡考驗(scree test)則有16個。科學映射圖上的類別可以分為三個群集:醫學與地球科學、基礎與實驗科學以及社會科學。

This study proposes a new methodology that allows for the generation of scientograms of major scientific domains, constructed on the basis of cocitation of Institute of Scientific Information categories, and pruned using PathfinderNetwork, with a layout determined by algorithms of the spring-embedder type (Kamada–Kawai), then corroborated structurally by factor analysis.

We present the complete scientogram of the world for the Year 2002.

This need arises from the general conviction that an image or graphic representation of a domain favors and facilitates its comprehension and analysis, regardless of who is on the receiving end of the depiction and whether a newcomer or an expert.

Science maps can be very useful for navigating around in scientific literature and for the representation of its spatial relations (Garfield, 1986). They are optimal means of representing the spatial distribution of the areas of research while also offering additional information through the possibility of contemplating these relationships (Small & Garfield, 1985).

From a general viewpoint, science maps reflect the relationships between and among disciplines; but the positioning of their tags clues us into semantic connections while also serving as an index to comprehend why certain nodes or fields are connected with others.

Moreover, these large-scale maps of science show which special fields are most productively involved in research—providing a glimpse of changes in the panorama—and which particular individuals, publications, institutions, regions, or countries are the most prominent ones (Garfield, 1994).

It is a tool in that it allows the generation of maps, and a method in that it facilitates the analysis of domains, by showing the structure and relations of the inherent elements represented. In a nutshell, scientography is a holistic tool for expressing the discourse of the scientific community it aspires to represent, reflecting the intellectual consensus of researchers on the basis of their own citations of scientific literature.

In Moya-Anegón et al. (2004), we ventured forth with a historic evolution of scientific maps from their origin to the present, and proposed ISI-JCR category cocitation for the representation of major scientific domains. Its utility was demonstrated by a visualization of the scientific domain of geographical Spain for the Year 2000.

Since then, other works related with the visualization of great scientific domains have appeared; however, all use journals as the unit of analysis, with the exception of a study based on the cocitation of categories (Moya-Anegón et al., 2005), comparatively focusing on three geographic domains (England, France, and Spain).

In contrast, Leydesdorff (2004a, 2004b) classified world science using the graph-analytical algorithm of biconnected components in combination with JCR 2001.

Boyack, Klavans, and Börner (2005) applied eight alternative measures of journal similarity to a dataset of 7,121 journals covering over 1 million documents in the combined Science Citation and Social Science Citation Indexes, to show the first global map of science using the force-directed graph layout tool VxOrd.

Samoylenko Chao, Liu, and Chen (2006) proposed an approach through the construction of minimum spanning trees of scientific journals, using the Science Citation Index from 1994 to 2001.

In processing and depicting the scientific structure of great domains, we further developed a methodology that follows the flow of knowledge domains and their mapping as proposed by Börner, Chen, and Boyack (2003).

Because ISI assigns each journal to one or more subject categories, to designate a subject matter (i.e., ISI category) for each document, we also downloaded the Journal Citation Report (JCR; Thomson Corporation, 2005a), in both its Science and Social Sciences editions, for 2002.

The downloaded records were exported to a relational database that reflects the structured information of the documents. This new repository contained nearly 1 million (N = 901,493) source documents: articles, biographical items, book reviews, corrections, editorial materials, letters, meeting abstracts, news items, and reviews that had been published in 7,585 ISI journals (N = 5,876 + 1,709). These were classified in a total of 219 categories, altogether citing 25,682,754 published documents.

As informational units, they are, in themselves, sufficiently explicit to be used in the representation of all disciplines that make up science in general. These categories, in combination with the adequate techniques for the reduction of space and the representation of the information to construct scientograms of science or of major scientific domains, prove much more informative and user friendly for quick comprehension and handling by nonexpert users than those obtained by the cocitation of smaller units of cocitation.

For these reasons, we used the 219 categories of the JCR 2002 as units of measure, with the exception of “Multidisciplinary Sciences.” ... The maximum number of categories with which we worked, then, was 218.

In light of our previous experience (Moya-Anegón et al., 2004, 2005), we use cocitation as the similarity measure to quantify the relationship existing between each one of the JCR categories.

Therefore, after a number of trials, we arrived at the conclusion that using tools of Network Analysis, the best visualizations are those obtained through raw data cocitation as the unit of measure. Yet, it also was necessary to reduce the number of coincident cocitations to enhance pruning algorithm yield. Therefore, to those raw data values we added the standardized cocitation value. In this way, we could work with raw data cocitation while also differentiating the similarity values between categories with equal cocitation frequencies. The key was a simple modification of the equation for the standardization of the degree of citation proposed by Salton and Bergmark:




where CM is cocitation measure, Cc is cocitation frequency, c is citation, and i and j are categories.

Over the history of the visualization of scientific information, very different techniques have been used to reduce n-dimensional space. Either alone or in conjunction with others, the most common are multidimensional scaling, clustering, factor analysis, self-organizing maps, and PathfinderNetworks (PFNET).

In our opinion, PFNET with pruning parameters r = ∞, and q = n − 1 is the prime option for eliminating less significant relationships while preserving and highlighting the most essential ones, and capturing the underlying intellectual structure in a economical way.

Although PFNET has been used in the fields of Bibliometrics, Informetrics, and Scientometrics since 1990 (Fowler & Dearhold, 1990), its introduction in citation was due to the hand of Chen (1998, 1999), who introduced a new form of organizing, visualizing, and accessing information. The end effect is the pruning of all paths except those with the single highest (or tied highest) cocitation counts between categories (White, 2001).

The spring embedder type is most widely used in the area of documentation, and specifically in domain visualization. Spring embedders begin by assigning coordinates to the nodes in such a way that the final graph will be pleasing to the eye (Eades, 1984). Two major extensions to the algorithm proposed by Eades (1984) have been developed by Kamada and Kawai (1989) and Fruchterman and Reingold (1991).

While Brandenburg, Himsolt, and Rohrer (1995) did not detect any single predominating algorithm, most of the scientific community goes with the Kamada–Kawai algorithm. The reasons upheld are its behavior in the case of local minima, its capacity to minimize differences with respect to theoretical distances in the entire graph, good computation times, and the fact that it subsumes multidimensional scaling when the technique of Kruskal and Wish (1978) is applied.

We can effortlessly see which are the most important nodes in terms of the number of their connections and, in turn, which points act as intermediaries with other lines, as hubs or forking points.

Whereas factor analysis is a clustering-oriented procedure, PFNET is topology oriented. Yet, they are extremely valuable as complements in the detection of the structure of a scientific domain.

Thus, factor analysis is responsible for identifying, delimiting, and denominating the great thematic areas reflected in the scientogram.

Meanwhile, PFNET is in charge of making the subject areas more visible, grouping their categories into bunches, and showing the paths that connect the different prominent categories, and finally, the overall topology of the domain.

Factor analysis identifies 35 factors in the cocitation matrix of 218 × 218 categories of world science 2002. Through the scree test we extracted 16, which we tagged using the previously explained method; these accumulate 70.2% of the variance (Table 1)

The number of categories included in at least one factor is 195. Twenty-three were not included in any factor (Table 2), and 25 belonged to two factors simultaneously (Table 5).

That is, a category or thematic area occupying a central position in the scientogram will have a more general or universal nature in the domain as a consequence of the number of sources it shares with the rest, contributing more to scientific development than those with a less central position.

The more peripheral the situation of a category or subject area, the more exclusive its nature, and the fewer the sources it will appear to share with other categories; accordingly, the lesser its contribution to the development of knowledge through scientific publications.

An intermediary position favors the interconnection of other categories or thematic areas. 

This broad interpretation of our scientograms not only explains the patterns of cocitation that characterize a domain but also foments an intuitive way for specialists and nonexperts to arrive at a practical explanation of the workings of PFNET (Chen & Carr, 1999).

From a macrostructural point of view, we can distinguish three major zones.

In the center is what we could call Medical and Earth Sciences, consisting of Biomedicine, Psychology, Etiology, Animal Biology & Ecology, Health Care & Service, Orthopedics, Earth & Space Science, and Agriculture & Soil Sciences.

To the right, we can see some other basic and experimental sciences: Materials Sciences & Physics, Applied; Engineering; Computer Science & Telecommunications; Nuclear Physics & Particles & Fields; and Chemistry.

To the left is the neighborhood of the social sciences, with Applied Mathematics, Business, Law, and Economy, and Humanities.

On one hand, it offers domain analysts the possibility of seeing the most essential connections between categories of given domain.

On the other hand, it allows us to see how these categories are grouped in major thematic areas, and how they are interrelated in a logical order of explicit sequences.

沒有留言:

張貼留言