2013年4月8日 星期一

Moya-Anegón, F., Vargas-Quesada, B., Herrero-Solana, V., Chinchilla-Rodríguez, Z., Corera-Álvarez, E., and Munoz-Fernández, F. J. (2004). A new technique for building maps of large scientific domains based on the cocitation of classes and categories. Scientometrics, 61, 129–145.

Moya-Anegón, F., Vargas-Quesada, B., Herrero-Solana, V., Chinchilla-Rodríguez, Z., Corera-Álvarez, E., and Munoz-Fernández, F. J. (2004). A new technique for building maps of large scientific domains based on the cocitation of classes and categories. Scientometrics, 61, 129–145.
information visualization
本研究以西班牙2000年的學術研究為共被引分析的領域,在Web of Science上蒐集住址(address)為Spain的文獻做為共被引分析的資料,並且將共被引關係進行資訊視覺化。根據文獻發表的期刊在ISI-JCR上的分類(categories),共計222個分類,然後將222個分類再以 ANEP的分群類目,分為25群(classes)。以分類及群為單位,將兩個分類的共被引次數除以兩個分類的被引次數成績的平方根Ccij/sqrt(ci*cj),計算共被引關係。然後根據共被引關係產生1)表現所有的分群之間關係的網路圖;2)25個分群內的所有分類之間關係的網路圖;3)以222個分類為中心的網路圖。為了容易觀察,分群之間與分類之間的網路圖都利用最小生成樹(minimum spanning tree,MST)減少連結的數目,分類為中心的網路圖則是將共被引數小於平均共被引數再加上一個標準差的連結線與分類省略。從25個分群的網路圖可以發現這些分群依據ISI的三種資料庫分成三組:圖形的左方為SCI-E的各群所組成,右上方為SSCI的各群,在SSCI各群稍微下方的則是A&HCI。SCI-E的各群可以再分成最上面為生命科學(Life Sciences)相關的各分群,中間為物理化學以及地球和太空科學(Physics, Chemistry and Earth and Space Sciences)等各分群,以及下方的工程及電腦科學(Engineering and Computer Sciences)等相關分群。連結左方與右方各群的分群有心理學與教育學(Psychology and Educational Sciences)、數學(Mathematics)和社會科學(Social Sciences)。從社會科學的各類關係網路圖和圖書資訊學(Library & Information Sciences)為中心的網路圖上,都可以發現圖書資訊學與有關的分類,包括電腦科學與資訊系統(Computer Sciences & Information Systems)、傳播學(Communication)、歷史與哲學(History & Philosophy of Science)、管理學(Management)、電腦科學與科際整合應用(Computer Sciences & Interdisciplinary Applications)、計畫與發展(Planning & Development)、商學(Business)以及社會科學-科技整合(Social Sciences-Interdisciplinary)等。
To make visible to the mind that which is not visible to the eye, or to create a mental image of something that is not obvious (e.g. an abstraction), are two definitions of the word “visualization” that point to the intrinsic need to represent information in a non-traditional manner.
The present study proposes a new technique for schematic visualization applied to the analysis of large scientific domains. The scientific domain is understood in the terms put forth by Hjørland and Albrechten, as the reflection of interactions between authors, and their role in science, through citation.
What we can clearly derive from this general overview is that domain maps or visualizations are primarily used, thus far, to reveal relationships among documents, to detect the most important authors within a given discipline, or to analyze the structure of an area of knowledge and its evolution. The methodology may involve clustering, MDS, factor analysis, or social networks based on models of graphs, or some combination thereof.
For strictly academic purposes we downloaded from the Web of Science – specifically from: the Science Citation Index-Expanded (SCI-E), the Social Science Citation Index (SSCI) and the Arts and Humanities Citation Index (A&HCI) – the records with at least one Spanish address in the field “address” from the year 2000, and put them into an ad hoc database for consultation. The database held 172,562 author names, who published a total of 26,062 documents (articles, biographical items, book reviews, corrections, editorial materials, letters, meeting abstracts, news items and reviews) in 3,838 different ISI journals. When these were broken down into the 243 categories established by the ISI-JCR for the year 2000, 222 categories were covered. ... These 222 categories were grouped, based on the ANEP classification, in 25 classes, again taking into account that one single category may belong to different classes.
Because we try to show the relationships existing among diverse disciplines in the natural sciences, social sciences, arts and humanities, we must first solve the problem of uneven level of citation, as suggested by Small and Garfield. For this reason, when carrying out the cocitation queries corresponding to the classes or categories, we normalize this measurement of association by dividing the cocitation by the square root of the product of the frequency of the cites of the cocited documents
Kamada & Kawai’s algorithm was used to automatically produce representations on a plane, starting from a circular position of the nodes. It generates social networks with aesthetic criteria such as the maximum use of available space, the minimum number of crossed links, the forced separation of nodes, building balanced maps, etc.
The result is a tree structure with the following characteristics: a map representing the Spanish scientific structure as a whole dividing it in 25 big classes. 25 maps, one for every ANEP class, each of which containing the ISI-JCR categories that ANEP evaluation experts have considered appropriate. And finally, 222 maps of ISI-JCR categories, one for every ISI-JCR category with its nearest neighbours.
Class cocitation map (first level)
The result is a symmetrical class cocitation matrix of 25 × 25. Of course, the degree of intellectual connections shown by the cocitation matrix among certain classes is very high, making it difficult in some zones to clearly visualize the structure of the domain. In following the advice of Small, therefore, we believe it better to eliminate some connections, as “the loss of information of the structure implies a gain in simplicity, justifying the sacrifice in some cases.” We therefore prune the relations between classes using the Minimum Spanning Tree (MST): when relationships among classes are under a threshold value, they are successively deleted until only one is left, totally disconnected from the rest.  Then the threshold value is re-established, leaving no class disconnected.
This information, together with the cocitation matrix, is processed by the Kamada & Kawai algorithm to produce a social network where each class is represented by a node connected with other nodes by undirected links. The relationships among them and their intensity is seen in the thickness of the links.
Category cocitation map (secondary level)
For each ANEP class, we consult the cocitation of ISI categories, normalized as explained earlier, to obtain a symmetrical cocitation matrix of n × n categories, based on the number of each in each class. After pruning by MST we assign a color to each category, which is the color of the class it belongs to. We adjust slightly in each level: the category with the greatest scientific output is the biggest one, and the rest are made proportional to this, reflecting their relative magnitude in the context of total publication.
The Kamada & Kawai algorithm is supplied with the name of the categories that make up each class, their size, color, and the corresponding cocitation matrix, which is what establishes the relationships among categories.
Map of neighbors (third level)
We depart from a cocitation matrix of 222 × 222 categories. From there, we build a list of neighbors based on the specific subject area under consideration.  ...We eliminate the links and vertices that are connected to the central node or category, with a threshold value under the average plus standard deviation. To represent this as a network we again use the Kamada & Kawai algorithm, but now specifying that the value of the distance between vertices is a similarity function of cocitation. It is thus obvious which categories are closest to the central one, and share a greater topic affinity.

Results
Map of the first level or class cocitation Figure
From a general standpoint, we can clearly distinguish the three ISI databases. In the leftward area SCI-E is vaguely contoured, wheras in the upper right the SSCI database is reflected, and pendent from it, the contents of what would be the A&HCI.
Within the SCI-E zone, there are three prominent blocks: one we could call Life Sciences, including Livestock and Fishing, Food Sciences and Technology, Medicine, Physiology and Pharmacology, Psychology and Educational Sciences, Molecular & Cellular Biology & Genetics, Plant & Animal Biology & Ecology, and Agriculture. Another block would be that of Physics, Chemistry and Earth and Space Sciences, containing Chemistry, Geosciences, Chemical Technology, Physics and Space Sciences and Materials Science & Technology. Finally, the group of Engineering and Computer Sciences would contain Civil Engineering & Architecture, Computer Sciences & Technology, Electric, Electronic & Automated Engineering, Mechanical, Naval and Aeronautical Engineering, Mathematics and Electronic & Telecommunications Technology.
In the upper right zone of Social Sciences and Art and the Humanities we quickly infer the SSCI database represented by Social Sciences, Economy and Law, as well as the A&HCI database, with History & Arts and Philology & Philosophy.
It is noteworthy that Psychology and Educational Sciences, Mathematics and Social Sciences act as a bridge for the network as a whole, connecting the three major component groups, which coincide with the three ISI databases.
The nodes with a greater number of links occupy more or less central positions easily related with the rest (for example Multidisciplinary Sciences, Physics & Space Sciences or Chemistry), whereas those with fewer links are situated in the periphery, (among others, Philology & Philosophy, Economy, Law and even Medicine).
The characteristic feature of this map type is that it depicts an egocentered network, where the node studied is always situated in the center, and the rest orbit around it. Although the representation is balanced and tends to occupy all the available space, the intensity of the relationships is reflected here by the distance between nodes. Thus, the most closely related categories are, respectively, Computer Sciences & Information Systems, Communication, History & Philosophy of Science, Management, Computer Sciences & Interdisciplinary Applications, Planning & Development, Business and Social Sciences-Interdisciplinary.

沒有留言:

張貼留言