van Eck, N. J., Waltman, L., Dekker, R. & van den Berg, J. (2010). A comparison of two techniques for bibliometric mapping: Multidimensional scaling and VOS. Journal of American Society for Information Science and Technology, 61(12), 2045-2061.
本論文從學理與實驗兩方面比較MDS(Multidimesional Scaling)和VOS兩種資訊視覺化的維度縮減方法。作者認為從理論上來看,VOS是在計算Stress Function時以項目(items)間的相似程度(similarity)做為它們在圖形上對應點的接近程度(proximity)加權的一種特殊MDS方法,當兩個項目之間愈相似,它們映射在圖形上的點的接近程度便應該有愈大的加權。由於在實際的資訊視覺化應用上,大多數的項目之間沒有關聯,它們之間的相似程度經常被視為0,傳統的MDS方法也會運用這些項目之間的相似程度進行,導致映射的資料點形成一個接近於圓的圖形,觀測次數較多的項目較有可能會被映射到圓的中心,作者認為這樣的結果是失真的。作者以資訊科學(information science)的作者共被引關係等四種書目計量資料分別進行實驗,結果MDS的兩種相似程度的視覺化結果都相當接近圓,而VOS的結果比較令人滿意。
MDS has been widely used for constructing maps of authors (e.g., McCain, 1990; White & Griffith, 1981; White & McCain, 1998), documents (e.g., Griffith, Small, Stonehill, & Dey, 1974; Small & Garfield, 1985; Small, Sweeney, & Greenlee, 1985), journals (e.g., McCain, 1991), and keywords (e.g., Peters & Van Raan, 1993a, 1993b; Tijssen & Van Raan, 1989).
To determine similarities between items, co-occurrence frequencies are usually transformed using a similarity measure. Two types of similarity measures can be distinguished.
Direct similarity measures (Van Eck & Waltman, 2009; also known as local similarity measures, see Ahlgren, Jarneving, & Rousseau, 2003) determine the similarity between two items by applying a normalization to the co-occurrence frequency of the items. ... Various direct similarity measures are being used in the literature. Especially the cosine and the Jaccard index are very popular. ... We argued that the most appropriate measure for normalizing co-occurrence frequencies is the so-called association strength (e.g., Van Eck & Waltman, 2007b; Van Eck et al., 2006). This measure is also known as the proximity index (e.g., Peters & Van Raan, 1993a; Rip & Courtial, 1984) or as the probabilistic affinity index (e.g., Zitt, Bassecoulard, & Okubo, 2000).
Indirect similarity measures (also known as global similarity measures), on the other hand, determine the similarity between two items by comparing two vectors of co-occurrence frequencies. ... For a long time, the Pearson correlation has been the most popular indirect similarity measure in the literature (e.g., McCain, 1990, 1991; White & Griffith, 1981; White & McCain, 1998). Nowadays, however, it is well known that the Pearson correlation has some undesirable properties (Ahlgren et al., 2003; Van Eck & Waltman, 2008). A well-known indirect similarity measure that does not have these undesirable properties is the cosine.
The aim of MDS is to locate items in a low-dimensional space in such a way that the distance between any two items reflects the similarity or relatedness of the items as accurately as possible. The stronger the relation between two items, the smaller the distance between the items.
To determine the locations of items in a map, MDS minimizes a so-called stress function. ... MDS determines the locations of items in a map by minimizing the (weighted) sum of the squared differences between on the one hand the transformed proximities of items and on the other hand the distances between items in the map.
Depending on the transformation function f, different types of MDS can be distinguished. The three most important types of MDS are ratio MDS, interval MDS, and ordinal MDS. Ratio and interval MDS are also referred to as metric MDS, while ordinal MDS is also referred to as non-metric MDS. Ratio MDS treats the proximities pij as measurements on a ratio scale. Likewise, interval and ordinal MDS treat the proximities pij as measurements on, respectively, an interval and an ordinal scale. In ratio MDS, f is a linear function without an intercept. In interval MDS, fcan be any linear function, and in ordinal MDS, f can be any monotone function.
The stress function in Equation 3 can be minimized using an iterative algorithm. Various different algorithms are available. A popular algorithm is the SMACOF algorithm (e.g., Borg & Groenen, 2005). This algorithm relies on a technique known
as iterative majorization.
as iterative majorization.
The idea of VOS is to minimize a weighted sum of the squared distances between all pairs of items. The squared distance between a pair of items is weighed by the similarity between the items. To avoid trivial solutions in which all items have the same location, the constraint is imposed that the average distance between two items must be equal to one.
Under certain conditions, MDS and VOS are closely related. More specifically, the proposition indicates that VOS can be regarded as a kind of weighted MDS with proximities and weights chosen in a special way.
For each of the three data sets that we consider, three maps were constructed, one using the MDS-AS (direct similarity measure) approach, one using the MDS-COS (indirect similarity measure) approach, and one using the VOS (direct similarity measure) approach.
Experiment I: 405 authors publishing papers in 36 journals closely related to the Journal of the American Society for Information Science and Technology between 1999 and 2008.
Experiment II: 2079 journals that belong to at least one social science subject category.
Experiment III: 831 keywords that were automatically identified in the abstracts (and titles) of 7492 articles published in 15 operations research journals between 2001 and 2006.
A notable property of the maps produced by the two MDS approaches is that important items (i.e., items with a large number of co-occurrences) tend to be located toward the center of a map. This is especially clear in the case of the authors and keywords data sets. Many relatively unimportant items are scattered throughout the periphery of a map.
The VOS approach seems to produce maps in which important and less important items are distributed fairly evenly over the central and peripheral areas.
In various studies of the field of information science (e.g., Åström, 2007; White & McCain, 1998; Zhao & Strotmann, 2008a,b,c), it has been found that the field consists of two quite independent subfields. We adopt the terminology of Åström (2007) and refer to the subfields as information seeking and retrieval (ISR) and informetrics.
A distinction is sometimes made between ―hard‖ and ―soft‖ ISR research (e.g., Åström, 2007; Persson, 1994; White & McCain, 1998). Hard ISR research is system-oriented and is for example concerned with the development and the experimental evaluation of information retrieval algorithms. Soft ISR research, on the other hand, is user-oriented and studies for example users’ information needs and information behavior. The distinction between hard and soft ISR research is visible in all three maps.
Both (MDS) approaches have a tendency to locate the most prominent authors in the center of a map and less prominent authors in the periphery. Due to this tendency, the separation of subfields becomes more difficult to see.
In these maps(MDS-AS and MDS-COS), a number of prominent ISR authors (e.g., Spink, Wang, and Wilson) are located equally close or even closer to various informetrics authors than to some of their less prominent ISR colleagues. However, contrary to what the maps seem to suggest, there is in fact very little interaction between the prominent ISR authors and the informetrics authors. The relatively small distance between these two groups of authors therefore does not properly reflect the structure of the field of information science. The small distance is merely a technical artifact, caused by the tendency of the MDS-AS and MDS-COS approaches to locate important items in the center of a map.It follows from this observation that distances in maps constructed using the MDS approaches may not always give an accurate representation of the relatedness of items. Hence, in the case of the MDS approaches, the validity of the interpretation of a distance as an (inverse) measure of relatedness seems questionable.
The VOS map in ... does properly reflect the large separation between the prominent ISR authors and the informetrics authors. In this map, the interpretation of a distance as a measure of relatedness therefore seems valid.
This means that in the MDS-AS approach MDS is typically applied to similarity data that consists largely of zeros. MDS attempts to determine the locations of items in a map in such a way that for each pair of items with a similarity of zero the distance between the items is the same. In the case of similarity data that consists largely of zeros, it is not possible to construct a low-dimensional map with exactly the same distance between each pair of items with a similarity of zero. MDS can only try to approximate such a map as closely as possible. Our experiments indicate that the best possible approximation is a map with an almost perfectly circular structure.
From this point of view, one can say that the VOS approach distinguishes itself from the MDS-AS approach in that it does not give equal weight to all pairs of items. The VOS approach gives more weight to more similar pairs of items. It gives little weight to pairs of items with a low similarity. As mentioned above, similarity data is typically dominated by low values, in particular by zeros. ... In the case of the VOS approach, however, pairs of items with a low similarity receive little weight and therefore have little effect on a map. Because of this, the VOS approach does not produce circular maps.
沒有留言:
張貼留言