Chen, C. and Morris, S. (2003). Visualizing evolving networks: minimum spanning trees versus pathfinder networks. In IEEE Symposium on Information Visualization 2003, Oct. 19-21, 2003, Seattle, Washington, USA, 67-74.
information visualization
本研究從網路的型態(topological)與動態(dynamical)兩方面比較Minimal Spanning Tree(MST)和Pathfinder Network(PFNet)兩種連結縮減的方法在共被引網路(co-citation network)的應用。連結縮減處理的目的在於使共被引網路能夠更清楚地呈現出重要的連結(論文或作者的共被引關係)與節點(論文或作者),因此一方面需要保留原本網路的型態,但另一方面當以論文引用的時間順序呈現時,能夠從網路的變化了解學科專業的演進。本研究發現經過MST處理的共被引網路會保留連結程度(degree)較高節點周圍的連結,所以可以維持原本網路的結構,但因為某些路徑上的重要連結被移除,因此不足以表達學科專業的演進;然而從PFNet所產生的網路不僅可以對應到學科專業的主題,同時在以共被引的時間呈現時也能夠發現到主題內以及主題之間的發展情形。
We compare the visualizations of co-citation networks of scientific publications derived by two widely known link reduction algorithms, namely minimum spanning trees (MSTs) and Pathfinder networks (PFNETs).
Two criteria are derived for assessing visualizations of evolving networks in terms of topological properties and dynamical properties.
The results suggest that although high-degree nodes dominate the structure of MST models, such structures can be inadequate in depicting the essence of how the network evolves because MST removes potentially significant links from high-order shortest paths. In contrast, PFNET models clearly demonstrate their superiority in maintaining the cohesiveness of some of the most pivotal paths, which in turn make the growth animation more predictable and interpretable.
The shortage of comprehensive examinations of the evolution of citation networks is due to various reasons, including the lack of an overarching framework that accommodates underlying theories and system functionalities across relevant disciplines, the lack of integrated network analysis and visualization tools, the lack of widely accessible longitudinal citation network data, and the lack of tools that specifically facilitate the analysis of network evolution.
A common problem with visualizing a complex network is that a large number of links may prevent users from recognizing salient structural patterns.
In fact, an MST is a special case of a Pathfinder network because a Pathfinder network is the set union of all the possible MSTs derived from a network [Schvaneveldt 1990].
In order to achieve a network of high clarity and legibility, it is necessary to impose the so-called triangular inequality throughout the network. While this requirement leads to the simplest representation of the essence of an underlying proximity network, this is at a considerable computational cost. Additionally, as the size of the original network increases, the algorithm requires a considerable amount of memory to run.
The most widely known graph drawing techniques include force-directed graph drawing algorithms and spring-embedder algorithms [Eades 1984]. ... These algorithms, however, face some challenges in terms of efficiency, especially in terms of scalability, which is closely related to the clarity of a visualized network.
A commonly used strategy to reduce clutter is to reduce the number of links. There are several ways to achieve this goal. Three popular ones are analyzed below.
The first option is imposing a link weight threshold and only include links with weights above the threshold [Zizi and Beaudouin-Lafon 1994]. ... However, it does not take the intrinsic structure of the underlying network into account, so the transformed network may not preserve the essence of the original network.
The second option is extracting a minimum spanning tree (MST) from a network of N vertices and reducing the number of links to N – 1.
The third option is imposing constraints on paths and excluding links that do not satisfy the constraints, for instance, as in Pathfinder network scaling [Schvaneveldt 1990]. ... The topology of a PFNET is determined by two parameters q and r and the corresponding network is denoted as PFNET(r, q). The q-parameter specifies the maximum length of a path subject to the triangular inequality test. The r-parameter is the Minkowski metric used to compute the distance of a path. The most concise PFNET for visualization is PFNET (q = N–1, r = inf) [Chen 2002; Chen and Paul 2001; Schvaneveldt 1990].
Most network growth models draw upon the rich-get-richer notion and cumulative advantage. As a result, if the degree of a node indicates its “richness,” a node with a higher degree will have a better chance to receive the next new link than a node with lower degree. In a citation network, this means that a highly cited article is more likely to be cited again than a less frequently cited article. This type of growing mechanism is known as preferential attachment.
It appears to be particularly problematic to identify significant topological and dynamical patterns in such visualization models because of the high density of the underlying network.
An et al. [2001] suggested that the evolution of citation networks could be useful in predicting research trends and in studying a scientific community’s life span.
Two criteria are derived based on the above analysis for qualitatively evaluating network visualization.
The first criterion for selecting a preferable topological structure of a visualized network is the presence of hubs, or stars, in derived networks. ... A star pattern indicates the star node carries the most information, processes the highest cue validity and the most differentiated from one another. ... Existing studies appear to suggest that co-citation counts are likely to form such star patterns in both MST and PFNET.
Criterion II requires that the changes of topological properties over time must preserve the integrity of emergent trends or patterns. Visualizing network evolution should not merely inform users of changes of individual nodes and links; rather, it is essential to inform users how an intrinsically cohesive structure changes locally and globally in organically.
In this study, research fronts were identified by agglomerative clustering using only papers that had at least five bibliographic coupling counts with some other paper in the dataset. Similarity calculation was based on Salton's cosine coefficient [Salton 1989] applied to bibliographic coupling counts. The titles for each research front were derived manually by exploring titles of papers within each research front for common themes.
Base reference clusters were formed by agglomerative clustering using only references that had been cited 10 or more times. Similarity calculation was based on Salton's cosine coefficient applied to co-citation counts. For each base reference cluster, labels were found by using the label of the research front that contained the most citations to references in the cluster.
A map of the references in the pathfinder network was produced identifying each reference by its base reference cluster membership, which allowed labeling of sections of the pathfinder network based on base cluster labels.
In general, due to the arbitrary choice inherited from the MST algorithms, one cannot guarantee the uniqueness of an MST. As a result, an MST may not preserve all the necessary links for representing the growth of a co-citation network. If this is the case, then important diffusion patterns may be distorted or inadequately represented by the extracted MST model.
The 516-node PFNET (q = N – 1, r = inf) is shown in Figure 3. The two parameters q and r were chosen to ensure that the extracted PFNET has the least number of links.
The animated PFNET visualization model demonstrated that nodes with similar colors often emerged simultaneously and formed local structures. And these local structures were reinforced by the timely emergence of salient co-citation links. The growth process can be represented by the dynamics shown in such local structures. Features such as continuity, predictability, and local cohesiveness in the PFNET indicated that the second criterion was met.
沒有留言:
張貼留言