Chen, C. (1997). Structuring and visualizing the WWW with Generalized Similarity Analysis. Proceedings of the 8th ACM Conference on Hypertext (Hypertext '97), 177-186. Retrieved August 27, 2012, from http://delivery.acm.org/10.1145/270000/267456/p177-chen.pdf?ip=211.76.242.1&acc=ACTIVE%20SERVICE&CFID=108526370&CFTOKEN=44188441&__acm__=1346053732_db4b154c3eaabf4b1429f52a8e30ba0c
vis_paper
本論文以PathFinder方法提供網頁(或網站)視覺化的呈現,並且根據網頁彼此間的超文件連結(hypetext linkage)、內容相似度(content similarity)和瀏覽樣式(browsing patterns)來衡量它們的接近度(proximity)。具體而言,本研究利用網頁間的連結數目比率、向量空間模式(vector space model)以及網頁間的狀態轉移機率(state transition probability)來估算彼此間的接近度。以網頁做為圖形(graph)上的節點(vertices),網頁間估測的接近度做為節點間連結線的強度,然後再藉由Pathfinder方法在保留網絡的主要型態下,去除不必要的連結,作者認為Pathfinder比多維尺度法(MDS)能夠更精確地表現圖形在區域間的關係(local relationship)。
This paper describes a generic approach to structuring and visualizing a hypertext-based information space on the WWW. This approach, called Generalised Similarity Analysis (GSA), provides a unifying framework for extracting structural patterns from a range of proximity data concerning three fundamental relationships in hypertext, namely, hypertext linkage, content similarity and browsing patterns.
Pathfinder networks are used as a natural vehicle for structuring and visualizing the rich structure of an information space by highlighting salient relationships in proximity data.
Georgia Institute of Technology’s WWW User Surveys [17] shows that 69. 1% of users regarded the delay in downloading Web pages as a major problem and 34.5% of users identified the difficulty of finding an existing page. In particular, 14.3% of the users reported the difficulty of visualizing where they have been and where they can go and 6.5% identified the classic hypertext problem — lost in hyperspace. The memory overload remains a problem when navigating the WWW.
Ideally, spatial relationships in visualization should be determined by some psychological judgments of proximity, such as similarity, dissimilarity and relatedness.
Pirolli, Pitkow and Rae’s study [18] and HyPursuit [20] are two notable examples of taking into account hypertext linkage, content similarity and usage information on the WWW.
In HyPursuit, document similarity by linkage is defined as a linear combination of three components: direct linkage, ancestor and descendant inheritance.
Pirolli, Pitkow and Rao [18] developed a model which characterises documents on the WWW by various attributes associated with these documents, such as the number of incoming and outgoing hyperlinks of a document, how frequently the document was downloaded from the hosting WWW server and content similarities between the document and its children.
Sequential patterns of browsing indicate, to some extent, document relatedness perceived by users. For example, the number of users who followed a hyperlink connecting two documents in the past were used in [18] to indicate the degree of relatedness between the two documents.
Furnas’ fisheye views model is based on a “degree of interest” (DOI) function which assigns a value to each node in accordance with the degree to which a user would be interested in seeing that node [14, 12]. ... A fisheye view can be generated with a threshold so that only nodes with sufficient DOI are displayed in the view. ... By choosing a different API function, one can produce a fisheye view which emphasizes a particular type of structural patterns[ 12]. For example, the number of times that a node has been visited can be used to define a user-centred fisheye view, in which popular nodes will be highlighted for easy access.
In this paper, we focus on extracting underlying relationships in a hypertext information space and representing resultant patterns for structuring and visualizing the information space. Existing techniques such as fisheye views can be subsequently incorporated into such systems with improved spatial configuration mechanisms.
This definition also takes into account the overall connectivity of the document Di, which can be related to the ROC metric defined in [2].
In this study, we use the well-known tf x idf model, term frequency times inverse document frequency, to build term vectors. ... The document similarity is computed as follows based on corresponding vectors.
We have applied a state transition approach to extracting behavioral patterns of users with a hypertext system [6]. The dynamics of a browsing process can be captured by state transition probabilities. Transition probabilities can be used to indicate document similarity in the nature of browsing.
Pathfinder provides a more accurate representation of local relationships than techniques such as multidimensional scaling (MDS)[10]. Pathfinder has been applied to a number of human-computer interaction problems [10].
The topology of a PFNET is determined by two parameters q and r and the corresponding network is denoted as PFNET(r,q). The q-parameter constrains the scope of minimum-cost paths to be considered. The r-parameter defines the Minkowski metric used for computing the distance of a path.
When a PFNET satisfies the following 3 conditions, the distance of a path is the same as the weight of the path:
1. The distance from a document to itself is zero.
2. The proximity matrix for the documents is symmetric; thus the distance is independent of direction.
3. The triangle inequality is satisfied for all paths with up to q links. If q is set to the total number of nodes less one, then the triangle inequality is universally satisfied over the entire network.
1. The distance from a document to itself is zero.
2. The proximity matrix for the documents is symmetric; thus the distance is independent of direction.
3. The triangle inequality is satisfied for all paths with up to q links. If q is set to the total number of nodes less one, then the triangle inequality is universally satisfied over the entire network.
The number of links in a network can be reduced by increasing the value of parameter r or q. The distance between nodes in a network is the length of the minimum-length path connecting the nodes; such a path is known as the geodesic connecting the nodes. A minimum-cost network (MCN), PFNET(r=INF, q=n- 1), has the least number of links.
The major advantage of Pathfinder networks is that salient relationships among documents are extracted by patterns associated with minimum-cost paths. This type of information filtering improves the clarity and quality of the information produced by information visualization systems based on spring models. Users are able to see how documents are related to each other.
GSA has some distinct features.
(1) GSA emphasizes that users can substantially benefit from explicit, graphical representations of salient relationships in hypertext systems, and these graphical representations should be incorporated into user interfaces so as to reduce cognitive burdens on users in browsing.
(2) Each component model in GSA can be used independently for extracting structures of a particular type so that users may contrast patterns in distinct characteristics. In contrast, related work such as [ 18] combines various features into a monolithic feature vector, Consequently, the resulting inter-document relationship is a combined effect of a range of factors. Users may not be able to assess how documents are related along a specific dimension.
(3) GSA focuses on relationships that are particularly essential for hypertext systems and these relationships are preserved in resulting network representations. Many existing information visualization techniques are based on storage information such as file-size and last modification time, and often use hierarchical structures as the basis of visualization. Differences between the two approaches should be evaluated by further empirical studies.
(1) GSA emphasizes that users can substantially benefit from explicit, graphical representations of salient relationships in hypertext systems, and these graphical representations should be incorporated into user interfaces so as to reduce cognitive burdens on users in browsing.
(2) Each component model in GSA can be used independently for extracting structures of a particular type so that users may contrast patterns in distinct characteristics. In contrast, related work such as [ 18] combines various features into a monolithic feature vector, Consequently, the resulting inter-document relationship is a combined effect of a range of factors. Users may not be able to assess how documents are related along a specific dimension.
(3) GSA focuses on relationships that are particularly essential for hypertext systems and these relationships are preserved in resulting network representations. Many existing information visualization techniques are based on storage information such as file-size and last modification time, and often use hierarchical structures as the basis of visualization. Differences between the two approaches should be evaluated by further empirical studies.
For example, a Pathfinder network becomes increasingly cluttered as the number of documents in the underlying information space increases. There are several possible ways to deal with this issue. One is to use existing display techniques such as fisheye views, which provide adequate access to specific local information as well as contextual structure.
Similar documents are naturally placed near to each other in the space. Users can gain a birds-eye view of the global structure by moving up to a higher view point in the sky and have a close look by moving down to a view point closer to the target document.
沒有留言:
張貼留言