2013年12月19日 星期四

Huang, Z., Chen, H., Guo, F., Xu, J. J., Wu, S., and Chen, W-H. (2004). Visualizing the expertise space. In Proceedings of the 37th Annual Hawaii International Conference on System Sciences (HICSS'04), IEEE Computer Society.

Huang, Z., Chen, H., Guo, F., Xu, J. J., Wu, S., and Chen, W-H. (2004). Visualizing the expertise space. In Proceedings of the 37th Annual Hawaii International Conference on System Sciences (HICSS'04), IEEE Computer Society.

information visualization/self-organizing map

本論文利用SOM及MDS等資訊檢索與文件處理技術將專家及他們的專長以視覺化的方式呈現在二維圖形上。這個研究的資料是台灣的597位商務與管理方面的學者,針對每位學者提供的研究領域(以國科會的分類,總共包括127個研究領域),在研究時分別對學者及研究領域建立特徵向量,用來產生專家地圖以及專長地圖。學者的特徵向量上每一個成分的二元值代表這位學者是否具有某項研究領域的專長,研究領域的特徵向量上每一個成分的二元值則是代表這項研究領域是否為某位學者的專長。最後將這些資料輸入SOM及MDS進行視覺化,進行MDS處理時兩個特徵向量間的相似程度是以Jaccard模式來進行估算。從結果的專家地圖上,可以發現具有相同與相近專長的學者被映射到相同或鄰近的節點上;在專長地圖上,有共同的理論或分析基礎或是共同的應用範疇的研究領域則會被映射到相同或鄰近節點上,形成群聚。
We focus on a basic form of expertise representation, in which experts are represented by a set of expertise fields. Due to the potential high dimensionality of such expertise data, we chose to examine two dimensionality reduction visualization techniques that have been widely applied in data and document visualization: the Self-organizing Map (SOM) and Multidimensional Scaling (MDS). We present two types of visualization results: the expert map and expertise field map, and provide initial analysis on the effectiveness of these visualizations to support expertise searching and browsing.
One type of set-level document visualization uses interactive scatter plots in different forms, which is also referred to as “dimensions and reference point systems” (Morse, Lewis and Olsen, 2000). Visualization techniques of this type attempt to display additional information about the retrieved documents and to group documents that share the similar characteristics. These characteristics may include the relationship between the documents and the query terms (Ahlberg and Shneiderman, 1994), predefined document attributes such as size, date, source and popularity (Hearst and Karadi, 1997;  Nowell, France, Hix, Heath and Fox, 1996), and user-specified attributes such as predefined topics (Olsen, Korfhage, Sochats, Spring and Williams, 1993).
A second category of techniques attempts to visualize inter-document similarities. This form of visualization is also referred to as “map systems” (Morse, Lewis and Olsen, 2000). There are four major techniques for inter-document similarity visualization: document networks (Thompson and Croft, 1989), physically based modeling techniques (Chalmers and Chitson, 1992), document clustering (Allen, Obry and Littman, 1993; Hearst and Pedersen, 1996) and geographic map metaphors (Chen, Schuffels and Orwig, 1996; Lin, Soergel and Marchionini, 1991).
Mockus and Herbsleb (2002) presented a system named “Expertise Browser” in the context of collaborative software engineering for change management systems. They embedded in some simple visualization elements such as the tree structure and other visual elements to present the expert attributes.
Our research explores this idea by focusing on a simple form of expertise database, where each expert is represented by a list of predefined expertise fields. Each expert can be represented as a binary vector, the elements of which correspond to the fields of expertise and the dimensionality is the number of predefined expertise fields. Each expertise field can also be represented as a binary vector, the elements of which correspond to the experts and the dimensionality is the number of experts in the data set. These representations adopt the vector space model of document representation and share the same high dimensionality characteristic. We chose two commonly used dimensionality reduction techniques for visualizing document space in the literature, the self-organizing map and multidimensional scaling, to generate map metaphors to visualize inter-expert and inter-expertise-field similarities.
The data set we used was resulted from an Internet survey on researchers in business and management fields in Taiwan. The survey was conducted by the National Science Council in Taiwan, and covered almost all the researchers in the business and management field in Taiwan.
The data set contained 597 researchers, who had selected their research interests or expertise from a two-level hierarchy of research fields. ... There were 127 second-level research fields and 2865 researcher-field combinations in the data set. ...  Each of the 597 researchers was represented by a binary vector with 127 elements, which corresponded to the research fields. The expertise similarity between two researchers was derived using vector similarity functions. We also had a dual representation for research fields, similar to the researcher representation. Each of the 127 research fields was represented by a binary vector with 597 elements, which corresponded to the researchers. In this case, similarities among research fields depended on the number of overlapping researchers. Such similarities may reflect the common theoretical/analytical foundations or closely related application domains of the research fields, based on the assumption that researchers typically work on closely related research fields.
The input to MDS is a square, symmetric matrix indicating relationships among a set of objects. Such matrices are usually either similarity or dissimilarity matrices. In the context of our research, a similarity matrix is formed based on the similarity scores of expert/expertise field pairs derived from the Jaccard’s similarity function (Jaccard, 1912).
We conducted a regression analysis to evaluate the general relationship between the researcher similarities and map distances. Researcher similarities were calculated using the Jaccard’s similarity function. A Euclidean distance function was used to calculate the map distances of researcher pairs. ... These statistics showed that SOM and MDS both preserved a large portion of the similarity information, although with certain degrees of distortion.
We observe from Figure 4 (expertise field map) that research fields having underlying similarities based on common theoretical/analytical foundations and application domains were grouped together. ... The expertise field map generated by our visualization techniques revealed meaningful grouping of research fields based on experts’ co-occurrence patterns in multiple research fields.

沒有留言:

張貼留言