2013年4月8日 星期一

Keim, D. A. (2002). Information visualization and visual data mining. IEEE Transactions on Visualization and Computer Graphics, 7, 100-107.


Keim, D. A. (2002). Information visualization and visual data mining. IEEE Transactions on Visualization and Computer Graphics, 7, 100-107.

information visualization

視覺資料探查(visual data exploration)是嘗試將資訊視覺化(information visualization)技術應用於資料探勘(data mining)的技術,目的是將大量資料呈現為某些視覺形式,幫助人類直接與資料互動,利用知覺能力來獲得資料洞察。一般而言,視覺資料探查包含以下的三個步驟: 概觀(overview), 放大與過濾(zoom and filter)以及進一步對細節的要求(details-on-demand),也就是一開始便以概觀的圖形提供資料的全貌,讓使用者從整體的資料概觀中選擇一到多個他們感興趣的樣式(patterns),並讓使用者可以深入集中於這些樣式的細節。本論文依據資料類型、視覺化技術以及互動與變形(distortion)技術將目前已經發表的各種技術加以分類。在資料類型方面,視覺資訊探查需要處理異質性高而且有雜訊的資料,需要能夠處理的類型相當多,包括1維資料(如:時間資料)、2維資料(如:地理資料)、多維資料(如:關連式表格(relational tables))、文件與超文件(如:新聞與網頁文件)、階層(hierarchies)與圖(graphs)(如:電話聯絡(telephone calls)與網頁文件)以及演算法與程式碼(如:除錯操作過程(debugging operations))。視覺化技術包括標準的2維或3維顯示、幾何轉換顯示(geometrically transformed displays)、基於圖標的顯示(icon-based displays)、密集像素顯示(dense pixel displays)以及堆疊顯示(stacked displays)。互動技術能夠動態產生資料的視覺化結果,並且提供分析人員能夠根據探索的目的,直接與資料的視覺化結果進行互動。變形技術對部分資料呈現出相當高的細節,提供同時呈現局部的細節與全體的概觀。互動與變形技術包括:動態投射(dynamic projection)、互動過濾(interactive filtering)、互動縮放(interactive zooming)、互動變形(interactive distortion)以及互動連結與切換(interactive linking and brushing)。

There is a large number of information visualization techniques which have been developed over the last decade to support the exploration of large data sets. In this paper, we propose a classification of information visualization and visual data mining techniques which is based on the data type to be visualized, the visualization technique and the interaction and distortion technique.
Visual data exploration aims at integrating the human in the data exploration process, applying its perceptual abilities to the large data sets available in today’s computer systems.
The basic idea of visual data exploration is to present the data in some visual form, allowing the human to get insight into the data, draw conclusions, and directly interact with the data.
Visual data exploration is especially useful when little is known about the data and the exploration goals are vague. Since the user is directly involved in the exploration process, shifting and adjusting the exploration goals is automatically done if necessary.
In addition to the direct involvement of the user, the main advantages of visual data exploration over automatic data mining techniques from statistics or machine learning are: 
• visual data exploration can easily deal with highly inhomogeneous and noisy data
• visual data exploration is intuitive and requires no understanding of complex mathematical or statistical algorithms or parameters.
Visual Data Exploration usually follows a three step process: Overview first, zoom and filter, and then details-on-demand (which has been called the Information Seeking Mantra [1]). First, the user needs to get an overview of the data. In the overview, the user identifies interesting patterns and focuses on one or more of them. For analyzing the patterns, the user needs to drill-down and access details of the data.
In the last decade, a large number of novel information visualization techniques have been developed, allowing visualizations of multidimensional data sets without inherent two- or three-dimensional semantics.
Nice overviews of the approaches can be found in a number of recent books [2] [3] [4] [5].
The techniques can be classified based on three criteria (see figure 1) [6]: The data to be visualized, the visualization technique, and the interaction and distortion technique used.
The data type to be visualized may be
• One-dimensional data, such as temporal data as used in ThemeRiver (see figure 2 in [7]). Note that with each point of time, one or multiple data values may be associated.
• Two-dimensional data, such as geographical maps as used in Polaris (see figure 3(c) in [8]) and MGV (see figure 9 in [9]). X-Y-plots are a typical method for showing two-dimensional data and maps are a special type of x-y-plots for showing two-dimensional geographical data.
• Multidimensional data, such as relational tables as used in Polaris (see figure 6 in [8]) and the Scalable Framework (see figure 1 in [10]). An example of a technique which allows the visualization of multidimensional data is the Parallel Coordinate Technique [16]. Parallel Coordinates display each multidimensional data item as a polygonal line which intersects the horizontal dimension axes at the position corresponding to the data value for the corresponding dimension.
• Text and hypertext, such as news articles and Web documents as used in ThemeRiver (see figure 2 in [7]). In most cases, first a transformation of the data into description vectors is necessary before visualization techniques can be used. An example for a simple transformation is word counting which is often combined with a principal component analysis or multidimensional scaling (for example, see [17]).
• Hierarchies and graphs, such as telephone calls and Web documents as used in MGV (see figure 13 in [9]) and the Scalable Framework (see figure 7 in [10]). Data records often have some relationship to other pieces of information. Graphs are widely used to represent such interdependencies. A graph consists of set of objects, called nodes, and connections between these objects, called edges.
• Algorithms and software, such as debugging operations as used in Polaris (see figure 7 in [8]). The goal of visualization is to support software development by helping to understand algorithms, e.g. by showing the flow of information in a program, to enhance the understanding of written code, e.g. by representing the structure of thousands of source code lines as graphs, and to support the programmer in debugging the code, i.e. by visualizing errors.
The visualization technique used may be classified into
• Standard 2D/3D displays, such as bar charts and x-y plots as used in Polaris (see figure 1 in [8]). In addition to standard 2D/3D-techniques such as x-y (x-y-z) plots, bar charts, line graphs, etc., there are a number of more sophisticated visualization techniques.
• Geometrically transformed displays, such as landscapes and parallel coordinates as used in Scalable Framework (see figures 2 and 12 in [10]). Geometrically transformed display techniques aim at finding “interesting” transformations of multidimensional data sets. ... The parallel coordinate technique maps the k-dimensional space onto the two display dimensions by using k equidistant axes which are parallel to one of the display axes. The axes correspond to the dimensions and are linearly scaled from the minimum to the maximum value of the corresponding dimension. Each data item is presented as a polygonal line, intersecting each of the axes at that point which corresponds to the value of the considered dimensions (see figure 2).
• Icon-based displays, such as needle icons and star icons as used in MGV (see figures 5 and 6 in [9]). The idea is to map the attribute values of a multi-dimensional data item to the features of an icon. ... If the data items are relatively dense with respect to the two display dimensions, the resulting visualization presents texture patterns that vary according to the characteristics of the data and are therefore detectable by preattentive perception.
• Dense pixel displays, such as the recursive pattern and circle segments techniques (see figures 3 and 4) [11] and the graph scetches as used in MGV (see figure 4 in [9]). The basic idea of dense pixel techniques is to map each dimension value to a colored pixel and group the pixels belonging to each dimension into adjacent areas [11]. Since in general dense pixel displays use one pixel per data value, the techniques allow the visualization of the largest amount of data possible on current displays (up to about 1.000.000 data values). If each data value is represented by one pixel, the main question is how to arrange the pixels on the screen. ... By arranging the pixels in an appropriate way, the resulting visualization provides detailed information on local correlations, dependencies, and hot spots.
• Stacked displays, such as treemaps [12] [13] or dimensional stacking [14].  Stacked display techniques are tailored to present data partitioned in a hierarchical fashion. In case of multidimensional data, the data dimensions to be used for partitioning the data and building the hierarchy have to be selected appropriately. An example of a stacked display technique is Dimensional Stacking [35]. The basic idea is to embed one coordinate systems inside an other coordinate system, i.e. two attributes form the outer coordinate system, two other attributes are embedded into the outer coordinate system, and so on. The display is generated by dividing the outmost level coordinate systems into rectangular cells and within the cells the next two attributes are used to span the second level coordinate system. This process may be repeated one more time. The usefulness of the resulting visualization largely depends on the data distribution of the outer coordinates and therefore the dimensions which are used for defining the outer coordinate system have to be selected carefully. A rule of thumb is to choose the most important dimensions first.
Interaction techniques allow the data analyst to directly interact with the visualizations and dynamically change the visualizations according to the exploration objectives, and they also make it possible to relate and combine multiple independent visualizations.
Distortion techniques help in the data exploration process by providing means for focusing on details while preserving an overview of the data. The basic idea of distortion techniques is to show portions of the data with a high level of detail while others are shown with a lower level of detail.
Interaction and distortion techniques allow users to directly interact with the visualizations. They may be classified into
• Interactive Projection as used in the GrandTour system [15]. The basic idea of dynamic projections is to dynamically change the projections in order to explore a multidimensional data set
• Interactive Filtering as used in Polaris (see figure 6 in [8]). In exploring large data sets, it is important to interactively partition the data set into segements and focus on interesting subsets. This can be done by a direct selection of the desired subset (browsing) or by a specification of properties of the desired subset (querying). Browsing is very difficult for very large data sets and querying often does not produce the desired results. Therefore a number of interaction techniques have been developed to improve interactive filtering in data exploration.
• Interactive Zooming as used in MGV and the Scalable Framework (see figure 8 in [10]).  In dealing with large amounts of data, it is important to present the data in a highly compressed form to provide an overview of the data but at the same time allow a variable display of the data on different resolutions. Zooming does not only mean to display the data objects larger but it also means that the data representation automatically changes to present more details on higher zoom levels.
• Interactive Distortion as used in the Scalable Framework (see figure 7 in [10]). Popular distortion techniques are hyperbolic and spherical distortions which are often used on hierarchies or graphs but may be also applied to any other visualization technique.
• Interactive Linking and Brushing as used in Polaris (see figure 7 in [8]) and the Scalable Framework (see figures 12 and 14 in [10]).  There are many possibilities to visualize multidimensional data but all of them have some strength and some weaknesses. The idea of linking and brushing is to combine different visualization methods to overcome the shortcomings of single techniques. ... As a result, the brushed points are highlighted in all visualizations, making it possible to detect dependencies and correlations. Interactive changes made in one visualization are automatically reflected in the other visualizations. Note that connecting multiple visualizations through interactive linking and brushing provides more information than considering the component visualizations independently.

沒有留言:

張貼留言