2014年4月17日 星期四

Keim, D. A. (2001). Visual exploration of large data sets. Communications of the ACM, 44(8), 38-44.

Keim, D. A. (2001). Visual exploration of large data sets. Communications of the ACM, 44(8), 38-44.

information visualization

視覺資料探索(visual data exploration)將人類的知覺能力運用在大量資料探索過程裡,減少過程中所需要的認知能力,實際上也就是將資料以某種視覺形式呈現,讓資料分析師可以獲得其中蘊涵的洞悉(insight),做出結論,並且與其互動。視覺資料探索運用的時機包括對資料的認識有限以及對於探索的目的模糊等。此外,除可可以讓使用者直接處理資料,相較於自動化的資料探勘技術,視覺資料探索具有可以容易地處理高度不同類的(imhomogeneous)及有雜訊的資料、直覺、不需要了解複雜的數學或統計學演算法與參數等優點。

視覺資料探索的過程大抵上遵循所謂的資訊搜尋箴言(information seeking mantra) [11] 的三步驟,概觀全體 (overview)、放大與過濾(zoom and filter)、選取與觀看細節 (details-on-demand)。相關的技術可以從三個標準進行歸類。
1. 被視覺化的資料類型(the data type to be visualized):一維(如時間資料)、二維(如地圖)、多維(如關連式資料表)、文件與超文件、階層與圖式資料、演算法與軟體。
2. 如何在螢幕上安排資料以及如何處理資料的多維度(multiple dimensions)等視覺化技術(the visualization technique)。
3. 使視覺化產生動態改變及將多個獨立視覺化聯繫與合併的互動(interaction)技術與在深入的同時保留資料全體概觀的扭曲變形(distortion)技術。

視覺資料探索可以根據它們對特定資料特性的適合性進行評估與比較。任務特性則包括叢集(clustering)、分類(classification)、關連(associations)、多變量熱點(multivariate hot spots)等,視覺化的特性包括視覺重疊(visual overlap)和學習曲線(learning curve),希望能夠提供有限的視覺重疊、快速學習和良好的回收。



Visual data exploration seeks to integrate humans in the data exploration process, applying their perceptual abilities to the large data sets now available. The basic idea is to present the data in some visual form, allowing data analysts to gain insight into it and draw conclusions, as well as interact with it.

The visual representation of the data reduces the cognitive work needed to perform certain tasks.

Visual data exploration is especially useful when little is known about the data and the exploration goals are vague.

In addition to granting the user direct involvement, visual data exploration involves several main advantages over the automatic data mining techniques in statistics and machine learning:
• Deals more easily with highly inhomogeneous and noisy data;
• Is intuitive; and
• Requires no understanding of complex mathematical or statistical algorithms or parameters.

A visual representation provides a much higher degree of confidence in the findings of the exploration than a numerical or textual representation of the findings.

Visual data exploration, also known as the “information seeking mantra” [11], usually follows
a three-step process: overview, zoom and filter, and details-on-demand.

These techniques are classified using three criteria: the data to be visualized, the technique itself, and the interaction and distortion method (see Figure 1).

The classification begins with the data type to be visualized [11], including whether it is:
• One-dimensional (such as temporal data, as in Figure 2);
• Two-dimensional data (such as geographical maps, as in Figure 3);
• Multidimensional data (such as relational tables, as in Figure 4);
• Text and hypertext (such as news articles and Web documents);
• Hierarchies and graphs (such as telephone calls and Web sites, as in Figure 5); and
• Algorithms and software (such as debugging operations).

The visualization technique fits into one or more of the following categories, as identified in Figure 1:
• Standard 2D/3D displays using standard 2D or 3D visualization techniques (such as x-y plots and
landscapes) for visualizing the data.
• Geometrically transformed displays using geometric transformations and projections to produce useful visualizations.
• Icon-based displays that visualize each data item as an icon (such as stick figures) and the dimension values as features of the icons.
• Dense pixel displays that visualize each dimension value as a color pixel and group the pixels belonging to each dimension into an adjacent area [6].
• Stacked displays that visualize the data partitioned hierarchically.

The techniques associated with each of these categories differ in how they arrange the data on the
screen (such as 2D display or semantic arrangement) and how they deal with multiple dimensions
in case of multidimensional data (such as multiple windows, icon features, and hierarchy).

Interaction techniques, which allow users to interact directly with a visualization, include filtering, zooming, and linking, thus allowing the data analyst to make dynamic changes of a visualization according to the exploration objectives; they also make it possible to relate and combine multiple independent visualizations.

Interactive distortion techniques support the data exploration process by preserving an overview of the data during drill-down operations. Basically, they show portions of the data with a high level of detail and other portions with a lower level of detail.

Visualization techniques and visual data exploration systems can be evaluated and compared with respect to their suitability for certain data characteristics (such as data types, number of dimensions,
number of data items, and category). Task characteristics include clustering, classification, associations, and multivariate hot spots; visualization characteristics include visual overlap and learning curve.

Desirable visualization characteristics for any technique include limited visual overlap, fast learning, and good recall.

Undesirable visualization characteristics include occlusions and line crossings that might appear to the user/viewer as an artifact limiting the usefulness of the visualization technique.

沒有留言:

張貼留言