2013年12月19日 星期四

Chalmers, M. (1992). BEAD: Explorations in information visualisation. Proceedings of the 1992 ACM SIGIR Annual International Conference on Research and Development in Information Retrieval, 330-337.

Chalmers, M. (1992). BEAD: Explorations in information visualisation. Proceedings of the 1992 ACM SIGIR Annual International Conference on Research and Development in Information Retrieval, 330-337.
vis_paper
本論文說明Bead的運作原理。Bead是以圖形探索文件資訊的雛型系統,在這個系統中將文件視為是三維空間中的粒子(particles),類似像粒子間的作用力,使得相似的文件在空間中相對應的粒子移動到愈接近的距離,不相似的文件移動到較遠的位置。也就是以粒子在三維空間中的幾何距離(geometric distance)表示文件距離(document distance),以輸出結果的三維空間表達原先相當高維度的文件空間。在這個雛型系統中粒子的位置是由多維尺度法(MDS)來決定,以文件間的距離做為輸入產生每個相對應的粒子在三維空間中的位置。並且利用模擬退火(simulated annealing)演算法來避免產生區域最佳化,而有較佳的輸出結果。
Bead is a prototype system for the graphically-based exploration of information. In this system, articles in a bibliography are represented by particles in 3-space. By using physically-based modelling techniques to take advantage of fast methods for the approximation of potential fields, we represent the relationships between articles by their relative spatial positions. Inter-particle forces tend to make similar articles move closer to one another and dissimilar ones move apart. The result is a 3D scene which can be used to visualize patterns in the high-D information space.
As described in [Chatfield 80], MDS is “a term used to describe any procedure which starts with the ‘distances’ between a set of points (or individuals or objects) and finds a configuration of the points, preferably in a smaller number of dimensions, usually 2 or 3.” Such a configuration may then afford the graphically-based, exploratory styles of manipulation usually reserved for lower dimensional data.
Standard techniques for MDS usually involve an eigenvector analysis of an nxn matrix, where n is the number of points to be configured. This O(n3) analytic procedure generates the required configuration of points in a single step, but changes or additions to the set of points require re-execution of the entire analysis.
As mentioned by Chatfield & Collins with regard to a special case of MDS, ordinal scaling, one can treat the configuration task as an optimization problem. This has allowed the introduction of iterative numerical techniques such as the steepest descent method [Press 88]. ... One problem of steepest descent is that it may be unable to climb out of a local minimum. The points become stuck in a configuration which disallows progress towards a ‘better’ configuration.
Applying these principles to numerical optimization led to the Metropolis algorithm for simulated annealing [Metropolis 53]. This requires a way of describing possible configurations (e.g. the spatial position of points), a method of generating random changes in the configuration (e.g. perturbation of position), an objective function, analogous to energy, whose minimization is the goal of the procedure (e.g. the error between the actual and desired inter-point distances), and finally a control parameter analogous to temperature, in tandem with a schedule for its reduction. ... As the schedule progresses, random changes in configuration are successively generated. At each step the change in system energy is calculated. If the step lowers the energy then the change is accepted. If the step increases the energy then the change is accepted with a probability which depends on both the size of the energy step and the current temperature of the system. ‘Uphill’ steps are taken less often as temperature decreases.
The behaviour of a body of matter can be considered as the aggregate of the behaviours of its constituent particles. ... Potentially, all N particles can lead to N(N -1) pairwise interactions. This N-body problem is particularly well-known in computational physics where interactions due to gravitational or Coulombic fields are often of exactly this O(N2) complexity. ... Approximate solutions of lower orders of complexity have therefore been subjects of research activity in recent years. By using hierarchical structures for spatial subdivision such as k-d trees [Appel 85] and octrees [Barnes 86] relatively straightforward algorithms of O(N logN) complexity have been developed.
We associate documents with particles in space, and use two concepts to underpin this work: a cluster of such hybrid objects can be represented by a metaparticle, and when comparing particles, the difference between the actual ‘geometric’ distance and the desired ‘document’ distance can be used as the basis for a potential field. ... We use a model of a damped spring in order to generate forces of attraction and repulsion between particles. When particles are too close the spring pushes them apart, and when they are too far apart they are drawn towards each other. ... We use hierarchical spatial subdivision in order to speed up the calculation of the interactions between the full set of particles. We also use the hierarchy to form document clusters, using addition and normailization of term vectors.
We use a 3-D tree to hierarchically subdivide the space into which particles are placed. We maintain a metaparticle for each node in the tree, consisting of a particle position, a mass, and a term vector. We use document distances to metaparticles to recursively descend the tree in an effort to insert a new document near to similar documents. We maintain a maximum number of particles inside each leaf node — usually 10 — and split a leaf when that limit is reached. In splitting voxels we try to find an axis and coordinate with which to split the number of particles as evenly as possible.
Finally, we iteratively use fourth-order adaptive Runge-Kutta in applying either a simulated annealing step or a steepest descent step to the particle system. In this way we attempt to minimize the unbalanced force on each particle.In this way we try to find positions where the often-conflicting demands for proximity reach the best available compromise.
While looking at the maximum unbalanced force on any particle in the system is one useful measure of progress, we have also found it informative to look at the residual sum of squares. Analogous to the mechanical stress of the spring system and denoted here as S, this metric is built up by examining each pair of particles a and b. ... We assume that an ‘acceptable’ configuration has been reached when continued iteration produces roughly constant levels of stress and maximum force.

沒有留言:

張貼留言