2013年12月19日 星期四

Guerrero Bote, V. P., de Moya Anegón, F. and Herrero Solana, V. (2002). Document organization using Kohonen's algorithm. Information Processing and Management, 38, 79-89.

Guerrero Bote, V. P., de Moya Anegón, F. and Herrero Solana, V. (2002). Document organization using Kohonen's algorithm. Information Processing and Management, 38, 79-89.

information visualization/self-organizing map

本研究以Self Organizing Map方法,將LISA資料庫中八類描述語(Acquisitions, Artificial Intelligence, Business Management, Computerized Information Storage and Retrieval, Conferences, Periodicals, WWW)的202筆摘要進行組織。結果具有鄰近的節點大多是具有相同描述語的摘要,而且鄰近區域也可以找出關連,例如Computerized Information Storage and Retrieval在產生的兩個圖形上所佔的區域與Artificial Intelligence和WWW的區域都相鄰。

The Kohonen's model is capable of performing a topological organization of the inputs presented to it.
This type of network has recently been used in documentation for the analysis of domains (White, Lin, & McCain, 1998), for textual data mining (Lagus, Honkela, Kaski, & Kohonen, 1999), to extract semantic relationships between words from their contexts (Honkela, Pulkki, & Kohonen, 1995, Ritter & Kohonen, 1989), and in particular to generate topological maps of sets of documents, even labeling the zones of influence of each word or term (Kohonen et al., 1999a; Kaski, 1999, Lagus & Kaski, 1999, Moya, Herrero, & Guerrero, 1998; Moya Anegón et al., 1999, Chen, Houston, Sewell, & Schatz, 1998; Lin, 1997; Huerrero Bote, 1997; Orwig, Chen, & Nunamaker, 1997; Lin, Soergei, & Marchionini, 1991).
In the learning process, as well as clustering the inputs, the Kohonen network generates a topological organization of those clusters. When we apply this to documentation the result will be the creation and organization of clusters in a manner that those which are topically close will also be close in the network. We may use this to expand the query, or rather the results: once one has found the cluster that best fits the query, one may extend the activation to those which are topologically close.
In some of these case, as well as performing a document classification, one determines for each node which unitary term vector produces the greatest activation. One may thereby generate each term's zone of influence, providing a graphical view of the database on which one could even select the zone that one wants to visit.

沒有留言:

張貼留言