information visualization/self-organizing map
本研究利用自組織映射圖技術,將235個 Monash University, School of Business Systems的網頁進行叢集。本研究利用網頁伺服器上儲存的記錄檔做為自組織映射圖在訓練及映射時的參考資料,首先將記錄檔依據記錄上的使用者和時間資料,劃分成8054個Transactions,再以K-means叢集方法,根據每個Transaction裡的網頁,將Transactions歸類成9類,統計235個網頁在9個Transaction分類上的數目,建立代表網頁的特徵向量。由於在結果的自組織映射圖上,相同目錄的網頁會被映射到相同或鄰近的節點上,由此可見,利用自組織映射圖技術以及使用者存取網頁的資料可以用來對相關的網頁進行歸類與視覺化。
For the system to accurately reflect the needs of users, the organization of the web documents should also take into account the feedback from users. While it is useful to have a system to organize the web pages in a content-driven manner, it may be more advantageous to organize the web pages in a web-user oriented manner. After all, the web documents are organized so that humans can search in a more effective and efficient manner.
The authors have developed the prototype of the LOGSOM system based on the access logs for September of 1999 from the Monash University, School of Business Systems web server. There are 170,515 entries in the web log indicating the date, time, and address of the requested web pages, as well as the IP address of the user’s machine. ... The original server logs are formatted, cleansed, and then grouped into meaningful transactions before being mapped onto the self-organizing map.
Following Cooley et al. (1999), the authors group the data into meaningful transactions. The authors define a transaction as a
set of web pages requested by a user in a particular session. ... For the examined web log, the number of transactions m= 8054 and the number of URLs n = 235.
set of web pages requested by a user in a particular session. ... For the examined web log, the number of transactions m= 8054 and the number of URLs n = 235.
The number of inputs of the SOM will need to be equivalent to the number of transactions, and because this number is so large, it will not be feasible with this data. ... By using the K-means clustering algorithm, we cluster the transactions into nine groups. The number K=9 is chosen arbitrarily. ... Thus, after the dimension reduction, it consists of 235 URLs X 9 transaction-groups.
The distance between nodes on the resulting map indicates the similarity of the web pages, measured according to the user navigation patterns. LOGSOM provides a visual tool to enable users to see the relationship between web pages based on the usage patterns of web users similar to themselves. LOGSOM also provides an analysis tool for web masters and web authors to better understand the interests of visitors to their pages, and identify potential referring pages.
沒有留言:
張貼留言