network analysis
本論文評估Blondel社群偵測演算法(Blondel, Guillaume, Lambiotte,& Lefebvre, 2008)應用於作者共被引分析的可行性,希望根據網路的型態(topology)來找出網路圖上可能的節點叢集,確認學術領域內的專業(specialties)。這個方法能夠解決目前在作者共被引分析的四個研究問題:1) 能夠利用作者間連結線上不同的權重;2) 能夠利用單一的最佳化函數(optimization function)自動判斷社群的數目;3) 以模組性(modularity)來判斷網路上的社群偵測結果,無須受限於網路的大小與型態;4) 以網路本身既有的結構區分網路上的社群,無需事先對網路本身進行改變。由於共被引網路上同一叢集的節點之間具有較高的共被引次數,本研究認為模組性適合利用來測量共被引網路的社群結構。本研究利用Blondel演算法做為社群偵測演算法的原因是由於它具有運算時間較短和能夠較敏銳發現局部結構等兩項優點。本研究將兩個作者共被引的資料集輸入Blondel社群偵測演算法來驗證這個方法的可行性:一為資訊檢索及書目計量學研究各12位高被引作者的共被引資料,另一為整個科學研究領域的前100位高被引作者的共被引資料。
There is an increasingly large amount of literature devoted to the treatment of cocitation data, either of papers, authors, or journals. Most of these studies use this readily available information to map the structure of science or identify different clusters of scientific research. The idea behind this type of work, initially developed and used by H. Small and others (Bayer, Smart, & McLaughlin, 1990; Marshakova, 1973; Small, 1973; Small&Griffith, 1974; Small&Sweeney, 1985; White, 1981; White & McCain, 1981) is to use cocitations as the foundation of a conceptual network that evolves in time based on the choices (i.e., citation practices) of scientists themselves (Small, 1978).
A recent debate on the appropriate similarity measures to evaluate the “proximity” of agents (Ahlgren, Jarneving, & Rousseau, 2003, 2004; Bensman, 2004; Leydesdorff, 2008; Leydesdorff & Vaughan, 2006; White, 2003) highlights the need for an alternative methodology for detecting local research communities corresponding to scientific specialties, preferably without having to map the data onto another vector space.
In this article, we evaluate a new community detection method (Blondel, Guillaume, Lambiotte,& Lefebvre, 2008) used for identifying, without any free parameters, pre- or postprocessing of data, scientific specialties for any given cocitation network.
Our new approach is motivated by four requirements with respect to the clustering of cocitation networks.
First, the weight of the links between authors (no. of cocitations) is crucial; this is where most of the information is contained. Therefore, any network-based approach must be able to take into account not only the existence of links between authors but also how strong these links are.
Second, there should be no “choice” made by the user regarding which clusters to identify nor should there be any a priori limitations as to the number of communities or to their population; a single optimization function or algorithm should provide an independent division of the network.
Third, aside from the case of extremely large networks, there should be no restrictions on the size or topology of the network used. Naturally, some networks have a more clear-cut community structure than do others, and this should be apparent or quantifiable. In our case, the modularity of the decomposed networks provides a good indicator of this.
Finally, there should not be any a priori assumptions on the networks themselves. In other words, they need not be altered in any way before applying the algorithm and only their inherent structure should be used to determine how they should be partitioned.
The Girvan–Newman (GN) algorithm (Girvan & Newman, 2002; Newman & Girvan, 2004) is well-known as the canonical method for community detection in complex networks. This method has recently been successfully applied to identify research themes within a citation network (McCain, 2008). Essentially, the algorithm consists in cutting links with high values of betweenness (in terms of geodesics passing through a link) and monitoring the graph’s modularity Q, loosely defined as a measure of how meaningful a given division of the network into subgroups is, while taking into account the number of random links that would be expected within a subgroup.
Modularity is not an appropriate measure for community structures in all networks. ... Cocitation networks, though, are well-suited to the measure. Even a negative citation alongside a positive citation implies some topical similarity between the two articles.
However, a standard implementation of the GN algorithm for weighted cocitation networks is not straightforward and is extremely expensive in computational time.
The GN algorithm successfully identifies communities on the periphery of the network, but almost never cuts “heavy” links (i.e., high numbers of cocitations), even though this is occasionally necessary to bring out the communities inherent in the network. In real, relatively compact cocitation networks with “strong” (but divisible) cores where practically everyone is co-cited with everyone at some point in time, an algorithm organized in this way can be of some heuristic value, but will have great difficulty uncovering the optimal community structure.
The algorithm of Blondel et al. (2008) balances optimization of modularity with running time and sensitivity to local structure.
Each node first is placed in separate communities. Iterating over all nodes, one checks if moving the node from its current community to any community to which a neighbor belongs would yield an increase in modularity. If so, one moves the node to the neighboring community that gives the highest increase in modularity and continues the process until equilibrium is reached.
Then, one projects each community as a single node in a new network, with edges between community-nodes where there were edges between nodes in the communities in the original network. The weights of the new edges are obtained by summing over all previous weights (including self-loops).
Finally, the entire process is repeated until there is no change in the community structure. ... The result is a hierarchy of communities for the network.
While many other methods discussed can be successfully applied to networks that are small, or where the communities are fairly clear-cut, we believe that a rigorous utilization of cocitation data generally results in much more dense or convoluted networks, and thus requires a more robust approach.
Furthermore, we believe that it is imperative that subjective treatment of the data be avoided as much as possible in cocitation analysis.
These techniques could be of great use to historians or sociologists of science, by tracking the emergence, demise, proximity, or fusion of specializations as well as the evolution of scientific paradigms (Chubin, 1976; Mullins, 1972; Mullins et al., 1977; Small, 2006). Given a specific community, we can identify—using keywords for instance—its ideas, methods, and membership.
沒有留言:
張貼留言