2013年12月19日 星期四

Schildt, H. A. and Mattsson, J. T. (2006). A dense network sub-grouping algorithm for co-citation analysis and its implementation in the software tool Sitkis. Scientometrics, 67, 143-163.

本研究建議一個密集網路次群集演算法(dense network sub-grouping algorithm)從共被引網路裡發現學術領域的主流研究主題。在以論文被其他論文的引用情形做為代表論文的特徵,利用Jaccard指標(Jaccard index)衡量論文間在共被引方面的相關性,建立網路圖。然後利用密集網路次群集演算法將網路圖分為幾個節點集合和無法歸入集合的節點。本研究並討論了建議的方法與共被引分析常用的叢集分析(cluster analysis)和多維縮放(MDS, multidimensional scaling),密集網路次群集演算法不須事先決定結果集合的數目,而且對可以歸入多個集合的論文(通常有較廣泛的共被引關係)有較佳的處理方式。本研究將密集網路次群集演算法應用於家庭事業(family business)的研究,同時也發展了Sitkis軟體將ISI的檢索結果轉換成可以做為常用於網路分析的軟體UCINET輸入的檔案格式。

information visualization

We propose an alternative algorithm, dense network sub-grouping, which identifies dense groups of co-cited references. We demonstrate the algorithm using a data set from the field of family business research and compare it to two alternative methods, multidimensional scaling and clustering.
The software identifies journal-, country- and university-specific citation patterns and co-citation groups, enabling the identification of “invisible colleges.”
Gmür’s recent article (2003) provides a review of such methodologies and suggests that clustering algorithms may be useful in identifying research streams or “invisible colleges” among scientists within a field.
Because pre-existing clustering algorithms were not designed for bibliometric analysis, several factors make them suboptimal for the task. First, optimizing clustering algorithms requires the number of clusters to be defined ex ante, whereas hierarchical clustering algorithms lack clear boundaries between clusters.
More important, clustering algorithms always assign each article into a cluster, with no residual category for items that resemble many disparate clusters. This may potentiall cause broadly cited articles to reside in one cluster over another based on very slight differences in citation patterns. The relative prominence of clusters, measured in terms of citation counts, tends to depend heavily on these “citation classics.” Coincidentally, many highly cited works are also cited relatively broadly. As a result, the apparent popularity of different approaches within a field may not be robust to small variations in citation patterns. To avoid this, very broadly cited articles should be excluded altogether from the clusters.
The dense network sub-grouping algorithm is based on iterative identification of the tightly coupled areas of co-citation networks.
The algorithm starts formation of a group at the most strongly connected dyad, and then iteratively adds dyads one at a time, ordered by highest average tie strength to existing group members. When average tie strength from pre-existing group members to all other nodes is below the given cut-off value, the algorithm terminates. The newly formed group is removed from the network, and the algorithm begins to search for the next group. Each group represents a cohesively cited body of literature.
The empirical part of this paper suggests that these groups tend to correspond either to a specific empirical research question or a theoretical approach.
A widely accepted tool for meta-analysis, bibliometrics has been applied in social science disciplines including: economics (CAHLIK, 2000; PIETERS & BAUMGARTNER, 2002), finance (BOROKHOVICH et al., 2000; CHUNG & COX, 1990; HOLLMAN et al., 1991; SCHWERT, 1993), strategic management (MARTINSONS et al., 2001; RAMOS-RODRIGUEZ & RUIZ-NAVARRO, 2004), entrepreneurship (BUSENITZ et al. 2003; DERY & TOULOUSE, 1996; RATNATUNGA & ROMANO, 1997), inter-organizational relationships (OLIVER & EBERS, 1998; SOBRERO & SCHRADER, 1998), organization studies (ÜSDIKEN & PASADEOS, 1995), marketing (PASADEOS et al., 1998), and research and development studies (TIJSSEN & VAN RAAN, 1994).
The algorithm groups together the works that are commonly cited together in scientific articles. All members of such a group would have a similar subject and readership. Since the most-cited prior works in a scientific field arguably represent its key intellectual roots, combining the works into groups based on co-citation coupling will provide a “map” of the field’s intellectual structure (GMÜR, 2003).
A new algorithm, dense sub-network grouping, was developed for the purposes of co-citation analysis. ... It begins by forming a group at the dyad that has the highest co-citation value and then iteratively adds nodes ordered by the highest average co-citation link to the existing members of the group, until the average link value is lower than a predetermined cut-off value chosen by the researcher. The resulting group is then removed from the network, and the algorithm proceeds from the beginning.
We advocate that co-citation networks be constructed using a normalized co-citations strength measure, the Jaccard index (SMALL & GREENLEE, 1980). The normalization is used in order to emphasize proximity between similar references that are cited less often than the most common references.
The distinct advantage of our algorithm is that it omits widely cited books that do not belong to any coherent stream of literature. Also, because we focus on coherent groups of references, our analysis allows us to include works that are less cited, but clearly part of a coherent stream.
Based on this sample of outlets for publication, we used the Institute of Scientific Information Social Sciences Citation Index (ISI SSCI) to systematically select all family business related articles published during the period 1986 – 2003. Acknowledging the definitional diversity of “family business research” (SHARMA, 2004), we searched for variations on the terms “family firm,” “family business,” “family ownership,” and “family control.” An initial set of 341 articles was obtained, which was then systematically reviewed to select out any articles not related to family business. Altogether 108 articles were included in the final sample. Subsequently, we selected all references that had been cited by at least four (3.7%) of these 108 articles.
A major shortcoming of the traditional clustering method was evident from the beginning of the analysis: because the resulting clusters must be evaluated manually, the number of works included in the analysis has to be kept manageably low. This reduces the precision of the analysis and may leave the cluster structure imperfect or skewed. ... One can manually construct about 8 to 12 relevant clusters, depending on the threshold at which different clusters are allowed to join together. However, the manual cluster formation is not a straightforward task. In addition, because of the relatively small number of works included, the clusters themselves are rather small, making the results less comprehensive.
Generally, interpreting the MDS results requires much speculation on the part of the researcher, and thus may not yield replicable results.
There are four common reference data incoherencies: (1) authors’ middle initials are used inconsistently; (2) journal and book names are spelled inconsistently in reference information (e.g. ADMIN SCI Q and ADM SCI Q represent the same publication); (3) different publication years (editions) of the same book appear as independent entries; and (4) multiple articles by the same author in the same journal in the same year appear as a single entry.

沒有留言:

張貼留言