2013年5月14日 星期二

White, H. D. (2003). Pathfinder networks and author cocitation analysis: A remapping of paradigmatic information scientists. Journal of American Society for Information Science and Technology, 54, 423-434.


White, H. D. (2003). Pathfinder networks and author cocitation analysis: A
remapping of paradigmatic information scientists. Journal of American Society for Information Science and Technology, 54, 423-434.
本研究認為當研究者引用文獻時,在論文上提到文獻上註明的作者姓名事實上包括兩種涵意:一為作者的作品所涉及的相關主題,另一為作者本身所被認定從屬的學術專長(scholarly specialties)或是思想學派(schools of thought)、心智社群(communities of the mind),有時甚至是其相識的網絡。而利用發現於共被引者間的相識或相互通訊的連結的大型作者共被引分析(author cocitation analysis, ACA)能夠映射出接近的無形學院(invisible college)。傳統的作者共被引分析利用Pearson相關係數(Pearson's correlation coefficients)和多維縮放(multi-dimensional scaling, MDS)等統計技術,將作者根據他們被共同引用情形的相似性,映射到二維的圖形上,使每一位作者成為圖形上的一個映射點,被共同引用情形較相似的作者其映射點之間的距離較近,反之則較遠。產生的圖形僅有散佈的映射點,適合熟悉學術領域的圈內人審識或重新認識,但圈外人則不易發現其意義,甚至由於產生的圖形過於簡化,無法表達學術領域內各作者間的關係形成之典範(paradigms)的複雜性,即便是圈內人在審視時也不一定能認同圖形呈現的結果。本研究利用原始的共被引次數(raw cocitation counts)產生網路圖,藉由尋徑者網路(pathfinder networks, PFNETs)技術刪減網路上較不重要的連結線,並以Kamada-Kawai演算法繪製圖形,藉以發現學術領域上的研究主題和相關的重要研究人員。在經過PFNETs處理的網路圖上,將具有高程度中心性(high degree centrality)的作者,也就是具有較多連結線的映射點所對應的作者,視為是主要的作者。主要的作者和連結他們的其他作者構成的密集樣式可定義為他們的專長,主要作者彼此間的連結則將各個專長相關的研究主題聯繫構成整個領域。本研究以White & McCain(1998)研究同樣的作者共被引資料分析資訊科學領域的學術專長及主要作者,結果發現網路圖上各種中心性都以Salton、Garfield、Lancaster和Price等四位最高,是這個領域的主要作者,同時網路圖上的樣式也呈現1972-1995年間資訊科學的典範。
information visualization
In PFNETs, nodes represent authors, and explicit links represent weighted paths between nodes, the weights in this case being cocitation counts. The links can be drawn to exclude all but the single highest counts for author pairs, which reduces a network of authors to only the most salient relationships. When these are mapped, dominant authors can be defined as those with relatively many links to other authors (i.e., high degree centrality).
Links between authors and dominant authors define specialties, and links between dominant authors connect specialties into a discipline.
White and McCain’s raw data from 1998 are remapped as a PFNET. It is shown that the specialty groupings correspond closely to those seen in the factor analysis of the 1998 article.
During the past 20 years, several map making techniques have been tried in ACA. Raw cocitation counts and Pearson r correlations of author pairs have both been used as input; output displays have included multidimensional scaling, complete-linkage clustering, factor-loading plots, Kohonen self-organizing maps, geographic-style maps, and Pathfinder networks (PFNETs).
Afterward, the interest of the name combinations is dual.
As designators of oeuvres, names jointly connote intertextual themes—lines of exposition and perhaps controversy.
As designators of people, names jointly connote scientific or scholarly specialties, schools of thought, communities of the mind, and—sometimes— networks of acquaintances. To the extent that ties of acquaintanceship and intercommunication are found among cocitees (which is often), large-scale ACA maps approximate invisible colleges.
Even so, ACA maps have obvious limitations. To be useful, they must depict a domain the viewer already knows, or at least is curious about; names that fascinate insiders will bore outsiders. Furthermore, the maps will not—cannot— capture all of the relations among authors that give a paradigm its complexity. The whole point of ACA mapping is to simplify. But in simplifying relationships to those most salient in the database, ACA may contradict how a field is viewed in individual heads.
Pearson r detects the similarity of count profiles across all authors, and it was chosen over single-highest counts because the latter can vary across three orders of magnitude, which results in high-end pairs overwhelming low-end pairs in multidimensional scaling (cf. McCain, 1990). However, the computation of Pearson rs adds another layer of complexity to getting the maps out. PFNETs at r=INF remove this layer because they are not affected by absolute magnitudes of the counts, only by whether the counts are higher or lower when algorithmically compared.
Raw-count PFNETs are actually more informative than those made with Pearson rs, because when many authors share their highest counts with a single dominant author, specialty or subspecialty structure emerges automatically, and there is no need for a separate clustering routine.
The citation counts of several hundred contributors to information science during the 24-year period 1972–1995 were obtained in early 1996 (White & McCain, 1998). The top 120 names from this list were systematically paired and their raw cocitation counts taken from ISI’s Social Scisearch (the on-line Social Sciences Citation Index) on Dialog. Those counts are reused here.
The resizing shows four authors dominating information science in the 1972–1995 period—Gerard Salton, Eugene Garfield, F. W. Lancaster, and, to a lesser extent, Derek Price. The same four authors are also highest on two other actor centrality measures available in Pajek and UCINet, closeness and betweenness. Closeness is the inverse of “farness,” which is the sum of all shortest paths (geodesics) from any author to any other author in the network. Betweenness counts the number of geodesics on which any node lies.
One attractive feature of raw-count PFNETs is that they not only form specialties around dominant authors but also chain the specialties in
explicit sequences. The ordering of nodes in these sequences is non-arbitrary, and reveals how major topical areas in a field are connected.
If one traverses the most highly connected nodes from right to left in Figure 1, this sequence suggests itself: Markey -> Bates -> Belkin -> Saracevic -> Salton -> Lancaster -> Garfield -> Price -> Brookes. These and their associated authors represent paradigmatic information science of the 1972–1995 period (cf. Ding et al., 1999; Persson, 1994; Urs, 1995).
The authors from Markey to Saracevic share a focus on non-experimental document retrieval systems (e.g., on-line bibliographic databases, on-line library catalogs) and their users.
Salton and Garfield dominate two large central groups that I have elsewhere called, respectively, the retrievalists and the citationists. The retrievalists generally bring high formal and computational skills to problems of designing and evaluating experimental systems for document retrieval. The citationists analyze properties of the scientific and scholarly literatures from which documents are retrieved, especially the citation linkages that became amenable to study after Garfield founded the Institute for Scientific Information and its databases.
According to the closeness and betweenness measures in Table 1, he (Lancaster) is the most central figure in the map. He is also at the center of a group of generalists, many of them active for decades in the movement to automate various information services. The generalists are more oriented toward existing library and bibliographic institutions than the retrievalists, and are perhaps inclined to a more encyclopedic range of interests, including information policy issues.
The group around Garfield has further links leftward to authors who also analyze literatures, often in the context of scientific communication studies in general—various citationists, bibliometricians, and scientometricians centered on Price and B.C. Brookes. Brookes and his group represent mathematical bibliometrics.
Simon and his neighbors are used in conceptualizations of the nature of information studies; he and Zipf are also cited in bibliometrics. The Price -> Merton -> Crane line that ends in Rice is the intersection of science and technology studies with information science (both fields, for example, have used the idea of “invisible colleges”); Steven and Jonathan Cole, Harriet Zuckerman, and Thomas S. Kuhn symbolizethis area as well. Most of these social scientists have contributed to literature-based domain analysis (e.g., Kuhn stated that the history of paradigms might be tracked through citations) and fit comfortably on Garfield’s side of the map.
One way in which the PFNET in Figure 1 does differ from the multidimensional scaling (MDS) maps in White and McCain (1998) is in its rendering of “disciplinary centrality.”
PCAs in both the 1998 article and White and Griffith (1982) demonstrated that paradigms comprise “crystallized” authors, who load on a single factor, and “diffuse” or “pervasive” authors, who load less strongly on two or more factors.
The PFNET in Figure 3 (with Pearson r correlations) simply chains together the highest rs for particular author pairs. This fails to render specialties in such a way that even a complete outsider can see them, as did Figure 1 (with raw cocitation counts). ...  The Figure 3 PFNET also removes all sense of the field’s most important authors.
Pearson r correlations, and the tables and displays based on them, will retain their usefulness in certain kinds of ACA, but they do not make for the best PFNETs.
Pathfinder network analysis comes out of semantic association studies in cognitive psychology (Schvaneveldt, 1990), but it shares a foundation in mathematical graph theory with social network analysis, an active subfield of sociology and anthropology (Wasserman & Faust, 1994). A collateral benefit of working with PFNETs is that it bonds ACA to cognitive semantics and to social network analysis.
More important, the move to PFNETs makes explicit what has been true all along—that ACA is a kind of network analysis: authors as people form social networks; authors as oeuvres are formed by citers into semantically rich citation networks.

沒有留言:

張貼留言