White, H. D. and McCain, K. W. (1998). Visualizing a discipline: An author co-citation analysis of Information Science, 1972–1995. Journal of the American Society for Information Science, 49, 327-355.
vis_paper
本論文探討作者共被引方法,並將其應用在資訊科學。這個研究分析了1972到1995年間12份資訊科學相關期刊內的作者共被引資料,以每八年為一期,所以整個24年研究共3期,每一期均找出被引用次數最多的前100位作者,整個期間共120位,其中的75位在三個時間都有出現。本研究使用的方法與結果分別如下
1) 對120位作者與其他作者的共被引次數形成的矩陣進行Pearson相關係數分析,再利用主成分分析(principal components analysis)與最大變異轉軸(varimax rotation)進行因素分析(factor analysis),了解資訊科學的專業(specialty)結構。以特徵值(eigenvalue)大於1決定抽取的因素數目,每一個因素代表一個專業,如果作者在某一特定的因素上具有0.3以上的負荷(loading),便視為引用者一般認為這位作者具有這個專業。由於作者可能在多個因素上都有超過0.3的負荷,因此每位作者可能會具有多種專業。在本研究中,共抽取出12個因素,可以解釋84%的變異情形,這些因素中前8個特徵值較大,可以從作者辨識的資訊科學專業為 a)設計與評估文件檢索系統的實驗檢索(experimental retrieval);b) 研究科學研究文獻關連的引用分析(citation analysis);c)應用於實際資料庫的實務檢索(practical retrieval);d) 從文字及書目資料分布規律探討數學模型的書目計量學(bibliometrics);e) 研究圖書館自動化、圖書館運作等議題的一般圖書館系統理論(general library systems theory);f) 研究資訊需求與使用的使用者理論(user theory);g) 研究科學的社會系統(social system of science)的科學傳播(scientific communication);h)OPAC ;另外幾個因素則由研究被引入資訊科學的其他領域學者組成。根據各專業上的作者交互情形以及下述映射圖的結果,資訊科學可以分為對於知識文獻以及其社會脈絡的分析研究和人-電腦-文獻的介面研究等兩個次學科。
2) 根據120位作者在3個時期的平均共被引次數,分析他們在各時期的代表性與影響力。
3) 以作者的共被引次數矩陣所產生的相關係數,也就是他們被引用者一般認定的相似性,做為他們之間的關連性,利用多維縮放技術ALSCAL,將每個時期前100位作者映射成圖形,使得共被引次數分布彼此相似的作者在產生圖形上的映射點有較近的距離。並以叢集分析技術CLUSTER進行完全連結叢集(complete linkage clustering),將作者根據他們之間的關連性分為次學科。結果發現,屬於同一個專業的作者在圖形上的映射點彼此間的距離比較近。並且如先前類似的研究所指出的,資訊科學很明顯地可以區分為資訊檢索及領域分析(domain analysis)等兩個次學科。比較不同時期的圖形,雖然少部分的作者映射點有明顯移動,但大多數的作者其映射點的位置相當穩定。
4) 從三個時期的映射圖上作者映射點位置的改變情形產生映射圖,表示作者引用形象(citation image)的改變。
5) 以經典作者(canonical auhtors)在三個時期的共被引相關係數為輸入,利用INDSCAL評估三個時期維度的重要性,從引用的角度驗證學科是否發生典範轉移的情形。結果發現表示「人-電腦-文獻」介面(human-computer-literatures interface)的第二個維度比起表示資訊科學主題專業的第一個維度在三個時期的重要性有大的變化,1972-1979年的第一時期這個維度的重要性不高,1980-1987年的第二時期其重要性則大幅增加,到了1988-1995年第三時期則稍微減少。許多研究者認為資訊科學在1980年代有典範轉移(paradigm shifting)發生,White and McCain上述的結果可以驗證這個現象。
We defined the authors of information science as all those cited in 12 journals, as listed below. Authors were ranked in order of citedness for the entire period covered by Social Scisearch, 1972–1995. Co-citation data were retrieved for all pairs in the top-ranked 120, from which we produced:
1) A factor analysis of the 120 authors for the entire 24-year span, 1972–1995, which reveals the specialty structure of the discipline. Factor analysis, unlike multi-dimensional scaling and clustering, can show an author’s contribution to more than one specialty.
2) Analyses of the 120 authors’ mean co-citation counts, which indicate their standing and influence in the discipline as of 1972–1979, 1980–1987, 1988–1995, and at the end of the three periods combined.
3) Two-dimensional maps of the top 100 authors in each of the 8-year periods (made with ALSCAL, the SPSS multidimensional scaling program) .
4) A map of authors whose ‘‘citation images’’ changed markedly over the years of our study.
5) A two-dimensional composite map of the authors who are in the top 100 in all three periods—some 75 in all. Their most cited works arguably make up the canonical literature of information science. Certain statistics generated by the mapping routine (INDSCAL, a part of ALSCAL) may bear on paradigm shift in the discipline.
In any field of scholarship, writers make judgments as to who has written on what, using what methods, and they reflect the judgments in their citing practices. Aggregated over time, these practices assume definite structure: Writers show commonalities in how they judge the subject matter, methodology, and intellectual style of other writers; for example, they often attach the same meanings and significance to precedent works (Cozzens, 1985; Small, 1978) .
It suggests how authors are commonly viewed on two dimensions, often interpretable as subject matter and style of work. ... Author clusters placed on these two dimensions can be interpreted as specialties within a discipline (White, 1990a, 1990b) .
What is actually mapped is an author’s citation image. Everyone ever cited has one, but only those who have been cited in many writings are likely to figure in ACA. In the latter case, the image has a constant part, the author’s identity as it is rendered in successive reference lists. The image also has a variable part, the gradually increasing set of other author-names that co-occur with a given author in those lists. At the end of a time period, ACA sums up the record by mapping the author as a single point among other selected author-points on the basis of the repeated co-occurrences. Authors with similar profiles of co-occurrences are displayed close together.
The decisive argument for ACA is that it enables one to see a literature-based counterpart of one’s own overview of a discipline.
As is well known, the closeness of author points on such maps is algorithmically related totheir similarity as perceived by citers. We use Pearson r as a measure of similarity between author pairs, because it registers the likeness in shape of their co-citation count profiles over all other authors in the set.
The raw co-citation counts were converted to Pearson r correlation matrices by the FACTOR routine in SPSS, and factors were extracted by principal components analysis with varimax rotation. The default criterion of ‘‘eigenvalues greater than one’’ determined the number of factors extracted.
The Pearson r correlation matrices for ALSCAL and CLUSTER in SPSS were generated with another SPSS rountine, CORRELATIONS ( cf. McCain, 1990) . They were treated as nonmetric (ordinal) similarity data in ALSCAL and grouped by the complete linkage method in CLUSTER. Subdisciplinary groupings of the author points on the maps are based on the dendograms from CLUSTER.
Authors in the top 100 in all three periods—‘‘the canonical 75’’—were separately mapped with INDSCAL, a routine in the ALSCAL bundle that does a specialized kind of multidimensional scaling. The input data to INDSCAL are judgments on the similarity of a set of stimuli by a set of judges. INDSCAL reveals not only the judges’ composite view of the stimuli in multidimensional space, but the weight each individual judge gives each dimension; INDSCAL is short for ‘‘individual differences scaling.’’ We used the individual weights in a new way to explore the notion of ‘‘paradigm shift’’ as it affects the canonical 75.
The two-dimensional space in which the authors appear is relative, not absolute, and it fails to capture certain relationships among oeuvres that appear in higher dimensionality.
Specialties
The results of the factor analysis, incorporating 24 years’ worth of data for the 120 authors, are presented in Table 3. ... Twelve factors were extracted; jointly (R2 ) , they explain 84% of the variance. ... The first eight factors alone explain 78% of the variance. All have seven or more authors with loadings greater than 0.60 and may be interpreted as specialties within the discipline.
The two biggest specialties, obviously, are experimental retrieval, which focuses on the design and evaluation of document retrieval systems, and citation analysis, which focuses on the interconnectedness of scientific and scholarly literatures, usually with data from ISI.
The third biggest specialty we have labeled practical retrieval. Unlike the experimental retrievalists, the authors in this group, rather than working with content-neutral indexing theory, thought experiments, or document testbeds, have tended to discuss retrieval in terms of ‘‘real world’’ databases; terms such as ‘‘INSPEC’’ or ‘‘DIALOG’’ occasionally profane their pens.
The next specialty we call bibliometrics—a word often used to subsume the specialty we labeled citation analysis. However, unlike the citationists, the authors who load primarily here, including the pioneers Lotka, Bradford, and Zipf, are most interested in mathematically modeling certain regularities in textual or bibliographic statistical distributions, irrespective of the literatures from which they come.
General library systems theory is a not altogether satisfactory name for a body of writings on library automation, library operations research, library and information service policy, retrieval system evaluation, and many other interconnected topics.
The specialty we call user theory is appropriately headed by Dervin, author of a highly cited chapter on ‘‘information needs and uses’’ in the 1986 ARIST. ... It will be seen that authors who write about literatures—the citationists, bibliometricians, and scientific communication people—never load above 0.30 on this factor, apparently because citers do not perceive their work as having the right psychological content. On the other hand, quite a few retrievalists load above 0.30, and this suggests the nature of the cognition involved. It has to do with problem-solving at the interface where literatures are winnowed down for users with: Question formulation, search strategies, information-seeking styles, relevance judgments, and the like.
Authors loading mainly on scientific communication all have strong disciplinary identities outside L&IS—for example, in sociology. They may be thought of as explicating the social systems of science, including those in which formal publication of results is an important (but not the only important) part. The sociologists among them all have loadings, some quite high, in citation analysis, confirming their relevance to the study of scientific literatures.
The design of computerized library catalogs, especially for subject searching, is the province of authors who load on OPACs (online public access catalogs) . It makes sense that leading authors here, such as Matthews, Hildreth, Cochrane, and Drabenstott, load secondarily in practical retrieval, just as several of the primary authors there, such as Borgman and Fidel, also turn up here.
As was said, the chief remaining factor seems a collection of authors in other disciplines from whom information science has imported ideas—e.g., cognitive science (Winograd) , information theory (Shannon) , computer science (Knuth)—that are all variously relevant to the central concern of information science, the human–computer–literature interface.
In fact, as both author cross-loadings and the maps below suggest, almost all of the factors or specialties in Table 3 can be aggregated upward into two larger subdisciplines: (1) The analytical study of learned literatures and their social contexts, comprising citation analysis and citation theory, bibliometrics, and communication in science and R&D; and (2) the study of the human–computer–literature interface, comprising experimental and practical retrieval, general library systems theory, user theory, OPACs, and indexing theory.
The Maps
Figures 2 through 4 are our 8-year period maps. We shall use them to explore the idea, introduced earlier, of two subdisciplines in information science.We operationalize this idea as the last two clusters joined in a complete-linkage clustering of 100 authors. These final clusters, which are brought together only after all closer ties have been exhausted, are separated by an angled line superimposed on each map.
We have not, as in the past, drawn lines around smaller clusters of authors corresponding to their specialties. The crowding of many names on the maps makes this difficult, and, besides, the specialties are better conveyed by the factor analysis of the earlier section. To a great extent, however, the authors forming specialties in the factor analysis will be found to have been placed near each other in the maps.
The first finding to note is the overall stability of information science, as here defined. Some author-points undergo remarkable changes of position from map to map, but many more authors stay put in discernible specialties. Fully 75, moreover, persist through all three maps.
We conclude that author co-citation analysis is useful for rendering the inertia of fields. In other words, it objectively captures the slow-changing divisions on which one’s subjective sense of ‘‘semi-permanent’’ disciplinary structure rests.
Co-citation analysis of papers, as opposed to authors, captures disciplinary history at a different, faster rate, which may better suit fields with livelier research fronts than information science.
However, ‘‘domain analysis,’’ as put forward by Hjørland and Albrechtsen (1995) , seems a more appropriate choice. It incorporates citation analysis and bibliometrics, but also a range of topics broader than what ‘‘bibliometrics’’ usually implies— for example, scholarly and professional communication, parts of sociology of science and sociology of knowledge, interdisciplinary linkages, discourse communities, and disciplinary vocabularies (cf. Beghtol, 1995) .
ACA’s confirmation of expert judgments by Hjørland and Albrechtsen, Persson, and the Vickerys is consistent with the claim that citation databases can be exploited for non-experts in a form of AI.
The axes in INDSCAL maps are not subject to rotation and are supposed to be maximally interpretable. Thus prompted, we think the horizontal axis conveys, as in past studies, the range of subject specialties within the subdisciplines of domain analysis and information retrieval. ... Coherent groups from left include the citationists, the arc of bibliometricians across the top and the philosophically orienting figures across the bottom, ‘‘generalist’’ writers such as Smith, Wilson, Saracevic, and Swanson, and the hard and soft retrievalists. The plot generally makes good sense. For example, it is easy to accept Bookstein, Tague-Sutcliffe, Kantor, Buckland, Vickery, and Shaw as transitional figures between the retrievalists and the bibliometricians.
The more interesting vertical axis reflects another subject-related continuum. Information science deals, we said earlier, with ‘‘the human–computer–literature interface.’’ If so, then the top pole represents a relative emphasis on literatures as objects of study, and the bottom, a relative emphasis on people or users. The same polarity can be inferred in earlier maps. Figure 4 showed that when a literature theoretician like Egghe enters, it is automatically at the top, whereas a user theoretician like Dervin is automatically placed at the bottom.
However, INDSCAL is expressly designed to reveal differences in the importance of each dimension to whoever is judging the similarity of stimuli. In our use of INDSCAL, the stimuli are the 75 authors, and the three periods are regarded as three separate ‘‘judges.’’
Usually, of course, persons are the judges in INDSCAL studies, and the ‘‘derived subject weights,’’ which are standard INDSCAL output, are taken to show the salience of each dimension to each person. In replacing individuals as judges with large numbers of citers, we are acting as if the citers collectively embodied the paradigm of information science in each 8-year period.
Accordingly, we interpret the derived subject weights for each period as indicating the relative importance of the dimensions within the paradigm. Thus, we can probe a hidden aspect of disciplinary history—whether key dimensions of the field were given about the same weight in all periods. If not, that would be consistent with a perception of paradigm shift.
Substantively, it is as if during 1972–1979 citers had regarded the range of specialties as by far the most important part of the information science paradigm, but then during 1980–1987 had taken much more cognizance of the differences in authors’ orientation toward literatures or users.
Perhaps the main weakness of this INDSCAL measure is that it is so indirect—that is, not clearly connected to specific papers with specific claims about the world. One expects evidence of paradigm shifts to leap from main texts, not references; from writers, not citers.
Though it might be used to discover paradigm shift, we think it has more promise as a means ofconfirming one. ... A shift detectable there implies not only that authors are promoting new lines of inquiry, but that citers are responding in such a way that the overall map of the discipline is changed.
Toward that account, ACA simultaneously provides both breadth and focus. It provides breadth by forcing contemplation of multiple specialties... It provides focus by forcing contemplation of particular authors, which is to say particular oeuvres and works. It also provides crude but unmistakable evidence of intellectual change.
The role of information science is to explicate the conceptual and methodological foundations on which existing systems are based’’ (Borko, 1968, p. 67). Or ‘‘Information science is the study of the means by which organised structures (which we call ‘information systems’) process recorded symbols to meet their defined objectives’’ (Hayes, 1985, p. 174) .
What they do study empirically, and uniquely, are problems associated with the human–literature barrier—the special difficulties of obtaining answers to questions from publications, in any medium, rather than persons. In other words, while many scholars seek to understand communication between persons, information scientists seek to understand communication between persons and certain valued surrogates for persons that literatures comprise (White, 1992).
This study requires a conceptual scheme that encompasses properties not only of literatures(e.g., size, growth rate, age, dispersion, authority levels, degree of summarization, quality of indexing) but also of people (e.g., interests and concerns, vocabularies, social ties, knowledge of existing systems, search styles, editorial strategies, resource environments).
The bond between domain analysts and retrievalists is their common interest in the literature barrier and related phenomena on both sides. The barrier in action is exemplified by information overload and underload—recurring topics for authors in both subdisciplines because they require both literatures and users to be discussed in a single framework, as implied by the second dimension of our maps.