2014年1月27日 星期一

van den Besselaar, P. (2001). The cognitive and the social structure of STS. Scientometrics, 51(2), 441-460.

van den Besselaar, P. (2001). The cognitive and the social structure of STS. Scientometrics, 51(2), 441-460.

本研究利用作者共被引分析(author cocitation analysis)分析STS領域的社會結構,探討做為次領域間連結的作者或研究機構。本研究將STS領域分為STS的量化研究次領域(the qualitative STS sub-field)、STS的質性研究次領域(the qualitative STS sub-field)和政策導向次領域(the policy oriented sub-field),並且以Scientometrics期刊為STS的量化研究的代表,Social Studies of Science和Science, Technology and Human Values兩種期刊代表STS的質性研究,Research Policy則是STS政策研究的代表。針對1986到1997年間在這些期刊上被引用超過25次的229位作者,建立他們的共被引矩陣,然後進行因素分析(factor analysis),查看這些作者被歸類的情形,並且與上述的次領域進行比較分析。此外,本研究也探討被不同次領域引用的作者、不同次領域之間的作者的合作關係以及有多少位作者在不同的次領域發表論文?

Table 1表示762、305、304及569位作者分別曾在Scientometrics、Social Studies of Science、Science, Technology and Human Values以及Research Policy等期刊發表論文,Scientometrics和Research Policy的作者平均在對應的期刊上發表1.5及1.7篇,比Social Studies of Science和Science, Technology and Human Values的1.1篇來得高。曾在四種期刊發表論文的作者則是1756位,平均每位作者發表的論文數為1.4。


共有759個機構曾在四種期刊上發表論文,但只有少數的機構有較高的生產力,例如超過11篇論文的機構僅有41個。此外,從Table 2也可以發現有些高生產力機構的論文是在不同次領域的期刊上發表。

共有65個國家在四種期刊上發表論文,其中大多數的國家(57個)有在Scientometrics上發表,但其他三種期刊都僅有約半數的國家有發表的紀錄。

將作者共被引矩陣進行因素分析後,較大的因素共有7個,依作者撰寫論文的內容將各因素命名。其中第1個因素和第6個因素間有很大的關係,第1個因素有大半數的作者的次高負荷是落在第6個因素上,反之亦然,第1個因素和第6個因素的研究主題為科技政策相關的STS研究。第2個因素的研究主題為STS的質性研究。第3個因素和第4個因素、第5個因素以及第7個因素彼此間的作者有關係,這些因素可以視為是STS的量化研究,進一步來說,第4個因素、第5個因素和第7個因素的主題分別是科學社會學(Sociology of Science)、詞語共現分析和資訊計量學。

接下來,Table 4 分析各次領域的專家(specialists)以及兼通兩門或以上的通才(generalists)。本研究將專家定義為在該次領域發表的論文數超過該領域論文總數0.69%以上的作者,STS的量化研究、質性研究和政策導向研究等次領域各有31、23和41位。量化研究次領域的專家並且也發表質性研究相關論文的作者有6位,反之質性研究次領域的專家並且也發表量化研究相關論文的作者只有2位。量化研究次領域的專家同時發表政策導向相關論文的作者有14位,政策導向研究次領域的專家並且也發表量化研究相關論文的作者則有11位。從以上數據顯示,量化研究與其他兩個次領域的關係主要是由量化研究次領域的研究者在維繫著,也就是量化研究次領域的研究者是主要的跨邊界者(boundary spanners)。


Table 4上也可以發現一些從質性研究次領域跨越政策導向研究的研究者,這個研究結果修正了先前認為質性研究次領域比較獨立的看法。

The differentiation of scientific fields into sub-fields can be studied on the level of the ‘scientific content’ of the sub-field, that is on the level of the products, as well as on the level of the ‘social structures’ of the sub-field, that is on the level of the producers of the content.

By comparing the behavior of the constructs with the behavior of the constructors, we are able to demonstrate the analytical distinction between a cognitive and a social approach in an empirical way.

Although we are able to distinguish analytically between the cognitive and social dimension of the development of the research field, we find similar patterns of differentiation on the social level too. At the same time, this differentiation differs in some respects from the cognitive differentiation pattern.

Consequently, the social and the cognitive dimensions of the STS field are not independent – as no serious STS scholar would argue – but also not identical, as radical constructivists claim, but are strongly interacting.

It was claimed that scientometrics has to focus more on the role it can play for qualitative STS, and that scientometric researchers should refrain from sterile data and mathematics. It was felt that scientometric results have to be carefully interpreted from a substantial perspective, to be meaningful for S&T policy.

There, we showed that the journals Social Studies of Science (SSS) and Science, Technology and Human Values (STHV) form a reasonable operationalization of the qualitative STS sub-field. Research Policy represents the policy oriented sub-field, and Scientometrics can be used as a representation of the quantitative STS sub-field. These journals are central in STS as they have the highest impact factors in their respective sub-fields.

In this paper we will use the same boundary of STS to analyze the social structure of the field: who are the authors and what are the research groups in the field as defined by the mentioned journals? Do they function as the ties between the various sub-fields?

Data about authors and institutional affiliation can be found on the CD-ROM version of the Social Science Citation Index (SSCI). We downloaded the full records for all publications in the four journals for the period 1986-1997.* This resulted in a database with 3579 records. ... Finally, as is usual in scientometric studies, for further analysis we restricted the database to Articles, Reviews, Notes, and Letters, and excluded other document types. This resulted in a final set of 1787 documents.

Referring to a text may indicate the use of a knowledge claim to support one’s own position, or to oppose to. Referring to persons, on the other hand, may indicate the existence of a social relationship. Therefore we will use author co-citation analysis as a first methodology to analyze the social structure of the STS field. In this way, we will describe the STS field in terms of clusters of authors that are placed near each other by the scholars active in the field.

Using the prepared database and bibexcel, an author co-citation matrix has been produced of all cited 229 authors with more than 25 citations over the 1986-1997 period. Factor-analyzing (principal component analysis, varimax rotation with Kaiser normalization) this matrix results in clusters of authors, and the question is whether these clusters differ from the three sub-fields of qualitative, quantitative, and policy oriented STS.

If a communication system shows considerable segregation, individual researchers (or institutes) could play the role as weak ties [3] between the sub-fields.
(i) Authors can refer to materials from other sub-fields. We classify these authors as being active on the borders of the sub-fields. The border between sub-fields A and B is then defined as the authors of papers in sub-field A referring to papers in sub-field B, and the other way around. How densely populated are the borders between the subfields?

(ii) Authors can cooperate with colleagues active in the other sub-fields. Even if authors specialize, research groups and institutions may cover more sub-fields, and this could indicate social integration of the field on a more informal level of communication.

(iii) Generalist authors work in various sub-fields. Do many authors publish in more than one sub-field, or do we see a specialization and differentiation on the level of individual scholars? How many generalists can be found among researchers and institutions? The larger numbers we find, the stronger is the degree of communication between the sub-fields.



The average number of authors per article is 1.4, but this figure is higher in Scientometrics (1.5) and in Research Policy (1.7), but considerable lower (1.1) in the two qualitative STS journals.



As expected, the number of frequently publishing institutes is rather small, compared to the grand total.

If we aggregate one more step, to the level of countries, we find 65 countries active in the STS field, of which some 57 are active within scientometrics. However, only half of the countries are publishing in the qualitative journals SSS and STHV. The same is true for Research Policy.

Factor analyzing the author co-citation matrix resulted in a solution of 22 factors with an eigenvalue larger than 1. Inspecting the scree plot shows that seven factors dominate the structure, and these factors explain more than 70% of the total variance. More than 90% of the 220 cited authors have their highest factor score on one of these seven factors.



The authors in Factor 1 are within science & technology policy studies and in research & innovation management studies, or in related fields in management and economics. The same holds for the small Factor 6. Half of the authors in Factor 1 have a relatively high second factor loading in Factor 6, and all authors with their highest loading on factor 6 do have a high second loading on Factor 1.

Authors with their highest factor loading on Factor 2 all belong to qualitative STS, and they generally do not load on other factors.

Factor 3 represents quantitative STS. Most authors with the highest loading on Factor 4 can be characterized as traditional sociology of science (e.g., Merton). Factor 5 represents coword analysis, and Factor 7 represents informetrics and scientometric distributions (e.g., Bradford and Lotka). Between the Factors 3, 4, 5, and 7 we find a considerable ‘interfactorial complexity’: the authors loading highest on Factor 3 often have a substantial second loading on one of the Factors 4, 5, or 7. The same is true the other way around.

Therefore I also created the author co-citation matrix of all authors with more than 25 citations over the whole period with the highest loading on the Factors 3, 4, 5, or 7. Authors that have a second loading on these factors of more than 0.2 are also included. This set of authors represents the sub-field scientometrics.

Factor-analyzing this matrix in a similar way results in seven substantial factors. Inspection of the factors shows that they represent the following research foci: Policy oriented scientometrics (Factor 1); Empirical science & technology studies (Factor 2); Coword analysis (Factor 3); Scientometric distributions (Factor 4); Critique of scientometrics (Factor 5); Patent studies (Factor 6); Economics of technical change (Factor 7). This result corroborates that the method is suited for analyzing the fine structure of research fields.

If we now summarize these findings, the factor-structure of the co-citation matrix of STS reproduces the clear split between policy oriented STS (Factor 1 plus 6), qualitative STS (Factor 2), and quantitative STS (Factors 3, 4, 5, 7), while at the same time showing some internal differentiation in the sub-field of scientometrics. In other words, the author co-citation analysis reveals a similar structure as the journal-journal citation analysis did.8

Firstly, we distinguish the groups of specialists, which consist of the authors with relatively high numbers of publications in one of the various sub-fields of STS. We consider a scholar as specialist in one of the sub-fields, if he or she is (co-) author of at least 6, 4, or 3 publications respectively in quantitative, qualitative, or policy oriented STS. In this way, the threshold is about the same in the three sub-fields: 0.77%, 0.69%, and 0.73%.

Secondly, we have the semi-generalists, the groups of authors active in two of the three sub-fields each. A semi-generalist is defined as an author who has published at least two publications in two of the three sub-fields.

Finally we have the group of generalists, publishing in all the three sub-fields, again based on at least two publications per sub-field.




The number of specialists in Scientometrics is 31, and only six of them have published in SSS or STHV. The other way around we identified only 2 authors. This implies that the more quantitative researchers maintain the relations between these two sub-fields

The number of Scientometrics authors also publishing in Research Policy is much higher, and some 45% of the scientometrics specialists also work – at least incidentally – on S&T policy topics. Researchers frequently publishing in Research Policy publish a little less (27%) in Scientometrics, but this is still a substantial number.

This underlines our earlier conclusion that research policy and management is related to scientometrics for the part of using scientometrics in research evaluation, but not much wider.8

Between Scientometrics and Research Policy, as well as between Scientometrics and SSS/STHV, most of the authors who maintain the relation have most publications in Scientometrics, and generally only a single publication in one of the other journals. This implies that the relations between the sub-fields (also the very weak one’s) are maintained to a large extent by scientometricians.

Between Research Policy and SSS/STHV the picture is more balanced, with a weak emphasis on the SSS/STHV authors. The number of authors publishing both in qualitative STS and S&T policy studies is very low, as is the number of authors publishing both in quantitative STS and in qualitative STS. Only the number of authors publishing in both quantitative STS and S&T policy studies is substantial.

Lowering the threshold increases the number of (semi-)generalists, but of course most of them have a very low number of publications, and the scientometricians are the boundary spanners, much more than the others.

However, a larger number of qualitative authors than expected is also active in the S&T policy studies. Only this latter finding modifies slightly our earlier conclusion that qualitative STS is an isolated sub-field.

We use a 3% threshold, and various organizations that exceed this threshold in one of the sub-fields are in Table 6. Three of the eight organizations are specialized in only one sub-field. Four others are specialized in two sub-fields, and only one organization is a generalist one, and active in three sub-fields.

In other words, there is a relatively low level of specialization here, as most of the institutions seem to be rather active in more sub-fields.

If we decrease the threshold to 2%, another 15 institutions count as specialists. However, of these 15 institutions only a few are active in more sub-fields. This implies that the most productive institutions within STS are also the broadest in their covering of the field.

Where the cognitive analysis showed that the relationship between scientometrics and S&T policy studies is stronger than the relations between qualitative and quantitative STS,8 on the level of the conferences (and as shown before, on the level of research institutes) it is the other way around. In other words, the institutional structures and the cognitive structures are not identical.

If we summarize the findings, we see that the cognitive patterns of integration and (mainly) differentiation to a large extent are visible within the social structure of the field.

The social relations between quantitative STS and policy oriented STS are similar to the cognitive relations between the two sub-fields. The links, however, between the two sub-fields are only between a substantial part of scientometrics and a small part of S&T policy studies, namely the part focusing on evaluation and performance studies.

The larger part of S&T policy studies is on technological innovation and on evolutionary approaches to technical change, and these research topics are not related to the research front in scientometrics, as the author co-citation analysis underlines.

Most importantly, we found that the interaction between qualitative and policy oriented STS is much stronger on the social level of authors and institutions than on the cognitive level of documents.

This may explain why the discussants in the panel session quoted earlier in this paper saw different divides, than the one’s I revealed in Ref. 8: the social structure of the STS field is not identical to its cognitive structure.

Within the mainstream of STS it is generally accepted that the production of knowledge and the grounding of knowledge claims consists of a ‘seamless web’ of cognitive and social elements.

2014年1月26日 星期日

Glenisson, P., Glänzel, W., Janssens, F., & De Moor, B. (2005). Combining full text and bibliometric information in mapping scientific disciplines. Information Processing & Management, 41(6), 1548-1572.

Glenisson, P., Glänzel, W., Janssens, F., & De Moor, B. (2005). Combining full text and bibliometric information in mapping scientific disciplines. Information Processing & Management, 41(6), 1548-1572.

本研究以詞語共現分析(co-word analysis),將Scientometrics期刊2003年發表的論文,歸類為六個叢集。為了瞭解叢集結果的有效性,將這個結果與專家歸類的結果進行比較,同時也利用書目計量指標分析各個叢集。

Braam, Moed, and Van Raan (1991)建議利用詞語分析(word analysis)評估共被引叢集分析的結果,這些詞語利用書目紀錄裡的索引詞(indexing terms)和分類碼(classification codes)作為基礎。

在專家歸類方面,本研究援引Schoepflin and Glänzel(2001)研究的六個類別:數學模型與資訊計量學法則(Mathematical models/informetric laws)、個案研究(Case studies)、科學計量學的進展(Advances in Scientometrics)、指標工程(Indicator engineering)、社會學方法(Sociological approaches)與政策相關議題(Policy relevant issues)。加上近年興起的網路計量學(Webometrics)後,本研究用來進行專家的歸類的類別為:科學計量學的進展(Advances in Scientometrics)、實務論文與個案研究(Empirical papers/case studies)、數學模型(Mathematical models)、政策議題(Political issues)、社會學方法(Sociological approaches)以及資訊計量學與網路計量學(Informetrics/Webometrics)。下表是共詞分析與專家歸類的比較結果:

除了較大的A與E類別分布在多個叢集外,較小的類別大多集中內一個或兩個叢集上。

從六個叢集上的論文在專家以及它們的詞語網絡,可以將這些叢集分別:叢集1是書目計量學指標的方法學研究(methodological indicator research),這些指標用來測量發表活動(publication activity)以及引用影響(citation impact)的研究;叢集2大多為有關於國家和機構方面或科學領域的個案研究(case studies)與實務性的論文(empirical papers);叢集3和叢集1同樣是理論與方法學問題相關的論文,但更著重在資訊計量學法則(informetric laws)、頻率分布(frequency distributions)與多變化統計(multivariate stattistics)等先進方法學技術。叢集4是網路計量學和其他網路相關議題;叢集5是論文數較少的叢集,總共僅包括3篇論文,這些論文與共被引分析以及其他引用統計的分析有關;叢集6則是最大的叢集,包含的面向相當廣泛,從社會學、政策到科技等許多相關主題。從上述的分析,可以了解科學計量學目前主要的兩個面向是基於科學計量學標準技術的方法學研究和擴展傳統書目計量學範圍的實務研究。

接著利用平均參考文獻年齡(mean reference age)和連續出版品所占部分(share of serials)等書目計量學特徵分析上述的叢集結果。如下圖所示
在各個叢集裡,網路計量學具有低參考文獻年齡的特徵,並且連續出版品所占部分為中到高。政策議題相關的論文大部分具有相對低的連續出版品所占部分,但另有一群論文的連續出版品所占部分則明顯地高,因此相關的論文在圖形上分成兩個子叢集。至於科學計量學的先進方法與技術,除了少數例外,大部分的論文的平均參考文獻年齡在5到15年間,連續出版品所佔的部分則是在50%到90%間。實務性研究的論文在連續出版品所占部分的特徵分為兩群,一群的連續出版品所佔部分較低(<=55%),另一群則較高(>=67%),較低的一群與政策相關研究具有類似的特徵。

The question how bibliometric measures can, in turn, be assumed to reflect formal characteristics of documented scientific communication that might supplement results obtained from content-based analyses could also be answered in a positive way. Reference-based citation measures can help to fine-structure clusters determined on basis of co-word analysis.

Braam, Moed, and Van Raan (1991) suggested combining co-citation with word analysis in the context of evaluative bibliometrics to improve efficiency of co-citation clustering. The word analysis by Braam et al. used publication ‘‘word-profiles’’ that were based on indexing terms and classification codes.

Not much later, Noyons and Van Raan (1994) and Zitt and Bassecoulard (1994) demonstrated the appeal of plunging into contents by using keywords from both patent—and scientific literature to characterise the science-technology linkage.

The study by Schoepflin and Glänzel aimed at monitoring and characterising structural changes in the research profile in bibliometrics in the period 1980–1997. The authors created five categories, Mathematical models/informetric laws, Case studies, Advances in Scientometrics, Indicator engineering, Sociological approaches and Policy relevant issues. The term Webometrics did not yet appear in this scheme since at that time it was not yet established as a sub-discipline of scientometrics/informetrics.





We see classes S, M, I and P, admittedly all of smaller size, moderately to well conserved in the text-based cluster structure. Conversely, papers assigned to the larger classes A and E are heavily shifted around the text clusters.

The map in Fig. 6 represents the content structure of cluster 1 with altogether 9 papers. This cluster represents publications that are concerned with methodological questions related to bibliometric indicators. Indicator-related terms such as indicator names and terms relevant in the context of measuring publication activity and citation impact are close to the centre, and strongly interlinked. ... One could consider this cluster representing methodological indicator research.



Cluster 2 is dominated by empirical papers and case studies (cf. Table 3). ... The terms in this map are presented in Fig. 7 and relate above all to national and institutional aspects as well as to science fields. This is the cluster of case studies and traditional bibliometric applications.



Cluster 3 is a second theoretical/methodological cluster. Unlike the first one, this cluster relates to more advanced methodological techniques, such as informetric laws, frequency distributions and multivariate statistics. This cluster could be characterised as theoretical and mathematical issues in bibliometrics. The term structure is presented in Fig. 8.




Cluster 4 presented in Fig. 9 clearly represents webometrics and network-related issues. All terms are strongly interlinked. This cluster corresponds by and large to the category of Webometrics/Informetrics.



Cluster 5 with 3 papers is the smallest one. Co-citation analysis and the analysis of other citation statistics are the topic of these papers. The term structure (cf. Fig. 10) reflects the statistical vocabulary used in these studies. This cluster covers specific applications of statistical methods.



The last cluster with 30 papers (see Fig. 11) is by far the largest one. It comprises technology and innovation related studies, the science-technology interface and almost the complete Triple Helix issue can be found here (cf. Table 3). Also the sociological approaches are covered by this cluster. This cluster can be considered a borderland of classical scientometrics, namely the interdisciplinary approaches such as sociological, policy relevant and technology related issues.



The two large categories A and E covering 65% of all papers proved heterogeneous. Category A has (jointly with category M) three sub-clusters, namely, Cluster 1, 3 and 6, whereas Category E falls apart into three other sub-clusters: Cluster 2, 5 and 6. Policy relevant issues are also covered by clusters 2 and 6. Only Category I is represented by a corresponding co-word cluster, namely cluster 4.

The full text analysis substantiates that both methodological and empirical research have nowadays at least two different main focuses each, one is based on scientometric standard techniques such as classical indicators, the other ones are clearly broadening the scope of traditional bibliometrics.



As already seen in the pilot study, Webometrics is characterised by low reference age and medium–high share of serials (cf. Glenisson et al., 2005).

Most of the policy related issues are characterised by relatively low share of serials. Nevertheless, there is a group of papers with clearly higher share, too. This confirms the results of the full text analysis, namely that this category practically forms two sub-clusters.

The category Advances in Scientometrics proves strikingly homogeneous with several outliers only. Most of the A-class papers have, however, a mean reference age ranging between 5 and 15 years, with medium–high share of serials ranging between 50% and 90%.

The empirical groups proved heterogeneous, indeed. Regarding the share of serials this class forms two distinct sub-classes, particularly, one with low share (<=55%) and one with relatively high share (>=67%). The class with lower share has similar characteristics as the policy relevant class.

The question how bibliometric measures can, in turn, be assumed to reflect formal characteristics of documented scientific communication that might supplement results obtained from content-based analyses could also be answered in a positive way. Reference-based citation measures can help to fine-structure clusters determined on basis of co-word analysis.

2014年1月25日 星期六

Janssens, F., Leta, J., Glänzel, W., & De Moor, B. (2006). Towards mapping library and information science. Information Processing & Management, 42(6), 1614-1642.

Janssens, F., Leta, J., Glänzel, W., & De Moor, B. (2006). Towards mapping library and information science. Information Processing & Management, 42(6), 1614-1642.

本研究利用詞語共現分析(co-word analysis)技術,區分出六個圖書資訊學的研究主題:兩個書目計量學主題、一個資訊檢索主題、一個一般議題、一個網路計量學主題以及一個專利研究主題。

詞語共現分析根據詞語共同在文件出現的現象描述文件的內容,利用共同出現的相對強度呈現領域的概念網絡(concept networks)。目前已經有植物生物學(de Looze and Lemarie, 1997) 、凝態物理(Bhattacharya and Basu, 1998)、化學工程(Peters and van Raan, 1993)、資訊檢索(Ding, Chowdhury, and Foo, 2001)以及 醫學(Onyancha and Ocholla, 2005)等多個領域曾利用詞語共現分析技術來研究領域內的概念網絡。Van Raan and Tijssen (1993)討論基於詞語共現分析的書目計量在知識論的潛力(epistemological potentitals)。相較於共被引分析,詞語共現分析能應用在沒有引用索引的資料,而且共被引分析會因為在領域的變動與趨勢以及引用者的行為而變得複雜(Noyons & van Raan, 1998)。雖然Leydesdorff (1997)認為詞語的意義隨它們與其他詞語關係的頻率及其出現位置,會有所改變;但Courtial (1998)則是認為詞語共現分析中的詞語,並非做為用來代表某種意義的語言單位,而僅僅是文本間的連結指標。

本研究列舉幾個應用文字資訊為基礎的書目計量方法在圖書資訊學研究主題分析的研究:Courtial(1994)以詞語共現分析對這個領域進行探討,結果發現這個領域包含傳統圖書館學、資訊檢索、科學計量學、資訊計量學、專利分析以及最近興起的網路計量學。Glänzel及其同事整合全文為基礎的結構分析(full-text based structural analysis)和傳統的書目計量方法探討書目計量學及其次領域(Glenisson, Glänzel, and Persson, 2005; Glenisson, Glänzel, Janssens, and De Moor, 2005; Janssens, Glenisson, Glänzel, and De Moor, 2005)。

本研究所使用的分析技術包括:文本抽取(text extraction)、前處理(preprocessing)、多維度尺度(multidimensional scaling)以及Ward’s階層叢集(Ward's hierarchical clustering),並且利用向量空間模式(vector space model) (Salton & McGill, 1986)和隱藏語意分析(latent semantic analysis) (Deerwester et al., 1990)測量文件間相似程度的估計值。以論文彼此間的相似程度,將論文映射成二維圖形的結果如下,此圖形並且標示出每篇論文的期刊:

Scientometrics的論文主要分布在標示為1與2的兩個橢圓附近,橢圓1的主題為書目計量,橢圓2則為專利分析。橢圓5上的論文主要來自Information Processing and Management和Journal of the American Society for Information Science and Technology,其主題為資訊檢索。橢圓12的論文傾向於社會方面的主題,除了Journal of the American Society for Information Science and Technology以外,還包括Journal of Information Science和Journal of Documentation。正中央標示為14的橢圓,其主題與網路相關,所有的期刊均有這個主題的相關論文。

以Ward's叢集分析將所有論文進行歸類,最佳的結果共分為六個叢集。本研究並且根據每個叢集上論文的重要詞語以及中心的論文給予叢集的名稱。在二維圖形上標示各種叢集的結果如下:

六個叢集可以圖形上的斜線分為兩群,斜線以下為Bibliometrics1、Bibliometrics2和Patent Analysis,以上則為Webometrics、Information Retrieval和Social Aspects,但六個叢集中以Patent Analysis和其他叢集較分離。書目計量相關論文分為兩個叢集:Bibliometrics1和Bibliometrics2。Bibliometrics1與科學裡的合作關係(collaboration in science)、引用分析(citation analyses)和國家研究成效(national research performance)等主題相關,Bibliometrics2則主要為方法學和書目計量理論相關的論文。

為了找出各期刊分別著重的主題,除了比較上面的兩個圖形,另外還將叢集和期刊的關係映射成圖形。結果發現Information Processing and Management和Information Retrieval幾乎重疊,這個現象表示Information Processing and Management上的論文和Information Retrieval十分相關。Social Aspects和Webometrics相當靠近Journal of the American Society for Information Science and Technology、Journal of Information Science和Journal of Documentation三種期刊。事實上,除了Scientometrics以外,Social Aspects和其他期刊的距離大約相等。最後,Scientometrics則是落在Bibliometrics1、Bibliometrics2和Patent Analysis構成的三角形中心。

The optimum solution for clustering LIS is found for six clusters. The combination of different mapping techniques, applied to the full text of scientific publications, results in a characteristic tripod pattern. Besides two clusters in bibliometrics, one cluster in information retrieval and one containing general issues, webometrics and patent studies are identified as small but emerging clusters within LIS.

The method was developed by Callon, Courtial, Turner, and Brain (1983), more than two decades ago, for purposes of evaluating research. The methodological foundation of co-word analysis is the idea that the co-occurrence of words describes the contents of documents. By measuring the relative intensity of these co-occurrences, simplified representations of a field’s concept networks can be illustrated (Callon, Courtial, & Laville, 1991).

Van Raan and Tijssen (1993) have discussed the ‘‘epistemological’’ potentials of bibliometric mapping based on co-word analysis.

Leydesdorff (1997) analysed 18 full-text articles and sectional differences therein, and considered that the subsumption of similar words under keywords assumes stability in the meanings, but that words can change both in terms of frequencies of relations with other words, and in terms of positional meaning from one text to another. This fluidity was expected to destabilize representations of developments of the sciences on the basis of co-occurrences and co-absences of words.

However, Courtial (1998) replied that words, in co-word analysis, are not used as linguistic items to mean something, but as indicators of links between texts.

Many researchers have used this methodology to investigate concept networks in different fields, among others, de Looze and Lemarie (1997) in plant biology, Bhattacharya and Basu (1998) in condensed matter physics, Peters and van Raan (1993) in chemical engineering, Ding, Chowdhury, and Foo (2001) in information retrieval (IR) and Onyancha and Ocholla (2005) in medicine.

The reason why the emphasis has shifted from co-citation analysis to co-word techniques is twofold. The first reason is a practical one; co-word analysis allows application to non-citation indexes as well. The second relates to methodology; co-citation analysis complicates the combined analysis of field dynamics and trends in the actors’ activity (Noyons & van Raan, 1998).

Bonnevie (2003) has used primary bibliometric indicators to analyse the Journal of Information Science, while He and Spink (2002) compared the distribution of foreign authors in Journal of Documentation and Journal of the American Society for Information Science and Technology.

Bibliometric trends of the journal Scientometrics, another important journal of the field, have been examined by Schubert and Maczelka (1993), Wouters and Leydesdorff (1994), Schoepflin and Glänzel (2001), Schubert (2002), Dutt, Garg, and Bali (2003).

The main journals of the field were also analysed in terms of journal co-citation and keyword analyses (Marshakova, 2003; Marshakova-Shaikevich, 2005).

The co-citation network of highly cited authors active in the field of IR was studied by Ding, Chowdhury, and Foo (1999).

Finally, Persson (2000, 2001) analysed author co-citation networks on basis of documents published in the journal Scientometrics.

Courtial (1994) has studied the dynamics of the field by analysing the co-occurrence of words in titles and abstracts. Courtial described scientometrics as a hybrid field consisting of invisible colleges, conditioned by demands on the part of scientific research and end-users. Although this situation might have somewhat changed during the last decade, this conclusion illustrates how heterogeneous the much broader field of LIS – comprising subdisciplines such as traditional library science, IR, scientometrics, informetrics, patent analyses and most recently the emerging specialty of webometrics – nowadays is.

In recent papers, Glenisson, Gla¨nzel, and Persson (2005), Glenisson, Gla¨nzel, Janssens, and De Moor (2005), Janssens, Glenisson, Gla¨nzel, and De Moor (2005) have applied full-text based structural analysis in combination with ‘‘traditional’’ bibliometric methods to bibliometrics and its subdisciplines.

The full-text analysis consisted of text extraction, preprocessing, multidimensional scaling, and Ward’s hierarchical clustering (Jain & Dubes, 1988).

In short, the textual information is encoded in the vector space model using the TF-IDF weighting scheme, and similarities are calculated as the cosine of the angle between the vector representations of two items (see Salton & McGill, 1986; Baeza-Yates & Ribeiro-Neto, 1999).

The term-by-document matrix A is again transformed into a latent semantic index Ak (LSI), an approximation of A, but with rank k much lower than the term or document dimension of A. A latent semantic analysis is advisable, especially when dealing with full-text documents in which a lot of noise is observed.

One advantage of LSI is the fact that synonyms or different term combinations describing the same concept are mapped on the same factor, based on the common context in which they generally appear (Berry et al., 1995; Deerwester et al., 1990).

A lot of time was devoted to the detection of phrases. Since the best phrase candidates can be found in noun phrases, the programs LT POS and LT CHUNK4 have first been applied to detect all noun phrases in the complete document collection.

MDS represents all high-dimensional points (documents) in a two- or three-dimensional space in a way that the pairwise distances between points approximate the original high-dimensional distances as precisely as possible (see Mardia, Kent, & Bibby, 1979).

The agglomerative hierarchical cluster algorithm using Ward’s method (see Jain & Dubes, 1988) was chosen to subdivide the documents into clusters. ... One of the disadvantages of agglomerative hierarchical clustering is that wrong choices (merges) that are made by the algorithm in an early stage can never be repaired (Kaufman & Rousseeuw, 1990). What we sometimes observe when using hierarchical clustering is the forming of one very big cluster and a few small very specific clusters.

The journal Scientometrics can be largely separated from the other journals (which is also confirmed by the different term profile in the table of Appendix 1), and exhibits two different foci (best visible in Fig. 4).



The first ‘‘leg’’, indicated by the ellipse with number 1 and by and large containing the first focus of the journal Scientometrics, clearly contains papers in bibliometrics. The 10 best TF-IDF terms for ‘‘leg’’ #1 are: citat, cite, impact factor, self citat, co citat, scienc citat index, citat rate, isi, countri and bibliometr.

The second ‘‘leg of Scientometrics’’, indicated by number 2, is characterised by the best terms patent, industri, biotechnolog, inventor, invent, compani, firm, thin film, brazilian and citat. The JIS paper (#3) embedded in this patent ‘‘leg’’ might be considered an outlier for that journal, but it was put in the right place since it is concerned with ‘‘The many applications of patent analysis’’ (Appendix 2: Breitzman & Mogee, 2002).

An important focus of LIS is indicated by ellipse #5 and can be profiled as ‘‘Information Retrieval’’ (IR) when looking at the highest scoring terms: queri, search engin, web, node, music, imag, xml, vector and weight.

The fourth distinguishable subpart of LIS (#12) is about digit, internet, servic, seek, behaviour, health, knowledg manag, organiz, social and respond; so encompassing the more social aspects.

The remaining large subpart is somewhat the central part (#14). It consists of papers leading to a mean profile containing the terms web, web site, classif, domain, web page, languag, scientist, region, catalog, and web impact factor.

The term network of Cluster 1 allowed the conclusion that the papers belonging to this cluster are concerned with domain studies, studies of collaboration in science, citation analyses, national research performance and similar issues.



The medoid is a paper by Persson et al. on ‘‘Inflationary bibliometric values: The role of scientific collaboration and the need for relative indicators in evaluative studies’’ (Appendix 2: Persson et al., 2004). This is a methodological paper with strong implications for research evaluation, combining research collaboration with citation analysis and construction of national science indicators.

The smaller bibliometrics cluster (Cluster 3: manually labelled as ‘‘Bibliometrics2’’) is of more methodological/theoretical nature.




The medoid is the state-of-the-art report ‘‘Journal impact measures in bibliometric research’’ (Appendix 2: Gla¨nzel & Moed, 2002).

The term networks for the two bibliometrics clusters just described contain a few overlapping terms (bibliometr, chemistri, citat, citat rate, cite, cluster, countri, impact factor, isi, physic, rank and scienc citat index). The MDS plot of Fig. 15 confirms that there is no clear border between Bibliometrics1 and Bibliometrics2, but that there is a gradual transition.

The almost tiny Cluster 2 (19 papers, Fig. 10) represents patent analysis.


A paper on ‘‘Methods for using patents in cross-country comparisons’’ forms the medoid of this cluster (Appendix 2: Archambault, 2002).

Cluster 4, with 282 papers, is the largest one. We have labelled it ‘‘Information Retrieval’’.


The medoid paper is entitled ‘‘Querying and ranking XML documents’’ (Appendix 2: Schlieder & Meuss, 2002).

Cluster 5, with 62 papers, belongs to the small clusters. Both terms and papers close to the medoid characterise this cluster as ‘‘Webometrics’’.


The medoid paper is entitled ‘‘Motivations for academic web site interlinking: evidence for the Web as a novel source of information on informal scholarly communication’’ (Appendix 2: Wilkinson et al., 2003).

Cluster 6 (213 papers) proved to be the most heterogeneous cluster. We have labelled it ‘‘Social’’, however, we could also have called it ‘‘General & miscellaneous issues’’.



‘‘Approaches to user-based studies in information seeking and retrieval: a Sheffield perspective’’ is the title of the medoid paper (Appendix 2: Beaulieu, 2003).


The Patent cluster can be clearly separated from the rest of LIS. The subspace under the line is almost completely occupied by Bilbiometrics1, Bibliometrics2 and Patent.




IR and IPM almost collide in this 2D projection (Fig. 20). This means that Cluster 4 (‘‘IR’’) is very close to the scope of this journal.

The ‘‘Social’’ cluster with general and miscellaneous topics as well as ‘‘Webometrics’’ are close to JIS, JDoc and JASIST, too. Moreover, the ‘‘Social’’ cluster is almost equidistant to all traditional journals in Information Science.

The remaining three clusters, namely Bibliometrics1, Bibliometrics2 and Patent, form a triangle in the centre of which the journal Scientometrics is located. The relatively large distances among these clusters and between each cluster and the journal, strongly indicate that a quite large spectrum of bibliometric, technometric and informetric research using different vocabularies is covered by the journal Scientometrics. This observation is in line with the findings by Schoepflin and Gla¨nzel (2001) that scientometrics consists of several subdisciplines such as informetric theory, empirical studies, indicator engineering, methodological studies, sociological approach and science policy; and that case studies and methodology became dominant by the late 1990s. At the end of the 1990s, also technology related studies based on patent statistics became an emerging subdiscipline of the field.

We have found two clusters in bibliometrics, of which a big one in applied bibliometrics/research evaluation and a smaller one in methodological/theoretical issues; also we have found two large clusters in information retrieval and general and miscellaneous issues and, finally, two small emerging clusters in webometrics and patent and technology studies. Within the IR cluster, we have found a small subcluster on music retrieval, which might be a temporary phenomenon since the journal JASIST has published a special issue on this topic.

According to the expectation, IR, General issues and Webometrics were represented by four of the five journals, namely JIS, IPM, JASIST and JDoc, while the two bibliometrics and the patent clusters were the domain of the journal Scientometrics.

2014年1月24日 星期五

Lu, K., & Wolfram, D. (2010). Geographic characteristics of the growth of informetrics literature 1987–2008. Journal of Informetrics, 4(4), 591-601.

Lu, K., & Wolfram, D. (2010). Geographic characteristics of the growth of informetrics literature 1987–2008. Journal of Informetrics, 4(4), 591-601.

本研究探討在地理上的生產力遷移(shifts in productivity)是否發生在書目計量學(bibliometrics)、資訊計量學(informetrics)和科學計量學(scientometrics)等計量學(metrics)領域,也就是歐洲的貢獻明顯地成長,並且北美的貢獻相對來說有減少的情形。

有關計量學的研究,Hood and Wilson (2001)和Stock and Weber(2006)等研究都分析了這個領域的文獻成長情形。Hood and Wilson (2001)回顧了計量學領域的發展,並且比較bibliometrics、scientometrics和informetrics的相關文獻,發現bibliometrics還是在相關領域上使用最廣泛的詞語。Stock and Weber(2006)從觀察中確認這個領域從1980年後便持續地成長。Wolfram (2008)則發現在計量學領域中,北美的文獻有明顯地減少而歐洲則是急遽地增加的情形。

本研究利用bibliometrics、scientometrics、informetrics、cybermetrics、webometrics、citation analysis、link analysis和citation indexes做為檢索的問句,同時再加上Scientometrics和Journal of Informetrics兩種期刊的論文,從Web of Science資料庫中進行檢索。結果共檢索出4404筆論文資料。

在這些論文資料裡,共有75個國家。以地區來區分,歐洲在每個時段上具有最大的貢獻,不論是數量或所占比率都有成長,亞洲所佔的相對比例在22年間有很大的成長,北美雖然在數量上有成長,可是相對的比例呈現緩慢的下降。每個地區的作者會偏好在本身地區的期刊上發表,舉例而言,歐洲作者發表論文的前五個期刊中有四個歐洲期刊,南美也有類似的情形,但是亞洲的情形例外,前五個期刊中有四個是歐洲期刊,另一個則是北美的期刊。

自1990年代中期後,國家間的合作情形增加許多,之前國際合作的論文每年為1到19篇,2008年已大幅增加為96篇。美國是國際合作佔最多的國家,但以地區來說,歐洲平均每個國家的國際合作數為5.78篇論文,多於世界其他部分的4.47篇論文。

此外,歐洲則有許多具有國際合作經驗的機構,共有16所研究機構有國際合作經驗,北美則有8所,亞洲有1所。機構間的合作來說,在1987年每篇論文平均只有1.1個機構,但在2007年則增加為1.96。

本研究且利用MDS、VOSviewer和Pajek將這些論文上的國家與機構之間的合作關係,呈現為圖形。

In metrics research, the United States also has the highest share of international collaborations, but the average number of collaborations with European countries was higher (5.78 publications per country) than for other parts of the world (4.47 publications per country).


This investigation was prompted by interest in whether shifts in productivity based on geography are observed in the bibliometrics, informetrics and scientometrics areas.

One of the authors conducted a pilot study to determine whether there have been clear declines in North American contributions to the metrics literature base (Wolfram, 2008). The author found that there was indeed a notable relative decline in North American contributions and a sharp increase in European contributions.

Hood and Wilson (2001) examined the growth of literature of the metrics area. They provided an historical treatment of the development of these areas that included earlier studies of the field. In their research, literature associated with bibliometrics, informetrics and scientometrics was compared for the period 1968–2000. The authors noted that bibliometrics was still the most widely used term for metrics research.

More recently, Stock and Weber(2006) conducted a Web of Science search for records specifically including metrics terms and allied areas. They observed contributions had grown substantially since 1980.

Search parameters included the Boolean ORed result of bibliometrics, scientometrics, informetrics, cybermetrics and webometrics, in truncated form (e.g., webometri*), along with the phrases “citation analysis”, “link analysis” and “citation indexes”. ... These search results were ORed with the two primary journals that publish metrics research that are indexed by WoS, namely Scientometrics and the Journal of Informetrics.

A pair-wise comparison of all collaborations at the national and institutional levels was then conducted from which a cooccurrence matrix could be compiled.

Multidimensional scaling (MDS) analysis was used to visualize the relationships among countries. Because the data represent a type of similarity measure represented as a symmetric matrix, SPSS PROXSCAL was used to construct the map, as recommended by Leydesdorff and Vaughan (2006).

The recently developed visualization tool VOSviewer (van Eck &Waltman, 2010) was also used to provide an alternate visualization of the relationship outcomes. Like MDS, VOSviewer (http://www.vosviewer.com/) relies on a distance-based approach to mapping informetric relationships. Instead of using more traditional similarity measures to produce a normalized outcome for co-occurrences as used in MDS, relationships are based on association strengths, so the algorithm is somewhat different than PROXSCAL and, therefore, can produce different outcomes. Details of the comparison of different measures can be found in van Eck and Waltman (2009).

The network visualization software Pajek (http://vlado.fmf.uni-lj.si/pub/networks/pajek/) was used as well. Unlike the distance-based mapping of PROXSCAL and VOSviewer, Pajek produces directed or undirected network maps, with the strength of the relationships represented by the thickness of connecting lines between vertices on the map. Distances are used more for clarification, but proximities do not necessarily indicate a stronger relationship.

The search parameters retrieved 4404 publications.

Europe shows the highest levels of contribution, both in absolute and relative terms over the time period of the study. Growth patterns in absolute terms are nonlinear based on trend line analysis in MS Excel; however, the R-squared goodness-of-fit values for even the best fitting models (higher order polynomials) were never more than 0.95, indicating a less than desirable fit.

Relative contributions based on geographic divisions have been largely stable. An exception is Asia, which had an increasing relative contribution over the 22-year time frame of the study. Although North American contributions have continued to increase in absolute numbers, the relative contribution shows a slow average decline over time.

The top five journals listed for each continent demonstrated a regional preference for publication outlets from that region. So, for example, four of the top five journals for European publications were published in Europe, and four of the top five journal outlets for South America were South American. The exception to this was Asia. Four of the top five journals for Asian publications were European and one was North American. This outcome may be a reflection of the data extraction method, the indexing practices of WoS, or a preference during the study time frame for Asian scholars to publish in Western journals.

Seventy-five countries were represented in the record set.

The number of metrics papers published annually that represent collaborations between two or more countries has increased greatly since the mid-1990s. Prior to this time, the number of internationally collaborative papers ranged from 1 to 19 papers annually. Over the last decade this number has increased to a high of 96 papers in 2008.

In metrics research, the United States also has the highest share of international collaborations, but the average number of collaborations with European countries was higher (5.78 publications per country) than for other parts of the world (4.47 publications per country).

Sixteen of the institutions on the list are European, eight are North American, and one is Asian. The United States has the largest number of institutions represented (five), followed by Belgium (four – note: one institution merged with another institution to form a new entity).

There has been steady growth in inter-institutional collaboration over the 22 years. The mean number of collaborative institutional partners within the dataset has steadily increased from a low mean of 1.1 institutions per publication in 1987 to a high of 1.96 institutions per publication in 2007.

Europe, and in particular Western Europe, clearly dominates in the production of metrics literature. The United States continues to be the largest singular contributor, but this appears to be changing. North American contributions as a whole continue to increase, but represent a smaller percentage of worldwide production. European contributions have grown tremendously, especially during the last 5 years of the study period. This same period is marked by impressive growth from Asia.

It should be noted that WoS increased its coverage in 2008 by including more regional journals. These inclusions possibly could contribute to the increase in Asian contributions, but the observed growth for Asia was already evident prior to any such additions.

International and inter-institutional collaborations do not necessarily reveal strong geographic affinities, although the multiple institutional affiliations by a number of scholars associated with Flemish institutions do contribute to the strengthening of regional ties. Undoubtedly, the growth of the Internet and increasing availability of other telecommunication technologies have made these collaborations less distance dependent.

2014年1月23日 星期四

Egghe, L. (2012). Five years “Journal of Informetrics”. Journal of Informetrics, 6(3), 422-426.

Egghe, L. (2012). Five years “Journal of Informetrics”. Journal of Informetrics, 6(3), 422-426.

Journal of Informetrics (JOI)上發表的論文經過審查而具有良好模型與資料集合,主旨與資訊科學的基本量化方面有關的論文,範疇包含書目計量(bibliometrics)、科學計量(scientometrics)、網路計量(webometrics、cybermetrics)等廣義的資訊計量研究。最初的5卷裡,包含"letters to the editor",共計239篇論文,544位作者,每篇論文的平均作者數為2.276,其分布的主題如下表:
論文最多的兩個主題,引用分析(Citation analysis)和h指標(h-Type indices),便佔有一半以上的論文。

JOI最相近的期刊是Scientometrics,兩者都有相當多關於引用分析的論文。不同在於JOI有較多關於模型-理論(model-theoretic)以及網路議題(networking issue)方面的論文;Scientometrics則包含許多案例研究(case studies)的論文。

以一份相當年輕的期刊來說,JOI具有相當的影響力。利用Journal Citation Reports (JCR)提供的影響係數(impact factor)來觀察,JOI在2009與2010年都有很高的影響係數,在主題是圖書資訊學(Information and Library Science)的所有期刊中列於前三、四名。

將JOI最常引用的30種期刊以及最常被引用的30種期刊彼此間的引用關係,利用Gephi網絡分析軟體的ForceAtlas 2布局演算法繪製成網絡圖,使得彼此有引用關係的期刊在圖形上形成叢集。在JOI上方的期刊叢集是它經常引用的多元領域(multidisciplinary)期刊,右方的叢集是物理學相關期刊,下方是數學和資訊科學相關期刊,左方的期刊則以Research Policy和Research Evaluation為主。


JOI publishes refereed articles on fundamental quantitative aspects of information science. Accepted articles should contain good models and/or fundamental data sets.

The Journal covers the broad field of informetrics, including the field bibliometrics, scientometrics, webometrics and cybermetrics.

Specific topics can be described (non-exhaustively) as follows: informetric laws, modelling generalised bibliographies, aspects of inequality or concentration and diffusion, citation theory, linking theory (in general: social networks, including the Internet, citation and collaboration networks), downloads, indicators, evaluation techniques for scientific output (literature, scientists), evaluation techniques for documentary systems (information retrieval) including ranking theory, digital and classical library management, visualisation and mapping of science (individuals, fields, institutes, topics).

Over the five volumes there are 239 published articles with an average number of authors per article equal to 2.276. Here all articles and “letters to the editor” are taken into account. In total there are 544 (co-)authors.

The topics of these papers are described in Table 3.


We linked only one (main) topic to each paper. So a paper does not fall in two or more categories. Of course, sometimes, a paper could be linked to more than one category (e.g. papers on h-type indices can also deal with citation analysis) but we feel that Table 3 gives a rather accurate topical view.

It is clear that a bit more than 50% of the papers deal with citation analysis and/or h-type indices.

Finally the journal Scientometrics is closest to JOI in that it also publishes mainly on citation analysis. A difference between Scientometrics and JOI is that JOI publishes more model-theoretic papers and papers on networking issues while Scientometrics publishes more case studies.

These are very high numbers, certainly for a young journal. They are the highest for any “metrics” journal in the Journal Citation Reports (JCR) Subject Category Listing “Information and Library Science” (LIS).

The slight decrease of the IF from 2009 to 2010 is also noted by other LIS metrics journals, probably due to the fact that papers on the h-index (and related indices) have reached their maximum in terms of citations.

In fact JOI increased its relative impact from 2009 to 2010 since the value IF = 3.379 ranked JOI fourth out of 66 journals in the LIS Subject Category Listing while the value IF = 3.119 ranked JOI third out of 76 journals in the LIS Subject Category Listing.




The map is produced in Gephi, using the ForceAtlas 2 layout algorithm (http://webatlas.fr/tempshare/ForceAtlas2 Paper.pdf). It positions all journals with respect to the journals that they cite. Journals are mapped out using all of the citation links between them: related journals cluster together as they cite one another more frequently. Journals are selected for the map by virtue of being among the top 30 journals that JOI cites or the top 30 journals citing JOI in the mentioned period.

In this map, JOI sits in the centre with a core of informetrics journals, with branches leading to different clusters of research.

At the top are the large multidisciplinary journals (mainly cited by JOI rather than citing it); at the right, a group of physics journals with “Physics World” acting as a bridge from the informetrics journals. At the bottom of the map are mathematical and information science journals and at the left are “Research Policy” and “Research Evaluation”.

2014年1月18日 星期六

Hou, H., Kretschmer, H., & Liu, Z. (2008). The structure of scientific collaboration networks in Scientometrics. Scientometrics, 75(2), 189-202.

Hou, H., Kretschmer, H., & Liu, Z. (2008). The structure of scientific collaboration networks in Scientometrics. Scientometrics, 75(2), 189-202.

本研究利用社會網絡分析、共現分析(co-occurrence analysis)、叢集分析和詞語的頻率分析等多種分析技術,從Scientometrics期刊1978到2004年發表的1927筆論文資料,探討科學家合作網絡的結構特性、整個網絡上的合作領域以及個別的合作網絡、合作網絡上的合作中心(collaborative  center)。

過去的研究裡,Schubert (2002) 和 Dutt, Garg, & Bali (2003)都是針對國家間合作的巨觀層次。Kretschmer (2004) 認為巨觀和中觀(meso)層次的分析無法足夠地反映個人之間的合作趨勢,因此呼籲應在微觀層次的分析投注更多努力。

1927筆論文資料裡,單一作者的論文共有1052筆,所以仍稍占多數。作者數大於3的論文僅占非單一作者論文的13.71% (120/875),顯然研究Scientometrics的團隊規模都不大。發表3篇論文以及以上的高生產作者共計234人,其中有69.66%的作者曾發表與其他作者合作的論文。將這些作者間的合作關係表現成網絡,並利用Bibexcel對這個網絡上的節點進行叢集分析,共發現22個叢集。前兩個較大的叢集分別有15與14個科學家。網絡上最大的相連成分上共有15個叢集,共有合作經驗的高生產作者中的96位,占58.90%。合作網絡共有401條連結線,網絡密度為0.03,顯示Scientometrics領域的合作很鬆散。

對每一個節點計算它們的三種中心性,結果發現中心性和對應作者的生產力之間有很顯著的正相關,表示高生產力的作者同時也活躍在Scientometrics領域的合作網絡上。其中Glänzel的程度中心性最高,總共和其他18位作者有合作關係。

以詞語的頻率分析每個叢集的主題,最大的兩個叢集有類似的主題,但使用的研究方法略有不同。此外,研究主題為科學合作的四個叢集間幾乎沒有連結,同樣的情形也發生在研究科學與技術之間關係的四個叢集。

The structure of scientific collaboration networks in scientometrics is investigated at the level of individuals by using bibliographic data of all papers published in the international journal Scientometrics retrieved from the Science Citation Index (SCI) of the years 1978–2004.

Combined analysis of social network analysis (SNA), co-occurrence analysis, cluster analysis and frequency analysis of words is explored to reveal: (1) The microstructure of the collaboration network on scientists’ aspects of scientometrics; (2) The major collaborative fields of the whole network and of different collaborative sub-networks; (3) The collaborative center of the collaboration network in scientometrics.

Schubert [8] and Dutt etc. [9] presented international collaboration characteristics in the scientometrics community itself, focusing on country aspects at macro level.

Kretschmer [6] appealed to devote more efforts to investigations at micro level in the future because the knowledge at meso and macro level does not yet adequately reflect the trends in cooperation between individuals.

The study is based on bibliographic data retrieved from the Web of Science. The data contains all types of documents published in Scientometrics during 1978 to 2004.

In this study we have adapted an integrated procedure of social network analysis (SNA), co-occurrence analysis, cluster analysis and frequency analysis of title words.

Bibexcel is designed as a tool for manipulating bibliographic data, which is a free online-software published by Persson. In the present study, Bibexcel is used to do cooccurrence analysis and cluster analysis.

Following the methods of Otte & Rousseau [11], White [13] and Kretschmer & Aguillo [12], SNA was applied to display the microstructure of collaboration networks in scientometrics with Pajek.

Moreover, we used frequency analysis of title words to display the main collaborative field of different sub-networks. The software for frequency analysis is demo version of Wordsmith Tools published by Oxford University Press and available online.

There were 1927 documents published in Scientometrics during 1978 to 2004 (see Table 1).



From Table 1, we found that the pattern of co-authorship was still dominated by single-authored papers as the conclusion drawn by Dutt etc. [9].

While the number of multi-authored papers (the number of co-authors is more than 3) accounts for 13.71% only, which indicates that team size in scientometrics is not large.

In order to show the main structure of the network, each author must published 3 papers or more to be included in this integrated analysis. This threshold resulted in a total of 234 prolific authors publishing 3 or more papers during 1978 to 2004, among them there are 163 authors published co-authorship papers, accounting for 69.66% of the prolific authors.



Based on cluster analysis embedded in Bibexcel, we gained 22 clusters circled by solid lines (see Figure 1). We identified these clusters as sub-networks in the field of scientometrics.

The largest subnetwork is number 1 that has 15 collaborators, and the second largest one is number 2, which has 14 collaborators, and so on.

We noticed that there was totally 15 subnetworks connected with each other composing the largest central component, which had 96 numbers accounting for 58.90% of the prolific authors published co-authorship papers.

Density is an indicator for the general level of connectedness of the graph. ... In the present study, there are totally 401 links in the network, so the density of the network is 0.03, which indicates that the collaborative network in the field of scientometrics is very loose.

So an author who has high degree centrality must has collaborated with many other authors, which means the author is a central collaborator of the whole network. In the present study, Glänzel who has 18 co-workers is the central author of the whole network.

We found a positive and significant correlation between output of authors and the centrality measures (r=0.648, 0.437, 0.338 respectively at the 0.01 level, see Table 4) after investigating the correlations between output and the three centralities of the 125 authors in the 22 sub-networks, which indicated that most of the prolific authors are also active in collaboration network in the field of scientometrics.

We have also presented the main collaborative field of different sub-networks in scientometrics and found that the two biggest sub-networks have the similar collaborative topic with slightly methodological difference. In addition, we found an interesting phenomenon that four sub-networks dealing with scientific collaboration didn't collaborate with each other except sub-network 3 and 12. Moreover, four subnetworks studying technology and science never collaborated with each other at all.

2014年1月17日 星期五

Chen, Y. W., Fang, S., & Börner, K. (2011). Mapping the development of scientometrics: 2002–2008. Journal of Library Science in China, 3, 131-146.

Chen, Y. W., Fang, S., & Börner, K. (2011). Mapping the development of scientometrics: 2002–2008. Journal of Library Science in China, 3, 131-146.

本研究利用社會網絡分析與科學地圖映射(science mapping)分析Scientometrics期刊2002到2008年發表的816筆論文。

針對Scientometrics期刊進行書目計量分析的相關研究,包括:Schoepflin and Glanzel (2001)將Scientometrics在1980、1989和1997年發表的論文分別進行歸類,發現科學政策(science policy)和科學社會學(the sociology of science)的比率在下降。Peritz and Bar-Ilan (2002)發現Research Policy和Social Studies of Science分別是1990和2000年Scientometrics論文引用的期刊次數最多的第三名和第四名。Chen, McCain, White, and Lin (2002)分析出1981到2001年間Scientometrics期刊的引用及共被引模式。Hou, Kretschmer, and Liu (2008)對2002到2004年間Scientometrics期刊上的作者合作網絡的結構特性進行分析。Dutt, Garg, and Bali (2003) 則分析Scientometrics期刊1978到2001年間論文資料上的國家、機構在主題上的分布。

本研究在816筆論文資料上共計發現57個國家,具有較大生產力的國家主要是歐洲國家。前十個較大生產力的國家裡,美國、比利時、西班牙、中國和德國都有相當快速的年增率,但印度的年增率是負的。生產力較大的國家的被引用次數也比較高。

為了國家間的研究合作情形,本研究提出相對合作強度(relative collaborative intensity, RCI),這個測量方式整合了合作的國家數和合作的次數兩種指標,其公式如(3)所示:

假設(RCI)i是第i個國家的相對合作強度,其中CCiCTi分別是這個國家合作的國家數和與其他國家合作的次數。在本研究裡,比利時是相對合作強度最高的國家,英國、荷蘭與美國則分居2到4名。

接下來將國家間的合作關係表現成網絡圖,圖形上最大的相連成分(connected component)共有37個國家。在這個相連成分上,比利時、英國和匈牙利之間都有很強的連結。

以機構來看,比利時的Katholieke Univ Leuven、匈牙利的Hungarian Academy Science和荷蘭的 Leiden Univ發表的論文數和被引用次數最多。

進一步分析前十個主要機構的被引用次數最多的前十筆論文資料,發現引用它們的論文主要來自圖書資訊學、電腦科學、資訊系統和跨領域應用(interdisciplinary applications)等領域。但台北醫學大學的一篇論文則被許多生物醫學領域的論文引用。

就論文的合作作者數來分析,本研究發現單一作者的論文有271篇,多位作者的論文有545篇,每篇論文平均有2.29位作者。Dutt, Garg, and Bali (2003) 研究1978-2001年間的論文,單一作者的論文占半數一上,平均合作作者數則為1.73。兩相比較之下,由多位作者的論文數和平均作者數增加的結果,能夠顯示Scientometrics期刊上的合作情形增多。

從引用的文獻分析Scientometrics的主題包括科學與技術的關係(the relationship between science and technology)、個人科學研究產出的量化指標(indexes to quantify an individual's scientific research output)、作者的合作現象(author collaborations)、共被引網絡(co-citation networks)、科學引響力以及國家富強(the scientific impact and wealth of nations)。

The purpose of this article is to use the methods of Social Network Analysis and Science Mapping to make an analysis on the 816 papers published in the international journal Scientometrics from 2002 to 2008.

The major tools used in this paper were TDA, NWB and Excel.

Börner (2006) discussed the mapping research on structure and evolution of science.

Börner, Penumarthy, Meiss, and Ke (2006) mapped the diffusion of information among 500 major U.S. research institutions based on the 20-year publication data set published in the Proceedings of the National Academy of Sciences (PNAS) in the years 1982-2001.

Boyack, Börner, and Klavans (2009) mapped the structure and evolution of chemistry research over a 30 year time frame based on Science (SCIE) and Social Science (SSCI).

Leydesdorff and Rafols (2009) made a global map of science based on the ISI subject categories.

For instance, Schoepflin and Glanzel (2001) found a decrease in the percentages of both the articles related to science policy and to the sociology of science by classifying the articles published in Scientometrics in the years 1980, 1989 and 1997.

Peritz and Bar-Ilan (2002) analyzed the papers published in Scientometrics in 1990 and 2000 and found that Research Policy and Social Studies of Science are the third and fourth most frequently referenced journals in articles published in Scientometrics.

Chen, McCain, White, and Lin (2002) drew upon citation and co-citation patterns derived from articles published in the journal Scientometrics (1981-2001).

Hou, Kretschmer, and Liu (2008) analyzed the structure of scientific collaboration networks in scientometrics at micro level (individuals) by using bibliographic data of all papers published in Scientometrics of the years 2002-2004.

Dutt, Garg, and Bali (2003) made an analysis of papers published by Scientometrics during 1978 to 2001 by scientometrics assessment on countries and themes distribution, comparison of institutions and co-authors.

The analysis of 816 papers published in Scientometrics during 2002-2008 showed that they were contributed by 57 countries (or regions). ... Most of the 57 countries were from Europe. Other major countries (or regions) had a larger number of papers were USA and Canada in North America, China, India, Taiwan, South Korea and Japan in Asia, Brasil in Latin America, and Australia.

Fig. 2 had clearly illustrated the average annual growth rates of TOP10 countries, from which we can conclude that USA, Belgium, Spain, China and Germany had higher growth rates and India had a negative growth rate.

From Fig. 3 we could see that all the TOP 10 countries had a higher number of times cited. It indicated that the papers contributed by those countries were of higher quality and had more impact.



In order to visualize the relative intensity of collaboration, this article introduced the concept of Relative Collaboration Intensity (RCI) indicator. The average number of collaboration countries (CC), average collaboration times (CT) and Relative Collaboration Intensity (RCI) of the 10 countries were given in formula (1), (2) and (3):



We found that Belgium had the highest relative collaboration intensity, and England, Netherlands and USA ranked 2, 3 and 4.

In order to make a clear vision about the collaborations among all the countries/regions (57), the country collaboration network had been made with the method of SNA by NWB. ... The largest connected component in the network had 37 nodes, and there is another small component with 2 nodes.

Fig. 4 showed the largest component with 37 countries, which depicted that Belgium, England and Hungary had formed an strong connection. The largest connection lied between Belgium and Hungary, and the collaboration times were 27. Fig. 4 also showed that although the USA had the largest number of papers, the collaboration activity was weaker than Belgium, Hungary, England and Finland. USA had paid much more attentions to collaborate with Canada, England and Australia. Netherlands had collaborated with many countries, however, the collaboration times were fewer compared to Belgium, England and Finland.

The data showed that Katholieke Univ Leuven (Belgium), Hungarian Academy Science (Hungary) and Leiden Univ (Netherland) ranked from first to third both in number of papers and times cited and all of them had a biggish advantage to others.

We select the Most-Cited paper (that had the highest value of times cited) of each TOP 10 TC/P institutions and get 10 Most-Cited papers finally. By analyzing their citing papers, we found that the citing papers which had cited the Most-Cited paper of each institution distributed mainly in the fields of information science & library science, computer science, information systems and interdisciplinary applications ....

So a conclusion could be made that although an institution did not have many papers or hold the advantage of research activities, it could carry out one or some significant works that had a great impact on the future development of information science & library science. And some research work on scientometrics had also affected the development of some other scientific fields, such as the work of Taipei Med Univ.

Another study carried out by Dutt et al. (2003) in scientometrics showed that the average number of authors per paper was 1.73 during the period of 1978-2001. We studied the average number of authors per paper published in Scientometrics 2002-2008 and found that the value was 2.29, which indicated that collaboration in scientometrics had been growing since 2001.

To analyze the intensity of co-authorship pattern, the whole data (816 papers) had been divided into two groups, which were single authored (271) and multi-authored (545). Compared to the result made by Dutt et al. (2003) that more than half of the papers were single authored, we found that the ratio of papers written by two or more authors had increased rapidly from 2002-2008.

Table 6 listed the TOP 10 authors according to their number of papers. Compared to Fig. 6 we could find that all the TOP 10 authors were appeared in the biggest collaboration cluster. It was interesting to note that the TOP 10 authors collaborated with each other either directly or indirectly.

Most of the TOP 20 cited references had distributed in big co-citation clusters shown in Fig. 7. ... All these four highly cited papers in the biggest cluster were focusing on the relationship between science and technology especially for the effect of science on technology. ... The second largest cluster contained 19 nodes, two of which were ranked in TOP 20. The topics were about indexes to quantify an individual's scientific research output (Hirsch, 2005). The third largest cluster included three nodes listed in TOP 20, whose topics were about author collaborations (Glanzel, 2001; Katz & Martin, 1997; Narin, Stevens, & Whitlow, 1991). There were another two clusters containing two TOP 20 nodes, and one had 10 nodes, whose topics were on co-citation networks (De Solla. Price, 1965;Small, 1973), the other had only two nodes published in Nature and Science individually with the topic of the scientific impact and wealth of nations (King, 2004; May, 1997).

The major topic were social network analysis (Wasserman & Faust, 1994), Matthew effect in science (Merton, 1968), author self-citation (Glanzel, Thijs, & Schlemmer, 2004), country research performance (Moed, 2002), evaluation indicators of publication and citation (Schubert & Braun, 1986) and the calculation of web impact factors (Ingwersen, 1998).