2014年1月26日 星期日

Glenisson, P., Glänzel, W., Janssens, F., & De Moor, B. (2005). Combining full text and bibliometric information in mapping scientific disciplines. Information Processing & Management, 41(6), 1548-1572.

Glenisson, P., Glänzel, W., Janssens, F., & De Moor, B. (2005). Combining full text and bibliometric information in mapping scientific disciplines. Information Processing & Management, 41(6), 1548-1572.

本研究以詞語共現分析(co-word analysis),將Scientometrics期刊2003年發表的論文,歸類為六個叢集。為了瞭解叢集結果的有效性,將這個結果與專家歸類的結果進行比較,同時也利用書目計量指標分析各個叢集。

Braam, Moed, and Van Raan (1991)建議利用詞語分析(word analysis)評估共被引叢集分析的結果,這些詞語利用書目紀錄裡的索引詞(indexing terms)和分類碼(classification codes)作為基礎。

在專家歸類方面,本研究援引Schoepflin and Glänzel(2001)研究的六個類別:數學模型與資訊計量學法則(Mathematical models/informetric laws)、個案研究(Case studies)、科學計量學的進展(Advances in Scientometrics)、指標工程(Indicator engineering)、社會學方法(Sociological approaches)與政策相關議題(Policy relevant issues)。加上近年興起的網路計量學(Webometrics)後,本研究用來進行專家的歸類的類別為:科學計量學的進展(Advances in Scientometrics)、實務論文與個案研究(Empirical papers/case studies)、數學模型(Mathematical models)、政策議題(Political issues)、社會學方法(Sociological approaches)以及資訊計量學與網路計量學(Informetrics/Webometrics)。下表是共詞分析與專家歸類的比較結果:

除了較大的A與E類別分布在多個叢集外,較小的類別大多集中內一個或兩個叢集上。

從六個叢集上的論文在專家以及它們的詞語網絡,可以將這些叢集分別:叢集1是書目計量學指標的方法學研究(methodological indicator research),這些指標用來測量發表活動(publication activity)以及引用影響(citation impact)的研究;叢集2大多為有關於國家和機構方面或科學領域的個案研究(case studies)與實務性的論文(empirical papers);叢集3和叢集1同樣是理論與方法學問題相關的論文,但更著重在資訊計量學法則(informetric laws)、頻率分布(frequency distributions)與多變化統計(multivariate stattistics)等先進方法學技術。叢集4是網路計量學和其他網路相關議題;叢集5是論文數較少的叢集,總共僅包括3篇論文,這些論文與共被引分析以及其他引用統計的分析有關;叢集6則是最大的叢集,包含的面向相當廣泛,從社會學、政策到科技等許多相關主題。從上述的分析,可以了解科學計量學目前主要的兩個面向是基於科學計量學標準技術的方法學研究和擴展傳統書目計量學範圍的實務研究。

接著利用平均參考文獻年齡(mean reference age)和連續出版品所占部分(share of serials)等書目計量學特徵分析上述的叢集結果。如下圖所示
在各個叢集裡,網路計量學具有低參考文獻年齡的特徵,並且連續出版品所占部分為中到高。政策議題相關的論文大部分具有相對低的連續出版品所占部分,但另有一群論文的連續出版品所占部分則明顯地高,因此相關的論文在圖形上分成兩個子叢集。至於科學計量學的先進方法與技術,除了少數例外,大部分的論文的平均參考文獻年齡在5到15年間,連續出版品所佔的部分則是在50%到90%間。實務性研究的論文在連續出版品所占部分的特徵分為兩群,一群的連續出版品所佔部分較低(<=55%),另一群則較高(>=67%),較低的一群與政策相關研究具有類似的特徵。

The question how bibliometric measures can, in turn, be assumed to reflect formal characteristics of documented scientific communication that might supplement results obtained from content-based analyses could also be answered in a positive way. Reference-based citation measures can help to fine-structure clusters determined on basis of co-word analysis.

Braam, Moed, and Van Raan (1991) suggested combining co-citation with word analysis in the context of evaluative bibliometrics to improve efficiency of co-citation clustering. The word analysis by Braam et al. used publication ‘‘word-profiles’’ that were based on indexing terms and classification codes.

Not much later, Noyons and Van Raan (1994) and Zitt and Bassecoulard (1994) demonstrated the appeal of plunging into contents by using keywords from both patent—and scientific literature to characterise the science-technology linkage.

The study by Schoepflin and Glänzel aimed at monitoring and characterising structural changes in the research profile in bibliometrics in the period 1980–1997. The authors created five categories, Mathematical models/informetric laws, Case studies, Advances in Scientometrics, Indicator engineering, Sociological approaches and Policy relevant issues. The term Webometrics did not yet appear in this scheme since at that time it was not yet established as a sub-discipline of scientometrics/informetrics.





We see classes S, M, I and P, admittedly all of smaller size, moderately to well conserved in the text-based cluster structure. Conversely, papers assigned to the larger classes A and E are heavily shifted around the text clusters.

The map in Fig. 6 represents the content structure of cluster 1 with altogether 9 papers. This cluster represents publications that are concerned with methodological questions related to bibliometric indicators. Indicator-related terms such as indicator names and terms relevant in the context of measuring publication activity and citation impact are close to the centre, and strongly interlinked. ... One could consider this cluster representing methodological indicator research.



Cluster 2 is dominated by empirical papers and case studies (cf. Table 3). ... The terms in this map are presented in Fig. 7 and relate above all to national and institutional aspects as well as to science fields. This is the cluster of case studies and traditional bibliometric applications.



Cluster 3 is a second theoretical/methodological cluster. Unlike the first one, this cluster relates to more advanced methodological techniques, such as informetric laws, frequency distributions and multivariate statistics. This cluster could be characterised as theoretical and mathematical issues in bibliometrics. The term structure is presented in Fig. 8.




Cluster 4 presented in Fig. 9 clearly represents webometrics and network-related issues. All terms are strongly interlinked. This cluster corresponds by and large to the category of Webometrics/Informetrics.



Cluster 5 with 3 papers is the smallest one. Co-citation analysis and the analysis of other citation statistics are the topic of these papers. The term structure (cf. Fig. 10) reflects the statistical vocabulary used in these studies. This cluster covers specific applications of statistical methods.



The last cluster with 30 papers (see Fig. 11) is by far the largest one. It comprises technology and innovation related studies, the science-technology interface and almost the complete Triple Helix issue can be found here (cf. Table 3). Also the sociological approaches are covered by this cluster. This cluster can be considered a borderland of classical scientometrics, namely the interdisciplinary approaches such as sociological, policy relevant and technology related issues.



The two large categories A and E covering 65% of all papers proved heterogeneous. Category A has (jointly with category M) three sub-clusters, namely, Cluster 1, 3 and 6, whereas Category E falls apart into three other sub-clusters: Cluster 2, 5 and 6. Policy relevant issues are also covered by clusters 2 and 6. Only Category I is represented by a corresponding co-word cluster, namely cluster 4.

The full text analysis substantiates that both methodological and empirical research have nowadays at least two different main focuses each, one is based on scientometric standard techniques such as classical indicators, the other ones are clearly broadening the scope of traditional bibliometrics.



As already seen in the pilot study, Webometrics is characterised by low reference age and medium–high share of serials (cf. Glenisson et al., 2005).

Most of the policy related issues are characterised by relatively low share of serials. Nevertheless, there is a group of papers with clearly higher share, too. This confirms the results of the full text analysis, namely that this category practically forms two sub-clusters.

The category Advances in Scientometrics proves strikingly homogeneous with several outliers only. Most of the A-class papers have, however, a mean reference age ranging between 5 and 15 years, with medium–high share of serials ranging between 50% and 90%.

The empirical groups proved heterogeneous, indeed. Regarding the share of serials this class forms two distinct sub-classes, particularly, one with low share (<=55%) and one with relatively high share (>=67%). The class with lower share has similar characteristics as the policy relevant class.

The question how bibliometric measures can, in turn, be assumed to reflect formal characteristics of documented scientific communication that might supplement results obtained from content-based analyses could also be answered in a positive way. Reference-based citation measures can help to fine-structure clusters determined on basis of co-word analysis.

沒有留言:

張貼留言