2014年8月15日 星期五

Milojević, S., Sugimoto, C. R., Yan, E., & Ding, Y. (2011). The cognitive structure of library and information science: Analysis of article title words. Journal of the American Society for Information Science and Technology, 62(10), 1933-1953.

Milojević, S., Sugimoto, C. R., Yan, E., & Ding, Y. (2011). The cognitive structure of library and information science: Analysis of article title words.Journal of the American Society for Information Science and Technology,62(10), 1933-1953.

Scientometrics

圖書資訊學(LIS)為對於記錄下來的資訊(recorded information)和具有文化意義的文物與標本(culturally meaningful artifacts and specimens)有興趣的研究領域(Bates, 2010),包括的領域有檔案學(archival science)、 書目(bibliography)、文獻與文類理論(document and genre theory)、資訊學(informatics)、資訊系統(information systems)、知識管理(knowledge management)、圖書資訊學(LIS)、博物館研究(museum studies)、記錄管理(records management)和資訊的社會研究(social studies of information)。過去有許多研究嘗試定義與描述圖書資訊學的領域並且確認其中包含的研究主題,這些研究使用的方法相當廣泛,包含Järvelin & Vakkari (1990, 1993)採用內容分析(content analysis);Åström (2007, 2010)、Moya-Anegón, Herrero-Solana, & Jiménez-Contreras (2006)和 Persson (1994) 針對期刊或期刊文章進行書目計量分析 (bibliometric analysis) ; Moya-Anegón et al., (2006)和White & McCain (1998)針對作者進行書目計量分析 ;Åström (2002)、 Ding, Chowdhury, & Foo (2001) 和 Janssens, Leta, Glänzel, & De Moor (2006)利用從題名、摘要或全文抽取的詞語進行詞語的共現分析(co-word analysis) ;Sugimoto & McCain (2010)則是用索引詞語的三元共現分析(tri-occurrence analysis) ; van den Besselaar & Heimeriks (2006)利用詞語和參考文獻的組合進行分析;以及Sugimoto, Li, Russell, Finlay, & Ding, (2011)和 Sugimoto & McCain (2010)所使用的主題模型分析方法。

上述的這些方法,許多必須依賴於作者對於領域知識的了解,才能了解領域的主題與認知結構(cognitive structure),例如White & McCain (1998)基於最重要的作家的集群,觀察資訊科學由圍繞在一個微弱中心的許多專業所組成;Åström (2010)則是透過作者與期刊的映射圖說明這個領域的圖書館學(LS)和資訊科學(IS)之間具有差距。除了是認知結構較不直接的指標之外,引用分析另一個的問題是不同的次領域有不同的發表與引用實務。

論文題名包含許多能夠指出該文章內容的詞語(Buxton & Meadows, 1977; Meadows, 1998)。因此,本研究採用的方法是利用期刊論文題名上的重要詞語進行分析。分析的資料來自16種LIS期刊於1988到2007年發表的10344筆論文資料。

選取100個最常出現於題名的詞語。

本研究使用的分析技術包含詞語的相對頻率(relative frequency)並且根據詞語的共現進行叢集,最後並將詞語以及期刊與發表年度等進行多維尺度分析(multidimensional scaling, MDS),產生視覺化的結果。

詞語的共現分析以及階層式集群分析的結果發現三個主要分類LS(圖書館學)、IS(資訊科學)、SCI-BIB(科學計量學-書目計量學)以及兩個較小的分類資訊尋求行為(information-seeking behavior)和書目指導(bibliographic instruction)。LS可再細分為學術圖書館專業(academic librarianship)、公共圖書館專業(public librarianship) (包含館藏建立)、資訊素養和學校圖書館專業(information literacy and school librarianship, technology)、政策(policy)、全球資訊網(the web)、知識管理(knowledge management)、數位圖書館(digital libraries)、電子商務(e-commerce)、法律(law)以及學術出版(scholarly publishing)等主題。IS則包含資訊檢索(information retrieval)、網路搜尋(web search)、分類目錄(catalogs)以及資料庫(database)等主題。SCI-BIB也有書目計量指標(bibliometric indicators)、作者生產力(author productivity)與引用研究(citation study)等主題。整體的結構如下圖

從詞語的使用可以發現LIS中有某些持續出現的核心詞語,但也有一些詞語的使用在20年間有明顯的變化,這些都是與科技相關的(technologically related)詞語,這個現象符合Saracevic(1999)所宣稱的LIS是個科技驅動的(technology driven)領域。大致上來說,LIS內的改變可以從資料庫(database),到數位圖書館(digital libraries),到全球資訊網(the World Wide Web)等詞語使用的移轉上看得出來。

除了科技驅動的特徵外,LIS同時也有很大的範圍在討論資訊尋求行為,這是LS和IS都共同關心的課題。

A number of empirical studies of LIS have been conducted with the aim of describing and defining the field and identifying research areas within it. These studies applied a wide array of approaches: content analysis (Järvelin & Vakkari, 1990, 1993); bibliometric analysis of journals and journal articles (Åström, 2007, 2010; Moya-Anegón, Herrero-Solana, & Jiménez-Contreras, 2006; Persson, 1994); bibliometric analysis of authors (Moya-Anegón et al., 2006,White & McCain, 1998); co-word analysis of both index terms and words extracted from titles, abstracts, and full text (Åström, 2002; Ding, Chowdhury, & Foo, 2001; Janssens, Leta, Glänzel, & De Moor, 2006); tri-occurrence analysis of index terms (Sugimoto & McCain, 2010); analysis of word-reference combinations (van den Besselaar & Heimeriks, 2006); and topic analysis (Sugimoto, Li, Russell, Finlay, & Ding, 2011; Sugimoto & McCain, 2010).

Some notable studies of cognitive structure of LIS have interpreted topics post hoc, by assigning topicality based on knowledge of the author’s domain (e.g., White & McCain, 1998). In White and McCain’s influential visualization of LIS, they concluded that “information science lacks a strong central author, or group of authors, whose work orients the work of others across the board. The field consists of several specialties around a weak center” (p. 343). However, this analysis was based foremost on the clustering of authors, rather than topics. Similarly, Åström (2010) examined the divide between LS and IS components of the field by a bibliometric mapping of authors and journals. Topicality was assigned through expert knowledge of the domains in which these authors wrote and journals published.

Of the various components of textual documents, the titles, and the choice of words in them, are of particular importance. Title words function as “attention triggers” (Bazerman, 1985, 1988). They are devices for capturing interest in the world where information overload is a norm. Title words
have been called “signal-words”1 (Rip & Courtial, 1984) and “macro-actors” or “macro-terms”2 (Callon et al., 1983). Titles of journal articles themselves have undergone a change during the 20th century, becoming more informative, more specific, and containing a larger number of words that indicate article content (Buxton & Meadows, 1977; Meadows, 1998). Leydesdorff (1989) claims that “title words seem to offer a means of making visible the internal cognitive structure” (p. 217) of a discipline. He also claims that “word structure reflects internal intellectual organization in terms
of the codification of word usage in the relevant disciplines” (Leydesdorff, 1989, p. 221). 

Co-word analysis is based on co-occurrence of words (all words, or selected keywords) extracted from titles, abstracts, or text in general, or the index terms assigned by authors or indexers. Co-word analysis is a method that derives “higher level structures from word-occurrence patterns in text” (Chen, 2003, p. 139). Of particular importance in the context of this study is that co-word analysis is “a means to the elucidation of structures of ideas, problems, and so on, represented in appropriate sets of documents” (Whittaker, Courtial, & Law, 1989, p. 473). 

Although co-word analysis has its limitations, (e.g., Leydesdorff, 1997) primarily because of the
change of usage and meaning of words and the lack of context, such analysis has been considered particularly useful in tracking the development of scientific fields over time (Callon et al., 1991; Noyons & van Raan; Rip & Courtial, 1984), which represents another goal of this study.

Although citation analysis is not subject to the same limitation, it is a less direct indicator of cognitive structure. As already mentioned, studies using citations require post hoc assignment of topics. In addition, citation analysis of LIS is less effective in analyzing the cognitive structure of entire fields due to the different publication and citation practices of subfields, thus leaving even large subfields such as LS often invisible.

Selection of journals and articles. Articles from 16 LIS journals were chosen for inclusion in this study. The journals were selected from a ranked list of the most important journals in the field, according to deans and directors of American Library Association (ALA)-accredited, MLS programs in North America (Nisonger & Davis, 2005).

From this journal set, all research and review articles (10,344) published between 1988 and 2007 were included in the analysis.

Identification of the most frequently occurring LIS words and phrases. Word frequency is an important measure in content analysis. This measure is used to identify the most important research topics or concepts in a field by focusing on the most frequently occurring words.

In this study, we base all analyses on the 100 most frequently occurring LIS words or phrases. 

2014年8月14日 星期四

Huang, M. H., & Chang, Y. W. (2012). A comparative study of interdisciplinary changes between information science and library science. Scientometrics, 91(3), 789-803.

Huang, M. H., & Chang, Y. W. (2012). A comparative study of interdisciplinary changes between information science and library science. Scientometrics, 91(3), 789-803.

scientometrics

本研究利用圖書館學與資訊科學領域下各五種期刊於1978到2007年間論文引用的參考文獻,比較這兩個領域的跨學科(interdisciplinary)特性。跨學科性(interdisciplinarity)的定義為使用來自其他學科的知識(knowledge)、方法(methods)、技術(techniques)與設備(devices)成為科學活動的結果(Tijssen 1992),利用來自不同學科參考文獻的引用分布是經常採用的分析技術。研究結果顯示兩者的來源學科有很大不同:圖書館學的研究傾向於引用圖書資訊學(library and information science)、教育學(education)、企業/管理(business/management)、社會學(sociology)和心理學(psychology);然而資訊科學的研究引用大多來自圖書資訊學、一般科學(general science)、電腦科學(computer science)、科技(technology)和醫學(medicine)等學科。除了圖書資訊學本身以外,圖書館學引用的學科主要以社會科學為主,資訊科學的引用則主要來自於自然科學。

從引用比例的變化來看,圖書館學在引用圖書資訊學上有下降的趨勢,引用自教育學的比例則是上升,資訊科學來自電腦科學上的引用,其比例也是上升。

本研究以從Brillouin指標(Brillouin’s Index)測量兩個領域的跨學科性,Brillouin指標的計算方式如下:

N是觀察的數量(the number of observations),也就是參考文獻的總數,ni是屬於第i個類別的觀察的數量,也就是在第i個學科的參考文獻數量。從可以看到這兩個領域的跨學科性都逐年上升,並且資訊科學比圖書館學有較高的跨學科性。


Based on the research generated by five library science journals and five information science journals, library science researchers tend to cite publications from library and information science (LIS), education, business/management, sociology, and psychology, while researchers of information science tend to cite more publications from LIS, general science, computer science, technology, and medicine. This means that the disciplines with larger contributions to library science are almost entirely different from those contributing to information science.

However, a decreasing trend in the percentage of LIS in library science indicates that library science researchers tend to cite more publications from non-LIS disciplines. A rising trend in the proportion of references to education sources is reported for library science articles, while a rising trend in the proportion of references to computer science sources has been found for information science articles.

In addition, this study applies an interdisciplinary indicator, Brillouin’s Index, to measurement of the degree of interdisciplinarity. The results confirm that the trend toward interdisciplinarity in both information science and library science has risen over the years, although the degree of interdisciplinarity in information science is higher than that in library science.

The concept of interdisciplinarity has been discussed by many researchers (Huutoniemi et al. 2010; Leydesdorff and Probst 2009; Rosenfield 1992; Tijssen 1992), and can be defined as the use of knowledge, methods, techniques, and devices as a result of scientific activities from other fields (Tijssen 1992).

2014年8月11日 星期一

Tsay, M. Y. (2011). A bibliometric analysis and comparison on three information science journals: JASIST, IPM, JOD, 1998–2008. Scientometrics, 89(2), 591-606.

Tsay, M. Y. (2011). A bibliometric analysis and comparison on three information science journals: JASIST, IPM, JOD, 1998–2008. Scientometrics, 89(2), 591-606.

Scientometrics

Borko (1968) 將資訊科學定義為「研究資訊的特性與行為、管理資訊流的力量以及使資訊能最佳化的取得與可用性的處理方法」本研究探討與比較JASIST (Journal of the American Society for Information Science and Technology)、IPM (Information Processing and Management)和JOD (Journal of Documentation)三種資訊科學相關期刊在1998到2008年間論文的參考文獻具有的書目計量特性 (bibliometric characteristics) 以及與其他學科的主題關係 (subject relationship)。

研究結果呈現三種期刊都是資訊科學導向,但JOD更傾向於圖書館學,而JASIST和IPM有更多的共同性以及比JOD更深入地擴散到其他學科。若干結果如下:
1. JASIST出版的文章數量為IPM和JOD的兩倍,後兩者出版的文章數量約略相當。但JOD以書評(book reviews)為主(54%)。
2. JASIST和JOD上每一篇論文平均有38和40筆參考文獻,明顯比IPM的32筆多。JOD、JASIST和IPM的參考文獻分別有9.3、7.8和4.1為書籍。
3. 期刊的自我引用情形以JASIST的17.46%最高,IPM和JOD分別為14.11%和10.19%。
4. 三種期刊引用最高的前五種期刊中有四種是相同的,包含JASIST、IPM、Scientometrics和JOD。JOD引用最高的書籍與其他兩者明顯不同,但JASIST和IPM引用最高的前三名則是一樣的,包含Salton 和 McGill的 Introduction to Modern Information Retrieval、 Van Rijsbergen的 Information Retrieval 以及 Salton的 The SMART Retrieval System: Experiments in Automatic Document Processing。
5. 三種期刊引用最高的前十種期刊有40到50%為資訊科學相關期刊,表示這個領域的研究人員引用較多自己領域的研究結果。
6. 引用期刊的前三大類別為‘‘Bibliography. Library Science. Information Resources (General)’’ 、 ‘‘Science’’ 和 ‘‘Social Sciences (General)’’。針對引用書籍而言,JASIST和IPM最大的類別都是science,但JOD則是‘‘Bibliography. Library Science. Information Resources (General)’’。以主題來說,三種期刊的前三大都是一樣的,包括‘‘searching’’、‘‘online information retrieval’’ 和 ‘‘information work’’。


Employing a citation analysis, this study explored and compared the bibliometric characteristics and the subject relationship with other disciplines of and among the three leading information science journals, Journal of the American Society for Information Science and Technology (JASIST), Information Processing and Management and Journal of Documentation. The citation data were drawn from references of each article of the three journals during 1998 and 2008.

Comparison on the characteristics of cited journals and books confirmed that all the three journals under study are information science oriented, except JOD which is library science orientation. JASIST and IPM are very much in common and diffuse to other disciplines more deeply than JOD.

Borko (1968) defined that information science is ‘‘a discipline that investigates the properties and behavior of information, the forces governing the flow of information, and the means of processing information for optimum accessibility and usability.

JASIST published more than twice of articles of IPM and JOD, both published approximately the same number of articles. Interestingly, JOD published more book reviews (54%) than journal articles.

The average number of references cited per paper for JASIST and JOD is 38 and 40. It is significantly higher than that of IPM of 32. There is no significant difference between JASIST and JOD in terms of average number of references cited.

In average, 9.3, 7.8, 4.1 books were cited per paper by JOD, JASIST and IPM, respectively. JOD cites books per paper most, while IPM cites least.

JASIST has the highest self-citation rate of 17.46%, next by IPM of 14.11% and JOD
has the least self-citation rate of 10.19%.

Four of the top five highly cited journals are in common, i.e., Journal of the American Society for Information Science and Technology, Information Processing and Management, Scientometrics, and Journal of Documentation.

On the other hand, the most cited three books in common for JASIST and IPM are Salton and McGill’s Introduction to Modern Information Retrieval, Van Rijsbergen’s Information Retrieval and Salton’s The SMART Retrieval System: Experiments in Automatic Document Processing.

For the three journals under study, most of the top ten highly cited journals, contributing about 40–50% of cited journals, are information science journals indicating that the researchers in the information science field cite more research results in their own field.

The top three main classes of cited journals in papers of the three journals under study are in common and in the same order, i.e., ‘‘Bibliography. Library Science. Information Resources (General)’’, ‘‘Science’’ and ‘‘Social Sciences (General)’’.

As for the books cited, the most cited main class in JASIST and IPM papers is science, while the most cited main class for JOD is ‘‘Bibliography. Library Science. Information Resources (General)’’.

The top three highly cited subjects of library and information science journals are in common and encompass ‘‘searching’’, ‘‘online information retrieval’’, and ‘‘information work’’.

Papers in JOD are less computer-related than JASIST and IPM and JOD is more traditional library science oriented than JASIST and IPM are. On the other hand, ‘‘Information Storage and Retrieval Systems’’ and ‘‘Information Retrieval’’ are two of the three most cited subjects of books cited by the three journals under study.

2014年8月10日 星期日

Ni, C., Sugimoto, C. R., & Cronin, B. (2013). Visualizing and comparing four facets of scholarly communication: producers, artifacts, concepts, and gatekeepers. Scientometrics, 94(3), 1161-1173.

Ni, C., Sugimoto, C. R., & Cronin, B. (2013). Visualizing and comparing four facets of scholarly communication: producers, artifacts, concepts, and gatekeepers.Scientometrics, 94(3), 1161-1173.

network analysis

本研究以發表場域-作者-耦合(Venue-Author-Coupling,VAC)、期刊共被引分析(journal co-citation analysis)、主題分析(topic analysis)和連結編輯委員會成員(interlocking editorial board membership)等四個面向分析資訊科學與圖書館學的期刊網絡。這個研究分析的期刊範圍為2008年 JCR (Journal Citation Report)資訊科學與圖書館學分類的58種期刊,在2005到2009年間的出版資料。分析資料的相關數據如Table 1:


本研究利用VAC代表期刊的生產者(producers)的相似性,根據每一對期刊間相同的作者數量測量它們的接近程度,其原理建立在作者會選擇主題或社會性相似(thematically or socially similar)的期刊發表。期刊共被引分析(McCain, 1991)計算每一對期刊被共同引用的次數,本研究用來測量作品(artifacts)間的相似程度。本研究以修改自LDA模型(Blei et al. 2003)的ACT(Author-Conference-Topic)模型(Tang et al., 2008)透過關鍵詞(keywords)在主題上的分布以及主題在作者及發表場域(期刊)上的分布,本研究以餘弦(cosine)測量評估期刊之間的相似程度。連結編輯委員會成員則是編輯委員會上的共同成員數測量期刊間的相似程度。兩種期刊間共同的成員愈多,代表這兩種期刊在認知上或是社會性上愈相似。

根據上面的四種期刊間的相似程度所得到的結果,除了進行階層式集群分析(hierarchical cluster analysis)之外,也用來建立網絡,以Kamada-Kawaii 法呈現網絡的型態。分析得到的四種網絡並且以二次指派程序(Quadratic Assignment Procedure) (Lawler 1963)比較網絡之間可能的相關性(correlation)。

VAC方法得到的期刊網絡如下
四個集群分別為MIS(黃)、IS(藍)、LS(綠)以及專門性期刊(紅)。其中的MIS期刊集群與其他的集群相當分離。IS與LS距離較近。相較於其他三個期刊集群,專門性期刊彼此間的連結較弱。
期刊共被引分析所得的網絡如下:

主題模型產生的五個主題如Table 2
五個主題在網絡上的分布如下圖
MIS(黃)仍然與其他集群較為分離,但與健康和傳播(communication)等專門性期刊的距離較近。IS(藍)和LS(粉紅)的位置與VAC和期刊共被引分析的網絡上有所不同。圖書館服務與實務(綠)與專門性期刊和LS很接近。

在利用連結編輯委員會成員的期刊網絡上,有10種期刊沒有和其它期刊有共同編輯委員。其餘的集群分為四群。以傳播研究相關的期刊是新增加的集群(綠)。

四個網絡的QAP結果如Table 3

總結以上,在JCR的資訊科學與圖書館學分類下約略可以將期刊分為四個集群:MIS、IS、LS和傳播相關的期刊。MIS相較來說較為獨立。另外,QAP的結果可以看到編輯委員會成員的結果與期刊共被引分析有很高的相關性,其原因可能是由於擔任編輯委員的研究人員往往有較好的學術成就,被引用的機會較高。編輯委員會成員與VAC有較高的相關性,其原因也可能是編輯委員有較高的生產力。運用多種面向的分析可以較全面地了解整個學術傳播網絡。

Fifty-eight journals from the Information Science and Library Science category in the 2008 Journal Citation Report were studied and the network proximity of these journals based on Venue-Author-Coupling (producer), journal co-citation analysis (artifact), topic analysis (concept) and interlocking editorial board membership (gatekeeper) was measured. The resulting networks were examined for potential correlation using the Quadratic Assignment Procedure.

The VAC approach is used to represent the producers in this dataset. This approach measures journal proximity based on the number of authors shared by each journal pair. The VAC approach is based on the idea that an author’s choice of publication venue reflects similarity judgments authors are likely to choose venues that are thematically or socially similar.

Artifacts are measured by means of journal co-citation. This measure, introduced by McCain (1991), refers to the appearance of two journals in the same reference list of an article. The more frequently two journals appear in the same reference lists, the greater the similarity between the two journals. The journal co-citation approach measures journal proximity by the frequency with which each journal pair is co-cited by the same articles.

Topic modeling is used to capture concepts. ... The technique adopted here, the author-conference-topic (ACT) model (Tang et al., 2008), extends the LDA model by considering the author and publishing venue of the articles. LDA was developed originally as a topic modeling technique concerning the probability distribution of keywords for topics, and is particularly helpful with the ‘‘classification, novelty detection, summarization, and similarity and relevance judgment’’ of large-scale data (Blei et al. 2003, p. 993). ... This model extends the idea of LDA by taking into account the authors and publishing venues, and estimates not only the distribution of words on topics, but also the distribution of authors and venues on the topics modeled. ... Here, the outcome of the ACT model is the probability distribution of each author and each journal over topics, and the journal proximity is calculated using the cosine similarity of the journals.

The interlocking editorship approach, employed by Ni and Ding (2010), measures journal proximity based on common editorial board membership. The number of editorial board members that two journals share can be viewed as an indicator of journal similarity. ... Thus, it can be expected that if two journals have scholars in common on their editorial boards, these two journals have some degree of similarity, either cognitively or socially.

The journals were clustered using a hierarchical clustering technique with squared Euclidean distance and Ward’s method. Each journal clustering was displayed as a network (Kamada-Kawaii layout); each node (journal) was colored according to the hierarchical clustering result with the size of a
node proportional to its centrality (either degree or closeness).

Additionally, a comparison of journal proximity results was conducted using the Quadratic Assignment Procedure (QAP). QAP is commonly used in social network analysis as a means of investigating correlations between two networks. ... (Lawler 1963).