2015年4月14日 星期二

Pudovkin, A.I., & Garfield, E. (2002). Algorithmic procedure for finding semantically related journals. Journal of the American Society for Information Science and Technology, 53(13), 1113–1119.

Pudovkin, A.I., & Garfield, E. (2002). Algorithmic procedure for finding semantically related journals. Journal of the American Society for Information Science and Technology, 53(13), 1113–1119.

本研究嘗試利用論文的引用做為參數計算期刊之間的相關因素(relatedness factor),根據計算出來的相關因素找到與目標期刊意義上最相似的期刊。傳統的分類仰賴於根據主觀分析,主觀分析是根據某個或某些特定的分類原因,例如ISI期刊索引報告(Journal Citation Reports,JCR)上的期刊分類便是由經驗法則(heuristic)的主觀方式產生。JCR的作法是在類別建立之後,在同一時間,將新的期刊根據它的相關引用資料進行目測,指定類別;當類別成長,便將類別再細分。除此以外對於個別期刊的分類,也有使用一個未被發表的演算法--Hayne-Coulson algorithm,這個演算法將任何特定的期刊群組做為一個大型期刊(macro-journal),然後產生引用與被引用的期刊資料。在大多數的情況下,這種主觀分析已經足夠,但在一些研究領域中,它被認為是過於粗略而不足並且也受限於與時間的不確定,此外也無法讓使用者可以快速了解哪些期刊是最密切相關的。因此,引進引用索引(citation indexes)與的量化方法被提出來解決這些問題。JCR對每種期刊根據它的引用關係提供了一組最密切相關的期刊,也就是它引用最多的期刊以及引用它最多的期刊,Pudovkin & Garfield (2002)認為這是極為有用並且提供了一種原始的分類,然而由於每種期刊的論文數量不同,使得只能夠得到期刊間關係的淺層感知。因此他們提出了一種期刊間相關因素的測量方式:假定Ri>j表示期刊i和j之間的相關因素,定義Ri>j等於Hi>j * 106 / (Papj * Refi),此處Hi>j是當年度期刊i引用期刊j的次數,Papj與Refi分別是期刊j當年發表的論文數以及期刊i當年論文的參考文獻總數。上述的定義需要注意的是期刊本身的相關因素也許比它對其他期刊的相關因素來得小。此外,為了使兩種期刊A和B之間的相關因素對稱,所以本研究採用RA>B與RB>A中最大的一個,也就是定義RA&Bmax = max(RA>B, RB>A)。本研究以基因與遺傳學領域的核心期刊Genetics為例,研究結果顯示這種根據期刊論文數量加權的相關因素計算方式在發現相關期刊上的效果比未加權的方式來得好,這種方式可以發現原先未被歸入JCR的"Genetics & Heredity"類別但明顯是遺傳學相關的期刊,也可以發現原本歸入這個類別但內容較不相關的期刊。

Using citations, papers and references as parameters a relatedness factor (RF) is computed for a series of journals. Sorting these journals by the RF produces a list of journals most closely related to a specified starting journal.

The method appears to select a set of journals that are semantically most similar to the target journal.

Traditional classification relies on subjective analysis which for one reason or another proves inadequate and is subject to the vagaries of time.

Quantitative methods have been proposed for overcoming these problems. This was greatly facilitated with the introduction of citation indexes in the 1960's and the later introduction of the ISI Journal Citation Reports.

JCR reports inter-journal citation frequencies for thousands of journals. .... Journals are assigned to categories by subjective, heuristic methods.

One of the referees asked for a description of the procedures used by ISI in establishing journal categories for JCR. ... This method is “heuristic” in that the categories have been developed by manual methods started over 40 years ago. Once the categories were established, new journals were assigned one at a time. Each decision was based upon a visual examination of all relevant citation data. As categories grew, subdivisions were established. Among other tools used to make individual journal assignments, the Hayne-Coulson algorithm is used. The algorithm has never been published. It treats any designated group of journals as one macrojournal and produces a combined printout of cited and citing journal data.

In many fields these categories are sufficient but in many areas of research these “classifications” are crude and do not permit the user to quickly learn which journals are most closely related.

JCR provides, for each journal, a set of its most closely related journals based on citation relationships. These are the journals it cites most heavily (cited journals) and also the journals which cite it most often (citing journals). These are extremely useful and provide a crude classification, but unfortunately due to the variations in the sizes of journals one only obtains a superficial perception of the relatedness between two or more specific journals.

We have illustrated the procedure using one core journal in the field of genetics and heredity, the well-known Genetics, published by the Genetics Society of America.

Let journal relatedness of two journals, “i” and “j” be symbolized by Ri>j = Hi>j * 106 / (Papj * Refi), where Hi>j is the number of citations in the current year from journal “i” to journal “j” (to papers published in “j” in all years of ‘j’), Papj and Refi are the number of papers published and references cited in the j-th and i-th journals in the current year.

If we consider a pair of journals, A and B, there may be two indexes: RA>B and RB>A. These can be very different.

It is noteworthy that the citation relatedness of a journal to itself (that is “self-relatedness”) may be lower than its relatedness to some other journals.

Now it is suggested we use the larger of them, RA&Bmax = max(RA>B, RB>A), which we shall call the relatedness factor (RF).

An important feature of the suggested approach is the calculation of SPECIFIC citation relatedness, that is, the new indexes take into consideration the sizes of citing (through the number of references) and cited (through the number of published papers) journals.

The new algorithmic approach enables one to find thematically related journals out of a multitude of journals. ... Weighting citation data by journal size allows identifying journals that are similar in content better than unweighted raw citation data.

In the case of the starting journal Genetics the method identified those journals which are significantly genetic in content, but were not included in the “Genetics & Heredity” category of the JCR. ... Journals included in the “G & H” category are rather heterogeneous in content. Some are highly related to Genetics, while others, as for example journals on medical genetics are poorly related to its content.

JCR has become an established world wide resource but after two or more decades it needs to reexamine its methodology for categorizing journals so as to better serve the needs of the research and library community.

沒有留言:

張貼留言