Groh, G. and Fuchs, C. (2011). Multi-modal social networks for modeling scientific fields. Scientometrics, 89, 569-590.
network analysis
本研究將社會網路分析的概念與方法應用到領域分析,以學術領域相關的論文資料來建立研究者的合著網路(co-authorship network)、人員-組織網路(person-organization network)、共被引網路(co-citation network)、期刊-人員網路(journal-person network)和研討會-人員網路(conference-person network)等各種網路圖。本研究以行動社會網路(mobile social networking)為分析的學術領域,蒐集了933筆相關的論文資料,建立上述的各種網路圖並進行中心性(centrality)和節點叢集(clustering)等分析。合著網路上共有1687個節點以及2926條連結線,共分為538個成分(components),最大的成分上有200個節點,大約占所有節點的11.86%,這個成分的直徑(diameter)為13,平均最短路徑長度為5.21,另外密度為0.0394、整體叢集係數(global clustering coefficient)是0.63。利用tf-idf的方式標註各成分的作者所探討的主題,最大的成分其主題為應用(application)、行動性(mobility)、社會關係(social relations)和有用性(usefulness)。作者共被引網路上則有1687位作者、52928個共被引對(co-citation pairs),最大的成分上有1490位作者(~88.3%)以及51171條連結線,這個成分的直徑為7,平均最短路徑長度為2.8672,另外密度為0.0475、整體叢集係數以及平均區域叢集係數(average local clustering coefficient)分別是0.7和0.86。並且為了瞭解行動社會網路的主要研究主題,針對最大成分以Clauset, Newman and Moore (2004)提出的以模組性為基礎的演算法進行叢集,叢集結果的模組性為0.616,然後利用尋徑者網路(path-finder network)演算法 (Schvaneveldt et al. 1989)將網路進行視覺化,並且在圖形上標示網路叢集獲得的七個子群。本研究建議可以將這些分析應用於1)以網路圖提供整個領域一個完整的概觀(overview)、2)對網路上的某一節點或某一群節點在一段時間內的變化與動向進行追蹤(tracking)以及3)觀察網路圖在時間上的演變(evoluation)等三種服務。
When two authors publish their work together, both authors are treated as nodes and are connected by an edge (their publications) in the co-authorship graph. The edge can be weighted e.g. to reflect the number of papers published jointly or to illustrate temporal aspects. Co-authorship networks are—as typical representatives for social networks—scale-free and conform to the small world phenomenon (Barabasi et al. 2002; Porter et al. 2009).
The assumption for co-citation analysis is that two documents, authors, journals or other objects which get cited jointly by a (later) third document have—at least from the perspective of the citing author—some coincidence in terms of content. The more frequently two objects get cited together the more this similarity is stressed. This technique was presented for documents (document co-citation) by Small (1973) and for authors (author co-citation) by White and Griffith (1981) and is often used to create a semi-automatic overview of the literature of a scientific field.
By now, many author co-citation studies have been performed for many different fields (e.g. Chen and Carr 1999; McCain et al. 1990; Tsay et al. 2003; White and McCain 1998; Zhao and Strotmann 2008). Current approaches use Pathfinder Networks (Buzydlowski 2003; Chen and Morris 2003; Chen and Hsieh 2007; Lin et al. 2003; McCain et al. 1990; White 2003b) or self-organized maps (Buzydlowski 2003; Lin et al. 2003).
In order to more precisely define ‘to cover’ one basically has to answer three questions/define three sub-concepts of ‘to cover’:
– Define criteria whether an article belongs to the domain in question
– Define criteria when the data-set is sufficiently large to map the relevant structures in the corresponding networks.
– Decide upon the set of meta-data recorded for the construction of the networks
– Define criteria whether an article belongs to the domain in question
– Define criteria when the data-set is sufficiently large to map the relevant structures in the corresponding networks.
– Decide upon the set of meta-data recorded for the construction of the networks
The resulting data set which was collected in July and August 2009 consists of 933 articles and their associated items/objects from the scientific domain ’Mobile Social Networking’, forming a multi-modal network.
The data-model of items considered encompasses persons (authors and researchers), documents (articles), journals (and journal issues),conferences (and conference instances) and projects (which is an abstraction of scientific projects, working groups and other target-oriented organizations of persons) and free optional tags for every item.
The co-authorship network derived from the collected data consists of 1687 nodes and 2926 edges. The graph resolves into 538 components with the biggest component containing 200 nodes (approx. 11.86%).
The Person-Organization Network is the graph which results from looking at the relations between authors and their affiliations (companies, universities, research centers, etc.) retrieved from the articles in the database. The bipartite graph consists of 393 components (1194 author nodes and 544 nodes standing for organizations). The biggest component contains 330 nodes (~15%) and is composed of 277 authors and 53 organizations.
The average degree of an organization node is 6.62 (median 4.0) and the standard deviation is 7.76. For the nodes representing a person the average degree is 1.27 (median 1.0) with a standard deviation of 0.56. This implies that a big part of the authors in the data set maintain relations to only one organization and that many authors concentrate on a few organizations (17 organizations only have one assigned author each, whereas just two organizations— Carnegie Mellon University and MIT Media -Lab—are connected to more than 30 authors).
An author co-citation analysis can be done at least in two ways:
The traditional way ensuing McCain’s paper (McCain 1990) uses a vector model to compare the co-citation profiles of authors. For each author, a vector is calculated which contains the author cocitation count for each other author of the data set. Afterwards, the analysis compares the author vectors using a measure like cosine (van Eck and Waltman 2008; Egghe and Leydesdorff 2009) or Pearson correlation (McCain 1990). A link between two authors does not necessarily mean that both got co-cited, it just tells something about the similarity of their co-citation vectors (i.e. how they get co-cited with all authors).
The other approach used in studies like White (2003b) works on the raw data: a link between two authors expresses that these two authors got co-cited (and does not reveal something about their relationship to any other author).
The data set contains 1687 authors who make up 52928 co-citation pairs. 1490 authors (~88.3%) form the biggest component with 51171 edges. The remaining 197 authors are distributed among 148 additional components which are noticeably smaller. The diameter of the biggest component is 7, the density is 0.0475, the average path length is 2.8672, the global clustering coefficient is 0.7 and the average local clustering coefficient is 0.86.
The next question is whether it is possible to split the big component into several clusters with different research topics within the mobile social networking community. Thus the big component was clustered using a clustering method based on modularity by Clauset, Newman and Moore (Clauset et al. 2004). The modularity reached with the clustering mechanism is 0.616 and relatively high.
A Pathfinder Network (Schvaneveldt et al. 1989) of the largest component can be seen in Fig. 2.
Cluster #1 contains documents which deal with different use cases for context sensitive applications. The example applications cover e.g. tourist information systems and the fast development of prototypes for mobile, context sensitive applications.
In Cluster #2 the theory of scale-free networks is the main topic, important authors are Re´ka Albert, Albert-La´szlo´ Baraba´ si, Duncan J. Watts and Mark Granovetter.
Cluster #3 highlights security and data privacy, the main authors are Marco Gruteser, Jason I. Hong, Paul Dourish und Anind K. Dey.
The authors in cluster #4 write about ubiquitous computing, mobile applications with social, local and contextual reference.
In cluster #5 the influence of new forms of communication and the general technological development for our society are examined.
The topics in cluster #6 are sensor networks based on mobile phones.
Compared with the other clusters, cluster #7 is rather hardware-centric and deals with delay tolerant
networks.
This part of the article discusses the derived journal-person network. The bipartite graph contains 151 components with more than one node and consists of 861 authors and 242 journals. In this part, the largest component with 443 authors and 69 journals is discussed.
The nodes represent journals and two journals are connected by an edge when at least one person exists who published in both journals. The weight of an edge correlates to the number of authors who published in both of the connected journals. Thus, the decision on how similar two journals are is left to the authors, the users of the journals, themselves. ... The density of the network is 0.084, it consists of 69 nodes (journals) and 196 edges.
The Pathfinder Network shown in Fig. 3 reaches from the area of human computer interaction (upper left) to the social sciences (upper right):
The network consisting of conferences and persons contains 121 components with more than one node. A component has 7.66 nodes on average (with a standard deviation of 17.29). The largest component includes 185 nodes (see Fig. 4).
– overview services: the primary goal is to get a comprehensive overview of the modeled scientific area
Especially newcomers in a scientific field can have problems acquiring a broad overview of the area. The approach presented here can help to analyze the scientific area more in detail.
The author co-citation network can be used to locate the authors concerning their research interests. The representation as Pathfinder Network is—besides the application of a clustering mechanism—useful to get easily interpretable results. The author cocitation qualifies because the decision concerning the similarity of authors is done by the authors themselves. A disadvantage is that the picture drawn with author co-citation analysis only shows the past and not the present since it takes a while since a newly published paper won’t get cited immediately.
The collaboration of authors can be visualized using the co-authorship network. This graph can be used to identify definable schools within the scientific field (to get results which can be interpreted easily a clustering mechanism and a Pathfinder Network rendering can help).
A concrete service could allow the user to identify different sub-areas within the scientific field. ... The user has the option to focus his research on interesting sections only (ignoring the other subareas) or to do a more macro-orientated analysis and work on the whole graph.
The analysis of graphs with journals or conferences offers an insight in the communication patterns within the scene: from the perspective of the newcomer, this view on the graph can reveal interesting literature, the more experienced researcher can use this view to identify the best fitting journals for his/her articles.
The network of organizations can help active researchers to identify interesting organizations as career options: if an author specialized in a specific topic it would be favorable to work at an organization which already has influence in the specific area.
– tracking services: one or more nodes of the graph are separately tracked together with a history of their dynamics over a specific time-period. This type of service can unleash its full potential not until the update process of the model can be done automatically, because frequent regular manual model updates are uncomfortable and resource consuming for the user.
If the underlying network can be updated automatically, tracking services can be implemented effectively. These services observe a set of nodes and document their development over time.
– evolution services: these kind of services try to explain the development in the model starting from a given point in time up to the current situation.
... for this type of services the network needs to be updated on a regular basis (manually or, preferably, in an automatic way). Such a service can be useful for people who have been working in the observed field for a certain time but had to interrupt their work for a longer period of time (e.g. for other projects, parental leave, sabbatical, etc.). ... A representation with a graph visualization tool like SoNIA (McFarland and Bender-deMoll 2009) might be helpful too, in order to get a high level overview of the changes in the area. SoNIA displays changes in a graph as an animation which documents the dynamics of the network on a step by step basis.
沒有留言:
張貼留言