Mapping the history and current situation of research on John Cunningham virus – a bibliometric analysis

Background John Cunningham virus (JCV) constitutes a family of polyoma viruses, which plays important roles in the progressive multifocal leukoencephalopathy (PML) and tumorigenesis. However, no bibliometric investigation has been reported to guide the researchers and potential readers. Methods Papers were collected from database Sci-expanded and Pubmed until May 22, 2008. The highly-productive authors, institutes and countries, highly-cited authors and journals were ranked. The highly-cited articles were subjected to co-citation and chronological analysis with highly-frequent MeSH words for co-occurrence analysis. Results Until now, 1785 articles about JCV were indexed in Sci-expanded and 1506 in Pubmed. The main document type was original article. USA, Japan and Italy were the largest three producers about JCV. Temple University published 128 papers and ranked the top, followed by University of Tokyo. Khalili K and Yogo Y became the core authors due to more than 20 documents produced. Journal of Neurovirology published more than 15 papers and ranked the top. Padgett BL and Berger JR were the first two highly-cited authors. Journal of Virology and Journal of Neurovirology respectively ranked to the first two highly-cited journals. These top highly-cited articles were divided into 5 aspects: (1) The correlation between JC virus and tumors; (2) Causal correlation of JCV with PML; (3) Polyoma virus infection and its related diseases in renal-allograft recipients; (4) Detection of JCV antibody, oncogene and its encoding protein; (5) Genetics and molecular biology of JCV. The MeSH/subheadings were classified into five groups: (1) JCV and virus infectious diseases; (2) JCV pathogenicity and pathological appearance of PML; (3) JCV isolation and detection; (4) Immunology of JCV and PML; (5) JCV genetics and tumors. Conclusion JCV investigation mainly focused on its isolation and detection, as well as its correlation with PML and tumors. Establishment of transgenic animal model using JCV T antigen would be a hopeful and useful project in the further study.


Background
John Cunningham virus (JCV) constitutes a family of polyoma viruses, which contain small, circular and double-stranded DNA genomes. The early region is alternatively spliced to produce large T antigen and small t antigen [1]. T antigen, a large nuclear phosphoprotein for viral DNA replication, binds to viral replication region to promote the unwinding of double helix and recruitment of cell proteins that are required for DNA synthesis. The late region encodes the capsid structural protein VP1, VP2 and VP3 due to alternative splicing and the small regulatory protein known as agnoprotein [1,2]. VP proteins are essential to assemble with viral DNA to form virons. Serological studies have indicated an asymptomatic JCV infection in about 90% of the adult population, but it may be activated under immunosuppressive conditions, leading to the lethal demyelinating disease, progressive multifocal leukoencephalopathy (PML) [1][2][3][4][5]. Evidences from transgenic and infectious animal models indicated that JCV could transform cells and cause various malignancies [6][7][8][9]. In recent years, links have been suggested between JCV and various types of human cancers, including colorectal, prostate and esophageal cancers, brain tumors, bronchopulmonary carcinoma and B cell lymphoma [1][2][3][4][5][6][7][8][9], pointing out its roles as oncovirus. However, no bibliometric investigation has been reported to guide the researchers and potential readers.
Investigators in some fields commonly predict that decision making for the following experiments, clinical practice and paper's submission should be based on the findings of scientific studies published in journals. Although scientific papers have provided useful and helpful information to the readers, it is a little difficult to learn about the history, status and future trend of some study field. The bibliometric method employs empiric data and quantitative analysis to trace the core production or citation, the content or quality of publications, and motivations of the researchers in the form of published literature so that it proves to be a valid and reliable way to map external and internal features of a scientific field [10]. A key assumption underpinning this method to catch insight into the flow of knowledge is that investigation papers represent knowledge produced by scientific research. Generally, academic productivity of individuals or groups is measured by counting the number of publications. The number of times that one work is cited is viewed as a measure of research impact. That is, the more frequently a paper is cited, the higher its impact or quality [10,11]. Examination of bibliometric information shows the communication patterns of the investigation within the field and the patterns of influence among different work. Authors who publish earlier and experience frequent citations tend to accrue the number of citations over time as Matthew effect describes. For example, co-citation analysis (in which two papers are cited together in a paper) can indicate a strong conceptual relationship between the studies. On the other hand, PubMed indexes journal articles using MeSH terms, which constitute a thesaurus that embodies all the concepts appearing in the medical literature and are arranged in a hierarchical, treelike structure by subject categories. Associated with MeSH is a list of corresponding subheadings to enhance the focus of MeSH searches. The combination of MeSH terms and subheadings can not only facilitate the sensitivity and specificity of search, but also indicate the research contents and the relationship between papers [12][13][14]. If the further co-occurrence cluster analysis of MeSH is applied in some field, the close link between subtrees of the field will be well established.
In the present study, production and citation of JCV research have been analyzed using such bibliometric methods as chronological, co-citation and co-occurrence analysis to explore the whole history, current status and frontier about JCV study.

Data collection
The bibliographic data were collected in the database of the Institute for Scientific Information available on the web (http://www.isiknowledge.com, Sci-expanded) and National Library of Medicine on the web (http:// www.ncbi.nlm.nih.gov/sites/entrez, Pubmed) until May 22, 2008. The tile, author, address, source, references or the US list of the papers were downloaded according to the retrieval strategy of "JC virus OR John Cunningham virus OR JCV OR JC polyomavirus OR JC polyoma virus" for Sci-expanded or Pubmed.

Highly-produced and -cited analysis
Using Foxpro 5.0, Microsoft Excel, Bibliographic Item Cooccurrence Mining System (BIOCOMS) provided by Cui Lei and Sci-expanded statistical system, we applied Sciexpanded data to determine the document types, core authors, highly-produced institutes and countries. The references were analyzed to clarify the distribution of highlycited papers, authors and journals. The top MeSH/subheading words were collected from Pubmed and subjected to statistical analysis for highly-frequent ones.

Cluster analysis
After their identification, the top 34 most-cited articles were subjected to co-citation cluster analysis according to their co-citation times in one paper. The 48 highly-frequent MeSH/subheadings of all articles from Pubmed were studied using co-occurrence cluster analysis in term of their co-existence times in one paper. In any cluster analysis, the matrixes were built up according to co-citation or -occurrence times between the selected articles or words. Then, the related matrixes were developed using Ochiai index as previously described [15][16][17][18]. Finally, we employed the SPSS 10.0 software to perform the cluster analysis of these related matrixes.

Core countries, institutes, authors and journals
Until May 22, 2008, 1785 articles about JCV were indexed in Sci-expanded with 62508 references and 1506(225 reviews) in Pubmed with 6435 major MeSH/subheading words. The literature about JCV was gradually rising from 3 articles in 1976 until 179 in 2006 as indicated in Figure  1. The average annual growth rate was 5.7 pieces in the period. According to document type, there were 1307 original articles (73.2%), 123 reviews (6.9%) and 209 meeting abstracts (11.7%) in all collected literature (Table  1). In overall 21 countries listed, USA, Japan, Italy and Germany were in order the largest four producers about JCV despite 62 countries included ( Table 2). The overall 1245 institutes were mentioned to investigate JCV, among which Temple University of USA published 128 papers and ranked the top, followed by University of Tokyo, and National Institute of Neurological Disorder and Stroke subsequently. Fourteen of 21 (66.7%) core institutes come from USA with three core institutes in Italy (Table  3). Such 33 authors as Khalili K, Yogo Y and etc produced more than 20 documents in spite of all 4856 authors involved. There were 9 highly-produced scientists from Temple University and 6 from University of Tokyo, Japan, and 4 from National Institute of Neurological Disorder and Stroke, USA respectively ( Table 4). As shown in Table  5, Journal of Neurovirology, Journal of Virology, Virology, Journal of Medical Virology, Journal of General Virology and so forth published more than 15 papers and were considered as the core journals although there existed JCV papers in 395 journals. These source journals mainly include the field of Virology, Neurosciences, Clinical Neurology, Immunology, Pathology, Oncology and so on ( Table 6).

Highly-cited authors, journals and papers
The papers of 10 highly-cited authors (totally 1577 producers) like Padgett BL and Berger JR were cited for more than 400 times, among whom 8 persons come from USA ( Table 7). The 10 highly-cited journals (totally 3584 journals) were selected due to more than 1179 citation times, including 3 for Virology and 4 for comprehensive journals ( Table 8). Journal of Virology and Journal of Neurovirology respectively ranked to the first two among 404 cited journals ( Table 9). The highly-cited papers were chronologically analyzed and grouped into two stages: (1) 1971-1984: discovery and isolation of JCV in PML disease and (2) 1985-present: clarification of JCV genomic DNA sequence and its correlation with diseases (Table 9).

Co-citation analysis of highly-cited articles
In the overall references about JCV, most highly-cited articles were published before 1999 with more than 90 citation times and came from major journals, such as Journal of Infectious Disease, Journal of Virology, Science, New Temporal distribution of production about JCV investigation Figure 1 Temporal distribution of production about JCV investigation.

Co-occurrence analysis of highly-frequent MeSH/ subheading words
The 48 highly-frequent MeSH/subheading words generally existed for more than 25 times in the papers about JCV (Table 10)

Discussion
A systematic view of JCV papers to discern the distinct set of core researchers, institutional affiliations and corresponding countries helps us to gain a deeper understand-   ing of approaches to JCV. As shown in our bibliometric analysis, the document type of JCV was original articles (1307/1785) and many data (209/1785) had been communicated in meeting activities. The review part occupies 6.9% (123/1785). The results indicated that JCV research was very active and interesting many investigators, and some scientists had begun to summary the achievement of JCV. Among 33 core authors, 19 persons come from Temple University, University of Tokyo, and National Institute of Neurological Disorder and Stroke, which ranked the top in the highly-produced institutes. Additionally, 14 (66.7%) core institutes of USA also focused on the investigation of JCV and USA was the first top producer of JCV papers until now. JVC was discovered in 1971 by American Padgett and named after the two initials of a patient with progressive multifocal leukoencephalopathy (PML). It was suggested that the JCV investigation originated from USA, which consequently became the top source information for JCV. It is rational and helpful for the scientists to tack the core authors and institutes to grasp the frontier of this field, open new projects and submit their distinguished work.
The list of top-cited articles about JCV identified the authors, articles and topics that reflected history and development of this specialty. Among highly-cited authors, Padgett is the discoverer for JCV in PML and published the first article in Lancet. The paper has been cited for 366 times and ranks the top in the highly-cited ones. His outstanding was also due to another article in Journal of Infectious disease, which described the detection of the antibody against JCV in PML. Therefore, it is explanatory for Padgett BL to be the most highly cited. These top-cited articles produced valuable information for readers, but also tell us some historical achievement in some field. According to these highly-cited papers, the research about JCV was chronologically separated into beginning and developing stages including discovery and isolation of JCV in PML disease, and clarification of JCV genomic DNA sequence and its relationship with diseases by polymerase chain reaction (PCR) respectively.   Most of highly-cited journals almost come from Virology, Neurology, and comprehensive journals, indicating JCV paper mainly absorbs frontier knowledge from these fields. Oncogene, Journal of Biological Chemistry, and International Journal of Cancer also become the highlycited journal (data not shown), indicating the attempts of JCV study to combine with Molecular Biology and Oncology. This data also demonstrate the close link of JCV with these specialties. In the overall references of JCV papers, most highly-cited articles were published in Proceeding of National Academy and Science, USA and New England Journal of Medicine, indicating that these famous-brand journals highlight the investigation of JCV and emphasized the scientific achievement of JCV. Therefore, investigators of JCV not only read the journals of Virology, but also emphasized the novel findings of JCV published in other journals with high impact factor.
Methodologically, the cluster techniques include text segmentation, summary extraction, feature selection, term association, cluster generation, topic identification, and information mapping [19]. Clustering algorithms prominently used in co-citation analysis has proved very useful in revealing research streams in some discipline [20][21][22][23].
Here, we carried out empirical co-citation analysis to map the network of highly-cited papers about JCV. Our data indicated that these top highly-cited articles were grouped into such 4 aspects as the correlation between JC virus and tumors, causal correlation of JCV with PML, polyoma virus infection and its related diseases in renal-allograft recipients, detection of JCV antibody, oncogene and its encoding protein, and genetics and molecular biology of JCV. These findings might not only enrich the knowledge of students and specialists about the development's history of JCV research, but also open new bursts of scientific investigation.
Co-occurrence has been considered as carriers of meaning across different domains in studies of science. Based on this principle, we performed co-occurrence cluster analysis using Pubmed MeSH/subheading words to construct a new tie between two words depending on the co-existing frequencies [24]. Consequently, most of the top highlyfrequent MeSH/subheading words are mainly classified into C02 subcategory of MeSH (Viral Disease) and B04 subcategory (Viruses). The analytic data showed that the contents of published papers about JCV included JCV isolation and detection, as well as JCV and virus infectious diseases like PML or tumors. It was suggested that JCV investigation centered on its isolation, its pathogenicity of    [6][7][8][9]. The evidence provided enough reasons for the following data: (1) The core and highly-cited journals mainly contained the field of virology, neurology and oncology; (2) The highly-cited articles and highly-frequent MeSH/subheading also mentioned the research contents of JCV, PML and tumors.
Recently, the further clarification of JCV genetics promoted the scientists to detect its genomic existence in tumors or make the transgenic mice to study the oncogenic role of JCV. Our group had examined the JCV targeting T antigen using nested-PCR, real-time PCR, in situ PCR, in situ hybridization, and immunohistochemistry [6][7][8][9]. It was found that positive rate and copies of JCV were higher in gastric, lung and tongue carcinomas than corresponding normal tissues, indicating its oncogenic role in epithelial carcinogenesis. Furthermore, JCV T antigen can serve as helicase, and polymerase, orchestrate the assembly and function of cellular proteins, disrupt the sig-  Co-citation cluster analysis of highly-cited references Figure 2 Co-citation cluster analysis of highly-cited references.
Co-occurrence cluster analysis of the highly-frequent MeSH/subheading words Figure 3 Co-occurrence cluster analysis of the highly-frequent MeSH/subheading words. nal pathways of p53, Rb and Wnt signaling pathway, and should be considered as a viral oncogene [2][3][4]. Therefore, we are establishing a transgenic model of gastric neoplasia induced by JCV T antigen, which will help to verify the oncogenic role of JCV in gastric carcinoma and provide a novel tool to investigate gastric carcinomas. It was hypothesized that application of JCV T antigen in tumor transgenic animal model would be a novel and hot project in the future.

Conclusion
In this study, we successfully performed the scientometric analysis of JCV literature. Our data indicated that JCV mainly centered on PML and tumors. The bibliometric study assists researchers to know the history and frontier of JCV investigation, guide them to open new projects and submit the distinguished work. These cluster methods employed in this investigation can clarify the history, status and development in the field of JCV.