Web usage data outline map of knowledge
Analysis offers fresh perspective on role of humanities and social sciences.
When users click from one page to another while looking through online scientific journals, they generate a chain of connections between things they think belong together. Now a billion such 'clickstream events' have been analysed by researchers to map these connections on a grand scale.
The work provides a fascinating snapshot of the web of interconnections between disciplines, which some data-mining experts believe reveals the degree to which work that is not often cited — including work in the social sciences and humanities — is widely consulted and can form bridges between scientific disciplines.
The creators of the maps argue that web-usage metrics give an alternative and more up-to-date view of science than existing maps and indicators, which are largely based on out-of-date citation data. Other researchers agree that the new maps, published this week (J. Bollen et al. PLoS ONE 4, e4803; 2009), are impressive in approach, but they disagree on their significance.
For the study, Johan Bollen and his colleagues at the Los Alamos National Laboratory in New Mexico negotiated access to anonymized server log data covering 35,000 journals from 2006 to 2007. The data came from the University of Texas, the California State University system, and major science journal gateways including Thomson Reuters' Web of Science and Elsevier's Scopus database.
Although data on usage rather than citations have been used in some past studies, the sheer scale of the new study makes it stand out, says Henk Moed, a bibliometry expert at the Centre for Science and Technology Studies at the University of Leiden in the Netherlands. "The paper represents an important step forward."
The data reveal how often users looking at an article in journal A moved on to an article in journal B, and on to one in journal C, and so on, during a browser session. By aggregating hundreds of millions of such relationships, the researchers could use network-visualization algorithms to create maps based on the 'distances' computed between journals and disciplines.
The broad structure of the maps is similar to those created using citation data: a network of clusters in different fields, within which journals have strong connections with one another but fewer links to other clusters. A striking difference in the usage maps is that journals in the humanities and social sciences figure much more prominently than in citation-based maps. Along with some journals in other fields, such as psychology and the environment, they also emerge as gateways between clusters that are otherwise poorly connected, and so act as key bridges between disciplines. The difference partly arises because Bollen's study covers a wider literature than the citation databases, which are biased towards natural sciences journals.
The journal ranking generated from the usage maps includes not just the usual suspects such as _Nature_, _Science_ and _Physical Review B_, but also the _Journal of Advanced Nursing_ and _Environmental Health Perspectives_ That reflects a key difference between citation- and usage-based maps and metrics. The former reflect citations by researchers who publish, but ignore the impact of papers on large swathes of the scientific and medical community who read and apply the literature in medical, commercial or policy practice but who rarely or never publish.
"Citation data may undervalue papers written in practitioner-based fields, such as reviews or syntheses in clinical medical journals that are widely read by practising physicians but not cited proportionally," says Carl Bergstrom of the University of Washington in Seattle. "By including practitioners we capture a much wider sample of the scholarly community," adds Bollen.
Usage maps are also more up to date than citation ones because the inherent delay in publication means it takes at least two years before a paper will start to gather citations in sufficient numbers to be meaningful. "The most exciting aspect is that they give us a different time-slice of the process of scientific discovery," says Bergstrom.
Others are less impressed. Anthony van Raan, director of the Leiden Centre for Science and Technology Studies, argues that this more current view may in fact represent today's "fashions", rather than trends that will endure. Faster online publishing means that papers are being cited faster than before, he argues. He also questions the central position of the social sciences in the maps, and various aspects of the data-analysis techniques used. Other experts say they have similar concerns, but are holding off from passing judgement until they can discuss the methodology with the paper's authors.
But Bergstrom argues that usage and citation data each provide different but useful information on the impact of papers and journals. "Usage data tell us where the net was cast; citation data tell us where the fish were caught," he says. "If you want to understand the human enterprise of fishing, you had better know about both."