A Reboot of the Legendary Physics Site ArXiv Could Shape Open Science

The open source physics site arXiv is turning 25, and it's going to get a makeover. But what does that mean for its principles of data transparency?
This image may contain Text
arxiv/WIRED

In the early days of the Internet, scientists erected their own online network, a digital utopia that still stands today. Here, astronomers, physicists, mathematicians, computational biologists, and computer scientists come together to discuss heady, cosmic topics. They exchange knowledge—without exchanging money. It’s called arXiv, and it’s where researchers go to post their ideas for discussion, sharing PDFs of their scientific articles before they’re locked behind a journal’s paywall.

ArXiv is about to celebrate its 25th birthday. It can now officially rent a car without paying extra, and that means it has to grow up and start thinking about its future. The repository still excels at its primary goal---to quickly and freely disseminate papers about black holes, baryons, and Bayesian statistics---but it runs on old legacy code. “Under the hood, the service is facing significant pressures,” says Oya Rieger, arXiv’s program director.

And so last month the organizers put out a survey, asking users what the site should look like in adulthood. Perhaps most controversially, they asked whether arXiv should change its quality control and allow readers to comment on and annotate papers, adopting some of social media's flashy features. The redesign of one of the earliest open-access repositories could have significant impacts—positive and negative—on the way scientists think about transparency and the future of scientific publishing in the digital age. In other words: prime nerd fight fodder.

To get a sense of just how important arXiv is, consider these stats. In 2014, the site passed its million-paper mark. It received 105,000 submissions in 2015 alone, and last year boasted over 139 million downloads. It has become the go-to place to find out what’s going on right now---in the fields it covers, and in the workscapes of individual scientists. “When I give seminars, I give the arXiv numbers for my papers,” says David Hogg, an astronomer at New York University. “Why? Because I know that my arXiv papers are available to any audience member, no matter what their institutional affiliation or library support.”

That open access shapes the way scientific conversations happen, and speeds them up. A theorist can fire off a rebuttal (or “response,” if we’re being polite) to their rival’s poorly defended string-theory idea without having to slog through the traditional publication process. “In a lot of areas of physics, arXiv has become the main publication venue,” says Daniel Gottesman of the Perimeter Institute for Theoretical Physics in Waterloo, Canada, who is a moderator for the site and chair of its physics advisory committee. “People still will submit to journals, but increasingly that is only to get a stamp of approval from peer review.” Posting to the open-access arXiv is rapidly becoming the rule, not the exception.

Growing Pains

ArXiv knows how important it’s become to scientists. So its overseers want it to evolve to match their needs. The service began bare-bones: In the early ‘90s, astrophysicist Joanne Cohn of the University of California, Berkeley, started sending around electronic copies of physics papers---“preprints,” or versions that had not yet appeared in official publications. But scientists’ inboxes soon became clogged. To reduce the burden of personal storage, physicist Paul Ginsparg, then of Los Alamos National Laboratory in New Mexico, created an electronic cubbyhole where all the papers could live together. Anyone could reach into remotely.

That’s still how the site functions today, for the most part. So to figure out how to give a lift to the site’s back-end, administrators at the Cornell University Library held a technical infrastructure workshop and solicited responses for three weeks, with a banner at the top of the site not unlike Wikipedia’s “PLEASE DONATE” notice. The campaign worked: The team received 35,000 survey responses from all around the world.

What those users say about topics like search capabilities, the submission process, and moderation will guide arXiv into the future---along with other sites that strive to make science free, fast, and available to the people who pay for it (you). As one of the first open access science sites, this redesign may influence the paths of its younger siblings, like biologists’ bioRxiv---whether they follow the same route or rebel against it.

The fight over the future of scientific publishing is largely about balancing two things: speed and accuracy. The traditional model of peer-reviewed science is (theoretically) more likely to catch a cockamamie idea before it makes it into press, but that process keeps science unavailable to the public during the long process of review, slowing the overall pace of discovery. Preprint sites like arXiv bypass most of that review process in favor of speed and transparency.

Right now, arXiv moderates its entries, but it’s a pretty hands-off process. If a user submits a paper about quantum loop gravity, experts in the field get an email with a summary of the idea, author information, and other metadata. “If everything is OK, the moderator need do nothing more and the paper will appear on schedule,” says Gottesman. If the moderator sees something obviously incorrect, the paper is held up, and someone will actually read it and suggest edits if necessary. More than 150 experts are on call as moderators.

Because the site is relatively unmoderated, work that is more speculative and earlier-stage appears in the feed alongside meat-and-potatoes papers cataloging new exoplanet discoveries. That’s by design. The come-as-you-will policy makes it easier for scientists to publish left-field work and helps them get feedback and exposure before their studies are ready for traditional publication. And many users think it should stay that way, with moderators only throwing out submissions that make zero sense or are plagiarized.

Leave editorial judgments to the journals, Hogg suggests. “[ArXiv] should focus on what it does well, which is to archive, in an editorially neutral way, science that is pre-publication or in press,” he says. That neutrality is essential to the site’s integrity: One complaint about the traditional publication system is that reviewers and journals may preferentially pick papers that adhere to a standard story, or come from majority groups or fancy institutions (although many journals have a double-blind review policy, so no one knows who’s who).

Gottesman---a moderator, recall---is also against additional regulation, and not just because it would mean more work for him. “I think raising the moderation bar much more would risk cutting out too many submissions that are potentially of interest to someone,” he says. “A more interesting question is whether we should reduce the amount of moderation.”

Metrics of Success

The survey also asked whether users would like arXiv to develop rating or commenting systems. “Currently, we have no plans to go to this direction,” says Rieger. “So the goal was to seek input to understand our users' opinions and needs.”

Still, even the suggestion of incorporating hot-or-not, social-media-style features upset some users. David Abergel of the Nordic Institute for Theoretical Physics notes that other websites like ResearchGate already allow comments---and maybe they should stay there. “I suspect that any good things that come out of it would be swamped by negative unintended consequences,” he says, especially if anyone could jump into the fray. “If the arXiv comments were open to the general public, then it's easy to think that they could become a place for every crackpot with an agenda to post endlessly and drown out more constructive comments,” he continues. “Can you imagine if arXiv had a climate science section?”

Izabella Laba, a mathematician at the University of British Columbia, worries about the same thing, but from the perspective of women and minorities. If you’ve been on the Internet for more than five minutes and can read words, you know what she’s talking about. “Even if the obvious trolls are discounted, the responses to a blog post, for example, can still be very different depending on whether the author is male or female,” she says. “There is overwhelming evidence of that. Introducing Internet comments into professional evaluations would compound every known bias against underrepresented groups.”

She does not want those potentially biased (or aggressive or whatever) comments connected to her professional work. Having them on arXiv---often the first place colleagues and employers go when they Google a name---would almost like having them stapled to her resume. “If someone says something obnoxious about me on Reddit, then, well, it's Reddit,” she says. “But having the same thing posted on the arXiv would be very different.”

Laba says arXiv should focus on its core mission. It’s the job of the journals to moderate, she says, and better for third-party sites to handle the discussion.

Abergel agrees. “I think arXiv should stick to what it's good at, which is being an open repository for almost-publication-ready scientific work,” he says. “With their limited resources, I think it's much more important that they widen the areas of science that they serve rather than adding more 'features.’”

The survey results are not yet available, so it’s unclear what the other 34,998 think. But rest assured that if they don’t feel their opinions were heard, they will let us know, in a fast, free, and open way.