24 August 2010 by Jason Hoyt

Recently I was sitting at café Tryst in Washington D.C. along with Mendeley’s co-founders and a coffee house full of hipsters, Georgetown students, tourists, and a few politicos. In retrospect, perhaps this was the only setting possible to be discussing the future of research and our small part in it. We were surrounded by the common citizens who depend on the outputs of science, but had little to no power in changing its course for their benefit. More pointedly, they had no clue that science is being held back by the very people who are supposed to be advancing it.

We came to the conclusion that technology is finally at a point that if we don’t use it now, then we are holding back the progress of science. And what exactly are we to use technology on? Open science/data/access.

By our own hands

To understand how we (“we” meaning the research community) got here, we have to first briefly remember how the dissemination of science came to be the way it is. The dominant mode of communicating research results is through peer-reviewed literature. This dates back to more than 300 years ago when scholarly societies formed and needed a way to present their findings. This has evolved into the well-known journal system that we have today. This model for communicating results has served science quite well. I have no doubt that it will continue to do so, but given the technical abilities we have today, is it still the best way forward 100% of the time?

Let’s start with an ideal situation for the progress of science. For advancement not to be held in check there are arguably two requirements:

1) Research results (i.e. raw data) would need to be made available the moment it is created.

2) The write up of those results, as long as they are scientifically sound, are published immediately and made accessible to all. (For more insight, the Panton Principles go even further.)

For the past three centuries those two requirements could not be satisfied due to a lack of technology, but why are we not fulfilling them now? We have the tech (i.e. Internet), so something else is going on. It is easy to place the blame on publishers trying to protect business models, but that would be misplaced judgment. The business models for Open Access and Open Data are there, trust me (or go ask PLoS). Publishers are already experimenting with the models, but they are waiting for something before going full force. They are waiting for us, the researchers.

We could choose to publish in only Open Access. We could choose to reward tenure for Open Data. We could choose to only reward publications or data that are proven to be reused and make either a marked economic or research impact. Instead, we choose to follow a model that promotes prestige as the primary objective as outlined by Cameron Neylon in “Practical steps toward open science.”

History, I suspect, will look upon our society and practice with regards to scientific knowledge-share as we similarly do now with the Dark Ages. Each time we hold back data or publish research that isn’t immediately open to all, we have chosen to be on the wrong side of history.

Changing behavior

One shining example of what Open Data surrounding Alzheimer’s research can accomplish was recently profiled in the NYTimes. Sadly, this is more the exception than the rule though.

Not to be one that criticizes without a plan, what can be done then? We could wait for policy changes from the top, but that is neither a timely, nor guaranteed solution. A growing feeling amongst some in the community is that the rise of impact factors and author metrics, such as the h-index, have left most younger researchers without a choice in the matter. Either you publish in a high impact journal, which is often closed access, or you don’t get tenure. That, in turn, results in an increasing time lag in getting research published. Researchers resubmit to “lower tiered” journals only after being rejected by the top, a process that takes months at best and years at worst. It is not uncommon to see research that is already two years old before it sees the light of day. This cannot be good for the progress of science.

“Article-level metrics” (ALM) is one step toward weaning the addiction that we have with journal impact factors. Here, we disassociate the significance of the article from the prestige of the journal that it is packaged in. PLoS has done an excellent job in advancing this new trend. In theory, ALM could reduce competition for top-tiered journals and hence promote faster communication of primary research. ALM alone does not guarantee increased Open Access or Open Data though. For that, we need more.

One way to promote the sharing of knowledge, and thus be on the right side of history, is through reputation metrics. Unlike previous measurements for impact though, this would be designed to reward researchers who contribute to Open Data and science online. Without that principle, impact factors and author indices are blocking, not helping research. Some would go as far to say they should be abandoned. To be successful then, certain conditions would have to be satisfied:

1) Avoid obscurity. Reputation metrics cannot be hidden in the closet with last season’s wardrobe. The obvious choice to promote is through search and recommendation engines.

And that leads to the second and perhaps most controversial condition…

2) Design the system to reward participation and penalize omission in Open Science. It must bring researchers who are either hesitant, ignorant, or opposed to participating online to have no other choice. The antithesis of current impact measures.

Platforms such as Mendeley can have a hand in meeting both the first and second conditions. Mendeley is more than just a reference manager, it is also a system that aggregates the metadata of millions of documents and provides authors the opportunity to promote their works. We are now taking this one step further having created the beginnings of an author analytics platform.

Those who promote their works will be rewarded in way of discovery either via our own search engine or through their own researcher profiles. Starting this Wednesday, researcher profiles will show statistics based upon the self-authored works placed into the “My Publications” folder. A “Publication Statistics” snapshot appears on the right side of the profile and shows readership, publication page views, and downloads.

open access  Dear researcher, which side of history will you be on?

We are not naïve

A publication snapshot is nothing earth shattering and it has a long way to grow. It has many valid concerns, such as gaming that must be accounted for. Fundamentally though, this is not just another metric to base the next grant or tenure selection upon, although eventually it could/should be used that way. To the point, this is about being on the right side of history in promoting Open Science.

Those researchers who openly and quickly publish research or data for download will be rewarded.* Those who do not will adapt or risk falling into obscurity. As we wait for policy changes to be enacted by the top, we must act at the bottom to encourage a behavioral change in how we share our knowledge. I think we owe that to the students, hipsters, and citizens in coffee shops everywhere.

Which side of history will you be on?

*Exactly how that is rewarded through our search and recommendation engines has yet to be implemented, as we need to balance relevance as much as reputation.

Jason Hoyt, PhD is Chief Scientist and VP of R&D
Follow on twitter

Tags: , , , ,

18 Responses to “Dear researcher, which side of history will you be on?”

  1. Sue Says:

    Peer review?? That aspect is missing from your post. Peer review is one thing that improves the scientific process BUT it also slows it down. This is time well spent. If you have had the opportunity to review peer manuscripts, you know that papers are incomplete without that step.

  2. Jenny Reiswig Says:

    The thing that concerns me as a cranky librarian is that we may not KNOW which side of history you were on three hundred years from now. One of the benefits of old school publishing is that we do know, now in 2010, what people were up to before. In our library, you can go back and read the Philosophical Transactions – online – right back to 1665. But you can’t read the data attached to computer programming journals from the ’80s because they were supplied on 5-1/4″ floppy drives and we no longer have equipment to read them, much less know whether they are still sound. (Should libraries have been thinking about this in the early 80s and making sure we negotiated the rights to forward-migrate that content? Yeah, probably, but like the rest of the world we didn’t see the web coming!) I love the principles of open access, rapid dissemination of real data and all that goes with it for people working today. But is any of that going to be findable and usable 30 years from now, never mind 300?

  3. Mr. Gunn Says:

    Sue – This focus on more rapid, open publishing isn’t to suggest we leave peer review out of the assessment of literature. Repositories such as PLoS ONE and Arxiv.org have shown that it’s more important to have a paper available to as many eyes as possible as soon as possible, vetted by peers for technical integrity than it is to have a paper assessed for subjective impact by the editors of a journal.

  4. Steve Says:

    Jenny: If the information stored on the internet is inaccessible in 30 years, we’re going to have *much* bigger problems as a society than being able to look up the information in a library :)

    Your concern is definitely a valid one, but I would be less concerned with hardware barriers (as noted with the floppy disc issue from the 80’s) and more concerned about software/document formats such as PDF, which pose a much greater risk of becoming outdated.

  5. William Says:

    Jenny – I think your concern about long-term preservation is a valid one and something that all research repositories really need to think long and hard about. I think it’s also important to talk to experts, such as librarians, about these challenges. What’s the state of the art in long-term digital preservation these days?

    It seems to me like there’s two issues: preserving the content, and making sure it can still be found. Your example above has to do with content preservation aka bit rot (something that projects like LOCKSS are working on), but there’s also the issue of link rot, something that’s being addressed by Webcite and others (though I still have concerns about Webcite being centralized, vulnerable to link redirection, etc).

    What would the ideal system look like?

  6. Dario Says:

    own publication stats–that’s what I was looking forward to. I was planning to add a similar feature in my demo app, good to hear you were faster ;)

  7. BWG Says:

    I would say that a more important question is what side of history will Mendeley be on. Calling for open access is certainly noble, and for anyone outside the publishing mainstream, a no-lose proposition. But it seems to my skeptical librarian and researcher brain that championing open access serves mostly to deflect attention from Mendeley’s own commercial aspirations.

    In other words, I’m worried that Mendeley is already on the wrong side of history when it comes to its core software market by moving backward to the world of proprietary, commercial systems. Nonetheless, I’m really impressed by the rapidly improving quality of Mendeley’s software and web design, and the open API is a great step in the right direction. But I’m still deeply troubled by the proprietary nature of the software and the opaque commercial aspirations.

    I read that Mendeley received funding of 2M USD two years ago, but that money must long be gone given the quantity and quality of your employees. And selling a few gigabytes of storage can’t possibly even keep the lights on. So it’s only natural even for users (or maybe especially for users) to remain very worried about Mendeley’s future. Where’s the money coming from?

  8. Jason Hoyt Says:

    Hi “BWG” –

    I get asked a lot about open sourcing Mendeley when I go to speaking events. I always state that we are open to the possibility, but then ask how many people know how to type a URL verus how many know how to program in C++? That’s why we went with the Open API first instead of open sourcing the desktop software. If you can type a URL, which is what the API is based upon, then you can build on top of Mendeley. You don’t need to know how to program.

    As for Mendeley’s future, that’s a valid concern. From day one, users have had the ability to export their libraries into many standard formats that other reference management programs accept. Getting data in is just as important as getting data out. With the Open API, users and developers have another method of getting access to their data as well. Mendeley also works completely offline and its functionality as a reference manager is still in tact regardless of Mendeley’s future. That’s a pretty good deal for something that is free.

    Would also encourage everyone to take a look at a new JISC funded collaboration between ourselves, Symplectic, and University of Cambridge. In this grant, we are connecting Mendeley to institutional repositories. Together, we are trying to improve the flow and communication of research such that it is more accessible to all. The technical outputs of this research grant will be open sourced. http://jisc-dura.blogspot.com/

    It also isn’t lost on us that your point is to lead by example. I think we are doing that in a very intelligent way, but understand everyone has their own vision of how to best improve research. There obviously is no one right way to accomplish the shared goal we all have for science and humanities research.

    Hope that helps.

    Jason

  9. saroele Says:

    I want to react on 2 different things:
    1. about the reputation metrics: I think open science should not be a goal on itself. The goal should be to do relevant and usefull science (and ideally to make the results open and accessible for everyone, asap). Therefore, I’m afraid of metrics that reward the publication frequency or speed.

    Every metric system will influence the way people publish. If the speed and opennes of results is rewarded more than the quality, this will lead to more publications of lower quality. I prefer fewer publications of higher quality. Even if I have to wait a few months for a good peer reviewed article. It’s worth the time.

    2. Jason’s answer on BWG’s concern is only covering part of the question. I think the answer is satisfying, but the question that remains unanswered is how Mendeley will keep on paying it’s employees and how it will pay back it’s investors. The fact that the ‘business model’ remains hidden from the users will keep on creating scepcis among them. So I join those users who ask opennes from Mendeley, not only on scientific publications, but on it’s own business model in the first place.

  10. Victor Says:

    Hi Saroele,

    thanks for your comments.

    Regarding 1: I agree that whatever metrics are used should reward quality over anything else – but I don’t think that quality and speed are mutually exclusive. I also don’t think that the current peer review model is the only way to achieve that quality. Other people have written about this more extensively and more eloquently than I ever could, so I’ll just point to this blog post from Michael Nielsen: http://michaelnielsen.org/blog/the-future-of-science-2/

    Regarding 2: We’ve never tried to keep the business model “hidden” – we’ve answered the question whenever somebody asked. See, for example, this interview from February 2009 (http://my.biotechlife.net/2009/02/24/interview-with-victor-henning-from-mendeley/), this TechCrunch article from June 2010 (http://eu.techcrunch.com/2010/06/16/mendeley-the-last-fm-of-research-rolls-out-premium-packages-to-steady-customer-nerves/), our Premium Package page here (http://www.mendeley.com/upgrade/), or this talk from April 2009 where I speak about institutional/enterprise/content distribution business models (http://en.sevenload.com/shows/The-Next-Web-Conference/episodes/uxjTNyC-Mendeley-TheNextWeb-Conference).

    I hope this answers your questions.

    Best,
    Victor

  11. saroele Says:

    Dear Viktor,
    thanks for your answer and the links. It clarifies indeed most of my concerns.

    I’m especially happy to hear and read that all what’s free today will always stay free. This gives me the confidence that I will always be able to use Mendeley the way I do today, whatever plans you have to generate sufficient income to cover your costs.

    You will understand that I was worried: I used last.fm and loved it. Than I got very disappointed when I had to subscribe in order to continue listening to the radio stations… I’m glad to hear that Mendeley will never do this…

    Keep up the good work,
    Roel

  12. Lambert Heller Says:

    Dear Victor, dear other people at Mendeley,
    as I already mentioned at our discussion at BibCamp³ in Hannover: I’m afraid of what I called the “facebookization” of user data inside a Mendeley silo. Keeping the user data, at least in its entirety, under control, seems to be part of Mendeley’s business model. This is somewhat out of sync with what most people understand by the term “open data”, isn’t it? To me, “open data” is about general availability of user generated data, or at least the part of data that was meant to be public. For sure, one can argue that this business model of keeping user data under control enables you to bring up innovative new uses of the data. Then again, this business model constrains anybody else from doing this innovation based on the data. Which leads us the roots of the whole idea of open data.
    Best regards,
    Lambert

  13. Mr. Gunn Says:

    Lambert – I’m a little unclear on exactly what you feel is missing from the data you can freely and openly get via Mendeley’s API. Would you elaborate on what else Mendeley should be exposing?

  14. Roman Shapovalov Says:

    There are more purposes of the current publishing system except of dissemination. It is not trivial to cover them all in one moment with the “open science” system. Please look at this post:
    http://scholarlykitchen.sspnet.org/2010/01/04/why-hasnt-scientific-publishing-been-disrupted-already/

    Probably, validation is the hardest part, as was mentioned above.

    Another thing: you should be able to provide a gradual transfer, because it is still required to publish in journals to get tenure. If someone wants to join the open model, she can lose her reputation, so both ways should be available..

  15. Lambert Heller Says:

    Where can I download Mendeley’s user data as a complete data set? As we all know, that’s a necessarry presondition to do several interesting things with the data. Just think SNA. As Peter Murray Rust points out, “If however the linked Open data are all going to be through paywalls, portals, query engines then we regress into the feudal information possession of the past.” (http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=2576)

  16. Lambert Heller Says:

    I wanted to say \necessary precondition\

  17. Peter Murray-Rust Says:

    >>hat you feel is missing from the data you can freely and openly get via Mendeley’s API. Would you elaborate on what else Mendeley should be exposing?

    I have commented on this in

    http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=2576

    http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=2577

    Where I make the point that being freely available is not the same as Open. Open requires that the community has control of the future of the system – the means of production, the definition of the information, and the formal contract that it will be copied and re-used without permission and so potentially available for ever.

    If you can satisfy us that:
    * that the data is copiable and reusable without permission
    * that the whole data is accessible and can be iterated over
    then the data is Open
    – – and if the software and means of information generation is additionally Open, then you have an Open Service

    From what I have read so far – and I hope I am wrong – there is no element of Openness about your data or API – simply a free release of part of the system for a restricted period of time

  18. Jan Edelmann Says:

    The answer is P2P, new business models for publishers, and shooting down researchers’ self-protectionism.

    You didn’t mention that there is a huge language barrier between English and countries where research has been done mostly in local language. However, that is not the real problem. Google helps in translation, but does not grant database access to all those poor researchers.

    The reason for existing system might be that it offers rich research institutes / universities competitive advantage. Why should those researchers who have access to all possible databases make it any easier to others? (And who might be brighter and faster)

    I suggest that peer reviewed articles will be completely Leaked out – P2P is the first solution comes to my mind.