One million articles uploaded to Mendeley!

articles-uploadedWe passed a landmark today: As of 16.50h GMT, our users have uploaded one million articles to their Mendeley accounts! Including the cited references which Mendeley also extracts from research papers, we now have over 14 million metadata sets in our database. Even we were surprised by the speed in which this has happened!

For the record, the millionth article added to our database was “The somatic marker hypothesis: A critical evaluation by Dunn et al. (2006) in Neuroscience and Biobehavioral Reviews – as luck would have it, that’s a topic related to my personal research on the role of emotions in decision making! The closest publication by one of our users, Joaquin Rivera, was added at 17.01h – a maths paper titled “On the exact multiplicity of solutions for boundary-value problems via computing the direction of bifurcations”, available for download on Joaquin’s Mendeley profile.

90% of these one million articles have been uploaded since January 2009, and our database is currently doubling in size every 6 weeks. For comparison, venerable PubMed – the largest database of biomedical literature – contains 18,813,527 records as of today. Assuming we managed to keep up our growth, we could surpass the size of the PubMed database within the next 6 months!

Roughly 43% of the papers in our database are in the biological and medical sciences (even though only about 27% of users are working in these academic disciplines). Computer and information science comes in second with roughly 11% of all papers, followed by engineering with 7%, and chemistry, physics, psychology and other social sciences with 4-5% each.

As we’ve said before on this blog and elsewhere (e.g. see my talk at the Plugg Conference), we’re not hoarding all that data just because we can, no Sir! Our vision is to create the largest open, interdisciplinary and ontological database of research – as crazy as that sounds, remember that Last.fm (whose former chairman and COO are our co-founders and investors) pulled it off in the space of music within just three years, using the same user data-aggregation model that Mendeley is built on.

We’ve already begun to report real-time “usage-based” research trends – a nice discussion of Mendeley statistics showing the most-read journal in the biological sciences can be found here (we’ll be writing more about this soon!). Analogous to Last.fm, we will provide APIs to let others mash up the research statistics we’re generating. Moreover, our database will be the basis for our upcoming collaborative filtering recommendation engine: Based on the articles in your Mendeley library, we will be able to tell you about articles you don’t know yet, but which have been read and recommended by researchers with similar interests. You can read more about these plans in our recent IEEE e-Science paper.

A big thank you to our wonderful users who have been helping us improve Mendeley with their constant feedback. After celebrating the millionth article upload tonight, we’ll get back to work on our next two releases, packed full with exciting new features!

6 thoughts on “One million articles uploaded to Mendeley!

  1. Way to go Victor for the millionth uploaded article. I am pretty sure that Mendeley is the future of academic research – this will fill the enormous void we researchers had always thought about for years but had no solution in sight. Now we have no reason to stick with Endnote and PDFs cluttered in our harddisks when web 2.0 for academia is here.
    I can hopefully predict that in 2 years time Mendeley will be indispensable for anyone doing academic work/research. Now waiting for universities and their libraries to start taking notice.

  2. Congrats, boys,
    that’s a pretty good achievement! keep going!

    Harry
    (father of one of this brilliant founders 🙂 )

  3. This is good progress. However, to put damper on this, most of the 14m references are probably junk. I know that about 13000 come from scanning the pdfs in my collection, and virtually all of them have significant errors in the field. Sorting those 14m out and improving the quality is quite a challenge.

    Assuming you can make a unique hash for each pdf, then the best way would be to add a mechanism for people to indicate (vote) when they think the core metadata fields (author, title etc) are correctly completed (and perhaps also to indicate when they know it is wrong). Assuming a number of people have the same pdf and a subset of them mark ‘their’ version as correct, then if the metadata is identical (and independently marked correct) there is a good possibility that they really are correct. Then you add a mechanism for other people to update theirs to the correct ones (eg when the metadata for a reference is different from the community-voted ‘correct version’, the reference is coloured differently or otherwise stands out and they get a prompt like ‘do you want to update this reference’).

    Shouldn’t be too hard to do, especially using hashes, but probably good to get working before that 14m doubles too many more times or there will be a lot of dross.

    You can use my idea if you gpl your code 😉

    • Hi Malcolm,

      indeed, the quality of the 14m references tends to be lower than those of the 1m articles. We’re in the process of matching references to each other and to articles in the database, so as to clean up the database.

      Regarding your idea, it’s a very good one, thank you! But I unfortunately have to rob you of your negotiation leverage and tell you that it’s already described in our FAQ: http://www.mendeley.com/faq#automatic-metadata-extraction 😉

      Nonetheless, we never ruled out GPLing our code, so it may happen in the future…

  4. Well done, guys! I can’t believe I could live without Mendeley before! And I guess I am not the only one who says that, hence the 1 million articles milestone…. 😉

Comments are closed.