Optimizing research documents: The results of a study of 80M research papers at Mendeley.

At Mendeley, we are always looking for ways to make the research process more efficient. We believe in the power of big data and creative analysis to change how research is done in big and trivial ways. For example, with the massive amount of documents in our database, we have the power to analyze successful publications vs. less successful publications based on the characteristics of the documents, and then incorporate this knowledge into our product to help researchers work better. In solidarity with our brothers at Google, we feel that A/B testing on enough data will always lead to the optimal design choice, even for complex design situations. Read on to see the first fruits of our font legibility study.

What’s the best font for academic publications?

Based on sophisticated A/B testing of reading times over our corpus of 80M documents, we have determined the optimal font for research comprehension, which will now become standard across all Mendeley platforms. Click the link for a bigger picture.

This is just a start. We will be conducting further studies along these lines, such as using a Naïve Bayes approach to find the most important components of papers accepted into the top journal in your field. Initial results suggest that by using the optimal amount of industry-specific terminology, engineering your subordinate phrases to the precise number and length to make the reader feel like you’re smarter than he is, and choosing words most favored by a journal editor, you can increase your chances of acceptance in your targeted journal by up to 30.13%. For example, using “obviously” in place of “clearly”, and “demonstrate” in place of “show” is predicted to increase your chances of acceptance into Nature by 5.63%, but only if the manuscript is submitted before 16:00 GMT on a Tuesday when it is raining in London.

We’ve always said we would change how research is done, and this is only the start.

7 thoughts on “Optimizing research documents: The results of a study of 80M research papers at Mendeley.

  1. While text analysis could be insightful, the simple extraction of significant patterns and presenting them to the naive population without careful explanations can be misleading and dangerous.

    1. I bet you can find all kinds of interesting patterns. But how many of them are due to random chance (large number theory?) is unclear. I believe anyone can write a program to derive tens of thousands of interesting patterns at p-value <0.05. What about false discovery rate? Also, how did you estimate prior and incorporate conditional probability estimates in your Naive Bayes model? I believe nothing is as accurate when one looks underneath the hood.

    2. Sometimes the descriptive statistics should just be interpreted as it is, and cannot be used for predictions. Before you attempt predictions, can you report the positive predictive value of substituting one word for another word in an published article? What about all those articles with the buzzwords and got rejected–do articles containing such buzzwords get rejected more often, say at 100% higher rate than average articles in the peer-review process? If so, folks are better off sticking to the languages that they are comfortable with.

    I believe science business. If you are making $$$, please do not pretend that your science is better.

  2. I have to admit i freaked out for about 2 seconds, then checked the date posted…nice one.

  3. Hey, thanks for those tips, Critique. Like they say, 27.28% of all statistics are made up on the spot, but I think that number probably spikes a bit on April 1. 🙂

Comments are closed.