A human-scored research paper recommendation engine?

A few days ago, William Gunn blogged about a fascinating idea for a paper recommendation engine and also described Mendeley’s role in it. His post then generated a lively discussion on FriendFeed.

Perhaps due to our relatively well-known affiliation with Last.fm, our idea for a research paper recommendation engine had always involved tags and collaborative filtering. But William brings up Pandora, another type of recommendation engine which doesn’t rely on critical mass, but on scoring music based on a certain set of dimensions.

So I was wondering, how feasible would such a human-scored recommendation engine be for research papers, and how could one do it? If one were to transplant the Pandora approach 1:1, one would have to find suitable dimensions on which to score papers – but what could those be? Epistemological position (e.g. positivist vs. constructivist), academic discipline, methods used? Or would you have to define a slightly different set of dimensions for each academic discipline? As opposed to music, where you can score tracks based on instrumentation, mood, tempo etc., I feel that it would be rather difficult to use this level of abstraction for research paper recommendations, but maybe I’m wrong.

Of course, you could think of tagging as a form of (binary) scoring, too, but without pre-defined dimensions. I thus remain convinced that tagging and collaborative filtering will be very good starting point for our recommendation engine. However, William’s suggestion made me think of an additional possibility.

Here’s what we might do: We have been planning to gradually add “Paper Pages” to the Mendeley site over the next few weeks. There will be one page for every paper in our database, containing the metadata, the abstract (if possible/available), some usage statistics about the paper, links to the publisher’s page (if available), and (later on) commenting functionality. We were also thinking about crowdsourcing approaches to enable users to correct mistakes in the metadata or merge duplicates.

Incorporating William’s suggestion, we could also give users the option to explicitly link paper pages to each other, and then say “this paper is related to this other paper because ___”. Two papers sharing the same tag may implicitly suggest a relation, but it might also be a case of a homonym – the same tag meaning two completely different things in different disciplines. An explicit link would solve this problem.

I didn’t have much time to fully think this through, and any further ideas would be appreciated!

8 thoughts on “A human-scored research paper recommendation engine?

  1. Actually, Victor, that’s exactly what I was thinking about. Relatedness in terms of keyword extraction is useful, but being able to state, perhaps in a FOAF-like machine readable manner, this paper is related to this one because Author X did his PhD in Author Y’s lab, or this paper uses a similar approach as this other one in a related field, or this paper is a follow-up to the questions raised by this other paper.

    Types of features human annotation could add are things like:
    “was trained by”
    “is a colleague of”
    “is a rebuttal of”
    “is an extension of”
    “was first to publish”
    “was most influential to”

    Dates, authors, times cited, are all metadata that’s available, but with the explicit linkage you’re talking about, you could provide what I find most missing in a recommendation system – validation that the paper being recommended is truly an important one, not showing up just because of some keyword co-occurrence.

  2. Have a look at pubmed and how they suggest related articles. I don’t think you will have access to the references for each article but that could be used as a similarity signal as well. I think I would go with some form of clustering on keywords extracted from the abstract plus citations matching if you had access to it. These would serve as a baseline for article similarity that could then be personalized for each users according to likes/dislikes, tagging (co-similarity with tagged items) etc.

  3. I think, the best way to find the relation between two papers is the overlap of cited works within these papers
    As a score for recommending the papers found in such a way, its own citation index can be used.

    Another way of recommendation is the one used at Last.fm – find the users with overlapping libraries.

    Overlap of tag clouds of two papers could be useful too. In contrast to music you can assign much more tags to a paper (describing methodology, materials used, theory vs expt, etc.) But people hardly add tags to all papers they keep (I have about 1000 items in my db and have tags only for several of them).

    Maybe if it was a more automated process during saving the paper instead of manual revision of the library in your spare time will reduce the barrier to tagging. For example, I’ve noticed that I started tagging bookmarks as soon as Firefox 3 allowed to do it on the fly.
    With research papers it is harder to do, since you can’t tag effectively without having read the paper, but still I tend to put 2-3 keywords in the name of the file when I download the PDF (and now I’m going to stop doing this since at some moment I would rename the files from within Mendeley). Maybe it’s a good idea to extract those keywords from filename in Mendeley (but you have to teach users to have it as a habit).

  4. You guys are way ahead of me… here I was thinking I was smart because I know what Markov engine is LOL

    Good luck

Comments are closed.