Mendeley at ACM Recommender Systems 2013

 

RecSys1

By Mark Levy, Senior Data Scientist at Mendeley

Last week I had the pleasure of travelling to Hong Kong to give two workshop presentations at the ACM Recommender Systems conference.  The art and science of recommender systems have come some way since the first time that “users who like X also like Y” appeared on an e-commerce site on the internet, and this year’s conference attracted several hundred delegates from both industry and academia.  Despite its close association with customer satisfaction and the commercial bottom line, as a research topic Recommender Systems occupies a tiny and somewhat recherché niche within the computer science discipline of Machine Learning, which centres on the idea that if you present a computer program with enough examples of past events, it will be able to come up with a formula to make predictions about similar events in the future.  For a recommender system these events record the interaction of a user with an item, for example Alice watched Shaun of the Dead, or Kris read Thinking Fast And Slow, and the program’s predictions consist of suggested new books that Alice or Kris might like, or of other movies similar to Shaun of the Dead, and so on.  In our products these scenarios correspond to Mendeley Suggest, currently available only if you subscribe to a Pro, Plus or Max plan, and to the Related Research feature which we recently rolled out to all users in Mendeley Desktop.

One challenge for anyone trying to build a recommender system is that it’s hard to tell whether or not your predictions are going to be accurate, at least until you start making them and can see how often your users actually accept your suggestions.  As there is a huge space of possible methods to choose from – far too many to test every possibility on unsuspecting users – ideally we’d like to be able to figure how well each prediction formula (technically each mathematical model) matches reality before we get to that stage.  If and how that might be possible was a recurring theme of this year’s conference, and the subject of my first talk in Hong Kong.

Surprisingly for a field that has now seen several years of quite intense research interest and hundreds of peer-reviewed publications, most practitioners remain highly sceptical of the results reported even in their own research.  This made it particularly interesting to hear conference presentations from large tech companies such as Google, Microsoft, LinkedIn, Ebay, not to mention Chinese counterparts such as Douban, TenCent and AliBaba, which were new names to me but who also operate at colossal scale.  These organisations have both the scientific expertise to develop cutting edge methods and the opportunity to test the results on significant numbers of real users.  You might be surprised to learn quite how much sophisticated research has gone into recommending which game to play next on your XBox.

At Mendeley we use a great deal of wonderful open source software, and so we’re very happy that the work we did in the Data Science team for my other presentation at the conference also gave us a chance to give something back to the developer community in the form of mrec, a library written in the very popular Python programming library and intended to make it easier to do reproducible research on recommender systems, even if you’ll still need to test your new algorithm on real people to convince most of us that it actually works.

Mendeley Mini-Conference on Recommender Systems

Mendeley Recommender Workshop

Last week, Mendeley hosted an all-day mini-conference on Academic-Industrial Collaborations for Recommender Systems.  As we’re fast running out of space in our London office, we rented a nearby venue called Headrooms.  With friendly staff looking after everyone’s needs and great start-up décor, we’ll definitely be coming back for future Mendeley event.  In the morning and early afternoon we were treated to seven talks from a variety of speakers who shared their experiences of academic-industrial collaborations and recommender systems.  We finished the afternoon by splitting into smaller groups to discuss the challenges involved in making such collaborations a success and sharing useful advice with one another.  The day then finished, as all good days do, with a quick trip to the funkily named Giant Robot, to taste some of their excellent cocktails. Our Chief Data Scientist Kris Jack, who masterminded this great event, shares some of the day’s highlights:

Presentations

Seven presentations were delivered by our eight speakers, one of them being an entertaining double act.  We tried to film as much of the event as we could so we could share them with you, so click on the links below to watch the presentations!

First off, Jagadeesh Gorla began with a presentation entitled A Bi-directional Unified Model.  Jagadeesh talked about the results presented in his www2013 paper on group recommendations via Information Matching, a new probabilistic model based on ideas from the field of Information Retrieval, which learns probabilities expressing the match between arbitrary user and item features: this makes it both flexible and powerful.  He is currently working on developing an online implementation for deployment in an online gaming platform.

Our double act, Nikos Manouselis and Christoph Trattner then followed with the intriguingly entitled presentation Je t’aime… moi non plus: reporting on the opportunities, expectations and challenges of a real academic-industrial collaboration.  They gave an honest and candid reflection of their expectations for working together and how some of their past experiences in other collaborations weren’t as successful as hoped.  It was great material that fed into the discussions later in the day.

Heimo Gursch then gave some Thoughts on Access Control in Enterprise Recommender Systems.  While his project is still in the early stages, he had quite a few experiences that he could share from working with industry partners from the perspective of an academic.  He was working on designing a system that would allow employees in a company to effectively share their access control rights with one another rather than relying on a top down authority to provide them.  It’s also the first time that I’ve seen a presenter give his car keys to a member of the audience.  I do hope that the got them back.

Maciej Dabrowski delivered an exciting presentation Towards Near Real-Time Social Recommendations in an Enterprise.  His team and him have been working on a cross-domain recommendation system that works in a federated manner.  It exploits semantic data from linked data repositories to generate recommendations that spans multiple domains.

Mark Levy, from our team here at Mendeley, then presented some of the work that he has been doing in a talk entitled Item Similarity Revisited.  The presentation was filled with useful advise from an industrial perspective on what makes a good recommender system.  He also explored the idea that simple algorithms may be more useful than complex ones in an industry setting, showing some impressive results to back it up.

Benjamin Habegger then took us on a rollercoaster ride exploring some of his successes and failures in his last startup, 109Lab: Feedback from a Start-up experience in Collaboration with Academia.  He reflected on many of his experiences co-founding a start-up and the learning from the mistakes that were made.  Although he worked with academia during the process, he wasn’t clear about the value that it actually brought.

Finally, Thomas Stone presented Venture Rounds, NDAs and Toolkits – experiences in Applying Recommender Systems to Venture Finance.  Thomas had some nightmare experiences with NDAs during his PhD.  So much so, that he’s still unclear what he has the right to publish in his thesis.  He also gave a nice introduction to PredictionIO, an open source machine learning server.

Discussion Groups

Once the presentations were given, everyone was invited think about the challenges and difficulties that they had faced in working in academic-industry collaborations and to write down some topics on a flip chart.  We then split into three groups and, using these topics as guidance, discussed the issues faced and presented some solutions.

A number of issues were identified including:

  • · Prototypes vs production code – do the partners know what is expected from whom?
  • · How to find the right partners
  • · Access to data (e.g. NDA issues)
  • · Evaluating systems
  • · Best practices

After the three groups discussed the points we all gathered back to share our thoughts and conclusions.  In general, we all seemed to share similar problems in making academic industry collaborations successful.  We discussed that there should always be a clear set of expectations agreed from the outset and that partners should know their roles.  Communication lines should be kept open and the spirit of collaboration encouraged.  What’s more, it can help to have members of the teams working together in the same physical location, even if it’s just for a brief period, in order to work well together.

Working in academic-industrial collaborations is hugely rewarding but it can be tough.  Finding the right partners who understand each other’s goals and constraints is important from the outset.  We can all learn from one another but we need to put in some effort in order to enjoy the rewards.

I’d like to thank everyone who put in the effort to make the workshop a success and, as I follow up the several e-mails that I’ve got, hope to start some new and fruitful collaborations!