Finding Better Ways of Mining Scientific Publications

TDM Workshop

Mendeley is supporting the 3rd edition of the International Workshop on Mining Scientific Publications, which will take place on the 12th September 2014 in London. The event will bring together researchers and practitioners from across industry, government, digital libraries and academia to address the latest challenges in the field of mining data from scientific publications.

Kris Jack, Chief Data Scientist at Mendeley, is part of the organizing Committee, which also includes The Open University and The European Library. Following a very successful call for papers, he is now looking forward to a very busy and productive day of presentations and discussions:

“We’ve had a record number of high-quality submissions this year, so were really spoiled for choice in putting together the agenda, which combines long papers, short papers, demonstrations and various presentations. We also worked with Elsevier to engage directly with the research community, which is really fantastic.”

As part of that ongoing outreach, Gemma Hersh, Policy Director at Elsevier, will be giving a brief presentation and answering questions from the participants regarding the company’s recently updated Text and Data Mining policy, and how it can best support the evolving needs of the research community.

As in previous years, this workshop is run in conjunction with the Digital Libraries conference – DL 2014 – and participants can register on the City University London website to attend the entire conference or just the workshops/tutorials.

See the full programme below, and for the latest updates be sure to follow @WOSP2014  or send any questions to @_krisjack or @alicebonasio on Twitter

 

PROGRAM

09:00-09:10

Introduction

09:10-09:45

Keynote talk

Information Extraction and Data Mining for Scholarly Big Data

Dr. C. Lee Giles

09:45-10:10

Long paper

A Comparison of two Unsupervised Table Recognition Methods from Digital Scientific Articles

Stefan Klampfl, Kris Jack and Roman Kern

10:10-10:30

Short paper

A Keyquery-Based Classification System for CORE

Michael Völske, Tim Gollub, Matthias Hagen and Benno Stein

10:30-10:50

Short paper

Discovering and visualizing interdisciplinary content classes in scientific publications

Theodoros Giannakopoulos, Ioannis Foufoulas, Eleftherios Stamatogiannakis, Harry Dimitropoulos, Natalia Manola and Yannis Ioannidis

10:50-11:10

Break

11:10-11:35

Long paper

Efficient blocking method for a large scale citation matching

Mateusz Fedoryszak and Łukasz Bolikowski

11:35-12:00

Long paper

Extracting Textual Descriptions of Mathematical Expressions in Scientific Papers

Giovanni Yoko Kristianto, Goran Topic and Akiko Aizawa

12:00-12:20

Short paper

Towards a Marketplace for the Scientific Community: Accessing Knowledge from the Computer Science Domain

Mark Kröll, Stefan Klampfl and Roman Kern

12:20-12:40

Short paper

Experiments on Rating Conferences with CORE and DBLP

Irvan Jahja, Suhendry Effendy and Roland Yap

12:40-13:00

Short paper

A new semantic similarity based measure for assessing research contribution

Petr Knoth and Drahomira Herrmannova

13:00-13:10

Presentation

Elsevier’s Text and Data Mining Policy

Gemma Hersh

13:10-14:00

Lunch

14:00-14:35

Keynote talk

Developing benchmark datasets of scholarly documents and investigating the use of anchor text physics retrieval

Birger Larsen

14:35-14:50

Demo paper

AMI-diagram: Mining Facts from Images

Peter Murray-Rust, Richard Smith-Unna and Ross Mounce

14:50-15:05

Demo paper

Annota: Towards Enriching Scientific Publications with Semantics and User Annotations

Michal Holub, Róbert Móro, Jakub Ševcech, Martin Lipták and Maria Bielikova

15:05-15:20

Demo paper

The ContentMine scraping stack: literature-scale content mining with community maintained collections of declarative scrapers

Richard Smith-Unna and Peter Murray-Rust

15:20-15:35

Break

15:35-16:00

Long paper

GROTOAP2 – The methodology of creating a large ground truth dataset of scientific articles

Dominika Tkaczyk, Pawel Szostek and Lukasz Bolikowski

16:00-16:25

Long paper

The Architecture and Datasets of Docear’s Research Paper Recommender System

Joeran Beel, Stefan Langer, Bela Gipp, and Andreas Nürnberger

16:25-16:50

Long paper

Social, Political and Legal Aspects of Text and Data Mining

Michelle Brook, Peter Murray-Rust and Charles Oppenheim

16:50-17:00

Closing

Submit your paper for Mining Scientific Publications Workshop!

Data Mining Workshop

The 3rd International Workshop on Mining Scientific Publications will take place from the 8th to the 12th September in London, and is a cross-disciplinary workshop for researchers, industry practitioners, digital library developers, and open access enthusiasts. Kris Jack, Chief Data Scientist here at Mendeley is co-organizing the event along with CORE, the Open UniversityAthena Research and Innovation Center, and the European Library/Europeana .

The aim is to bring together people from different backgrounds to explore the possibilities around data mining tools, and how they can be used to save researcher’s time by finding and processing huge amounts of information quickly and easily.

We’re asking for submissions before the 13th July 2014 from those interested in analysing and mining databases of scientific publications, developing systems to enable such analysis, or designing new technologies to improve research and the free availability of research data. Researchers should submit their papers online, for inclusion in the programme. Both long papers (up to eight pages in the ACM style) and short papers (not exceeding four pages) are welcome, as are practical demonstrations and presentation of systems and methods (demonstration submissions should consist of a two-page description of the system, method or tool).

“We’re looking to attract researchers from across academia and industry to work through the amazing possibilities and challenges around mining scientific content. The collaborations that come from these initiatives always yield really interesting results, so I’m looking forward to see what submissions we get through this year” says Kris

The workshop will be structured around three main themes:

  1. The whole ecosystem of infrastructures, including repositories, aggregators, text-and data-mining facilities, impact monitoring tools, datasets, services and APIs that enable analysis of large volumes of scientific publications.
  2. Semantic enrichment of scientific publications by means of text-mining, crowdsourcing or other methods.
  3. Analysis of large databases of scientific publications to identify research trends, high impact, cross-fertilisation between disciplines, research excellence etc.

This year, we also put together a CORE publications dataset containing a large array of publications from various research areas. This includes full-text as well as enriched versions of metadata, with the aim of providing workshop participants with a framework for developing and testing methods and tools around the workshop topics. You can access this data through the CORE portal.

If you have any questions or comments, leave them below or tweet @WOSP2014

Papers aren't just for people




Image via klausonline

There should be copyright exemptions for text mining in research.

There is a fundamental shift happening now in how research is conducted and it is affecting all fields of academic endeavor. Some fields have already shifted and some are just beginning to, but the shift has a common cause, and that cause is the growing amount of research output. At a certain point, the amount of research output exceeds the ability for researchers to consume it all as it is published. In biological sciences, the shift has already begun, but the difficulties reach all the way to the (digital) humanities.

At Mendeley, we’re building tools to address this problem. Mendeley Suggest is designed to suggest relevant research to you, in effect showing you the results of searches you haven’t run yet. Searching the Mendeley catalog allows you to find papers in smarter ways than just keywords, by ranking the results according to how widely read the paper is and by showing you groups and other concepts related to the paper. At the end, though, there has to be a researcher reading the paper and using the knowledge to inform their research, and this just doesn’t scale. We need to be smarter about this. However useful these tools are, they only stem the flood, when what we should be doing is building boats. Read More »