The Future of Research Communications and e-Scholarship

NLP methods for finding precedent research in the legacy literature

Name: 2013 Beyond the PDF2

Problem: Due to authors' ignorance of past work, new research replicates, or fails to adequately take into account, previously published research.  Reasons: (1) Inaccessibility of legacy literature.  (2) Laziness, poor literature search, not-on-Google effect, false belief of irrelevance of all but the most recent research.  Solution to (2): Precedent-finding system takes the text of an author's early draft (or a submitted manuscript) and uses NLP-based text-similarity metrics that take different terminologies, synonymy, paraphrase, semantic roles, structure of argumentation, and citations into account to find potentially related ideas in published work.  NB: This is neither conventional search nor plagiarism detection.

Graeme Hirst


