The Future of Research Communications and e-Scholarship

Hackathon, extracting meaningful, machine-interpretable data from scholarly publications

Hi all, we would like to organize a Hackathon while at the ESWC.



We want to do it on Monday, May 27 in Montpillier, France.

We are probably going to use https://www.hackerleague.org/ to manage signups



Theme:



The ability to extract meaningful, machine-interpretable data from

scholarly publications in PDF form is a big challenge.  Several open

source libraries exist that attempt to automate this process, but work

needs to be done on them to improve accuracy and reliability.  Some

specific and relevant  challenges include:



Ability to automatically identify and tokenize citations from the PDF

(or more accurately, from a string of text)

Ability to automatically identify those blocks of text that represent

the narrative in a PDF.

Ability to identify references within the narrative, extract their

scope, and associate them with citation information in the PDF.



Anybody interested is welcome to join us, we will announce more

presice details later on this week. Also, we are looking for someone who co-organizes the meeting, ideally

someone who is local to Montpellier or to France.



Best.

Archive: https://archive.force11.net/node/4359

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles

Membership

Join the FORCE11 community and take part in our groups, conference, summer school, post on FORCE11, and attend other events.

Membership

FORCE2022 Sponsors

The FORCE11 community thanks the following organizations for their financial support of the
FORCE2022 annual conference.