The Future of Research Communications and e-Scholarship

Why may Google textmine but Scientists may not?

Author: Piwowar, Heather

"Check out the robot.txt files for PMC  for /pmc/articles/  and notice that GoogleBot is allowed, Bing and a few others are allowed, but User-Agent:* (the rest of us) are not.  The same is true for ScienceDirect robots.txt:  Google may textmine everything, experimenting scientists, nothing.  (hat tip to Alf Eaton on twitter)

Is this defensible on the grounds that Google knows what it is doing but The Rest Of Us Can Not Be Trusted?  I sure hope not.  Scientists are routinely trusted with a lot more than writing a script that won’t bring down a server.  There are other ways to ensure someone won’t bring down a server than a global robots.txt ban."

http://researchremix.wordpress.com/2013/03/13/why-google/

Archive: https://archive.force11.net/node/6435

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles

FORCE2023 Conference
APRIL 18-20 (Online)

Thinking/Acting: The Global and the Local

#force2023

Membership

Join the FORCE11 community and take part in our groups, conference, summer school, post on FORCE11, and attend other events.

Membership

FORCE2023 Sponsors

The FORCE11 community thanks the following organizations for their financial support of the
FORCE2023 annual conference.