In the US, what are the regulations regarding caching other people's content in a company's content library (SharePoint or another CMS for example)?
I would like to have indexable references to certain scientific papers from public Internet sites such as scientific journals and corporate IT and Programming whitepapers would like to ensure that should those papers links go dead with no redirect, our employees would still be able to read those papers. Would this be a proper thing to do or would it require something like contacting and asking every paper's author or journal (or only using API-able datasources)?
This would not be site scraping but papers (links) chosen by the employees themselves to be part of the library. For development reasons we will be self-hosting within our own domain and/or using services like Box and Slack via API.
There would be proper attribution to the author and publisher in the modeled into the architecture of our intended indexing and search subsystem.