0

I'm currently working with a program that is used for scraping websites of information and storing them locally in a db. This program is set to fetch articles from the IT area, and I got specific journals list that are the once I'm suppose to find. It works like I a DOI from the db, we got that from a site called DBLP, and with this DOI I will set up a connection and redirect me to the site where I can find the article. The problem now for me is that I got this DOI:

Crossref

And as you can see if you click on the link I get to a crossref site that tells me that I can choose from 2 different location to find this article. And because I only have a translator that can scrape for one of this sites, I want go to the IEEE Xplore site. The problem now is I don't know how I can tell my program to go to IEEE Xplore site. If you look at my code that I have now it looks something like this:

    public static void Scan(Article article) throws Exception
{
    //When running program, creates a error text-file inside java Project folder
    File file = new File("errorlogg.txt");
    FileWriter fileWriter = new FileWriter(file, true);

    // if file doesn't exists, then create it
    if (!file.exists()) 
    {
        file.createNewFile();
    }

    //Setting up an URL HttpURLConnection given DOI
    URL urlDoi = new URL (article.GetElectronicEdition());

    //Transform from URL to String
    String doiCheck = urlDoi.toString();

    //Check what Journals
    String JournalsWanted = article.GetJournal();

    //Used to see if DOI changed 
    System.out.println("New DOI: " + urlDoi);

    HttpURLConnection connDoi = (HttpURLConnection) urlDoi.openConnection();

    // Make the logic below easier to detect redirections
    connDoi.setInstanceFollowRedirects(false);  

    String doi = "{\"url\":\"" + connDoi.getHeaderField("Location") + "\",\"sessionid\":\"abc123\"}";

    //Setting up an URL to translation-server
    URL url = new URL("http://127.0.0.1:1969/web");
    URLConnection conn = url.openConnection();

And I was thinking about just doing something simple like an If to chance to URL if I see that it wont lead me right. I thought of something like this:

if(doiCheck.startsWith("http://dx."));

Problem then is that I don't understand how the crossref site know what article I'm after. Because if I click on the link in there to IEEE Xplore, it looks like this:

IEEE Xplore

And it doesn't seem to use the DOI to find the article, or is it something that I miss? How can I tell my program to chance URL to find the article on IEEE Xplore?

I'm sorry if the question isn't that clear and easy to understand, but did my best to explain my problem.

anderssinho
  • 298
  • 2
  • 7
  • 21
  • use firebug and inspect the headers. probably the target is there – Leo Jan 08 '16 at 13:12
  • @Leo yeah I can that specific link if I use firebug. But I have to understand how the crossref site knows where to send me for each article. I thought I always could use DOI, but don't understand how I should do it in this case – anderssinho Jan 08 '16 at 13:14
  • Basically, your script must detect if it reached a crossref site, in this case, your script must decide which link to follow. Not sure if that was your doubt. – Leo Jan 08 '16 at 13:25
  • Yeah that is the problem how I can tell my script how to follow the link. Do you have any suggestions? – anderssinho Jan 08 '16 at 13:27
  • Not really, sorry! But a note about your `doiCheck.startsWith`: Please note that the preferred DOI resolver has been [`https://doi.org/`](https://www.doi.org/doi_handbook/3_Resolution.html#3.8) for a few years now. Maybe use a reg-ex like `https?://(dx\.)?doi.org/` rather than the old string ;-) – Katrin Leinweber Jan 11 '19 at 19:17

0 Answers0