2

Is it possible to get the information displayed in the page link given below using PHP. I want all the text content displayed on the page to be copied to a variable or to a file.

http://www.ncbi.nlm.nih.gov/nuccore/24655740?report=fasta&format=text

I have tried cURL too, but it didn't work. Where as cURL worked with a few other sites I know. But even if solutions with cURL are there do post. I might have tried various methods in which cURL can be used.

SRKR
  • 33
  • 8

2 Answers2

2

Use cURL to get the page content and then parse it - extract the <pre> section.

$ch = curl_init();

// Set query data here with the URL
curl_setopt($ch, CURLOPT_URL, 'val=24655740&db=nuccore&dopt=fasta&extrafeat=0&fmt_mask=0&maxplex=1&sendto=t&withmarkup=on&log$=seqview&maxdownloadsize=1000000'); 

curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, '3');
$content = trim(curl_exec($ch));
curl_close($ch);
// show ALL the content
print $content;

$start_index = strpos($content, '<pre>')+5;
$end_index = strpos($content, '</pre>');
$your_text = substr($content, $start_index, $end_index-$start_index);

UPDATE

Using the link from @ovitinho's answer - it now works :)

Nir Alfasi
  • 53,191
  • 11
  • 86
  • 129
  • This code isn't working for me... I am just getting a grey background. – SRKR Aug 02 '13 at 18:14
  • @SRKR Using the link from ovitinho's answer - it now works :) – Nir Alfasi Aug 02 '13 at 18:19
  • I used this to get it: `function get_content($URL) { $ch = curl_init(); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_URL, $URL); $data = curl_exec($ch); curl_close($ch); return $data; }` and used @ovitinho's link – SRKR Aug 02 '13 at 18:29
  • @SRKR yes, that's almost the exact same code - it should work :) – Nir Alfasi Aug 02 '13 at 18:33
1

You need to request the url used by form to show this result via javascript.

I founded this final url

http://www.ncbi.nlm.nih.gov/sviewer/viewer.fcgi?val=24655740&db=nuccore&dopt=fasta&extrafeat=0&fmt_mask=0&maxplex=1&sendto=t&withmarkup=on&log$=seqview&maxdownloadsize=1000000

Pay attention to use 24655740 from your first link in this request.

You can use cURL.

Vitor Almeida
  • 81
  • 1
  • 8
  • You have trouble to get this content because the text is loaded via javascript (who triggers form submition) – Vitor Almeida Aug 02 '13 at 18:06
  • 1
    That link is awesome, it worked like magic.... may I know where did you got that link from...? – SRKR Aug 02 '13 at 18:12
  • I used chrome element inspector looking at Network tab you can find requests to the server. Just copy the link :) – Vitor Almeida Aug 02 '13 at 18:14
  • Can you also get me such a link from this page please: http://www.ncbi.nlm.nih.gov/nuccore/NM_166356.1?report=fasta&log$=seqview&format=text – SRKR Aug 02 '13 at 18:18
  • 1
    @SRKR Sure :) http://www.ncbi.nlm.nih.gov/sviewer/viewer.fcgi?val=24655740&db=nuccore&dopt=fasta&extrafeat=0&fmt_mask=0&maxplex=1&sendto=t&withmarkup=on&log$=seqview&maxdownloadsize=1000000 – Nir Alfasi Aug 02 '13 at 18:20
  • 1
    If you need others URL's you can easily find it on this way. Open your chrome browser, enter your url. press F12 on keyboard. Then find Network tab on the top of this window. Next, at the bottom look for XHR. Them reload your page to see results on this panel. Please accept my answer if its ok for you. Thanks – Vitor Almeida Aug 02 '13 at 18:22