2

I manually go to pubmed and for example search my topic for example http://www.ncbi.nlm.nih.gov/pubmed/?term=Cancer+TFF then from summery I get PMIDs. and then try to retrieve all the abstracts using the following command.

I want to know two things, how to not do the first part manually? (means I do it also through scripting) and also how to save the abstracts ?

#Retrieve abstracts from PUBMEDid list
count=1;
for i in `cat pmid.txt`;
do echo -n "$count";
    ruby -e 'print "\t"';
    echo -n $i;
    ruby -e 'print "\n"';
    curl "http://togows.dbcls.jp/entry/ncbi-pubmed/$i/abstract";
    ruby -e 'print "\n"';
    ((count++));
done
Harsh Trivedi
  • 1,594
  • 14
  • 27
nik
  • 2,500
  • 5
  • 21
  • 48
  • you are getting this error because you haven't install `mechanize` library. Please do `gem install mechanize`. Make sure that in irb you are able to run `require "mechanize"` without error and then only try the script. – Harsh Trivedi Jul 03 '16 at 02:55
  • btw @nik, you don't need to do `chmod +x reterive_abstract.rb`. Although, it doesn't do any harm, this is only required if you are going to run the script by `./reterive_abstract.rb` and not `ruby reterive_abstract.rb`. But in that case you would have to add shebang in first line of ruby script telling the script that it is a ruby script. – Harsh Trivedi Jul 03 '16 at 03:32

1 Answers1

1

You can get the list of PMIDS by webscraping using mechanize gem in ruby. Do gem install mechanize and then you can get the required result by running the ruby script below:

require 'mechanize'
agent = Mechanize.new
elements = agent.get('http://www.ncbi.nlm.nih.gov/pubmed/?term=Cancer+TFF').search(".rprtid").to_a
pmids = elements.map{|x| x.elements.last.text}
puts "List of pmids:"
puts pmids

File.open( "output_pmid_abstracts.txt", "w" ) do |file|
    for pmid in pmids
        puts "Getting Abstract for PMID: #{pmid}"
        abstract = agent.get("http://togows.dbcls.jp/entry/ncbi-pubmed/#{pmid}/abstract").body
        file.puts "pmid:#{pmid}"
        file.puts abstract
        file.puts ""
    end
end
puts "Done"

This will make output_pmid_abstracts.txt file in your current directory which will look something like below:

pmid:27220894
BACKGROUND & AIMS: Gastric cancer has familial clustering in incidence, and the familial relatives of gastric ...
...
pmid:26479350
Trefoil factor family (TFF) peptides are a group of molecules bearing a characteristic three-loop trefoil domain ...
...

PS: Please make sure that you absolutely need to install mechanize gem first! Or else you will obviously end up getting error: require': cannot load such file -- mechanize (LoadError), because it is not able to find the required library/gem. By any case if even after gem install mechanize you get require error, then do sudo gem install mechanize and then try.

Update 1:

As mentioned by nik in comment, this code only loads the first page (20 entries) of the search even though it has more. So I am updating the code the fix this problem. Some URL's are different now.

I first get a list of all the pmids by a API and then lookup each pmid's abstract by webscraping.

require 'mechanize'
agent = Mechanize.new

search_terms = "Cancer+TFF"

url = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=#{search_terms}&retmax=10000"
all_pmids = agent.get(url).search("IdList").text.strip.split("\n").map{|x| x.strip.to_i}

puts "List of pmids:"
puts all_pmids

File.open( "output_pmid_abstracts.txt", "w" ) do |file|
    for pmid in all_pmids
        puts "Extracting Abstract for pmid: #{pmid}"        
        abstract_url = "http://www.ncbi.nlm.nih.gov/pubmed/#{pmid}"
        abstract = agent.get(abstract_url).search(".abstr").children[1].text rescue " "
        file.puts "pmid:#{pmid}"
        file.puts abstract
        file.puts ""
    end
end

PS: It is possible that some Paper dont have abstract at all: Eg: 16376814 (check here)

Hope it helps : )

Harsh Trivedi
  • 1,594
  • 14
  • 27
  • @nik I have updated the ruby script that does all you need. Just save this ruby script, install meachinze gem, and rest will be done - Hope it helps you : ) – Harsh Trivedi Jul 02 '16 at 18:50
  • please comment if you still have problems or are new with ruby :) – Harsh Trivedi Jul 02 '16 at 18:55
  • @z_- I have answered the question, rather than asking it. I didn't understand what you want to say. – Harsh Trivedi Jul 02 '16 at 19:02
  • @nik does this answer your question ? that is what i mean lol – z atef Jul 02 '16 at 19:07
  • @nik you get this error because you haven't installed mechanize gem (library). In the first line of my answer only, I have mentioned that you first need to install this library by `gem install mechanize`. Make sure `require "mechanize"` works in your irb. Only then try the script I suggested. – Harsh Trivedi Jul 03 '16 at 02:53
  • @nik does this answer your question now? – Harsh Trivedi Jul 03 '16 at 05:43
  • @nik Ok no problem. That can be handled. Just do this and let me know if it runs successfully: `gem install nokogiri` or `sudo gem install nokogiri`. I feel that you are pretty new to ruby, but its good that you are making an effort. – Harsh Trivedi Jul 03 '16 at 07:51
  • I never had any problem installing mechanize on my mac, but I guess you can follow these links: http://superuser.com/questions/385270/ruby-gem-mechanize-missing-libxml2-on-mac-os-x-10-7-2-lion , http://stackoverflow.com/questions/704544/installing-mechanize-gem-on-mac-os-x-10-4-11-gives-failed-to-build-gem-native-e http://stackoverflow.com/questions/10768586/cannot-installing-mechanize-for-ruby-on-mac – Harsh Trivedi Jul 03 '16 at 07:58
  • I understand your condition! I already have all libraries so its difficult to reproduce problem in my machine. Although, hope you find the solution soon: ) – Harsh Trivedi Jul 03 '16 at 08:09
  • Hope you get it soon. And, after that, hope you realise/find that answer above is correct and you accept/vote it :-) – Harsh Trivedi Jul 03 '16 at 08:28
  • @nik thanks : ) I am really glad that it helped you. I hope you were able to install mechanize finally. Now, I really wonder why you are getting error with that new url. Because I am not getting any error. I am updating/adding comment in line 4. Can you check if your update was exactly same and tell me? – Harsh Trivedi Jul 03 '16 at 13:10
  • Ohh shits I should have detected this. The code extracts the pmids from the page. Now page displays only first 20 of them! Things might get little messier to accommodate this problem. Give me sometime, will help you with that and update : ) Btw, I have used pubmed earlier few times earlier, and I don't think it blocks ips. – Harsh Trivedi Jul 03 '16 at 13:16
  • @nik I was working on this problem since last over 1-2 hrs. Finally, I have the solution - Check I have updated : ) – Harsh Trivedi Jul 03 '16 at 15:09
  • @nik Welcome : ) I am really glad it it helped you finally :) .. Btw, which example are you referring to remove? I didn't understand! – Harsh Trivedi Jul 03 '16 at 15:54
  • @Harsh Trivedi nothing now is great. Look I have a issue , do you think you can help me to solve it using ruby ? – nik Jul 03 '16 at 15:57
  • What is the issue ? I mean in the answer or something else ? – Harsh Trivedi Jul 03 '16 at 15:58
  • @Harsh Trivedi it is something else but I cannot post anymore question !!! it does not allow me :-D – nik Jul 03 '16 at 16:12
  • @Harsh Trivedi probably then I will ask tomorrow because I cannot ask question for 24 hours, Yes there is a limitation and that is why I cannot ask again :-D – nik Jul 03 '16 at 16:17
  • @nik I am sure you are doing/looking something wrong, because its working perfectly in my machine. What is the issue? – Harsh Trivedi Jul 04 '16 at 01:07
  • @Harsh Trivedi run the code on df1 and df2 i sent you, go to df3 and look at search for example for the first string which took _integer value. O00422_52_1, then look at the df1 line 52, you will see it consists of only 1 string, then why this string should get _1? for others also the same – nik Jul 04 '16 at 03:43