I have been trying a simple Ruby program to parse a simple pdf file and extract the texts I am interested in. I found that pdf-reader is quite good gem for pdf file parsing. I have read through the examples given in that gem and some tutorials around that.
I have tried the callback method and was able to get all the text from my pdf file. But I did not understand the concept behind the arguments for some of the callbacks.
For example, If my pdf has a simple table with 3 columns and 2 rows. (Header row values are Name, Address, Age) and first row values are (Arun, Hoskote, 22) and when U run the a ruby following ruby script
receiver = PDF::Reader::RegisterReceiver.new
reader = PDF::Reader.new("Arun.pdf")
reader.pages.each do |page|
page.walk(receiver)
receiver.callbacks.each do |cb|
puts cb.inspect
end
end
It prints series of callbacks among which some of the interesting callbacks show_text_with_positioning were like following
{:name=>:show_text_with_positioning, :args=>[["N", 5, "am", -4, "e"]]}
{:name=>:show_text_with_positioning, :args=>[[" "]]}
{:name=>:show_text_with_positioning, :args=>[["Ad", 6, "d", 3, "ress"]]}
{:name=>:show_text_with_positioning, :args=>[[" "]]}
{:name=>:show_text_with_positioning, :args=>[["Age"]]}
{:name=>:show_text_with_positioning, :args=>[[" "]]}
{:name=>:show_text_with_positioning, :args=>[["Ar", 4, "u", 3, "n"]]}
{:name=>:show_text_with_positioning, :args=>[[" "]]}
{:name=>:show_text_with_positioning, :args=>[["H", 3, "o", -5, "sk", 9, "o", -5, "te"]]}
{:name=>:show_text_with_positioning, :args=>[[" "]]}
{:name=>:show_text_with_positioning, :args=>[["22"]]}
{:name=>:show_text_with_positioning, :args=>[[" "]]}
From the above callbacks, what does args represent with respect to pdf file ? If I want to extract only name value that is 'Arun' (Anything can come here) here or age value i,e '25' (any value can come here) here in this example, how can I do that in ruby program ? Is there any pdf-parser API or Ruby API to get only a single "interested" value(s) from a pdf file ?
How can I write a Ruby program to access a particular callback which I am interested in which gives me the text I wanted ?