5

I'm trying to access the strings in a PDF loaded remotely in a UIWebView. Since Apple has the copy/define options listed for the UIMenuController items, i thought it would be just as easy as implementing the stringByEvaluatingJavaScriptFromString declaration to utilize with my custom shareMenuController item. However, this works universally across most web pages and text ranges, with the exception of remote PDFs. How does Apple access these strings to copy the selected ranges to the clipboard or define?


Correction:

I just tested the 'define' sharedMenuController item populated by Apple when text is selected in a UIWebView pdf document format and an error occurs:

+[_UIDictionaryManager _availableDefinitionDictionaries] returned nil. Error: Error Domain=ASError Code=21 "The operation couldn’t be completed. (ASError error 21 - Unable to copy asset information)"

I've also noticed you can't search PDF text from a user search term when in Safari. So I suppose they have trouble extracting it themselves. Could be memory issues, i'm not sure, but they still are able to copy the selected text to the clipboard. How would we emulate that and handle the copied text ourselves?

soulshined
  • 9,612
  • 5
  • 44
  • 79
  • Please note the reason for bounty, it's canonical. Please don't leave a frivolous answer just to try and get the bounty or a share of it with up votes. This answer can potentially help out a TON of people. So please, be clear, concise and educational. I'm here to learn, so, don't just copy/paste code, please explain what's going on so I can actually learn something from it. If the answer is outstanding I am willing to invest more reputation to the person who answers it correctly and as requested because it can help out a shit ton of people – soulshined Jul 06 '15 at 23:57
  • Can you please clarify a couple of items for me? What do you mean by "remote PDFs"? Are those PDFs opened from a URL in a UIWebView? And this is more of a clarification on the question itself - are you looking for a way to get a string extracted, and placed into an appropriate variable, out of said PDFs? Along with some detailed explanation on why this works. – Vel Genov Jul 08 '15 at 18:54
  • Correct @VelGenov remote as in not local. And yes, thought I was clear. I know how to extract strings using the method stated in my question, but that method doesn't work for PDFs. So in short, how would I emulate a string extraction for remote PDFs as Apple does for their copy/paste (sharedMenuController item) feature. Thanks for the feedback - hope you can help. And explanation is great. Like I said, I'm here to learn – soulshined Jul 08 '15 at 21:37
  • @VelGenov any ideas? – soulshined Jul 12 '15 at 08:42
  • A user can search for text in PDF in Mobile Safari. –  Jul 12 '15 at 21:23
  • That comment isn't very useful to the core issue of the question @MichaelL though that's not the question I threw that in there as an example of what I've observed through trial and error trying to rectify the issue. Take this website: https://manuals.info.apple.com/MANUALS/1000/MA1565/en_US/iphone_user_guide.pdf I can not search for text in the URL bar. I typed iPhone, something that's all over the place and it returned no matches. This on a real device, I don't know if it works on simulator but it doesn't work any of my 4 devices. can you provide an example where you got it to work? – soulshined Jul 12 '15 at 21:28
  • Wow: you this really should work. I will file a bug report. I know this has worked in the past. –  Jul 12 '15 at 21:35
  • Not sure it's a bug @MichaelL because webviews don't use pages so it would be hard for Apple to scroll to the next page a search term is located. I have been looking for this feature since iOS 6 and it has been the same result. I wish it was available to us to at least copy text, but I have been looking since iOS6. Perhaps it worked prior to that. – soulshined Jul 12 '15 at 21:37
  • Hey guys I wrote this stuff before I retired from Apple. If it is not working it is a bug. –  Jul 12 '15 at 21:38
  • @MichaelL not working is relative to features of intent. that feature is for general websites and searching through those, PDFs are structured very differently then a normal character in an HTML string. But if your more then welcomed to submit a bug report, but I would like to emphasis that's not the core issue. I just want to copy the text not search through it – soulshined Jul 12 '15 at 21:43
  • Yes, see my answer below. The ability to extract text in PDFs is quite complex. There was talk within Apple about exposing API calls to enable App developers to access text in PDF, but this was not done up until I left a year ago. Maybe they are working on it now? If you file your own bug report as a feature request, it will add to the pressure to make this stuff externally available. –  Jul 12 '15 at 21:50
  • @MichaelL thank you. thanks for your feedback, and yes I knew that and i have already annotated that in my question. It just sucks, such a beautiful machine inside/out, that's such a pain sometimes – soulshined Jul 12 '15 at 22:01
  • @soulshined I tried to get this working, but wasn't able to. Extracting text from a PDF can be quite difficult, when there isn't a specific API exposed. On a side note, something else might be playing a role here since you are working with a remote PDF that's being loaded from the web. The PDF might not be completely downloaded when you start extracting the text. That will contribute to the difficulty of the task. – Vel Genov Jul 13 '15 at 19:22

1 Answers1

0

Unfortunately, what you have asked can not be done in iOS with the current UIWebView API. Apple uses a private framework to extract text from PDFs, and this framework is not available for external apps.

  • thank you, complex doesn't = impossible. I don't feel like this answer is 100% accurate, or it might just be the phrasing. I know you can extract strings from a UIWebView regardless of what document type (.doc, .pages etc) it is, so that doesn't necessarily make it private API. It's how Apple does it their way, makes it private. I knew this wasn't easy, and per our discussion above, have been searching for this since 2 years ago, but you never know if someone has a solution. If you can create a PDF from UIWebView there is a way to retrieve the text, but that's beyond me. – soulshined Jul 12 '15 at 22:11
  • No, not impossible, but we used very complicated algorithms. You can search for US patents on "PDF Reconstruction" if you want to see more details. –  Jul 13 '15 at 22:44
  • Right, so my point stands that it _can_ be done, even with the current UIWebView, it's just not a simple 2 line code kind of done, which is why I've posted an inquiry on SO. But thank you for your feedback Michael, ill see if anyone has anything progressive to say in the Developer forums - – soulshined Jul 13 '15 at 22:52