0

Resources:JAVA, Selenium, TestNG, Maven

issue : I would like to validate a page which has pdf content only.

What I am able to assert as of now:

  1. After clicking button from parent page I am able to redirect and switch the windowHandle from parent window

  2. URL syntax is correct or not

  3. URL is giving us 200 HTTP status code or not

  4. Under DOM body this element is visible or not By.xpath("//*[@type='application/pdf']");

Note: My dev told me there are two kind of PDF (1) text based (2) Image based

I might have image based PDF that's why I am not able to read via PDFbox API (I guess).

Other API (PDFbox ) tried: below code is working with some other test PDF url but when I execute same code on my company PDF URL then I see "End of File error" , I tried to handle/google EOF error too but not got success.

driver.get("http://www.axmag.com/download/pdfurl-guide.pdf");

String getURL = driver.getCurrentUrl();

PDDocument doc = null;

BufferedInputStream file = null;

String output = null;

URL urlOfPdf = new URL(getURL);

BufferedInputStream fileToParse = new BufferedInputStream(urlOfPdf.openStream());

PDDocument document = PDDocument.load(fileToParse);

output = new PDFTextStripper().getText(document);

Assert.assertTrue(output.contains("some text"));

http://www.testingdiaries.com/selenium-webdriver-read-pdf-content/

EOF error trace:

java.io.IOException: Error: End-of-File, expected line
    at org.apache.pdfbox.pdfparser.BaseParser.readLine(BaseParser.java:1124)
    at org.apache.pdfbox.pdfparser.COSParser.parseHeader(COSParser.java:2589)
    at org.apache.pdfbox.pdfparser.COSParser.parsePDFHeader(COSParser.java:2560)
    at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:219)
    at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1222)
    at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1122)
    at com.ecompany.tests.RunOcrTest.validatePdfRunSummary1(RunOcrTest.java:135)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:567)
    at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:124)
    at org.testng.internal.Invoker.invokeMethod(Invoker.java:583)
    at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:719)
    at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:989)
    at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:125)
    at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:109)
    at org.testng.TestRunner.privateRun(TestRunner.java:648)
    at org.testng.TestRunner.run(TestRunner.java:505)
    at org.testng.SuiteRunner.runTest(SuiteRunner.java:455)
    at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:450)
    at org.testng.SuiteRunner.privateRun(SuiteRunner.java:415)
    at org.testng.SuiteRunner.run(SuiteRunner.java:364)
    at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
    at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:84)
    at org.testng.TestNG.runSuitesSequentially(TestNG.java:1208)
    at org.testng.TestNG.runSuitesLocally(TestNG.java:1137)
    at org.testng.TestNG.runSuites(TestNG.java:1049)
    at org.testng.TestNG.run(TestNG.java:1017)
    at org.testng.remote.AbstractRemoteTestNG.run(AbstractRemoteTestNG.java:115)
    at org.testng.remote.RemoteTestNG.initAndRun(RemoteTestNG.java:251)
    at org.testng.remote.RemoteTestNG.main(RemoteTestNG.java:77)

What I want to know: How can we handle such scenario ? How to validate such test case ?

Thanks

Mike ASP
  • 2,013
  • 2
  • 17
  • 24
  • 1) Please clarify what you mean with "validating a page". 2) "End of File error" is this a second question, or is this the real question? Is there a stack trace for this EOF? – Tilman Hausherr Apr 10 '20 at 07:25
  • Instead of loading a `PDDocument` from the `fileToParse` stream, copy the contents of that stream to a file and inspect that file. Start by viewing it in a text editor, does it look like html or different. If like html, your way to retrieve the data is wrong. – mkl Apr 10 '20 at 09:09
  • @TilmanHausherr thanks for your reply.(1) When we see PDF docs on browser then what to test through automation to make sure PDF functionality is working ok? (2) I updated EOF error. – Mike ASP Apr 11 '20 at 18:28
  • Re 1: If it opens with load(), then it's a (mostly) valid PDF (if you want to verify contents, then it's more complicated but still possible). 2) The stack trace suggests that the file is empty or does not start with %PDF. I suggest you do what mkl mentioned. – Tilman Hausherr Apr 12 '20 at 08:33
  • @mkl I streamed to file and I am seeing HTML content. This content does not have valid desirable data. Then I tried "itextpdf maven lib" saw https://stackoverflow.com/questions/8655027/how-to-solve-pdf-header-signature-not-found-error FYI, It's a Salesforce Application which is generating PDF. – Mike ASP Apr 12 '20 at 19:11
  • @TilmanHausherr I actually see error of EOF on "PDDocument document = PDDocument.load(fileToParse);" line so my loading is Not going through successfully. FYI, It's a Salesforce Application which is generating PDF. – Mike ASP Apr 12 '20 at 19:12
  • You see html content. Thus, for the given url you don't get a pdf file at all. Now you should look at the html. Is it a login page? Then you apparently forgot to add credentials when opening the url for reading. Is it some error page? Then you probably used an inappropriate agent header (in your code by default). Etc. Etc. Etc. – mkl Apr 12 '20 at 20:02
  • @mkl I am logging in first and then opening the pdf page, without successful login I can't open the PDF page. what header you are referring ? thanks for your reply. – Mike ASP Apr 13 '20 at 01:11
  • To retrieve the contents of `getURL`, you use a connection completely independent from the one in selenium for which you have logged in. Unless the authorization is part of the url (e.g. via some session id parameter, your retrieval of the url contents is unauthorized. Which headers? That depends on the exact architecture of the site you test. You should talk with your Salesforce Application developer to find out how you can re-use your authorization in `URL.openStream()`. – mkl Apr 13 '20 at 08:19

0 Answers0