1

OK, so I'm parsing through the PDF content stream, discovered that the TJ callback produces an array of strings, so I grab it and start iterating through it to get the string values like so:

static void Op_TJ(CGPDFScannerRef s, void *info)
{
    CGPDFArrayRef array;
    bool success = CGPDFScannerPopArray(s, &array);
    if(success) {
        NSMutableString *actualString = [[NSMutableString alloc] init];
        NSLog(@"array count:%zu",CGPDFArrayGetCount(array));
        for(size_t i = 0; i < CGPDFArrayGetCount(array); i++) {
            CGPDFStringRef string;
            CGPDFArrayGetString(array, i, &string);
            NSString *stringData = (NSString *)CGPDFStringCopyTextString(string);
            [actualString appendString:stringData];
            NSLog(@"string Data:%@",stringData);
        }
        NSLog(@"actual string:%@",actualString);
    }
}

Only problem is, this is my output:

2013-01-11 12:39:49.895 WinPCS Mobile[1617:c07] began text object
2013-01-11 12:39:49.895 WinPCS Mobile[1617:c07] array count:7
2013-01-11 12:39:49.896 WinPCS Mobile[1617:c07] string Data:In
2013-01-11 12:39:49.896 WinPCS Mobile[1617:c07] string Data:In
2013-01-11 12:39:49.896 WinPCS Mobile[1617:c07] string Data:it
2013-01-11 12:39:49.896 WinPCS Mobile[1617:c07] string Data:it
2013-01-11 12:39:49.897 WinPCS Mobile[1617:c07] string Data:ia
2013-01-11 12:39:49.897 WinPCS Mobile[1617:c07] string Data:ia
2013-01-11 12:39:49.897 WinPCS Mobile[1617:c07] string Data:ls
2013-01-11 12:39:49.898 WinPCS Mobile[1617:c07] actual string:InInititiaials
2013-01-11 12:39:49.898 WinPCS Mobile[1617:c07] ended text object

I've resorted to exiting the for loop if i equals a number divisible by 2, but this is extremely sloppy and seems inefficient, so I'm wondering if anyone has a solution or any idea what the problem might be... I've tried multiple PDF files with the same results.

My simple quick fix was to change the for loop from this:

for(int i = 0; i < CGPDFArrayGetCount(array); i++)

to this:

for(int i = 0; i < CGPDFArrayGetCount(array); i+=2)
Ptemple
  • 125
  • 6

1 Answers1

2

CGPDFArrayGetString is defined to return a BOOL that's true if there is a PDF string at the specified index, otherwise false.

You're not checking the return value!

My guess is than one time every two you don't have a PDF string (and function returns false).

In those cases the function doesn't overwrite the string variable that remains the same as the previous cycle.

Just a guess..

Paolo
  • 15,233
  • 27
  • 70
  • 91
  • The argument of the "TJ" operator is an array of *strings* or *numbers*, e.g. "[ (A) 120 (W) 120 (A) 95 (Y again) ] TJ". The numbers are used for glyph positioning. - So this seems to be the correct answer (if you replace "PDF stream" by "PDF string"). – Martin R Jan 11 '13 at 19:20
  • In the apple documentation they talk about "PDF stream". See https://developer.apple.com/library/mac/#documentation/graphicsimaging/reference/CGPDFArray/Reference/reference.html – Paolo Jan 11 '13 at 19:26
  • But the function CGPDFArrayGet**String** returns a reference to a PDF string. – Martin R Jan 11 '13 at 19:29
  • Perfect! Solved the problem! Thank you. :) – Ptemple Jan 11 '13 at 19:34