0

I'm extracting text from a PDF using Poppler and used the following code to print the text:

for (std::vector<poppler::text_box>::iterator it = currpg.begin(); it != currpg.end(); ++it)
{
    const char *txt = it->text().to_latin1().c_str();
    printf("%s\n", txt);
}

It worked fine for all but one string: "Exemptions/Allowances:" which came out Ы`L/V.

I then tried the following code and the string printed properly:

for (std::vector<poppler::text_box>::iterator it = currpg.begin(); it != currpg.end(); ++it)
{
    std::string txt = it->text().to_latin1();
    printf("%s\n", txt.c_str());
}

For that one particular string, why does the conversion to c_str inside printf yield a different result than when conversion is done outside printf? I thought maybe the "/" was causing an issue but there were date strings that also had "/" and printed properly.

c2po
  • 13
  • 3

1 Answers1

1

The pointer txt outlived a temporary variable.

it->text().to_latin1() // returns a temporary
const char *txt = it->text().to_latin1().c_str(); // stores the pointer to an internal buffer of the temporary
printf("%s\n", txt); // the temporary destroyed, the dangling pointer is used

The first example involves the undefined behaviour.

Your question is a duplicate. See std::string::c_str() and temporaries.


If you had used the C++ with its power, you would make shorter and safer code. Compare

std::string txt = it->text().to_latin1();
printf("%s\n", txt.c_str());

and

std::cout << it->text().to_latin1() << "\n";
273K
  • 29,503
  • 10
  • 41
  • 64
  • FWIW `printf("%s\n", it->text().to_latin1().c_str());` would work as well. – dxiv Mar 31 '21 at 04:57
  • @S.M. that makes perfect sense...thank you! – c2po Mar 31 '21 at 13:13
  • @dxiv I originally did it that way, but when I added 4 additional attributes to print from the text_box (not shown in post), the printf became unwieldy so used variables. Glad I did for educational purposes :) – c2po Mar 31 '21 at 13:25
  • Not sure if it's appropriate to ask follow-up in comments, but here goes: why doesn't the compiler recognize the unsafe assignment to a temporary and complain? Does the safety depend on how c_str() is executed and is unknown at compile time? – c2po Mar 31 '21 at 15:16