1

I want to extract character data from a file and it must be directly convertable to a 4 byte int. Would anyone know how to convert a 1 byte char into a 4 byte char?

Background:

I'm extracting stream data from a PDF file. That data is only encoded in LZW encoding. When extracting the data, if I use a char (this is before the decoding part), the maximum integer value the data will provide is 255, for obvious reasons (1 byte char, max 256). If I could extract the data straight into an integer without an intermediate char to catch the data (like my example below) it would probably get past this problem and display the correct numerical values (akin to LZW compressed data) which are in excess of 255.

Basically I want to be able to do this.

char FourBiteChar; // I can't use the char data type, not sure how else to do this?
int MyInteger;

while (input >> FourBiteChar)
{
    MyInteger = FourBiteChar;
    MyVector.push_back(MyInteger);
}
Stefan
  • 17,448
  • 11
  • 60
  • 79
domonica
  • 526
  • 7
  • 14
  • Add the language tag. – DontVoteMeDown Nov 22 '13 at 11:41
  • 1
    you want to convert a `char` to `4 byte int` or `1 byte char` into a `4 byte char`? – SajjadHashmi Nov 22 '13 at 11:42
  • Do you want to collect 4 chars and move them into an integer or just a char to int? – Jekyll Nov 22 '13 at 11:42
  • I want to change a 1 byte char into a 4 byte char. Is that even possible? – domonica Nov 22 '13 at 12:51
  • 9
    Great. People ask for clarification regarding what "convert a 1 byte char into a 4 byte char" is supposed to mean, and the clarification says that it means to "change a 1 byte char into a 4 byte char". Magnificent. – R. Martinho Fernandes Nov 22 '13 at 12:53
  • @domonica - no, that's not possible. Please tell us what you're trying to achieve, not how. – DarkWanderer Nov 22 '13 at 12:58
  • Well, I'm extracting stream data from a PDF file. That data is only encoded in LZW encoding. When extracting the data, if I use a char (this is before the decoding part), the maximum integer value the data will provide is 255, for obvious reasons (1 byte char, max 256). If I could extract the data straight into an integer withoutan intermediate char to catch the data (like my example above) it would probably get past this problem and display the correct numerical values (akin to LZW compressed data) which are in excess of 255. – domonica Nov 22 '13 at 13:27
  • 1
    A `char` is 1 byte by definition. There is no such thing as a "4 byte `char`". – Keith Thompson Nov 22 '13 at 15:47
  • 1
    As I understand it, LZW compression uses variable-length integers, not generally 8-bit-aligned. Is that true, domonica? – TonyK Nov 25 '13 at 08:05

1 Answers1

2

you are probably looking for std::stringsteam

std::string tempstr;
int MyInteger;

while (getline(input, tempstr))
{
    std::stringstream tempss(tempstr);
    tempss >> MyInteger;
}

as for the fact that your file is not ASCII, but binary (pdf) you might want to check these answers: Reading text from binary file like PDF

C++ Reading a PDF file

Is there a C++ library to extract text from a PDF file like PDFBox for Java?

Community
  • 1
  • 1
Federico
  • 1,092
  • 3
  • 16
  • 36
  • Hmm,, thanks for the effort. I have tried this before and just so we're on the same page 'input' is a in file stream 'infilestream'. I'm still playing around with this, maybe can get it to work. Any other suggestions welcome. – domonica Nov 22 '13 at 14:55
  • I get an error on the while (input >> tempss) part. Cannot overload >> operator. – domonica Nov 22 '13 at 15:02
  • I probably should have come straight out and asked this, but can anyone tell me this. If I am extracting data from an LZW compressed stream in a PDF, is the code I have written above enough. That is, extracting the stream (in its compressed form) character by character? That's all I really need to clear up for now. OR, is there a special method I need to use for extracting data streams compressed in LZW encoding? I am using the Rosetta code implementation for LZW. It works, but not on PDF stream data, hence my question? I thought maybe I am extracting the data incorrectly? – domonica Nov 24 '13 at 14:17
  • @domonica: you have to read text from a pdf with appropriate code (as I thought you were already doing): read the linked questions. – Federico Nov 25 '13 at 07:39
  • I'm aware of that. That is what I'm doing. The LZW decoder works fine with other ouput but for some reason it won't decode my ouput I have extracted from the file. It's very frustrating. I've tried creating a 12 bit chracter using a bitfield as I understand that I should be reading in 12bits at a time but it's still not decoding. Honestly,, I thought just by extracting each char from the compressed stream , converting it to an int and then passing it to the LZW decompress algorithm in a vector it would work. For some reason it's not working with the data I take from the PDF stream. – domonica Nov 26 '13 at 06:22