0

i am trying to get a html file which is one of the file located inside the tar file and i have something in mind i don't know its correct or not ?? please point me if i am wrong. My idea is-

i creating a stream from tar file in and storing that stream in buffer in order to have its contents then using strstr command to search the html file inside the tar file (as i know that in my tar file html contents start from "< !doctype html" and ends at <"/html>" , so i will load the contents between them which is actally the html file).Is my approach right ??

The problem is when i give very big size to the buffer (but smaller then size of tar file which contains html + many other files also) it gives stack over flow on debugging. but when i give small index then it show the contents of some other file which is located in starting if i open tar file in notepad(i have checked by opening the tar file in notepad those contents are really present in tar file but at the starting of the tar file so when i increased the index of buffer in order to access the html file which are located at the middle of file(which actually require very big index)it gives stackoverflow on debugging). My code is-

     HRESULT AMEPreviewHandler:: CreateHtmlPreview(IStream *m_pStream) //this function is called from
        // somewhere
             ULONG  CbRead;
                    const int Size= 115000 ; 
                    char Buffer[Size+1];
                    (m_pStream)->Read(Buffer, Size, &CbRead );
                    Buffer[CbRead ] = L'\0';
                    char *compare= "<!doctype html"; //this we have to search in tar file
    // content because the html file contents starts from here
                    char * StartPosition;
                    StartPosition = strstr (Buffer,compare); //StartPosition  gives Bad
// pointer when Size is small on debugging at this small size i can see some contents in buffer which i 
//can find in tar file at starting 
                    __int64 count=0; 
                    while (StartPosition!=NULL)
                    {
    MessageBox(m_hwndPreview,L"hurr inside the while loop",L"BTN WND",MB_ICONINFORMATION);

                        count=StartPosition-Buffer+1; //to get the location of 
    //"<!doctype html";

                                                        }


                    MessageBox(m_hwndPreview,L"wafter the while loop in CreateHtmlPreview  ",L"BTN WND",MB_ICONINFORMATION); 
                    return true;
                }

Please tell is my approach to get the file contents of html file inside the tar file is correct ?? and why it gives stack overflow when i give big index to buffer in order to access the contents of buffer which are loctaed in middle of tar file ?even the size i declare is smaller then the size of tar file if i see manually?

Sss
  • 1,519
  • 8
  • 37
  • 67

1 Answers1

0

The stack has a limited size, so just allocating arbitrary large amounts will not work - you either need to put a limit to it that fits within the stack available, and then loop for the read (makes for fun if your "needle string" (what you are looking for) straddles the "gap" between two blocks, but it's possible to overcome (see below). Or simply don't use the stack, but use new to allocate enough memory to hold the whole file. Of course, if the file is VERY large, that won't work - files can be larger than the total memory of your computer, and then you are stuffed, and have to go back to "read a bit at a time". It's also wasteful in terms of resources to read the entire file into memory, only to throw most of it away.

One solution, using one buffer, would be to add the length of the "needle" to the size of the buffer. When you read the second time, copy length of needle bytes from the back of the buffer to the beginning, and then read into the buffer at "needle" bytes in, then search from the beginning of the buffer. As long as the buffer is fairly large compared to the "needle", the overhead of searching through the same part of the buffer twice is not going to matter.

Mats Petersson
  • 126,704
  • 14
  • 140
  • 227