-1

i have a buffer

char buffer[size];

which i am using to store the file contents of a stream(suppose pStream here)

HRESULT hr = pStream->Read(buffer, size, &cbRead );

now i have all the contents of this stream in buffer which is of size(suppose size here). now i know that i have two strings

"<!doctortype html" and ".html>" 

which are present somewhere (we don't their loctions) inside the stored contents of this buffer and i want to store just the contents of the buffer from the location

"<!doctortype html" to another string ".html>"  

in to another buffer2[SizeWeDontKnow] yet.

How to do that ??? (actually contents from these two location are the contents of a html file and i want to store the contents of only html file present in this buffer). any ideas how to do that ??

Sss
  • 1,519
  • 8
  • 37
  • 67

4 Answers4

1

You can use strnstr function to find the right position in your buffer. After you've found the starting and ending tag, you can extract the text inbetween using strncpy, or use it in place if the performance is an issue.
You can calculate needed size from the positions of the tags and the length of the first tag
nLength = nPosEnd - nPosStart - nStartTagLength

Bojan Hrnkas
  • 1,587
  • 16
  • 22
  • does visual c++ supports strnstr function if you any idea ??? and what is nStartTagLength ? as i uderstand you nLength is the size of the total html contents and nPosStart="<!doctortype html" and nPosEnd= ".html>" we just need this why you have used nStartTagLength ??? – Sss Jul 23 '13 at 14:04
  • visual c++ does not have strnstr, but if you are not sure that the Read-function is giving you an zero-terminated string, you can put the terminal zero in it yourself: `char buffer[size+1]; ZeroMemory(buffer,size+1);` nPosStart and nPosEnd are the positions of start tag "<!doctortype html" and end tag ".html>". nStartTagLength is the length of "<!doctortype html". I am sorry, but what you are asking are the basic algorithms and its scope is beyond this forum. – Bojan Hrnkas Jul 23 '13 at 14:14
  • so do you know equivalent function for visual c++ ?? and do you think that i canaccompish it by using strtok() here?? - while (pch != NULL) { pch = strtok (NULL, "!doctortype html" return 0; } can i get the position of "!doctortype html" using this ? – Sss Jul 23 '13 at 14:15
  • You can use strstr if the buffer is zero-terminated. If it is not, you can make it zero-terminated, like I explained in my previous comment. – Bojan Hrnkas Jul 23 '13 at 14:25
  • if i do like this then strstr gives error C2665: 'strstr' : none of the 2 overloads could convert all the argument types – Sss Jul 23 '13 at 14:29
  • 1
    Lookup the c++ reference for [strstr](http://www.cplusplus.com/reference/cstring/strstr/). It does not take the size parameter like the strnstr does. Please do learn the tools before using them. – Bojan Hrnkas Jul 23 '13 at 14:32
0

Look for HTML parsers for C/C++.

Another way is to have a char pointer from the start of the buffer and then check each char there after. See if it follows your requirement.

ctrl-shift-esc
  • 876
  • 1
  • 9
  • 19
  • i am intersted in the second idea but could you please tell me that by doing character pointer from start i can compare just have one character at a time but i have to compare a full string which is of the size equal to the size of **"<!doctortype html"**(not character) by doing so thats the problem.. have you understood what i mean to say ? – Sss Jul 23 '13 at 13:22
  • `&buffer` is the pointer to your buffer area. Now create another char pointer as `char *b = &buffer`. Now b is pointing to the start of buffer and you can move this around and not lose your buffer. Now, if the first character in your buffer is '<', then `*b` is '<'. Keep incrementing b and you can read one char at a time. Now keep comparing to the string you want, and you have what you want. – ctrl-shift-esc Jul 23 '13 at 13:39
0

Are you limited to C, or can you use C++?

In the C library reference there are plenty of useful ways of tokenising strings and comparing for matches (string.h):

http://www.cplusplus.com/reference/cstring/

Using C++ I would do the following (using buffer and size variables from your code):

    // copy char array to std::string
    std::string text(buffer, buffer + size);

    // define what we're looking for
    std::string begin_text("<!doctortype html");
    std::string end_text(".html>");

    // find the start and end of the text we need to extract
    size_t begin_pos = text.find(begin_text) + begin_text.length();
    size_t end_pos = text.find(end_text);

    // create a substring from the positions
    std::string extract = text.substr(begin_pos,end_pos);

    // test that we got the extract
    std::cout << extract << std::endl;

If you need C string compatibility you can use:

char* tmp =  extract.c_str();
Simon Bosley
  • 1,114
  • 3
  • 18
  • 41
  • i am using visual c++. do you think that strtok is good for me because i have to search the location of "<!doctortype html" and ".html>" and then store the contents between them . what do you suggest ? – Sss Jul 23 '13 at 13:38
  • I've updated my answer with a C++ example that I've tested using g++ compiler: g++ (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3). Let me know if it works for you – Simon Bosley Aug 09 '13 at 11:25
  • 1
    Glad you've got an answer, Many Thanks, Simon. – Simon Bosley Aug 09 '13 at 14:30
  • if you want to see you can go this link i have written the solution to find the file containing a particular file ".html" inside a buffer/stream http://stackoverflow.com/questions/17920081/how-to-skip-a-file-inside-the-tar-file-to-get-a-particular-file/17928714#17928714 it may be useful for you in future – Sss Aug 09 '13 at 14:40
0

If that's the only operation which operates on HTML code in your app, then you could use the solution I provided below (you can also test it online - here). However, if you are going to do some more complicated parsing, then I suggest using some external library.

#include <iostream>
#include <cstdio>
#include <cstring>

using namespace std;

int main()
{
    const char* beforePrefix = "asdfasdfasdfasdf";
    const char* prefix = "<!doctortype html";
    const char* suffix = ".html>";
    const char* postSuffix = "asdasdasd";

    unsigned size = 1024;
    char buf[size];
    sprintf(buf, "%s%sTHE STRING YOU WANT TO GET%s%s", beforePrefix, prefix, suffix, postSuffix);

    cout << "Before: " << buf << endl;

    const char* firstOccurenceOfPrefixPtr = strstr(buf, prefix);
    const char* firstOccurenceOfSuffixPtr = strstr(buf, suffix);

    if (firstOccurenceOfPrefixPtr && firstOccurenceOfSuffixPtr)
    {
        unsigned textLen = (unsigned)(firstOccurenceOfSuffixPtr - firstOccurenceOfPrefixPtr - strlen(prefix));
        char newBuf[size];
        strncpy(newBuf, firstOccurenceOfPrefixPtr + strlen(prefix), textLen);
        newBuf[textLen] = 0;

        cout << "After: " << newBuf << endl;
    }

    return 0;
}

EDIT I get it now :). You should use strstr to find the first occurence of the prefix then. I edited the code above, and updated the link.

podkova
  • 1,019
  • 7
  • 16
  • without knowing the location of prefix and suffix i can't get the data between them so "THE STRING YOU WANT TO GET" is not possible i think you still couldn't understand my question. – Sss Jul 23 '13 at 13:54
  • Yep, you are right, I've missed this detail. I've just updated the answer :) – podkova Jul 23 '13 at 14:10
  • so do you have any idea of getting the location in visual c++ ?? in cwe can do using strnstr but not here..any ideas ?? – Sss Jul 23 '13 at 14:21
  • The code is still the same. yourTextLen should be calculated like this: `int nPosStart = strstr(buf,prefix) - buf; int nPosEnd = strstr(buf,suffix) - buf; int yourTextLen = nPosEnd - nPosStart - strlen(prefix);` – Bojan Hrnkas Jul 23 '13 at 14:22
  • Thanks. In the previous edit, I updated only the link, now the code is updated too. – podkova Jul 23 '13 at 14:26
  • Length calculation is still wrong. You should use the positions found using strstr, instead the lengths. – Bojan Hrnkas Jul 23 '13 at 14:28
  • if i use strstr then it gives error C2665: 'strstr' : none of the 2 overloads could convert all the argument types (no instance of overloaded function ""strstr matches the argument list ) – Sss Jul 23 '13 at 14:33
  • @BojanHrnkas: thanks man, the answer is corrected one more time :). – podkova Jul 23 '13 at 14:35