C++ splitting unicode delimited string using wstring

Question

I am trying to accomplish the subject task but my code doesnt split. Here is the main function:

#define SQL_TEXT Latin_Text
#include <iostream>
#define SQL_TEXT Latin_Text
#include <sqltypes_td.h>
#include "Split.h"
#include <string>
#include <stdio.h>
#include <vector>
#include <cstring>

using namespace std;
int main ()
{
    VARCHAR_LATIN *result = new VARCHAR_LATIN[512];
    wchar_t *s1 = (wchar_t *)"Myýnameýisýzeeshan";
    **splitstringwc s(s1);
vector<wstring> flds = s.splitwc((wchar_t)'ý');**
    wstring rs = flds[1];
    wcout<<rs<<endl;
for (int k = 0; k < flds.size(); k++)
        cout << k << " => " << flds[k].data() << endl;

    cout<<result;
    return 0;
}

the code for splitstringwc class is as follows:

public:
splitstringwc(wchar_t *s) : wstring(s) { };
vector<wstring>& splitwc(wchar_t delim, int rep=0);
};


vector<wstring>& splitstringwc::splitwc(wchar_t delim, int rep) {
if (!flds1.empty()) flds1.clear();  // empty vector if necessary
wstring ws = data();
wcout<<ws<<endl;
//wcout<<delim<<endl;

//wstring ws;
//int j = StringToWString(ws, work);
wstring buf = (wchar_t *)"";
int i = 0;
while (i < ws.size()) {
    if (ws.at(i) != delim)
        buf += ws.at(i);
    else if (rep == 1) {
        flds1.push_back(buf);
        buf = (wchar_t *)"";
    } else if (buf.size() > 0) {
        flds1.push_back(buf);
        buf = (wchar_t *)"";
    }
    i++;
}
if (!buf.empty())
    flds1.push_back(buf);
return flds1;

}

the code doesnt split the input string, when i try to debug, i get segmentation fault at: wstring ws = data();

please help...............

Related: http://www.utf8everywhere.org/, http://www.joelonsoftware.com/articles/Unicode.html, — JoeG, Mar 08 '13 at 15:07

score 1 · Answer 1 · answered Mar 09 '13 at 07:04

Using strtok instead of my own split function, is splitting the string based on the unicode delimiter.

the code is as follows:

str = "Myýnameýisýzeeshan";
char *pch;
pch = strtok(str, "ý");
while (pch != NULL)
{
    printf("%s\n", pch);
    pch = strtok(NULL, "ý");
}

Please note that the str consists of ANSI strings seperated by a UNICODE delimiter.

score 0 · Answer 2 · answered Mar 08 '13 at 14:28

0

You can't use normal string and character literals when dealing with wide-character strings. They too have to be wide-character, like

const wchar_t *s1 = L"Myýnameýisýzeeshan";

Notice the L in front of the literal, this makes the string a wide-character string.

The same is used for character literals:

s.splitwc(L'ý')

answered Mar 08 '13 at 14:28

Some programmer dude

400,186
35
402
621

ok, but i am casting the normal string to wstring like this: wchar_t *s1 = (wchar_t *)"Myýnameýisýzeeshan"; this wont work? – Zeeshan Arif Mar 08 '13 at 14:37
@MuhammadZeeshanArif No it won't work with just casting. A wide character is, well, wide, and takes up more than a single byte a normal narrow-character uses. If you cast a normal string to a wide-character string, the function will behave as two or more characters of that string is a single wide character. – Some programmer dude Mar 08 '13 at 14:44
anyway, i tried s.splitwc(L'ý') but it says error: converting to execution character set: Invalid argument – Zeeshan Arif Mar 08 '13 at 14:49
@JP your answer is incorrent, there is no built-in support in standard C++ distribution to handle unicode, for this ill have to use a third part library like ICU...thats why I got the invalid argument error... – Zeeshan Arif Mar 08 '13 at 17:48
@MuhammadZeeshanArif You're right that unicode support is not good in plain C++, but if you're going to use wide character variables, then you have to use wide character literals. – Some programmer dude Mar 08 '13 at 17:59

C++ splitting unicode delimited string using wstring

2 Answers2