0

I need to develop a small function to find occurrence in a wchar_t sequence of character. this function take as input the the pointer wchar_t* to a string, but as it's unicode the value of every single character is displayed as a number obviously.

Is there a elegant way to do this without parsing every single letter in the string and compare the unicode number? also when I try to pass the pointer to the function, this one take only the first character, why?

BenMorel
  • 34,448
  • 50
  • 182
  • 322
UncleSax
  • 148
  • 2
  • 16
  • The "C++ way" to do this would probably be to construct an `std::wstring` instance from the `wchar_t *` and use the `find()` wstring method, if I understand you correctly. Clarifying your problem, perhaps by posting some code, would be helpful. – cdhowie Sep 02 '14 at 22:58
  • It sounds like you're trying to pass a wide string to a function that expects a narrow string. Showing an example of what you're trying to do would make things much more clear. – Retired Ninja Sep 02 '14 at 22:58
  • I don't understand at all, why does a function that finds an occurrence display anything? What does "Parsing" have to do with anything? How are you passing the pointer, and to what function? – Mooing Duck Sep 02 '14 at 23:51

1 Answers1

0

std::wstring and std::wstream should do the job, provided that locale is correctly set:

#include <iostream>
#include <fstream>
#include <string>

using namespace std;

static void searchAndReport(const wstring &line) {
    wstring::size_type pos = line.find(L"な"); // hiragana "na"
    if (wstring::npos == pos) {
        wcout << L"見つかりません" << endl; // not found
        return;
    }
    for (bool first = true; wstring::npos != pos; pos = line.find(L"な", pos + 1)) {
        if (first)
            first = false;
        else
            wcout << L", " ;
        wcout << L"第" << pos << L"桁" ; // the pos-th column
    }
    wcout << endl;
}

static void readLoop(wistream &is) {
    wstring line;

    for (int cnt = 0; getline(is, line); ++cnt) {
        wcout << L"第" << cnt << L"行目: " ; // the cnt-th line:
        searchAndReport(line);
    }
}

int main(int argc, char *argv[]) {
//  locale::global(std::locale("ja_JP.UTF-8"));
    locale::global(std::locale(""));

    if (1 < argc) {
        wcout << L"入力ファイル: [" << argv[1] << "]" << endl; // input file
        wifstream ifs( argv[1] );
        readLoop(ifs);
    } else {
        wcout << L"標準入力を使用します" << endl; // using the standard input
        readLoop(wcin);
    }
}

Transcript:

$ cat scenery-by-bocho-yamamura.txt
いちめんのなのはな
いちめんのなのはな
いちめんのなのはな
いちめんのなのはな
いちめんのなのはな
いちめんのなのはな
いちめんのなのはな
かすかなるむぎぶえ
いちめんのなのはな
$ ./wchar_find scenery-by-bocho-yamamura.txt
入力ファイル: [scenery-by-bocho-yamamura.txt]
第0行目: 第5桁, 第8桁
第1行目: 第5桁, 第8桁
第2行目: 第5桁, 第8桁
第3行目: 第5桁, 第8桁
第4行目: 第5桁, 第8桁
第5行目: 第5桁, 第8桁
第6行目: 第5桁, 第8桁
第7行目: 第3桁
第8行目: 第5桁, 第8桁

All files are in UTF-8.

Be careful not to mix cout and wcout:

Environment:

$ lsb_release -a
LSB Version:    core-2.0-amd64: [...snip...]
Distributor ID: Ubuntu
Description:    Ubuntu 12.04.5 LTS
Release:        12.04
Codename:       precise
$ env | grep -i ja
LANGUAGE=ja:en
GDM_LANG=ja
LANG=ja_JP.UTF-8
$ g++ --version
g++ (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
nodakai
  • 7,773
  • 3
  • 30
  • 60