5

First of all sorry for my bad english. I have done my research but there isn't any related answers to solve my problem. I have understood and learnt about CodePages Utf 8 and other stuff about in c or c++, and also know that strings can hold utf8. My development machine winxp english with console codepage set to 1254 (windows turkish) and I can use turkish extended chars (İığşçüö) in std::string, count them and send them to mysqlpp api to write dbs. There is no problem. But when I want to use curl to fetch some html and write it to std::string my problem starts.

#include <iostream>
#include <windows.h>
#include <wincon.h>
#include <curl.h>
#include <string>
int main()
{
   SetConsoleCP(1254);
   SetConsoleOutputCP(1254);
   std::string s;
   std::cin>>s;
   std::cout<<s<<std::endl;
   return 0;
}

When I run these and type ğşçöüİı the output is the same ğşçöüİı;

#include <iostream>
#include <windows.h>
#include <wincon.h>
#include <curl.h>
#include <string.h>

size_t writer(char *data, size_t size, size_t nmemb, std::string *buffer);
{
   int res;
   if(buffer!=NULL)
   {
      buffer->append(data,size*nmemb);
      res=size*nmemb;
   }
   return res;
}
int main()
{
   SetConsoleOutputCP(1254);
   std::string html;
   CURL *curl;
   CURLcode result;
   curl=curl_easy_init();
   if(curl)
   {
      curl_easy_setopt(curl, CURLOPT_URL, "http://site.com");
      curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, writer);
      curl_easy_setopt(curl, CURLOPT_WRITEDATA, &html);
      result=curl_easy_perform(curl);
      if(result==CURLE_OK)
      {
         std::cout<<html<<std::endl;
      }
   }
   return 0;
}

When I compile and run;

if the html contains 'ı' prints out to cmd 'ı','ö' prints out 'Ķ', 'ğ' pirnts out 'ÄŸ', 'İ' prints out 'Ä˚' etc..

if I change the CodePage to 65000,

...
SetConsoleOutputCP(65000);//For utf8
...

Then result is the same so problem's cause isn't cmd CodePage.

Respond http headers indicates charset setted to utf-8 and html metadata is the same.

As I understood, source of problem is the function "writer" or "curl" itself. Incoming data parsed to chars so extended chars like ı,İ,ğ parsed to 2 chars and written to char array std::string with that way thus codepage equivalent of these half chars printing out or used anywhere in code(such as mysqlpp to write that string to db).

I dont know how to solve this or what to do in writer function or anywhere else. Am I thinking right? if so What can I do about this problem? Or is problem's source in elsewhere?

Im using mingw32 Windows Xp 32bit Code::Blocks ide.

uoakinci
  • 47
  • 3
  • Welcome to stackoverflow! Don't worry about your English, it will get better with the time you spend here. I know mine did :) – Konerak Nov 27 '11 at 15:46
  • Sorry for off-topic, but what language is it (ğşçöüİı, I mean)? – Violet Giraffe Nov 27 '11 at 15:54
  • ğçşçöüİı are the special letters in Turkish different than English and also letters xwq arent present in turkish alphabet. – uoakinci Nov 27 '11 at 16:57

2 Answers2

1

The correct codepage for UTF-8 is 65001, not 65000.

Also, have you checked if setting the codepage succeeds? The SetConsoleOutputCP function indicates success or failure by its return value.

sth
  • 222,467
  • 53
  • 283
  • 367
  • Sorry for my error. GetConsoleOutputCP(); returns the same as what I have setted. When I set 65001 writing string to out stops at meaningless point at about metatags. App has 0 cpu usage and not hanged. Cursor blinking. I can input chars. Statements afters this point isnt executed. But before this printed extended chars seen on screen are correct. We have a new problem. Why app stopped working at middle of nothing while printing out the string? – uoakinci Nov 27 '11 at 16:54
  • Are there any strange characters in the html at the point where it stops printing? But I don't know much about windows console output. You best post a new question for this new problem. – sth Nov 27 '11 at 17:03
  • I have checked the html it stops at... – uoakinci Nov 27 '11 at 17:15
0

The returned string is utf-8, so you should set the console code page to 65001 (as recommended by sth). Or convert the string to 1254 and use the 1254 code page for console output, as you did before.

Mihai Nita
  • 5,547
  • 27
  • 27