0

In linux I have created a file with Turkish characters and changed file characterset to "ISO-8859-9". With below cpp, I am trying to convert it to UTF-8. But iconv returns empty outbuffer. But "iconv" returns "inbytesleft" as "0" means conversion done on input. What could be the mistake here?

My linux file format: [root@osst212 cod]# file test.txt test.txt: ISO-8859 text

[root@osst212 cod]# cat test.txt --> Here my putty Characterset setting is ISO-8859-9 fıstıkçı şahap

#include <string>
#include <iostream>
#include <locale>
#include <cstdlib>
#include <fstream>
#include <string>
#include <sstream>
#include <iconv.h>
#include <cstring>
#include <cerrno>
#include <csignal>

using namespace std;

int main()
{

const char* lna = getenv("LANG");
cout << "LANG is " << lna << endl;
setlocale(LC_ALL, "tr_TR.ISO8859-9");

ifstream fsl("test.txt",ios::in);
string myString;
if ( fsl.is_open() ) {
        getline(fsl,myString); }

size_t ret;
size_t inby = sizeof(myString);                   /*inbytesleft for iconv */
size_t outby = 2 * inby;                          /*outbytesleft for iconv*/

char* input = new char [myString.length()+1];     /* input buffer to be translated to UTF-8 */
strcpy(input,myString.c_str());
char* output = (char*) calloc(outby,sizeof(char)); /* output buffer */

iconv_t iconvcr = iconv_open("UTF-8", "ISO−8859-9");
if ((ret = iconv(iconvcr,&input,&inby,&output,&outby)) == (size_t) -1) {
        fprintf(stderr,"Could not convert to UTF-8 and error detail is \n",strerror(errno)); }

cout << output << endl;
raise(SIGINT);
iconv_close(iconvcr);

}

Local variables after iconv called are as below, when I run it under gdb. You can see output is empty.

(gdb) bt
#0  0x00007ffff7224387 in raise () from /lib64/libc.so.6
#1  0x0000000000401155 in main () at stack.cpp:41
(gdb) frame 1
#1  0x0000000000401155 in main () at stack.cpp:41
41      raise(SIGINT);
(gdb) info locals
lna = 0x7fffffffef72 "en_US.UTF-8"
fsl = <incomplete type>
ret = 0
inby = 0
outby = 4
myString = "f\375st\375k\347\375 \376ahap"
input = 0x606268 " \376ahap"
output = 0x60628c ""
iconvcr = 0x606a00
273K
  • 29,503
  • 10
  • 41
  • 64
  • Now, look at what all the values are ***before*** calling `iconv`, figure out if all those values are what you think they should be, and the problem should be fairly obvious. You should also figure out which exact line in your code makes sure that whatever ends up in `output` is properly `'\0'` terminated, so that the `<<` operator works correctly for a plain character pointer (but that's just an additional problem). – Sam Varshavchik Dec 12 '21 at 18:08

1 Answers1

0

man 3 iconv

The iconv() function converts one multibyte character at a time, and for each character conversion it increments *inbuf and decrements *inbytesleft by the number of converted input bytes, it increments *outbuf and decrements *outbytesleft by the number of converted output bytes.

output is updated to point next not used byte in the originally allocated buffer.

The proper usage

char* nextouput = output:
if ((ret = iconv(iconvcr, &input, &inby, &nextoutput, &outby)) == (size_t) -1) {
    fprintf(stderr, "Could not convert to UTF-8 and error detail is \n", strerror(errno)); }
273K
  • 29,503
  • 10
  • 41
  • 64