0

There are tons of example of reading unicode file in C, but could not find anything specific to using glib library in a platform independent way. I am totally newbie in this stuff (coming from .net world). My requirement is to read a Unicode file using glib. I am using gcc as compiler.

halfer
  • 19,824
  • 17
  • 99
  • 186
Anindya Chatterjee
  • 5,824
  • 13
  • 58
  • 82

1 Answers1

2

Perhaps you're having trouble because unicode has several different encodings, and reading each one is a bit different. The most popular these days is UTF-8, and you can use something like g_data_input_stream_read_line for that. For other encodings you could use g_data_input_stream_read_upto (just pass the byte sequence for newline as stop_chars, and the width as stop_chars_len).

nemequ
  • 16,623
  • 1
  • 43
  • 62
  • That's a nice idea but I am inclined not use GObject. – Anindya Chatterjee Jun 12 '13 at 04:31
  • Why? The only legitimate reason I can think of for avoiding GObject is performance, but or disk I/O I really doubt that is an issue, since actually reading is likely going to take several orders of magnitude longer than constructing a GObject. – nemequ Jun 20 '13 at 17:56
  • Performance is the main concern here. I am actually working for an interpreter like program and trying avoid heavy burdens like constructing GObjects. The program has to be fast. So is there any other straight forward alternative you can suggest? – Anindya Chatterjee Jun 20 '13 at 18:26
  • 1
    Performance of GObject isn't really a concern here. The time required to construct a GObject will insignificant compared to the time required for I/O, even on an SSD. That said, you could probably use GIOChannel (g_io_channel_read_line, g_io_channel_read_line_string, etc.) after calling g_io_channel_set_encoding. Or you could just roll your own--it's a pretty trivial function to write as long as you know the encoding. – nemequ Jun 27 '13 at 01:31
  • I tried to read the line using g_io_channel_read_line, the line with normal ASCII character reads just fine, but the line with utf-8 chars, giving me GError = 0x8005b278 "Invalid byte sequence in conversion input". Before reading I have used g_io_channel_set_encoding on the channel for utf-8. Any idea? – Anindya Chatterjee Jun 29 '13 at 13:44
  • Did you set it to utf-8 or UTF-8 (case matters)? Are you sure the input is actually valid UTF-8? – nemequ Jun 30 '13 at 07:48
  • I have post anjother question regarding this : http://stackoverflow.com/questions/17383930/getting-error-while-reading-unicode-file-in-c/17387394?noredirect=1#17387394 – Anindya Chatterjee Jun 30 '13 at 10:23