0

The docs on this are rather lacking so I'm hoping the community can run a simple test and post results here so that I, and anybody else, has a reference.

#include <cwchar>
sizeof( std::mbstate_t );

If you could post the results here and also mention which compiler you are using, I would be very grateful.

On VS2010 it's declared as typedef int mbstate_t; and it's size is 4 bytes for both 32 and 64 bit builds.

I'm asking this because mbstate_t is a member of streampos. I need to use this member to store the conversion state of an encoding. The minimum space I can get away with is 3 bytes so I need to know if any implementation is going to break my code.

Thanks in advance.

Twifty
  • 3,267
  • 1
  • 29
  • 54
  • @Jeyaram `mbstate_t` is also 'c', It's used with the mbsxxx() set of functions. – Twifty Jul 24 '13 at 06:40
  • The size and structure of `mbstate_t` is irrelevant (and unspecified). This means that you should not need to worry about the size or how it is defined, and most definitely not about any of its internal structure. It should *always* be "big enough" to store whatever internal state is needed by the runtime library. – Some programmer dude Jul 24 '13 at 06:41
  • @JoachimPileborg I need to store data inside it, so it's size is very relevant. – Twifty Jul 24 '13 at 06:42
  • 2
    Then you are doing it wrong! Unless you are making your own runtime library of course... But for normal programs you should never store your own data in it, only use it with the functions/classes taking it as arguments/template parameters. – Some programmer dude Jul 24 '13 at 06:44
  • @JoachimPileborg I have made a `fstream` drop in which does any encoding of files automatically. `seekpos` and `seekoff` both use the `streampos` structure which I need to modify. The structure not only holds a position, it holds the current conversion state. So I am using it correctly. – Twifty Jul 24 '13 at 06:47
  • 2
    You are still using it wrong even in that case. You don't even know if it *is* an integer type, it may be an array of characters, or a structure, or something completely different. If you use it one way which works on one compiler, then it's not portable and will most likely break if you try it on another platform or even a different version of the same compiler. – Some programmer dude Jul 24 '13 at 06:50
  • @Waldermort Okiee... I will learn :) – Jeyaram Jul 24 '13 at 06:53
  • @JoachimPileborg The docs state it to be a POD type. So it can be casted to something more useful. In my case `unsigned long`. I'm asking this question to avoid casting it to something of too large a size. – Twifty Jul 24 '13 at 06:53

3 Answers3

1

gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3 on x86_64

size = 8

gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3 on armv7l

size = 8
neuro
  • 14,948
  • 3
  • 36
  • 59
  • Thanks. I wasn't expecting an 8 bytes size, but there you go. – Twifty Jul 24 '13 at 06:43
  • If i remember correctly int size is dependent of architecture. On x86_64, 8 is to be expected. If std::mbstate_t is typedefed to int, it seems sensible. – neuro Jul 24 '13 at 06:46
  • 2
    @neuro Actually the `int` type is still 32 bits even on 64 bits systems. The `long` type changes on *some* platforms though. – Some programmer dude Jul 24 '13 at 06:47
  • @joachim, hum, I know why I never use int when I do low level things. My, was my K&R memory. Long before 64 bits systems :) – neuro Jul 24 '13 at 06:54
1

You just want know the results of the sizeof?

Qt 5.1 with GCC x86 32bit under Debian:

size = 8

0

From the C11 specification (7.29.1/2):

   mbstate_t

which is a complete object type other than an array type that can hold the conversion state information necessary to convert between sequences of multibyte characters and wide characters;

So while I was wrong in that is can be an array, it could be anything else (including a structure containing an array). The language in the specification doesn't say anything about how it should be implemented, just that it's "a complete object type other than an array type".


From the C++11 specification (multiple places, for example 21.2.3.1/4):

The type mbstate_t is defined in <cwchar> and can represent any of the conversion states that can occur in an implementation-defined set of supported multibyte character encoding rules.


In conclusion, you can not rely on mbstate_t being an integer type, or of a specific size, if you want to be portable. If you want to be portable, you have to let the standard library manage the state for you.

Some programmer dude
  • 400,186
  • 35
  • 402
  • 621
  • So, what you are saying is, one shouldn't use `mbstate_t` directly? So how would one go about creating a `codecvt` which relies on `mbstate_t` to hold the conversion state? – Twifty Jul 24 '13 at 07:15
  • Also, `mbstate_t` is ported from 'c'. So even though it may be a struct, 'c' doesn't have constructors or virtual methods. This in itself implies that it is a POD type. – Twifty Jul 24 '13 at 07:18
  • @Waldermort Ah yes you're right about it being a POD type. As for `codecvt` creation, I did a small test-program a couple of years ago to convert between UTF-8 and `wchar_t`, you can find in [in this old blog post of mine](http://pileborg.org/b2e/blog5.php/2010/06/13/unicode-utf-8-and-wchar_t). I don't know if it will help. Handling code-conversions in C++ is not easy, you might want to consider using libraries such as [`libiconv`](http://www.gnu.org/software/libiconv/) (Windows version [available here](http://gnuwin32.sourceforge.net/packages/libiconv.htm)). – Some programmer dude Jul 24 '13 at 07:35
  • No worries. I have created something similar, but it's an `fstream` drop in. Internally it reads the BOM, loads the correct `basic_filebuf` with required `locale` then converts the file to whatever data type the clients wants to work with. Ie. Can load a UTF8/UTF16BE file and output as wchar_t. The problem I'm trying to overcome is seeking within the file. I'm hoping the `streampos` structure will help solve this by storing number of code sequences and remainder code units in it's `mbstate_t` member. – Twifty Jul 24 '13 at 07:44