4

I am implementing a C-based programming language and I would like to implement a compilation mode that is agnostic to whether it runs in 32-bit or 64-bit mode. All my data types have explicit width, so binary compatibility there is not a problem, the only problematic aspect is pointers.

So what if I go for an explicit 64-bit implementation of pointers even under 32-bit mode? IIRC pretty much all memory controllers are at least 64-bit, so reads and writes will still be a single cycle, but what about integer arithmetic?

Besides the increase of memory footprint, are there any potential drawbacks to such an approach? Any other potential caveats?

Edit:

Let me clarify the scenario context - the original question was a little off. I need that "binary agnostic mode" for an interpreter bytecode to be able to dynamically bridge different native binaries. Naturally, there is little to no point of using a pointer from a 64-bit binary in a 32-bit binary, but the width of the pointers affects the offsets for the locations of the other data, which is what will primarily be interchanged. So in short, the idea is the following - waste a bit of space for the sake of making a data structure binary compatible to both 32 and 64-bit binaries.

phuclv
  • 37,963
  • 15
  • 156
  • 475
  • Your goal is not clear. Do you want source-level or binary-level compatibility? You cannot have the latter, at least on Intel architectures. The former is possible but I don't see what it would buy you. – n. m. could be an AI Nov 21 '13 at 10:42
  • Why would binary compatibility be impossible? –  Nov 21 '13 at 10:44
  • You cannot run a 64-bit executable, in any language, on a 32-bit OS. The CPU simply won't go to a 64-bit mode. It's a privileged transition and the OS will not let it happen. You would have to write a kernel-mode component to switch back and forth between 32 and 64 bit modes upon each system call or context switch. – n. m. could be an AI Nov 21 '13 at 11:01
  • Yes, this goes without saying, I target a more specific use scenario - besides compiling to native my language also supports compilation to bytecode for interpretation, which would be the fabric to bridge between 32 and 64 bit binaries as long as I retain full binary compatibility in the program layout. –  Nov 21 '13 at 11:03
  • The notions of "32-bit mode" and "64-bit mode" are not necessarily applicable to interpreted code. These modes are properties of the architecture *specifications*. You can (very inefficiently) implement AMD64 architecture specs on a 2-bit FPGA and it will run 64-bit Windows (slowly). For interpreted code, your architecture specifications are your interpreter specifications. It doesn't matter how your interpreter is implemented. It could be a 32-bit executable, a 64-bit executable, or a Turing machine made of rusty railway cars. All that matters is its interface towards the outside world. – n. m. could be an AI Nov 21 '13 at 11:13
  • Java would be an example of an interpreted language that is "bitness-agnostic". But it achieves that not by specifying that pointers are 64 or 128 or whatever bits wide. It could have donee that, but instead it hides the notion of "pointer size" (and "object size") from the programmer altogether. I recommend you consider this route. – n. m. could be an AI Nov 21 '13 at 11:20
  • Yes, but I want to provide pointers as a language feature, not to mention my native code is actually produced from generating from compiling generated C code, alas I don't have neither the resources nor the knowledge to create decent compilers directly to assembly for each architecture I want to target. I'd rather reuse existing compilers. –  Nov 21 '13 at 11:23
  • Java the language might be partially hiding pointers from the programmer, but JVM the architecture does not (it calls them "references" but they are the same thing). What they both do hide is pointer *size*. There's no `sizeof` in Java. Your language might hide sizes and withhold `sizeof` too, it's not hard to manage without. – n. m. could be an AI Nov 21 '13 at 11:28
  • I don't plan on practically hiding sizes, just do discourage from using them by burying them a little deeper so that the language can still be extendable. I think `sizeof` was mandated mainly by `malloc` which is ugly, slow to type and inconvenient compared to C++'s `new` which does exactly the same - sweeps `sizeof` under the rug. –  Nov 21 '13 at 11:32
  • If you design your language such that sizeof is available but not actually needed for anything, *and* design your byte code such that it can be loaded in both 32 and 64 bit modes, and discourage exposing raw memory images, then your programs would be *mostly* bitness-agnostic. – n. m. could be an AI Nov 21 '13 at 11:50
  • Only the "dynamic" aspects will be shared as raw memory images - e.g. the bytecode and associated user data, the native binaries are not intended to be shared directly, only through the interpreted layer. That is why I need it to output only datatypes with guaranteed width so there is no mismatch in offsets if a pointer for some reasons exists in such a data structure. It may not make sense for for a pointer to be put there in the first place, but I also want to make the native and dynamic layers conceptually compatible, so the same data can actually exist in both forms. –  Nov 21 '13 at 11:57
  • If there are no pointers the layouts should be compatible. Sharing pointers is a bad idea anyway so you may just disallow it. – n. m. could be an AI Nov 21 '13 at 13:17

2 Answers2

1

You can use the uintptr_t type.

It is an unsigned int that is guaranteed to be the same size as a pointer.

Its definition is standard in C++11 and in C99 (use the <stdint.h> header file).

If you want the pointer to be always 64-bit, you can use uint64_t. However, this is unsafe on systems with 128-bit pointers.

Sergey K.
  • 24,894
  • 13
  • 106
  • 174
  • Since the question is about C, it's also part of C99, defined in ``. – Oswald Nov 21 '13 at 10:35
  • Any example of 128 bit machines? –  Nov 21 '13 at 10:43
  • @user2341104: http://en.wikipedia.org/wiki/128-bit C/C++ standards do not forbid it. And yes, you will have to patch this kind of code once you want to build for it. – Sergey K. Nov 21 '13 at 10:49
  • @SergeyK. - surely, modern processors support 128, 256 even 512 bit operations through SIMD, but if I am not mistaken this does not concern pointers. Also, I don't intend for my language to target more than current intel, amd and arm processors, which are all either 32 or 64 bit. –  Nov 21 '13 at 10:52
  • Machines with 128-bit pointers probably do not exist. There's no need. – n. m. could be an AI Nov 21 '13 at 10:54
  • @user2341104: then ``uint64_t`` is the choice of yours. – Sergey K. Nov 21 '13 at 10:55
0

It is not the case that 64 bit access is atomic on all 32 bit machines. As for arithmetic, typical 32 bit machines do not have 64 bit arithmetic units and so the compilers implement 64 bit arithmetic using runtime support functions built on top of 32 bit arithmetic units.

I suppose that if you did hold a 32 bit pointers in a 64 bit data type, you would just ignore half of the bits. When running in 32 bit mode you would perform 32 bit operations on the 64 bit pointer. There's clearly no point in performing 64 bit operations when only 32 of the bits have meaning. In which case you'd just have a 32 bit type, stored in a 64 bit slot, with half of the bits wasted.

In light of this, what you are describing seems pointless to me. It seems to me that you have decided that it is desirable for all data types to have the same size irrespective of whether you are on 32 or 64 bit. That may be a desirable goal for some data types, but it's not so for pointers.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
  • So I should be practically safe in single threaded scenarios? And I think I can still lock 64 bit data types in 32bit mode using a 32 atomic? Correct me if I am wrong, I'm just making rather uneducated assumptions. –  Nov 21 '13 at 11:04
  • I don't really understand that comment. It seems to me that what you are attempting is pointless. Can you give a good reason why you want to store 32 bit pointers in a 64 but type? Is your goal to make the resulting code run really slowly? – David Heffernan Nov 21 '13 at 11:05
  • The goal is to be able to create data structures for an interpreter to create fabric between 32 and 64 bit native binaries, so both can operate on the same data without any offset mismatch. –  Nov 21 '13 at 11:07
  • I see absolutely no reason for you to cripple your program in order to meet that goal. It can easily enough be met without such a wasteful storage of pointers. – David Heffernan Nov 21 '13 at 11:09
  • It is not intended for big program, just for tiny bridges in shared memory between native code. My hope is that the ease of interfacing between different architecture binaries will outweigh the potential overheads. –  Nov 21 '13 at 11:16
  • So use 64 bits if you wish. But obviously you would not perform 64 bit arithmetic, or 64 bit reads/writes when executing in 32 bit mode. Right? – David Heffernan Nov 21 '13 at 11:22
  • Well, not on 64 bit pointers anyway, but in the case of user data - yes. This part of my project is still at the "design" stage, thus the associated research to avoid as many issues down the line as possible. –  Nov 21 '13 at 11:29
  • The question is all about pointers isn't it? Anyway, you can certainly store a 32 bit pointer in a 64 bit integer, perform 32 bit arithmetic on the 32 bit part of that 64 bit integer. It will work. It just seems wrong to me. – David Heffernan Nov 21 '13 at 11:30
  • Well, it concerns pointers, but not their usage, just their binary footprint and its effect on the overall binary footprint. It doesn't even make sense for a 32bit binary to do any work on a 64bit pointer, but it must be able to do work on other data, whose location will depend on how wide the pointer is. Maybe I did not formulate the OP well enough. –  Nov 21 '13 at 11:35
  • All you need is an abstraction layer, and you can avoid all of this. – David Heffernan Nov 21 '13 at 11:40
  • I considered this "compilation mode" to be that abstraction layer. Maybe you can direct me to a better approach - it will be much appreciated. –  Nov 21 '13 at 11:43
  • I cannot supply details. Only you know your problem. I've just answered what I think to be the question that you asked. – David Heffernan Nov 21 '13 at 11:44