8

My understanding is that an Int value is a pointer to a thunk (double indirection) and an unboxed Int# is just a pointer to a 32/64 bit int. Is that correct? How does the pointer encode the fact that it's referring to an unboxed value?

The Haskell standard states that an Int is "A fixed-precision integer type with at least the range [-2^29 .. 2^29-1]". Is there some optimization in GHC where those extra bits are used to eliminate the indirections?

Daniel Fischer
  • 181,706
  • 17
  • 308
  • 431
Daniel
  • 26,899
  • 12
  • 60
  • 88
  • 1
    http://stackoverflow.com/a/3256825/83805 and http://hackage.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/HeapObjects – Don Stewart Jun 28 '13 at 07:33
  • 1
    @DonStewart That answer just says that Int# takes a word of storage, and is not _explicit_ about the fact that word is just the integer itself, nothing else, which is what the OP needed to know. – AndrewC Jun 29 '13 at 13:27

1 Answers1

13

The GHC documentation has some good information. But basically, you're correct in saying that an Int value is a pointer to a thunk. However, an unboxed value is not a pointer to the unboxed value, it is the unboxed value itself. Also, the Haskell standard report merely gives the lower limit on the range of Int. IIRC, GHC Int's have more than 30-bits.

I don't think GHC uses the extra bits of unboxed types to store any metadata, but it does use the bits of pointers to do so. See this page for more details.

  • Correct. In GHC `Int` tends to be 32 bits (ex: on x86 systems) or 64 bits (ex: on x86_64). If the difference between 29, 32, and 64 bit Ints is important then users should leverage `Int32`, `Int64`, or perhaps Integer. – Thomas M. DuBuisson Jun 28 '13 at 01:25
  • So how does the runtime distinguish between a boxed and unboxed Int? Where is the information that the value is a raw int stored? – Daniel Jun 28 '13 at 01:37
  • 6
    The information is in the type (and kind) -- an unboxed Int has type `Int#` in GHC, and is a second-class citizen (e.g. you can't pass it to a polymorphic function directly, because a polymorphic function expects a boxed value). – shachaf Jun 28 '13 at 01:39
  • 2
    When you have an unpacked strict Int -- as in `data Foo = Foo {-# UNPACKED #-} !Int` -- that'll also be stored unboxed, as part of a `Foo` value directly. But GHC will transparently box and unbox it when necessary so you don't generally have to worry about that. (And the information is again known at compile-time.) – shachaf Jun 28 '13 at 01:43
  • OK then where is the type information stored? – Daniel Jun 28 '13 at 16:59
  • 2
    It doesn't need to be stored at all, per se; the compiled code already knows what's boxed and what's unboxed. – Louis Wasserman Jun 28 '13 at 17:07
  • Louis is dead right: the joy of static typing is that it's all handled at compile time. There's no need to store type information at runtime, because you can't cast or mix types anyway. If it's in the right place, it's the right type. – AndrewC Jun 29 '13 at 13:24
  • OK then I'm confused why does OCaml needs the 31 bit Int hack? – Daniel Jun 29 '13 at 19:51
  • 1
    @DanielVelkov in GHC, unboxed integers are second-class citizens, you cannot pass them to a polymorphic function. (You can pass a pointer to them, and then the information that they are integers not to be traversed in stored in the header of the pointing block). In particular, they can only flow in places where the compiler knows statically that they are there, and generates the right GC-handling code for them. OCaml integers can be passed to polymorphic functions, so a polymorphic function can't know statically whether its argument values are integers or a pointers, it needs a tag. – gasche Oct 23 '17 at 14:01