3

Have Dictionary <Int64, byte> that gets used a lot. I mean in a loop that runs for days in a big data load. The Int64 comess from two Int32. The byte happens to be the distance (count) between those two Int32 from many very long lists.

What I need to do in this loop is

  • Generate the key
  • If key does not exists in the Dictionary then insert key and value
  • If key does exists and new value (byte) is less than the existing value then replace the existing value with the new value

Right now I am using straight math to generate the key and I know there is faster way but I cannot figure it out. I put shift as a tag as I think that is how to optimize it but I cannot figure it out.

Then when the loop is complete I need to extract the two Int32 from the Int64 to insert the data into a database.

Thanks

Per comment the math I use to combine two Int32 into one Int64

        Int64 BigInt;
        Debug.WriteLine(Int32.MaxValue);
        Int32 IntA = 0;
        Int32 IntB = 1;
        BigInt = ((Int64)IntA * Int32.MaxValue) + IntB;
        Debug.WriteLine(BigInt.ToString());
        IntA = 1;
        IntB = 0;
        BigInt = ((Int64)IntA * Int32.MaxValue) + IntB;
        Debug.WriteLine(BigInt.ToString());
        IntA = 1;
        IntB = 1;
        BigInt = ((Int64)IntA * Int32.MaxValue) + IntB;
        Debug.WriteLine(BigInt.ToString());

And the best key may not be an Int64. What I have is two Int32 that together form a key. And a value of byte. I need fast lookup on that composite key. Dictionary is fast but it does not support composite key so I create a single key that is actually a composite key. In SQL Int32A, Int32B form the PK.

The reason I don't use a composite key is I want the lookup speed of Dictionary and to my knowledge Dictionary does not support composite key. This is production code. In the SQL table there is actually a third key (Int32 sID, Int32 IntA, Int32 IntB). In this parser I am only dealing with one sID at a time (and sIDs are processed in order). I started with composite key lookup to SQL (billions in a run). When I pulled IntA, IntB out to Dictionary to process a single sID then load to SQL at the completion of each sID I got a 100:1 performance improvement. Part of the performance improvement is insert as when I insert from the Dictionary I can insert in PK order. The new IntA and IntB are not produced sorted by the parse so direct insert into SQL would severely fragment the index and I would need to rebuild the index at the end of a run.

paparazzo
  • 44,497
  • 23
  • 105
  • 176
  • 1
    What do you mean by "straight math"? Please show some code to illustrate the relationship between the two int32s and the int64. – Oliver Charlesworth Apr 01 '12 at 17:43
  • @OliCharlesworth I added a simple sample of the straight math I use. – paparazzo Apr 01 '12 at 17:59
  • `Int32.MaxValue` is 2^32-1. Are you sure that's what you want? – Oliver Charlesworth Apr 01 '12 at 18:00
  • @OliCharlesworth Please propose a better way to generate a key for Dictionary where that key is actually a composite of two Int32. And then extract the two Int32 from that key. – paparazzo Apr 01 '12 at 18:07
  • 1
    @Blam: Both my answer and Bas's combine 2 Int32 values in a somewhat simpler-to-understand manner than what you've got. I suspect you were *aiming* for something like what we've got, but didn't quite get there. Do you have anything *against* just using the two sets of 32-bits entirely orthogonally? – Jon Skeet Apr 01 '12 at 18:13
  • @JonSkeet See update to my answer at the end. I am looking for the speed of a Dictionary lookup and from I can tell a Dictionary does not support a composite key. – paparazzo Apr 01 '12 at 18:38
  • @Blam: Right, so something like the solutions given would be fine, as far as I can tell... – Jon Skeet Apr 01 '12 at 18:38
  • @JonSkeet What do you mean by orthogonally? Like Dictionary > – paparazzo Apr 01 '12 at 19:25
  • @Blam: I mean that each `Int32` *just* affects 32 bits of outcome - and those are distinct sets of bits. – Jon Skeet Apr 01 '12 at 19:26
  • Oh, that second set is sparse like 200 values. A word only occurs next to a subset of the total words and I only am looking for distance of 20 or less. – paparazzo Apr 01 '12 at 19:29

3 Answers3

11

If you want to convert back and forth from Int32's to Int64's you can use a struct with explicit layout:

//using System.Runtime.InteropServices;
[StructLayout(LayoutKind.Explicit)]
struct Int64ToInt32
{
    [FieldOffset(0)]
    public Int64 Int64Value;
    [FieldOffset(0)]
    public Int32 LeftInt32;
    [FieldOffset(4)]
    public Int32 RightInt32;
}

Just set/get values from the fields.

Bas
  • 26,772
  • 8
  • 53
  • 86
  • 1
    Note that the sake of language interoperability, .NET naming conventions would recommend `LeftInt32`, `RightInt32`, `Int64Value`, and `Int64ToInt32` as names. – Jon Skeet Apr 01 '12 at 17:50
  • FYI, the OP's updated question includes code that implies that it's not just simple bit-munging... – Oliver Charlesworth Apr 01 '12 at 18:08
  • Oh! This is beautiful! It simply makes me happy. Thank you! – Carl R Jul 05 '13 at 23:04
  • That's really neat, thanks for this. A question : If, in constructor, I assign `LeftInt` and `RightInt` (from constructor parameters). Is there a way to avoid the "field (`LongValue` in this case) must be fully assigned before return to caller" compiler error message ? (other way than assigning zero to `LongValue`) – tigrou Feb 20 '15 at 18:11
  • @tigrou I don't think there is any way, however if you set LongValue to zero in a field initializer or in the constructor there should be no performance impact, since this is the default behavior for a struct anyway. – Bas Feb 20 '15 at 19:07
9

Sounds like you just want a shift. Personally I find it simpler to think about bitshifting when using unsigned types instead of signed ones:

// Note: if you're in a checked context by default, you'll want to make this
// explicitly unchecked
uint u1 = (uint) int1;
uint u2 = (uint) int2;

ulong unsignedKey = (((ulong) u1) << 32) | u2;
long key = (long) unsignedKey;

And to reverse:

ulong unsignedKey = (long) key;
uint lowBits = (uint) (unsignedKey & 0xffffffffUL);
uint highBits = (uint) (unsignedKey >> 32);
int i1 = (int) highBits;
int i2 = (int) lowBits;

It's entirely possible that you don't need all these conversions to unsigned types. It's more for my sanity than anything else :)

Note that you need to cast u1 to a ulong so that the shifting works in the right space - shifting a uint by 32 bits would do nothing.

Note that that's a way of combining two 32-integers to get a 64-bit integer. It's not the only way by any means.

(Side-note: Bas's solution works perfectly well - I'm just always somewhat uncomfortable with that sort of approach, for no specific reason.)

paparazzo
  • 44,497
  • 23
  • 105
  • 176
Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • FYI, the OP's updated question includes code that implies that it's not just simple bit-munging... – Oliver Charlesworth Apr 01 '12 at 18:08
  • @OliCharlesworth: I suspect that's more of an artifact of "this looked like it might work" than a deliberate decision. Have added a comment to check though. – Jon Skeet Apr 01 '12 at 18:14
  • @OliCharlesworth That Math is what I am using today. I am just looking to make it faster. Moving the lookup from SQL to a Dictionary was big performance improvement and looking to optimize Dictionary or if there is a better approach. – paparazzo Apr 01 '12 at 18:42
  • Thanks, tested at the bottom, top, and in between. See cosmetic edit I proposed. – paparazzo Apr 01 '12 at 19:10
  • @Blam: Yup, that's fine - sorry for getting things the wrong way round :) – Jon Skeet Apr 01 '12 at 19:25
1

You can use bit shifting to store two 32bit values in one 64 bit variable.

I shall give a small example:

int a = 10;
int b = 5;
long c;

//To pack the two values in one variable
c = (long)a << 32;
c = c + (long)b;
//the 32 most significant bits now contain a, the 32 least significant bits contain b

//To retrieve the two values:
c >> 32 == a
c - ((c>>32)<<32) == b

Edit: I see I am a bit late to the party, just wanted to check in VS if I didn't make a mistake :)

Roy T.
  • 9,429
  • 2
  • 48
  • 70