0

How can i get the numeric representation of a string in C#? To be clear, I do not want the address of the pointer, I do not want to parse an int from a string, I want the numeric representation of the value of the string.

The reason I want this is because I am trying to generate a hash code based on a file path (path) and a number (line). I essentially want to do this:

String path;
int line;

public override int GetHashCode() {
    return line ^ (int)path;
}

I'm up to suggestions for a better method, but because I'm overriding the Equals() method for the type I'm creating (to check that both object's path and line are the same), I need to reflect that in the override of GetHashCode.

Edit: Obviously this method is bad, that has been pointed out to me and I get that. The answer below is perfect. However, it does not entirely answer my question. I still am curious if there is a simple way to get an integer representation of the value of a string. I know that I could iterate through the string, add the binary representation of that char to a StringBuffer and convert that string to an int, but is there a more clean way?

Edit 2: I'm aware that this is a strange and very limited question. Converting in this method limits the size of the string to 2 chars (2 16 bit char = 1 32 bit int), but it was the concept I was getting at, and not the practicality. Essentially, the method works, regardless of how obscure and useless it may be.

Nealon
  • 2,213
  • 6
  • 26
  • 40
  • You could use ToCharArray to get the number of each character. The hash of a string need to be calculated by looking each characters. – the_lotus Jan 29 '14 at 14:36
  • 1
    "The numeric representation of the value of the string" does not really mean anything. – Jon Jan 29 '14 at 14:36
  • @Jon, every single value on a computer breaks down into binary, and every single binary number can be converted into an integer. It might be impractical for long strings, but it still works – Nealon Jan 29 '14 at 14:38
  • @Nealon not really; if a string is 200 characters (400 bytes), tell me: what integer is that? – Marc Gravell Jan 29 '14 at 14:38
  • @MarcGravell A ridiculously large number? – Nealon Jan 29 '14 at 14:39
  • @Nealon: So what numeric value are you *expecting* to get for some sample strings? If you can't give us sample input and output, it's hard to help you. Given the context, I think taking the hash code really is what you want, but I'm hoping that this question can be used to encourage you to ask a better question next time... – Jon Skeet Jan 29 '14 at 14:39
  • 1
    @Nealon that isn't an `int`, though – Marc Gravell Jan 29 '14 at 14:40
  • @Nealon: Then it would be more accurate to say "the decimal representation of the raw bytes making up the string, converted into a numeric type", but even if practical that would be a horrible choice for a hash because the range of this value would be insanely greater than that of `line` which you are XORing it with. – Jon Jan 29 '14 at 14:41
  • @JonSkeet, Hello = 448378203247 – Nealon Jan 29 '14 at 14:41
  • @Nealon: And *why* should it be that? How did you get that number? (It might be best to give some 1 and 2 character examples - and please put them in the *question*.) – Jon Skeet Jan 29 '14 at 14:42
  • @Nealon I make it 7494100832430877696 - don't forget that each character is 2 bytes natively, unless you are taking the utf-8 encoding – Marc Gravell Jan 29 '14 at 14:43
  • @JonSkeet convert each char to binary, convert the overall string of binary to decimal. I realize its stupidly huge, which is why the answer below is infinitely better. – Nealon Jan 29 '14 at 14:44
  • You've still not explained *how* you're converting each character to binary, or how you're combining them (you realize each char is 16 bits wide, right?) - and you still haven't explained this *in the question*. – Jon Skeet Jan 29 '14 at 14:46
  • Re the edit: no, for all the reasons in the comments above – Marc Gravell Jan 29 '14 at 14:49
  • @JonSkeet using the same method sites like this do, http://www.binarytranslator.com/. obviously I'm missing something here, please tell me what. – Nealon Jan 29 '14 at 14:51
  • Never even seen that site before, but I suspect it's not doing what you would actually expect it to. Again, do you understand that a char in C# is 16 bits? – Jon Skeet Jan 29 '14 at 14:53
  • @JonSkeet I did not, but by saying that do you imply that the conversion would be limited to 2 characters before it would exceed the size of `int`? – Nealon Jan 29 '14 at 14:56
  • @Nealon: Yes, absolutely. – Jon Skeet Jan 29 '14 at 15:01
  • @JonSkeet Ok, so again, I maintain that while it is impractical, it still works. Thanks for walking through it with me though. I appreciate the clarification. – Nealon Jan 29 '14 at 15:02
  • Do you see now why I wanted concrete examples (in the question, not in comments) and how an example of one or two characters is better than an example of five characters? I've answered the question as asked now, but please bear this in mind for your next question - a question should be "ready to answer" without having to tease out a lot more information. – Jon Skeet Jan 29 '14 at 15:08
  • @JonSkeet yes, I didn't understand the amount of information needed due to a misconception on my part. I generally try to be thorough in asking questions. – Nealon Jan 29 '14 at 15:10

2 Answers2

3

If all you want is a HashCode, why not get the hashcode of the string too? Every object in .net has a GetHashCode() function:

public override int GetHashCode() {
    return line ^ path.GetHashCode();
}
w5l
  • 5,341
  • 1
  • 25
  • 43
  • 1
    that sir, is a fantastic idea. – Nealon Jan 29 '14 at 14:36
  • @Nealon just note that this a: has collisions, and b: can change between runs (no, seriously: in .NET 4.5 there is intentionally a hash-code scrambler built in, so you get different values in different app-domains) - make sure this is only needed inside a single run – Marc Gravell Jan 29 '14 at 14:39
  • Good point, but he states in the question already that he is overriding `Equals()` for the final comparisson. – w5l Jan 29 '14 at 14:41
  • @MarcGravell, thanks, but yes, I'm aware of that, I'm only doing this because I need it to match my implementation of `Equals()` as @willemDuncan stated – Nealon Jan 29 '14 at 14:43
  • 1
    @Nealon fair enough; note that in a `GetHashCode()` method, it is pretty routine to find yourself calling other downstream `GetHashCode()` methods. – Marc Gravell Jan 29 '14 at 14:48
1

For the purposes of GetHashCode, you should absolutely call GetHashCode. However, to answer the question as asked (after clarification in comments) here are two options, returning BigInteger (as otherwise you'd only get two characters in before probably overflowing):

static BigInteger ConvertToBigInteger(string input)
{
    byte[] bytes = Encoding.BigEndianUnicode.GetBytes(input);
    // BigInteger constructor expects a little-endian byte array
    Array.Reverse(bytes);
    return new BigInteger(bytes);
}

static BigInteger ConvertToBigInteger(string input)
{
    BigInteger sum = 0;
    foreach (char c in input)
    {
        sum = (sum << 16) + (int) c;
    }
    return sum;
}

(These two approaches give the same result; the first is more efficient, but the second is probably easier to understand.)

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194