85

I am a student at university and our task is to create a search engine. I am having difficulty generating a unique id to assign to each url when added into the frontier. I have attempted using the SHA-256 hashing algorithm as well as Guid. Here is the code that i used to implement the guid:

public string generateID(string url_add)
{
    long i = 1;

    foreach (byte b in Guid.NewGuid().ToByteArray())
    {
        i *= ((int)b + 1);
    }

    string number = String.Format("{0:d9}", (DateTime.Now.Ticks / 10) % 1000000000);

    return number;
}
Mat
  • 202,337
  • 40
  • 393
  • 406
strange_developer
  • 1,327
  • 2
  • 12
  • 17

7 Answers7

136

Why not just use ToString?

public string generateID()
{
    return Guid.NewGuid().ToString("N");
}

If you would like it to be based on a URL, you could simply do the following:

public string generateID(string sourceUrl)
{
    return string.Format("{0}_{1:N}", sourceUrl, Guid.NewGuid());
}

If you want to hide the URL, you could use some form of SHA1 on the sourceURL, but I'm not sure what that might achieve.

Jaime Torres
  • 10,365
  • 1
  • 48
  • 56
  • This worked... I initially wanted the id to be based on the url but this seems to work fine. Will it be able to generate large amounts of unique keys? Because the search engine will be working with a large quantity of urls – strange_developer Jul 03 '12 at 15:41
  • 19
    This will be able to produce approximately [5,316,911,983,139,663,491,615,228,241,121,400,000](http://answers.google.com/answers/threadview/id/553194.html) unique values. – Jaime Torres Jul 03 '12 at 15:44
  • Thanks alot! Thats more than enough because as url's are retrieved from the frontier they are then removed – strange_developer Jul 03 '12 at 15:47
  • Thanks! Basing it on the url worked as well! I figure that basing it on the url will make it more unique and a less chance of collision! Thanks alot!!! – strange_developer Jul 03 '12 at 15:58
  • OH no! tested the code based on the url again and it didnt work... But the first works perfectly fine! – strange_developer Jul 03 '12 at 16:01
  • I did mistakenly have it as string.format instead of string.Format... was that the source of your issue? – Jaime Torres Jul 03 '12 at 16:07
  • With regards to generating the id without basing the id on the url, will the characters always have a fixed length? Enquiring for the purpose of the db structure – strange_developer Jul 03 '12 at 19:29
  • Yes, [a guid is a well defined structure](http://msdn.microsoft.com/en-us/library/windows/desktop/aa373931(v=vs.85).aspx). The total string length of a Guid.ToString("N") will be 32 characters long. – Jaime Torres Jul 03 '12 at 19:33
  • For great justice, use `String.Format("{0}_{1:N}", sourceUrl, Guid.NewGuid())` – abatishchev Feb 15 '14 at 07:00
39

Why don't use GUID?

Guid guid = Guid.NewGuid();
string str = guid.ToString();
abatishchev
  • 98,240
  • 88
  • 296
  • 433
34

Here is a 'YouTube-video-id' like id generator e.g. "UcBKmq2XE5a"

StringBuilder builder = new StringBuilder();
Enumerable
   .Range(65, 26)
    .Select(e => ((char)e).ToString())
    .Concat(Enumerable.Range(97, 26).Select(e => ((char)e).ToString()))
    .Concat(Enumerable.Range(0, 10).Select(e => e.ToString()))
    .OrderBy(e => Guid.NewGuid())
    .Take(11)
    .ToList().ForEach(e => builder.Append(e));
string id = builder.ToString();

It creates random ids of size 11 characters. You can increase/decrease that as well, just change the parameter of Take method.

0.001% duplicates in 100 million.

Ashraf Ali
  • 573
  • 4
  • 9
  • do you think its ok to use this an order number for a E-Commerce? is there a chance that two order will get the same id using that method? considering that maybe there will be 1K or 10K orders/day? – Mahamad Husen May 05 '20 at 14:43
  • I wouldn't recommend using the above approach in your case. Best option is to use Guid. Also have a look at this https://github.com/dotnet/aspnetcore/blob/master/src/Servers/Kestrel/shared/CorrelationIdGenerator.cs – Ashraf Ali May 05 '20 at 15:27
  • 1
    well in my case, if i rephrase, i need something something exactly like in your solution(an alphanumeric string about 8 characters) for the purpose of using it as an OrderNo in an E-Commerce app, i just added your solution to my project + checking the duplicates against the DB,if yes, generatate a new one. is that CorrelationIdGenerator class fits my scenario? – Mahamad Husen May 05 '20 at 15:34
9

Why can't we make a unique id as below.

We can use DateTime.Now.Ticks and Guid.NewGuid().ToString() to combine together and make a unique id.

As the DateTime.Now.Ticks is added, we can find out the Date and Time in seconds at which the unique id is created.

Please see the code.

var ticks = DateTime.Now.Ticks;
var guid = Guid.NewGuid().ToString();
var uniqueSessionId = ticks.ToString() +'-'+ guid; //guid created by combining ticks and guid

var datetime = new DateTime(ticks);//for checking purpose
var datetimenow = DateTime.Now;    //both these date times are different.

We can even take the part of ticks in unique id and check for the date and time later for future reference.

6

If you want to use sha-256 (guid would be faster) then you would need to do something like

SHA256 shaAlgorithm = new SHA256Managed();
byte[] shaDigest = shaAlgorithm.ComputeHash(ASCIIEncoding.ASCII.GetBytes(url));
return BitConverter.ToString(shaDigest);

Of course, it doesn't have to ascii and it can be any other kind of hashing algorithm as well

daz-fuller
  • 1,191
  • 1
  • 10
  • 18
6

This question seems to be answered, however for completeness, I would add another approach.

You can use a unique ID number generator which is based on Twitter's Snowflake id generator. C# implementation can be found here.

var id64Generator = new Id64Generator();

// ...

public string generateID(string sourceUrl)
{
    return string.Format("{0}_{1}", sourceUrl, id64Generator.GenerateId());
}

Note that one of very nice features of that approach is possibility to have multiple generators on independent nodes (probably something useful for a search engine) generating real time, globally unique identifiers.

// node 0
var id64Generator = new Id64Generator(0);

// node 1
var id64Generator = new Id64Generator(1);

// ... node 10
var id64Generator = new Id64Generator(10);
Tom
  • 26,212
  • 21
  • 100
  • 111
  • Thanks for the tip! Exactly what I was looking for. – Sudhanshu Mishra May 15 '16 at 05:07
  • There's a NuGet with code at https://github.com/RobThree/IdGen that also does similar snowflake-based ids. Is the codeplex code for FlakeId owned by you? I'd like to get it to github and do a nuget if that's ok? – Sudhanshu Mishra May 15 '16 at 05:17
  • @dotnetguy, yes, I own that one. Sure, you can follow with github migration and nuget package. – Tom May 16 '16 at 11:04
-5

We can do something like this

string TransactionID = "BTRF"+DateTime.Now.Ticks.ToString().Substring(0, 10);
Mohsin Khan
  • 175
  • 11