I'm working on image mining project, and I used Hashset instead of array to avoid adding duplicate urls while gathering urls, I reached to the point of code to iterate the Hashset that contains the main urls and within the iteration I go and download the the page of the main URL and add them to the Hashet, and go on , and during iteration I should exclude every scanned url, and also exclude ( remove ) every url that end with jpg, this until the Hashet of url count reaches 0, the question is that I faced endless looping in this iteration , where I may get url ( lets call it X )
1- I scan the page of url X 2- get all urls of page X ( by applying filters ) 3- Add urls to the Hashset using unioinwith 4- remove the scanned url X
the problem comes here when one of the URLs Y, when scanned bring X again
shall I use Dictionary and the key as "scanned" ?? I will try and post the result here, sorry it comes to my mind after I posted the question...
I managed to solve it for one url, but it seems it happens with other urls to generate loop, so how to handle the Hashset to avoid duplicate even after removing the links,,, I hope that my point is clear.
while (URL_Can.Count != 0)
{
tempURL = URL_Can.First();
if (tempURL.EndsWith("jpg"))
{
URL_CanToSave.Add(tempURL);
URL_Can.Remove(tempURL);
}
else
{
if (ExtractUrlsfromLink(client, tempURL, filterlink1).Contains(toAvoidLoopinLinks))
{
URL_Can.Remove(tempURL);
URL_Can.Remove(toAvoidLoopinLinks);
}
else
{
URL_Can.UnionWith(ExtractUrlsfromLink(client, tempURL, filterlink1));
URL_Can.UnionWith(ExtractUrlsfromLink(client, tempURL, filterlink2));
URL_Can.Remove(tempURL);
richTextBox2.PerformSafely(() => richTextBox2.AppendText(tempURL + "\n"));
}
}
toAvoidLoopinLinks = tempURL;
}