0

Basically, I found out that my web requests are only using the same proxy over and over in a web scraping project I'm doing.

public static List<string> proxyLogs = new List<string>();
private static Random random = new Random();

public static string randomizeProxy(List<string> proxies = null)
{
    if (proxies == null)
       proxies = proxyLogs;

    return proxies[random.Next(proxies.Count)];
}

Parallel.ForEach(concurrentLogs, new ParallelOptions { MaxDegreeOfParallelism = 4}, log =>
{
//my http requests
string proxyLog = randomizeProxy(proxyLogs);
Console.WriteLine(proxyLog);
});

So the parallel option thread is set to 4, the 4 requests its doing is using the same proxy over and over and not different for each thread.

What seems to be the best approach for this?

Saeid Babaei
  • 481
  • 2
  • 16
Anna P.
  • 47
  • 5
  • 2
    https://learn.microsoft.com/en-us/dotnet/api/system.random?view=netframework-4.8#ThreadSafety –  Dec 20 '19 at 18:29
  • Hi @Amy, my concern is that when on threading, the value of the proxyLog is repeating. – Anna P. Dec 20 '19 at 18:40
  • 2
    I understand your concern. Read what I linked carefully and protect access to the Random class. –  Dec 20 '19 at 18:41
  • @AnnaP. Welcome to Stack Overflow. Please take the [tour] to learn how Stack Overflow works and read [ask] on how to improve the quality of your question. Then [edit] your question to include your full source code you have as a [mcve], which can be compiled and tested by others. You have not shown what `getRandomProxy()` does, how `proxyLogs` are filled or how big `concurrentLogs` is. – Progman Dec 20 '19 at 18:44
  • You dont want a random item because random choices can repeat. What you really want is take a list, randomize the order, then go through the list in the new randomized order. – Scott Chamberlain Dec 20 '19 at 19:48
  • I agree with Scotts answer. apart from that your code looks ok – Asım Gündüz Dec 20 '19 at 19:51
  • What is the reason for using the `Parallel.ForEach` method? Do you have any intensive [CPU-bound](https://stackoverflow.com/questions/868568/what-do-the-terms-cpu-bound-and-i-o-bound-mean) work to do? – Theodor Zoulias Dec 20 '19 at 19:52

1 Answers1

2

Anything that doesn't require parallel, put outside the ForEach. There is no reason random number selection needs to go in there (especially since it isn't thread-safe).

var data = concurrentLogs.Select
(
    log => new { Log = log, Proxy = randomizeProxy(proxyLogs) } 
).ToList();
Parallel.ForEach( data, new ParallelOptions (MaxDegreeOfParallelism = 4}, item =>
{
    var log = item.Log;
    var proxyLog = item.Proxy;
    Console.WriteLine(proxyLog);
});
John Wu
  • 50,556
  • 8
  • 44
  • 80