0

I am writing a UI application which accept an input from the user -

up to 50,000 entries which he pastes to a Textbox, which I need to convert to List<Uint32> (Distinct)

In the process, I display the Distict list (the output) in the 'Textbox'.

I am splitting the text and converting it to a Distinct list of Uint32 Then I convert the list to array.

private List<UInt32> ConvertTextToList(string TextBoxText)
{
string[] TextBoxSplitted = TextBoxText.Split(new string[] { Environment.NewLine},StringSplitOptions.RemoveEmptyEntries); //Fast
            var TextBoxSplittedAsList = TextBoxSplitted.ToList<string>(); //Fast
            List<UInt32> lp = TextBoxSplittedAsList.ConvertAll(new Converter<string, UInt32>(element => Convert.ToUInt32(element))); //Fast
            List<UInt32> uintList = lp.Distinct<UInt32>().ToList<UInt32>(); //Fast
            UInt32[] uintListArray = uintList.ToArray(); //Fast

            //Slow part (measured 15 sec on core2duo 2.53GHz)
            StringBuilder builder = new StringBuilder();
            Array.ForEach(uintListArray, x => builder.Append(x));                
            //Done slow part

            SomeTextBox.text = builder.ToString();

            return uintList;
}

First I tried with - ListOfHeliostatsText.Text = string.Join(",", uintListArray);

Which was slower (about 25% slower than using StringBuilder)

I feel my function designed wrong, two many conversions.

Is there anyway to improve the performance of this function ?

EDIT: My bad, The slow part is the ListOfHeliostatsText.Text = builder.ToString();

I will continue reading the answers.

Ofiris
  • 6,047
  • 6
  • 35
  • 58
  • 1
    Not directly related to your question... You don't need **both** `uintList` and `uintListArray`. Since you are not adding/removing any elements, just use the array. – Branko Dimitrijevic Jan 13 '13 at 09:05
  • Related: http://stackoverflow.com/questions/10032805/string-join-performance-issue-in-c-sharp – Alvin Wong Jan 13 '13 at 09:12
  • Are you sure, that the user input is exactly correct? – Hamlet Hakobyan Jan 13 '13 at 09:16
  • I tried the code that you are saying is slow, and it runs in about 10 ms. on my computer, and it has the same processor. If I make the array 80 million items intead of 50000 I get close to what you say. How did you measure the time? – Guffa Jan 13 '13 at 09:18
  • My bad, the slow part is in `SomeTextBox.text = builder.ToString()` As @BrankoDimitrijevic mentioned in his answer. – Ofiris Jan 13 '13 at 14:58

3 Answers3

1

You have measured incorrectly. The slow part is not:

StringBuilder builder = new StringBuilder();
Array.ForEach(uintListArray, x => builder.Append(x)); 

The slow part is:

SomeTextBox.Text = builder.ToString();

The problem is that you are feeding one huge line to the text box. If you put each string in its own line....

Array.ForEach(uintListArray, x => builder.AppendLine(x.ToString()));

...you'll observe roughly a 50-fold speedup.

Branko Dimitrijevic
  • 50,809
  • 10
  • 93
  • 167
  • I am accepting this answer. Although it doesn't solve my problem, it pointed my measure mistake. I will use [this](http://stackoverflow.com/questions/4398037/is-there-way-to-speed-up-displaying-a-lot-of-text-in-a-winforms-textbox) - Speeding up winforms textbox. Thanks everyone for your answers. – Ofiris Jan 13 '13 at 15:20
  • @Ofiris So splitting to multiple lines is not a solution... then what is the purpose of storing this huge line in the text box? How is the user supposed to make sense of it? I think you are trying to solve a "problem" that should not have been there in the first place (through a better UI design)... – Branko Dimitrijevic Jan 13 '13 at 16:21
  • You are right, it might be a bad UI design, the point is the user pastes / load from file the input and I have to display it, usually it will be small inputs, but I have no control how large the input is. – Ofiris Jan 13 '13 at 16:35
  • @Ofiris The input is not a problem, performance-wise. Do you have the control of the format of the _output_? Can you make it the same format as the input (i.e. delimit individual values by new lines)? – Branko Dimitrijevic Jan 13 '13 at 16:41
  • I try to process the text into a `uint` array, I can display the text in any format I choose, but I thought to beautify it, E.g `1 2,3;4,1,1` should be--> `1,2,3,4` – Ofiris Jan 13 '13 at 16:46
0

Can you try this please:

private List<UInt32> ConvertTextToList(string TextBoxText)
{
   ....
    var TextBoxSplittedAsList = TextBoxSplitted.ToList<string>(); //Fast

    TextBoxSplittedAsList.Select(int.Parse).ToList();
    TextBoxSplittedAsList.Distinct().ToList(); // to get the distinct values
bonCodigo
  • 14,268
  • 1
  • 48
  • 91
0

With potentially that many entries, I don't think using the string splitting operations to get intermediate values into an intermediate array will not help. That's a lot of overhead. If speed and efficiency is your goal, you should tokenize it by effectively reading through the string generating the items as they are read. This way, you won't have nor need an intermediate array holding all those values.

If you want to get all the distinct values, you can throw everything into a HashSet<T>. However the example I will show here will use some LINQ and the Distinct() method (which has its own overheads).

// a naive tokenizing iterator
IEnumerable<string> Tokenize(string str, string separator)
{
    var current = 0;
    while (current < str.Length)
    {
        // we're effectively scanning through the string
        var next = str.IndexOf(separator, current);
        if (next == -1)
        {
            next = str.Length;
        }
        var token = str.Substring(current, next - current);
        yield return token;
        current = next + 1;
    }
}

List<uint> ConvertTextToList(string text)
{
    return Tokenize(text, ",")
        .Select(token => Convert.ToUInt32(token))
        .Distinct()
        .ToList();
}

And take my advice, don't make that method do anything more than just generate that list. You can fill that text box outside of that function, it doesn't belong there.

Jeff Mercado
  • 129,526
  • 32
  • 251
  • 272