1

I found something interesting while doing a HW question.

The howework question asks to code the Median Maintenance algorithm.

The formal statement is as follows:

The goal of this problem is to implement the "Median Maintenance" algorithm (covered in the Week 5 lecture on heap applications). The text file contains a list of the integers from 1 to 10000 in unsorted order; you should treat this as a stream of numbers, arriving one by one. Letting xi denote the ith number of the file, the kth median mk is defined as the median of the numbers x1,…,xk. (So, if k is odd, then mk is ((k+1)/2)th smallest number among x1,…,xk; if k is even, then m1 is the (k/2)th smallest number among x1,…,xk.)

In order to get O(n) running time, this should be implemented using heaps obviously. Anyways, I coded this using Brute Force (deadline was too soon and needed an answer right away) (O(n2)) with the following steps:

  1. Read data in
  2. Sort array
  3. Find Median
  4. Add it to running time

I ran the algorithm through several test cases (with a known answer) and got the correct results, however when I was running the same algorithm on a larger data set I was getting the wrong answer. I was doing all the operations using Int64 ro represent the data. Then I tried switching to Int32 and magically I got the correct answer which makes no sense to me.

The code is below, and it is also found here (the data is in the repo). The algorithm starts to give erroneous results after the 3810 index:

    private static void Main(string[] args)
    {
        MedianMaintenance("Question2.txt");
    }

    private static void MedianMaintenance(string filename)
    {
        var txtData = File.ReadLines(filename).ToArray();
        var inputData32 = new List<Int32>();
        var medians32 = new List<Int32>();
        var sums32 = new List<Int32>();
        var inputData64 = new List<Int64>();
        var medians64 = new List<Int64>();
        var sums64 = new List<Int64>();
        var sum = 0;
        var sum64 = 0f;
        var i = 0;
        foreach (var s in txtData)
        {
            //Add to sorted list
            var intToAdd = Convert.ToInt32(s);

            inputData32.Add(intToAdd);
            inputData64.Add(Convert.ToInt64(s));

            //Compute sum
            var count = inputData32.Count;
            inputData32.Sort();
            inputData64.Sort();
            var index = 0;

            if (count%2 == 0)
            {
                //Even number of elements
                index = count/2 - 1;
            }
            else
            {
                //Number is odd
                index = ((count + 1)/2) - 1;
            }
            var val32 = Convert.ToInt32(inputData32[index]);
            var val64 = Convert.ToInt64(inputData64[index]);
            if (i > 3810)
            {
                var t = sum;
                var t1 = sum + val32;
            }
            medians32.Add(val32);
            medians64.Add(val64);
            //Debug.WriteLine("Median is {0}", val);
            sum += val32;
            sums32.Add(Convert.ToInt32(sum));
            sum64 += val64;
            sums64.Add(Convert.ToInt64(sum64));
            i++;
        }
        Console.WriteLine("Median Maintenance result is {0}", (sum).ToString("N"));
        Console.WriteLine("Median Maintenance result is {0}", (medians32.Sum()).ToString("N"));

        Console.WriteLine("Median Maintenance result is {0} - Int64", (sum64).ToString("N"));
        Console.WriteLine("Median Maintenance result is {0} - Int64", (medians64.Sum()).ToString("N"));
    }

What's more interesting is that the running sum (in the sum64 variable) yields a different result than summing all items in the list with LINQ's Sum() function.

The results (the thirs one is the one that's wrong): Console Application Results

These are the computer details: Computer details

I'll appreciate if someone can give me some insights on why is this happening.

Thanks,

Matt Burland
  • 44,552
  • 18
  • 99
  • 171
lopezbertoni
  • 3,551
  • 3
  • 37
  • 53
  • Despite the name, the `Convert` class is not a good way to convert data between known types. Are you ever getting different results from `Convert.ToInt32` and `Convert.ToInt64`? If you use `Int32.TryParse` and `Int64.TryParse` instead, do you get any failures? Any differences then? – Ben Voigt Mar 09 '15 at 18:42
  • 5
    0f is initializing a 32 bit float variable, you meant 0d or 0.0 to receive a 64 bit floating point. – jtimperley Mar 09 '15 at 18:43
  • @BenVoigt The results from the Convert.ToInt32/64 yield the same number, the sum is the one that's different. And I haven't tried to convert using TryParse yet. – lopezbertoni Mar 09 '15 at 18:45
  • @jtimperley: They shouldn't need a floating point at all. All the numbers are integers and so they should keep it all in intergers. They probably want `0L`, or else, just use explicit typing instead of `var`. – Matt Burland Mar 09 '15 at 18:45
  • As for linq, you'll probably get better results if you use strongly typed lists. new List and new List. To the last comment, they shouldn't need a bunch of this but I am not doing their homework, just pointing out their error. – jtimperley Mar 09 '15 at 18:46
  • @jtimperley you're correct, that's a bug in my code. Please post that as an answer and I will happily accept it. Thanks – lopezbertoni Mar 09 '15 at 18:47

2 Answers2

1

0f is initializing a 32 bit float variable, you meant 0d or 0.0 to receive a 64 bit floating point.

As for linq, you'll probably get better results if you use strongly typed lists.

new List<int>()
new List<long>()
jtimperley
  • 2,494
  • 13
  • 11
  • 1
    Pretty sure they want `0L` or just `int64 sum64 = 0`. And they are using typed lists. – Matt Burland Mar 09 '15 at 18:49
  • @matt-burland They had a floating point, I gave a floating point... Still not doing their homework. – jtimperley Mar 09 '15 at 18:50
  • @matt-burland Incorrect is relative, either data type will could produce the same result. My answer is intentionally agnostic to intentions. There is a huge difference between fixing an issue with someones solution and writing a better implementation. – jtimperley Mar 09 '15 at 18:55
  • 1
    I would hardly describe including such a minor detail as "writing a better implementation". If you insist on simply pointing out the problem and not helping with "homework", why suggest an alternate data type at all? I agree it's a bit misleading – aw04 Mar 09 '15 at 19:01
1

The first thing I notice is what the commenter did: var sum64 = 0f initializes sum64 as a float. As the median value of a collection of Int64s will itself be an Int64 (the specified rules don't use the mean between two midpoint values in a collection of even cardinality), you should instead declare this variable explicitly as a long. In fact, I would go ahead and replace all usages of var in this code example; the convenience of var is being lost here in causing type-related bugs.

KeithS
  • 70,210
  • 21
  • 112
  • 164
  • Understood, the problem was I was adding a float to a long. Any insights on why the sum starts to fail? Or is it just a precision error that gets carried over through the running sum? – lopezbertoni Mar 09 '15 at 19:14
  • 2
    If the numbers get big enough, definitely. Floating-point numbers have a fixed "significand"; only the first X binary digits can be accurately represented regardless of the magnitude of the number. So, at high enough values (for a "single-precision" float that's 24 bits or about 7 decimal digits), you'll start to see the least significant digits gets chopped off. – KeithS Mar 09 '15 at 19:19