3

I have a program that reads approximately 2million rows from a database into a List. Each row is a location that contains information such as geographic co-ordinates.

Once data is added to the List I use a foreach loop and grab the co-ordinates to create a kml file. The loop encounters an OutOfMemoryException error when the number of rows is large (but works perfectly otherwise).

Any suggestions on how to handle this so that the program can work with very large sets of data? The kml library is SharpKML.

I am still new to C# so please go easy!

This is the loop:

            using (SqlConnection conn = new SqlConnection(connstring))
        {
            conn.Open();
            SqlCommand cmd = new SqlCommand(select, conn);

            using (cmd)
            {
                SqlDataReader reader = cmd.ExecuteReader();
                while (reader.Read())
                {
                    double lat = reader.GetDouble(1);
                    double lon = reader.GetDouble(2);
                    string country = reader.GetString(3);
                    string county = reader.GetString(4);
                    double TIV = reader.GetDouble(5);
                    double cnpshare = reader.GetDouble(6);
                    double locshare = reader.GetDouble(7);

                    //Add results to list
                    results.Add(new data(lat, lon, country, county, TIV, cnpshare, locshare));
                }
                reader.Close();
            }
            conn.Close();
        }

            int count = results.Count();
            Console.WriteLine("number of rows in results = " + count.ToString());

            //This code segment generates the kml point plot

            Document doc = new Document();
            try
            {
                foreach (data l in results)
                {
                    Point point = new Point();
                    point.Coordinate = new Vector(l.lat, l.lon);

                    Placemark placemark = new Placemark();
                    placemark.Geometry = point;
                    placemark.Name = Convert.ToString(l.tiv);

                    doc.AddFeature(placemark);

                }
            }
            catch(OutOfMemoryException e)
            {
                throw e;
            }

This is the class uused in the List

        public class data
    {
        public double lat { get; set; }
        public double lon { get; set; }
        public string country { get; set; }
        public string county { get; set; }
        public double tiv { get; set; }
        public double cnpshare { get; set; }
        public double locshare { get; set; }

        public data(double lat, double lon, string country, string county, double tiv, double cnpshare,
            double locshare)
        {
            this.lat = lat;
            this.lon = lon;
            this.country = country;
            this.county = county;
            this.tiv = tiv;
            this.cnpshare = cnpshare;
            this.locshare = locshare;
        }

    }
Matthew
  • 9,851
  • 4
  • 46
  • 77
Richard Todd
  • 2,406
  • 5
  • 32
  • 40

4 Answers4

5

Why do you need to store all the data before writing it? Rather than adding each row to a list, you should process each row as it is read, then forget about it.

For instance, try rolling your code together like this:

Document doc = new Document();
while (reader.Read())
{
    // read from db
    double lat = reader.GetDouble(1);
    double lon = reader.GetDouble(2);
    string country = reader.GetString(3);
    string county = reader.GetString(4);
    double TIV = reader.GetDouble(5);
    double cnpshare = reader.GetDouble(6);
    double locshare = reader.GetDouble(7);

    var currentData = new data(lat, lon, country, county, TIV, cnpshare, locshare));

    // write to file
    Point point = new Point();
    point.Coordinate = new Vector(currentData.lat, currentData.lon);

    Placemark placemark = new Placemark();
    placemark.Geometry = point;
    placemark.Name = Convert.ToString(currentData.tiv);

    doc.AddFeature(placemark);
}

This will only work if Document is implemented sensibly though.

Oliver
  • 11,297
  • 18
  • 71
  • 121
2

Oliver is right (up-vote from me). Performance wise you can do some other stuff. First do not query for fields you're not going to use. Then move all variable declarations (Oliver's code) before the while statement (?). Finally instead of waiting your sql server to collect and send all records back, do it progressively with steps. For example if your records have an UID and the order to get them is this UID then start with a local C# variable "var lastID = 0", change your select statement to something like (pre-format) "select top 1000 ... where UID > lastID" and repeat your queries until you get nothing or anything will less than 1000 records.

1

If there is no big delay in populating list with data from database and you did not mentioned problems with populating list with data, why not immediately create your Point and Placemark object. Code is below.

    var doc = new Document();

    using (SqlConnection conn = new SqlConnection(connstring))
    {
        conn.Open();
        SqlCommand cmd = new SqlCommand(select, conn);

        using (cmd)
        {
            var reader = cmd.ExecuteReader();
            while (reader.Read())
            {
                double lat = reader.GetDouble(1);
                double lon = reader.GetDouble(2);
                string country = reader.GetString(3);
                string county = reader.GetString(4);
                double TIV = reader.GetDouble(5);
                double cnpshare = reader.GetDouble(6);
                double locshare = reader.GetDouble(7);

                var point = new Point();
                point.Coordinate = new Vector(lat , lon );

                var placemark = new Placemark();
                placemark.Geometry = point;
                placemark.Name = Convert.ToString(TIV);

                doc.AddFeature(placemark);

            reader.Close();
        }
        conn.Close();
    }

If there is no good reason for retrieving so many data in memory, try with some lazy loading approach.

Minja
  • 1,222
  • 1
  • 19
  • 28
  • Thanks for the awesome suggestions. I do ideally need the other fields in the query as these will at some point be used to label up the KML points or do other KML operations such as polygon. I'll spend a few hours this evening going over the suggestions here and report back. Thanks all. – Richard Todd Jun 12 '12 at 17:28
  • Gave this a go. The reader seems to work as if streaming. However I now get an OutOfMemoryException when attempting to write the KML file. This will be an extremely big KML file (e.g. 50mb) but should still not be enough for OutOfMemoryException. Is there another way to more efficiently implement some sort of stream? Perhaps I will need to split the files and then join later. – Richard Todd Jun 12 '12 at 19:54
  • Are you really need so big KML file? You can try with @drdigit solution http://stackoverflow.com/a/11001300/1433917, append 1000 rows to existing KML file in every iteration. However, from my GIS knowledge, KML file with 50MB of size will be possibly slow to process by Google Maps or Google Earth, why not split it by some rule in several smaller KML files, it will decrease your query execution. – Minja Jun 12 '12 at 20:55
1

@drdigit,

I would avoid executing queries in loop. One query should always return as much data as it is needed in that moment. In this case you would have 1000 queries that returns 1000 rows. Maybe it is better for quickly showing first 1000 rows, but I'm not sure if it will be faster if you execute 1000 faster queries in loop instead you execute only one query.Maybe I'm wrong....

I think your approach is good for lazy loading if there is need for that in this situation.

Minja
  • 1,222
  • 1
  • 19
  • 28
  • Number 1000 is just an example. In most cases (if the db indexes are the proper for the queries) the difference in performance is outstanding as long as queries are executed under the same connection. A factor that may affect the performance is the network latency, but it can be balanced (up to a point) with the number of round-trips which is based on the sample number 1000. Anyway in this case it seems that we talk about a localhost environment meaning that probably there is no network latency. –  Jun 12 '12 at 17:30
  • Oops. My apologies for not seeing your first answer on time. It is correct and exactly on the same track with Olivers'. So an upvote for you since this is not a competition for the fastest typing "machine". –  Jun 12 '12 at 17:38
  • I'm not so good with DB, since I'm mostly programming oriented and did not have situation like this to test performance issues. My answer pointed to good (must have) programming practice - never put query inside loop. But, if we talk about good programming practice, querying for 2 million rows surely is not the one. – Minja Jun 12 '12 at 17:47