0

I have a pretty great SqlDataReader wrapper in which I can map the output into a strongly typed list.

What I am finding now is that on larger datasets with larger numbers of columns, performance could probably be a bit better if I can optimize my mapping.

In thinking about this there is one section in particular that I am concerned about as it seems to be the heaviest hitter

What I would really like to know, is if there is a way that I can make this loop asyncronous? I feel that will make all the difference in the world with this beast :)

Here is the entire Map method in case anyone can see where I can make further improvements on it...

IList<T> Map<T>

using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Data.Common;
using System.Data.SqlClient;
using System.Linq;
using System.Reflection;
using System.Threading.Tasks;

namespace o7th.Class.Library.Data
{
public class WrapperTest
{

    public static string Message { set { _Msg = value; } get { return _Msg; } }
    private static string _Msg;

    // Instantiate our caching methods
    internal static Common.CustomCache _Cache = new Common.CustomCache();

    private static IEnumerable<T> Map<T>(SqlDataReader dr) where T : new()
    {
        var enumerableDataReader = dr.Cast<DbDataRecord>().AsEnumerable();
        var tObj = new T();
        PropertyInfo[] propertyInfo = tObj.GetType().GetProperties();
        var batches = enumerableDataReader.Batch(10000);
        var resultCollection = new ConcurrentBag<List<T>>();
        Parallel.ForEach(batches, batch => resultCollection.Add(MapThis<T>(propertyInfo, batch)));
        return resultCollection.SelectMany(m => m.Select(x => x));
    }

    private static List<T> MapThis<T>(PropertyInfo[] propertyInfo, IEnumerable<DbDataRecord> batch) where T : new()
    {
        var list = new List<T>();
        batch.AsParallel().ForAll(record =>
        {
            T obj = new T();
            foreach (PropertyInfo prop in propertyInfo)
            {
                var dbVal = record[prop.Name];
                if (!Equals(dbVal, DBNull.Value))
                {
                    prop.SetValue(obj, dbVal, null);
                }
            }
            list.Add(obj);
        });
        return list;
    }


    public static IEnumerable<T> GetResults<T>(string _Qry, System.Data.CommandType _QryType,
                                        string[] _ParamNames = null,
                                        object[] _ParamVals = null,
                                        System.Data.SqlDbType[] _ParamDTs = null,
                                        bool _ShouldCache = false,
                                        string _CacheID = "") where T : new()
    {
        // Create a reference to a potential already cached IList
        IEnumerable<T> _CachedItem = _Cache.Get<IEnumerable<T>>(_CacheID);
        // If we're already cached, there's no need to fire up the data access objects, so return the cached item instead
        if (_CachedItem != null && _ShouldCache)
        {
            return _CachedItem;
        }
        else
        {
            // Fire up our data access object
            using (Access db = new Access())
            {
                try
                {
                    // create a new ilist reference of our strongly typed class
                    IEnumerable<T> _Query = null;
                    // set the query type
                    db.QueryType = _QryType;
                    // set the query text
                    db.Query = _Qry;
                    // make sure we've got some parameters, if we do the set them to our db access object
                    if (_ParamNames != null)
                    {
                        // set the parameter names
                        db.ParameterNames = _ParamNames;
                        // set the parameter values
                        db.ParameterValues = _ParamVals;
                        // set the parameter data types
                        db.ParameterDataTypes = _ParamDTs;
                    }
                    // start using our db access :)  Fire off the GetResults method and return back a SqlDataReader to work on
                    using (SqlDataReader r = db.GetResults())
                    {
                        // make sure the data reader actually exists and contains some results
                        if (r != null && r.HasRows)
                        {
                            // map the data reader to our strongly type(s)
                            _Query = Map<T>(r);
                        }
                    }
                    // check if we should cache the results
                    if (_ShouldCache)
                    {
                        // if so, set the query object to the cache
                        _Cache.Set<IEnumerable<T>>(_Query, _CacheID);
                    }
                    // return our strongly typed list
                    return _Query;
                }
                catch (Exception ex)
                {
                    // Catch an exception if any, an write it out to our logging mechanism, in addition to adding it our returnable message property
                    _Msg += "Wrapper.GetResults Exception: " + ex.Message + db.Message;
                    ErrorReporting.WriteEm.WriteItem(ex, "o7th.Class.Library.Data.Wrapper.GetResults", _Msg);
                    // make sure this method returns a default List
                    return default(IList<T>);
                }
            }
        }
    }
}

public static class Extensions
{
    /// <summary>
    /// Take a collection and split it into smaller collections
    /// </summary>
    /// <typeparam name="T">The Type</typeparam>
    /// <param name="collection">The collection to split</param>
    /// <param name="batchSize">The size of each batch</param>
    /// <returns></returns>
    public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> collection, int batchSize)
    {
        var nextbatch = new List<T>(batchSize);
        if (collection == null)
        {
            yield break;
        }
        foreach (T item in collection)
        {
            nextbatch.Add(item);
            if (nextbatch.Count != batchSize)
            {
                continue;
            }
            yield return nextbatch;
            nextbatch = new List<T>(batchSize);
        }
        if (nextbatch.Count > 0)
        {
            yield return nextbatch;
        }
    }
}
}

db.GetResults() is a simple ExecuteReader via using SqlClient.SqlDataReader

p.s. This is my first c# project. I am a long time basic/qbasic/vb programmer =)

Here's my Test ConsoleApp:

Test

using o7th.Class.Library.Data;
using System;
using System.Collections.Generic;
using System.Threading;

namespace Testing
{
class Program
{
    static void Main(string[] args)
    {
        long startTime = DateTime.Now.Ticks;

        IList<Typing> _T = Wrapper.GetResults<Typing>("List.ZipSearch",
                                                    System.Data.CommandType.StoredProcedure,
                                                    new string[]{"@ZipCode", "@RadiusMile"},
                                                    new object[] { "01020", 10000 },
                                                    new System.Data.SqlDbType[] { System.Data.SqlDbType.VarChar, System.Data.SqlDbType.Float},
                                                    true, "TestCache1");
        long endTime = DateTime.Now.Ticks;
        TimeSpan timeTaken = new TimeSpan(endTime - startTime);
        Console.WriteLine("Task Took: " + timeTaken + " for: " + _T.Count + " records.");

        Thread.Sleep(2000);

        long startTime2 = DateTime.Now.Ticks;
        IEnumerable<Typing> _T2 = WrapperTest.GetResults<Typing>("List.ZipSearch",
                                                    System.Data.CommandType.StoredProcedure,
                                                    new string[] { "@ZipCode", "@RadiusMile" },
                                                    new object[] { "01020", 10000 },
                                                    new System.Data.SqlDbType[] { System.Data.SqlDbType.VarChar, System.Data.SqlDbType.Float },
                                                    true, "TestCache2");

        long endTime2 = DateTime.Now.Ticks;
        TimeSpan timeTaken2 = new TimeSpan(endTime2 - startTime2);
        Console.WriteLine("Task Took: " + timeTaken2 + " for: " + _T2 + " records.");
        Console.WriteLine("");
        Console.WriteLine("Press any key to continue...");

        Console.ReadKey();
    
    }

    partial class Typing {
        public long ZipID { get; set; }
        public string ZipCode { get; set; }
        public string City { get; set; }
        public string State { get; set; }
        public string County { get; set; }
        public double Mileage { get; set; }
    }

}

}
Community
  • 1
  • 1
Kevin
  • 2,684
  • 6
  • 35
  • 64
  • I am not looking to make the method asynchronous. I am looking to make the loop that is mapping the data returned – Kevin Oct 21 '13 at 14:20
  • Little help please? Found a better way to do this (well... sort of): http://codereview.stackexchange.com/questions/33128/dal-efficiency-help – Kevin Oct 25 '13 at 19:37

2 Answers2

0

A tiny change I'd make if I used that code would be to change your if to only set using PropertyInfo when required (newObject is already default(T)):

if ((info != null) && info.CanWrite && !(_Rdr.GetValue(i) is DBNull))
{
    info.SetValue(newObject, _Rdr.GetValue(i), null);
    break;
}

That will save you an extra call to default(T) and it will also save you overwriting newObject with it's own default value. It's a TINY optimization. Also, you seen to overwrite newObject multiple times, so that makes me thing that your if is only true once, so I've added a break to save you extra enumerations, assuming a large dataset, that could save you some time too.

How about this?

    var readerValue = _Rdr.GetValue(i);
    if ((info != null) && info.CanWrite && !(readerValue is DBNull))
    {
        info.SetValue(newObject, readerValue, null);
        break;
    }

*Edit to add more code.

Not sure whether this would improve things or not:

using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Linq;

namespace ConsoleApplication1
{
    internal class Program
    {
        private static readonly SqlToObjectReflectionMappingService MappingService = new SqlToObjectReflectionMappingService();

        private static void Main(string[] args)
        {
            // Call ConvertTable here...
        }

        private static IEnumerable<T> ConvertTable<T>(DataTable dataTable) where T : new()
        {
            return MappingService.DataTableToObjects<T>(dataTable);

        }

        public class SqlToObjectReflectionMappingService : ISqlToObjectMappingService
        {
            public T DataRowToObject<T>(DataRow row, PropertyDescriptorCollection propertyDescriptorCollection)
                where T : new()
            {
                var obj = new T();
                foreach (PropertyDescriptor propertyDescriptor in propertyDescriptorCollection)
                {
                    propertyDescriptor.SetValue(obj, row[propertyDescriptor.Name]);
                }
                return obj;
            }

            public IEnumerable<T> DataTableToObjects<T>(DataTable table) where T : new()
            {
                var obj = new T();
                var props = TypeDescriptor.GetProperties(obj);
                return table.AsEnumerable().AsParallel().Select(m => DataRowToObject<T>(m, props));
            }
        }

        public interface ISqlToObjectMappingService
        {
            T DataRowToObject<T>(DataRow row, PropertyDescriptorCollection propertyDescriptorCollection) where T : new();
            IEnumerable<T> DataTableToObjects<T>(DataTable table) where T : new();
        }
    }
}

*Edit to add even more code.

using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Data.Common;
using System.Data.SqlClient;
using System.Linq;
using System.Reflection;
using System.Threading.Tasks;

namespace ConsoleApplication1
{
    internal class Program
    {
        private static void Main(string[] args)
        {
            // Call ConvertTable here
        }

        private static IEnumerable<T> Map<T>(SqlDataReader dr) where T : new()
        {
            var enumerableDataReader = dr.Cast<DbDataRecord>().AsEnumerable();
            var tObj = new T();
            PropertyInfo[] propertyInfo = tObj.GetType().GetProperties();
            var batches = enumerableDataReader.Batch(10000);

            var resultCollection = new ConcurrentBag<List<T>>();
            Parallel.ForEach(batches, batch => resultCollection.Add(MapThis<T>(propertyInfo, batch)));

            return resultCollection.SelectMany(m => m.Select(x => x));
        }

        private static List<T> MapThis<T>(PropertyInfo[] propertyInfo, IEnumerable<DbDataRecord> batch) where T : new()
        {
            var list = new List<T>();
            batch.AsParallel().ForAll(record =>
            {
                T obj = new T();
                foreach (PropertyInfo prop in propertyInfo)
                {
                    var dbVal = record[prop.Name];
                    if (!Equals(dbVal, DBNull.Value))
                    {
                        prop.SetValue(obj, dbVal, null);
                    }
                }
                list.Add(obj);
            });
            return list;
        }
    }

    public static class Extensions
    {
        /// <summary>
        /// Take a collection and split it into smaller collections
        /// </summary>
        /// <typeparam name="T">The Type</typeparam>
        /// <param name="collection">The collection to split</param>
        /// <param name="batchSize">The size of each batch</param>
        /// <returns></returns>
        public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> collection, int batchSize)
        {
            var nextbatch = new List<T>(batchSize);
            if (collection == null)
            {
                yield break;
            }
            foreach (T item in collection)
            {
                nextbatch.Add(item);
                if (nextbatch.Count != batchSize)
                {
                    continue;
                }
                yield return nextbatch;
                nextbatch = new List<T>(batchSize);
            }
            if (nextbatch.Count > 0)
            {
                yield return nextbatch;
            }
        }
    }
}
  • Another approach
Faraday
  • 2,904
  • 3
  • 23
  • 46
  • gotcha. that part seems to work pretty well... not much of an improvement. attempting the `.ForAll` bit now and getting `A first chance exception of type 'System.InvalidOperationException' occurred in System.Data.dll` inside the .ForAll – Kevin Oct 21 '13 at 17:49
  • Aye, got about 5% increase. Never heard of ConcurrentBag? – Kevin Oct 21 '13 at 18:48
  • Yeah, there now :) Seems to be almost exactly like an IList. I am getting a couple errors though now. `Index was outside the bounds of the array.` Updating code in a sec... – Kevin Oct 21 '13 at 18:56
  • I'm a tool. `var dbVal = record[prop.Name];` durr – Kevin Oct 21 '13 at 18:59
  • significantly slowed the process down for 81178 records, by over 1.5 seconds. – Kevin Oct 21 '13 at 19:01
  • I'm guessing your units of work are too small to be made parallel. Perhaps we could batch your datareader in to groups of 10k or so and call a method similar to this for each batch (in parallel)? – Faraday Oct 21 '13 at 19:05
  • but @Vijay... 10000 < 81178 ;) I appreciate your help with this :) – Kevin Oct 21 '13 at 19:08
  • 1
    @o7thWebDesign - See updated code. It's a first pass at a DataTable implementation. Not sure if it would be faster or slower than your current implementation... – Faraday Oct 21 '13 at 19:48
  • nice thanks. Seems to be a bit slower, however it does allow a better implementation of TPL – Kevin Oct 21 '13 at 19:51
  • 1
    Try changing `return table.AsEnumerable().AsParallel().Select(m => DataRowToObject(m, props));` to `return from DataRow row in table.Rows select DataRowToObject(row);` – Faraday Oct 21 '13 at 20:00
  • @o7thWebDesign - I added another approach which might be slightly faster. – Faraday Oct 21 '13 at 20:04
  • much obliged. I will try this out later on. I like the idea of the batches... could probably use that in some kind of 'paging' add-on as well :) I'll let you know results once I am able to try it out – Kevin Oct 22 '13 at 12:17
  • Seems to almost double the speed! But it may just be SQL server caching the SP I'm running. It's a zipcode radius lookup... selecting all results from a 10000 mile radius from my current zipcode. I'll post my test set up in the question – Kevin Oct 22 '13 at 12:52
  • I spoke too soon. It's not faster, in fact same query returns 2 different record counts, and is .3sec slower. It also doesnt return the same number of records... – Kevin Oct 22 '13 at 12:56
  • @o7thWebDesign - Which approach are you talking about? I'm not sure either can vary the record count. Check your SQL, and/or check the datatable before/after. Could be a threading issue, but I'm not sure why/how that would cause missing rows. – Faraday Oct 22 '13 at 13:51
  • using your bottom code directly with the return datareader. i'll update the question – Kevin Oct 22 '13 at 20:14
  • p.s. Method 1 (the original) returns 81178 records @ 02.3881secs. Method 2 returns 80895 @ 2.9512secs. Running the query against the db directly results in 81178 records – Kevin Oct 22 '13 at 20:20
  • Also just tried this methodology via using a datatable instead of the data reader. considerably slower... by about 5 seconds. And still results in less than expected records returned – Kevin Oct 22 '13 at 20:55
  • for the one with batching, you would benefit from profiling the code to see your optimal batch size. It would likely depend on the data being mapped and the size of your data. Perhaps even remove the batching all together and see how that performs? – Faraday Oct 22 '13 at 21:05
  • I think I got it. I added in some "proper" `async` and `await` s to both my access and Map methods. I can't tell if it's creating multiple threads tho. I am watching the CPU Usage in taskmgr and for both the non-async and the async methods seem to use all 4 CPU cores... also, for these 81k records, the async methods seem to take slightly longer: non-async takes 2.272, where the async method takes 2.600 seconds. I may ask a new question about this though... I would normally say it's my system... but a 4core 4G AMD with 16g of ram... – Kevin Oct 22 '13 at 21:53
  • I am however, getting better results in terms of timing when I tone down the number of records to return. By about 50% faster utilizing the `async` `await` – Kevin Oct 22 '13 at 22:05
  • Could you post your best approach? I'll see if I can spot anything else worth trying... – Faraday Oct 22 '13 at 23:23
  • is there something like jsfiddle I could post it too? – Kevin Oct 23 '13 at 11:53
0

Are you aware that each time you call String.ToUpper() you are creating a new string just to throw away? And for each record?

I presume you're using an HashTable, you might be better off with:

_ht = new Dictionary<string, PropertyInfo>(StringComparer.OrdinalIgnoreCase);

Then you could just use it like this:

PropertyInfo info = _ht[_Rdr.GetName(i)];

You might want to look at Parallel.For or Parallel.ForEach if you want to parallelize it.

But all that wouldn't avoid massive use of reflection.

But what I really think you should do is build a mapper (and, probably, cache it).

If you don't want to go the route of emiting IL, you might want to use expression trees:

Paulo Morgado
  • 14,111
  • 3
  • 31
  • 59