2

We've got a system that seems to be consuming a lot of data, it uses Dapper for database queries and Seq for logging. I was wondering if other than with SQL Profiler whether there was a way to add logging to Dapper to log the size of the dataset returned in MB -so we can flag large datasets for review?

This question has been asked a while ago but I was wondering whether there was now a way of doing it without wireshark and ideally without iterating over the rows/cells?

Community
  • 1
  • 1
Tim
  • 5,443
  • 2
  • 22
  • 26

4 Answers4

1

I would configure Provider Statistics for SQL Server for the connection in the base repository class. You can add a config setting to switch it on and easily save this information off to a log file or where ever you want.

Example code from MSDN

using System;
using System.Collections;
using System.Collections.Generic;
using System.Data;
using System.Data.SqlClient;

namespace CS_Stats_Console_GetValue
{
  class Program
  {
    static void Main(string[] args)
    {
      string connectionString = GetConnectionString();

      using (SqlConnection awConnection = 
        new SqlConnection(connectionString))
      {
        // StatisticsEnabled is False by default.
        // It must be set to True to start the 
        // statistic collection process.
        awConnection.StatisticsEnabled = true;

        string productSQL = "SELECT * FROM Production.Product";
        SqlDataAdapter productAdapter = 
          new SqlDataAdapter(productSQL, awConnection);

        DataSet awDataSet = new DataSet();

        awConnection.Open();

        productAdapter.Fill(awDataSet, "ProductTable");
        // Retrieve the current statistics as
        // a collection of values at this point
        // and time.
        IDictionary currentStatistics =
          awConnection.RetrieveStatistics();

        Console.WriteLine("Total Counters: " +
          currentStatistics.Count.ToString());
        Console.WriteLine();

        // Retrieve a few individual values
        // related to the previous command.
        long bytesReceived =
            (long) currentStatistics["BytesReceived"];
        long bytesSent =
            (long) currentStatistics["BytesSent"];
        long selectCount =
            (long) currentStatistics["SelectCount"];
        long selectRows =
            (long) currentStatistics["SelectRows"];

        Console.WriteLine("BytesReceived: " +
            bytesReceived.ToString());
        Console.WriteLine("BytesSent: " +
            bytesSent.ToString());
        Console.WriteLine("SelectCount: " +
            selectCount.ToString());
        Console.WriteLine("SelectRows: " +
            selectRows.ToString());

        Console.WriteLine();
        Console.WriteLine("Press any key to continue");
        Console.ReadLine();
      }

    }
    private static string GetConnectionString()
    {
      // To avoid storing the connection string in your code,
      // you can retrive it from a configuration file.
      return "Data Source=localhost;Integrated Security=SSPI;" + 
        "Initial Catalog=AdventureWorks";
    }
  }
}
codingbadger
  • 42,678
  • 13
  • 95
  • 110
0

Not really a complete answer, but might help you out.

sys.dm_exec_query_stats and sys.dm_exec_connections might help you trace large result sets. For example:

SELECT * FROM sys.dm_exec_query_stats 
CROSS APPLY sys.dm_exec_sql_text(sql_handle)

(Units are pages of 8k).

This sort of gives you what you're using wireshark for at the moment (kinda :)

SELECT * FROM sys.dm_exec_connections
CROSS APPLY sys.dm_exec_sql_text(most_recent_sql_handle)
ORDER BY num_writes DESC

https://msdn.microsoft.com/en-us/library/ms189741.aspx

https://msdn.microsoft.com/en-AU/library/ms181509.aspx

Liesel
  • 2,929
  • 2
  • 12
  • 18
  • Thanks Les, ideally we need it in context of the underlying data connection but that's a really useful snippet – Tim Aug 08 '16 at 11:19
0

You can estimate the size required by a row summing up each column type size, then multiply by the number of rows. It should be accurate if you don't have TEXT / VARCHAR in your query:

int rowSize = 0;
foreach(DataColumn dc in Dataset1.Tables[0].Columns) {
    rowSize += sizeof(dc.DataType);
}
int dataSize = rowSize * Dataset1.Tables[0].Rows.Count;

In case you need a more accurate figure, sum up the size of each individual value using Marshal.SizeOf:

int dataSize = 0;
foreach(DataRow dr in Dataset1.Tables[0].Rows) 
{
    int rowSize = 0;
    for (int i = 0; i < Dataset1.Tables[0].Columns.Count; i++)
    {
        rowSize += System.Runtime.InteropServices.Marshal.SizeOf(dr[i]);
    }
    dataSize += rowSize;
}

Ideas for performance gain if high accuracy is not a concern:

  1. Compute the size of just a sample. Let's say, instead of iterating through all rows, pick 1 in every 100, then multiply your result by 100 in the end.
  2. Use [Marshal.SizeOf]((https://msdn.microsoft.com/en-us/library/y3ybkfb3.aspx) to compute the size of each DataRow dr instead of iterating through all it's values. It will give you a higher number since a DataRow object has additional properties, but that's something you can tweak by subtracting the size of an empty DataRow.
  3. Know the average size of a single row beforehand, by it's columns, and just multiply by the number of rows.
Paulo Amaral
  • 747
  • 1
  • 5
  • 24
0

We got around this limitation by just capping/limiting the query size. This prevents us from having to worry about size and doing double queries. The PR we used that you could also use is https://github.com/DapperLib/Dapper/pull/1758/files

Andrew G
  • 51
  • 2