I have an Access Database with a table having around 3,500,000 rows. One of its column is a 10-digit 'Phone_No'. Performing a search on such a huge table would take too long. So i decided to split that one table into multiple tables based on first 5 digits of the 'Phone_No'.
For example, if I have a 'Phone_No' value as 1234500000 then I will create a new table named 12345 and insert into it all rows which have 'Phone_No' starting with 12345 i.e. values like 1234500001, 1234500002, etc.
This way if I have to search for a 'Phone_No' I just need to do a 'SELECT * FROM 12345 WHERE Phone_No = 1234500000'. This would narrow down my search to a very small dataset comparatively.
The sample code is as shown below
int iRow = 0;
int iRowCount;
string strIndex;
string strOutputTableName;
DataTable dtOutput;
DataRow dr;
try
{
// Get staging row count
iRowCount = dtStaging.Rows.Count;
dsOutput = new DataSet();
// Iterate each row for splitting
while (dtStaging.Rows.Count > 0)
{
// Increment row count
iRow++;
// Get current Datarow
dr = dtStaging.Rows[0];
// Get column to be indexed by splitting
strIndex = dr["Phone_No"].ToString();
// Set output table name
strOutputTableName = strIndex.Substring(0, 5);
try
{
// Create new datatable
dtOutput = new DataTable(strOutputTableName);
// Add columns
dtOutput.Columns.Add("Phone_No", typeof(string));
dtOutput.Columns.Add("Subscriber_Name", typeof(string));
dtOutput.Columns.Add("Address_Line1", typeof(string));
dtOutput.Columns.Add("Address_Line2", typeof(string));
dtOutput.Columns.Add("Address_Line3", typeof(string));
dtOutput.Columns.Add("City", typeof(string));
dtOutput.Columns.Add("Pin_Code", typeof(string));
dtOutput.Columns.Add("SIM_Activation_Date", typeof(DateTime));
// Add datatable to dataset
dsOutput.Tables.Add(dtOutput);
}
// catch table already exists error and proceed
catch
{
}
// Import current datarow from staging to output table
dsOutput.Tables[strOutputTableName].ImportRow(dr);
// Report progress
bwSplit.ReportProgress((iRow * 100) / iRowCount);
// Remove current datarow to release some memory
dtStaging.Rows.RemoveAt(0);
}
}
catch
{
throw;
}
finally
{
dtStaging = null;
}
Once my Dataset dsOutput is prepared, I insert it into another Access Database file. The problem is this splitting process takes ages. I am looking for an optimized version of the above code. Or any different approach which would make the splitting process faster.