0

I wrote this in VB.NET, but I am comfortable with C# as well. I have a list of files that I want to find on a Windows file system. Based on the file name, I will need to look in a different directory. The list of files that I have is a list that I compiled at the beginning of the program (which works) and it is stored in a DataTable that is not sorted. Here is my approach.

DataTable List of Files (this can vary from day to day, sometimes in the 1,000s+)

- a_111.txt
- a_222.txt
- b_333.txt
- a_444.txt
- c_555.txt
- b_666.txt

Directories to look into based on file name

C:\a\ -- for files begin with a (variable name is A_folder)
C:\b\ -- for files begin with b (variable name is B_folder)
C:\c\ -- for files begin with c (variable name is C_folder)

Code:

If DataTableofFiles IsNot Nothing AndAlso DataTableofFiles.Rows.Count > 0 Then 
  For Each row as DataRow In DataTableofFiles.Rows
    If row("FILENAME").ToString.StartsWith("a") Then
      Dim a_WriteResultstoA as String = "a.csv"
      functionfindfiles(A_folder, row("FILENAME").ToString, a_WriteResultstoA)
    ElseIf row("FILENAME").ToString.StartsWith("b") Then
      Dim b_WriteResultstoB as String = "b.csv"
      functionfindfiles(B_folder, row("FILENAME").ToString, b_WriteResultstoB)
    ElseIf row("FILENAME").ToString.StartsWith("C") Then
      Dim c_WriteResultstoC as String = "c.csv"
      functionfindfiles(C_folder, row("FILENAME").ToString, c_WriteResultstoC)
    End If
  Next
End If

Private Sub functionfindfiles(sourcefolder As String, filename as String, writetofile As String)
        Try
            For Each f As String In Directory.EnumerateFiles(sourcefolder, "*.*", SearchOption.AllDirectories)  '<-- file enumeration
                    If Path.GetFileName(f).Equals(filename, StringComparison.OrdinalIgnoreCase) Then
                        Using fs As New FileStream(writetofile, FileMode.Append, FileAccess.Write, FileShare.Write)
                            Using sw As StreamWriter = New StreamWriter(fs)
                                If Not New FileInfo(writetofile).Length > 0 Then
                                    For i As Integer = 0 To DataTableofFiles.Columns.Count - 1 Step 1
                                        sw.Write(DataTableofFiles.Columns(i).ToString)

                                        If i < DataTableofFiles.Columns.Count - 1 Then
                                            sw.Write(",")
                                        End If
                                    Next

                                    sw.WriteLine()
                                End If

                                For Each row As DataRow In DataTableofFiles.Rows
                                    If row("FILENAME").ToString = filename Then
                                        For i As Integer = 0 To DataTableofFiles.Columns.Count - 1 Step 1
                                            If Not Convert.IsDBNull(row(i)) Then
                                                sw.Write(row(i).ToString.Replace(vbLf, "").Replace(",", ";"))
                                            End If

                                            If i < DataTableofFiles.Columns.Count - 1 Then
                                                sw.Write(",")
                                            End If
                                        Next

                                        sw.WriteLine()
                                    End If
                                Next
                            End Using
                        End Using
                    Else
                        'write results that are not found here to a file
                    End If
            Next
        Catch ex As Exception
    MessageBox.Show(ex.Message)
        End Try
End Sub

In this case, the Enumeration on the file system will occur 6 times. The execution can take a really long time if I have a lot of files in the directories. Is there a better approach that will reduce the amount of file enumerations? Or other areas in the code that can be improved to reduce additional operations being performed more than needed? Any advice is greatly appreciated. Thanks!

Mark
  • 8,140
  • 1
  • 14
  • 29
Jayarikahs
  • 75
  • 11
  • Reverse your foreach's. Right now you do `foreach (fileName in list){ foreach (file in EnumerateFiles){ }}`, but you could just reverse them to `foreach (file in EnumerateFiles) { foreach (fileName in list){ }}`. You'll have to change the architecture of what your methods do and how you call them, but logically the double foreach is all your doing and if the code was all inlined you could trivially reverse the order of the `foreach`'s to enumerate only once. – Quantic Dec 07 '16 at 22:48
  • Thanks for the reply. If I reversed the order, I will not know what "folder path" to pass for the ForEach to search through. – Jayarikahs Dec 07 '16 at 23:17

1 Answers1

1

You are not enumerating 6 times in your example, you are enumerating folder A 3 times, folder B 2 times and folder C 1 time. To reduce these extra enumerations you could pre-process the data table to build lists of filenames for each folder, then modify your method to work on a list of filenames instead of a single filename. I don't write in VB so here's an answer that mashes in c# code (sorry I couldn't fit my ideas into a comment, this is a poor answer as it doesn't compile).

Note that all I did to your method was add in foreach (var filename in listOfFileNames) and changed the signature to accept a List<string> listOfFileNames instead of just string filename, and the caller now builds lists and finishes the datatable foreach completely before calling the method once for each folder.

If DataTableofFiles IsNot Nothing AndAlso DataTableofFiles.Rows.Count > 0 Then 

  List<string> allAFileNames = new List<string>();
  List<string> allBFileNames = new List<string>();
  List<string> allCFileNames = new List<string>();

  For Each row as DataRow In DataTableofFiles.Rows
    If row("FILENAME").ToString.StartsWith("a") Then
      Dim a_WriteResultstoA as String = "a.csv"

      allAFileNames.Add(row("FILENAME"));

    ElseIf row("FILENAME").ToString.StartsWith("b") Then
      Dim b_WriteResultstoB as String = "b.csv"

      allBFileNames.Add(row("FILENAME"));

    ElseIf row("FILENAME").ToString.StartsWith("C") Then
      Dim c_WriteResultstoC as String = "c.csv"

      allCFileNames.Add(row("FILENAME"));

    End If
  Next

  if (allAFileNames.Count > 0)
  {
        functionfindfiles(A_folder, allAFileNames, a_WriteResultstoA);
  }

  if (allBFileNames.Count > 0)
  {
        functionfindfiles(B_folder, allBFileNames, b_WriteResultstoB)
  }

  if (allAFileNames.Count > 0)
  {
        functionfindfiles(C_folder, allCFileNames, c_WriteResultstoC)
  }

End If

Private Sub functionfindfiles(sourcefolder As String, List<string> listOfFileNames, writetofile As String)
        Try
            For Each f As String In Directory.EnumerateFiles(sourcefolder, "*.*", SearchOption.AllDirectories)  '<-- file enumeration

                    foreach (var filename in listOfFileNames)
                    {

                    If Path.GetFileName(f).Equals(filename, StringComparison.OrdinalIgnoreCase) Then
                        Using fs As New FileStream(writetofile, FileMode.Append, FileAccess.Write, FileShare.Write)
                            Using sw As StreamWriter = New StreamWriter(fs)
                                If Not New FileInfo(writetofile).Length > 0 Then
                                    For i As Integer = 0 To DataTableofFiles.Columns.Count - 1 Step 1
                                        sw.Write(DataTableofFiles.Columns(i).ToString)

                                        If i < DataTableofFiles.Columns.Count - 1 Then
                                            sw.Write(",")
                                        End If
                                    Next

                                    sw.WriteLine()
                                End If

                                For Each row As DataRow In DataTableofFiles.Rows
                                    If row("FILENAME").ToString = filename Then
                                        For i As Integer = 0 To DataTableofFiles.Columns.Count - 1 Step 1
                                            If Not Convert.IsDBNull(row(i)) Then
                                                sw.Write(row(i).ToString.Replace(vbLf, "").Replace(",", ";"))
                                            End If

                                            If i < DataTableofFiles.Columns.Count - 1 Then
                                                sw.Write(",")
                                            End If
                                        Next

                                        sw.WriteLine()
                                    End If
                                Next
                            End Using
                        End Using
                    Else
                        'write results that are not found here to a file
                    End If
                    }
            Next
        Catch ex As Exception
    MessageBox.Show(ex.Message)
        End Try
End Sub
Quantic
  • 1,779
  • 19
  • 30
  • I see what you mean now. It makes more sense to compile the lists based on file names first and then pass it in 1 go. Optionally, because I wanted to find a specific file in a specific folder, I re-wrote it to use FileInfo(filename).Exists. I will test your code and will see how both options perform against 1,000s of files. Thanks again for all your help. – Jayarikahs Dec 07 '16 at 23:50
  • If you aren't reusing the `FileInfo` then it may be more performant to just do a `File.Exists()` as it doesn't have to build up an entire `FileInfo` view of the file. Also I don't know much about data tables but if `row("FILENAME")` does an actual lookup and if you know the rows aren't changing then it should be more performant to store the value once and recall the stored value: `string thisFilename = row("FILENAME")`. I.e., `row("FILENAME")` may take 2 seconds to run each time, but recalling the stored variable is basically free. – Quantic Dec 07 '16 at 23:56
  • I tested both FileInfo().Exists and File.Exists() and the results are very similar. Sometimes FileInfo was faster and other times File was faster, but with each trial it was only by 1 second. I tested against 1,000 give or take records. Perhaps if the volume was much larger, the result may differ. Thanks again for all your help! – Jayarikahs Dec 09 '16 at 23:02