4

I need to copy files from one directory to another, depending on the existence of the file name in a table of SQL database.

For this I use the following code:

using(SqlConnection connection = new SqlConnection("datasource or route"))
{

  connection.Open();

  using(SqlCommand cmd = new SqlCommand("SELECT idPic, namePicFile FROM DocPicFiles", connection))
  using (SqlDataReader reader = cmd.ExecuteReader())
  {

    if (reader != null)
    {
      while (reader.Read())
      {
        //picList IS AN ARRAY THAT Contains All the files names in a directory
        if (picList.Any(s => s.Contains(reader["namePicFile"].ToString())))
        {
          File.Copy("theFile  in the Directory or array picList",  "the destiny directory"+ ".jpg", false)
        }
      }
    }
  }
}

Is there any way that this can be done in less time? It takes 1 hour for do that, for 20.876 records.

Nicholas Carey
  • 71,308
  • 16
  • 93
  • 135
ger
  • 414
  • 1
  • 4
  • 22
  • 2
    How much of this time is used by the File.Copy and how much is used to loop and find the names? If you don't measure you cannot figure out where the problem is and search for viable (and existing) solutions – Steve Aug 26 '15 at 21:18
  • 1
    Split your method into one that returns a List containing all the files you want to copy, then send that list into another method which performs the copying. Then you can measure both methods and figure out where the bottleneck is. – Kvam Aug 26 '15 at 21:21
  • if you can rum cmd here than my solution will help. https://stackoverflow.com/questions/4743094/how-can-i-move-all-the-files-from-one-folder-to-another-using-the-command-line – pankaj Aug 13 '20 at 17:27

4 Answers4

13

File.Copy is as fast as it gets. You must keep in mind that you depend on the file transfer speed dictated by your hardware and at 20000 files, the latency for data access also comes into play. If you are doing this on a HDD, you could see a big improvement after switching to SSD or some other fast medium.

For this case alone, most likely the hardware is your bottleneck.

EDIT: I consider keeping the connection to the database open for such a long time as a bad practice. I suggest you fetch all the needed data in some in-memory cache (array, list, whatever) and then iterate through that as you copy the files. A db connection is a precious resource and on applications that must handle high concurrency (but not only), releasing the connection fast is a must.

Mihai Caracostea
  • 8,336
  • 4
  • 27
  • 46
  • yeah! the connection increases the resource, and I changed to a list of a object "DocPicFiles", that shift decreases significantly, the copy action 35 minutes!! – ger Aug 27 '15 at 14:18
  • @ger: It was surprising for me to see such a big difference in time with that little change. If possible for you, can you run your code again with and without keeping the connection open. This way we will know that there's not any other factor at play. – displayName Aug 31 '15 at 03:06
9

Allow me to make a guess - No. There is no way to do it faster.

How come I am so confident? Because file copying requires talking to disk and that is a horribly slow operation. Even further, if you try to go for multi-threading, the results will go slower instead of faster because the mechanical operation of moving the head over the disk isn't sequential anymore, which may have been earlier by chance.

See answers to this question I asked earlier.

Switch to SSDs if you aren't yet using them, otherwise you are getting the best already.

Below here is something for us to put into perspective what does slow mean in disk writing when compared to caches. If cache access is taking 10 min., it implies that it takes 2 years to read from disk. All the accesses are shown in the image below. Clearly when your code will execute, the bottleneck will be disk writes. The best you can do it to let the disk writes stay sequential.

enter image description here

displayName
  • 13,888
  • 8
  • 60
  • 75
  • https://stackoverflow.com/questions/4743094/how-can-i-move-all-the-files-from-one-folder-to-another-using-the-command-line – pankaj Aug 13 '20 at 17:28
5

Since your i/o subsystem is almost certainly the botteneck here, using the parallel task library is probably about as good as it gets:

static void Main(string[] args)
{
  DirectoryInfo source      = new DirectoryInfo( args[0] ) ;
  DirectoryInfo destination = new DirectoryInfo( args[1] ) ;

  HashSet<string> filesToBeCopied = new HashSet<string>( ReadFileNamesFromDatabase() , StringComparer.OrdinalIgnoreCase ) ;

  // you'll probably have to play with MaxDegreeOfParallellism so as to avoid swamping the i/o system
  ParallelOptions options= new ParallelOptions { MaxDegreeOfParallelism = 4 } ;

  Parallel.ForEach( filesToBeCopied.SelectMany( fn => source.EnumerateFiles( fn ) ) , options , fi => {
      string destinationPath = Path.Combine( destination.FullName , Path.ChangeExtension( fi.Name , ".jpg") ) ;
      fi.CopyTo( destinationPath , false ) ;
  }) ;

}

public static IEnumerable<string> ReadFileNamesFromDatabase()
{
  using ( SqlConnection connection = new SqlConnection( "connection-string" ) )
  using ( SqlCommand cmd = connection.CreateCommand() )
  {
    cmd.CommandType = CommandType.Text ;
    cmd.CommandText = @"
      select idPic ,
             namePicFile
      from DocPicFiles
      " ;

    connection.Open() ;
    using ( SqlDataReader reader = cmd.ExecuteReader() )
    {
      while ( reader.Read() )
      {
        yield return reader.GetString(1) ;
      }
    }
    connection.Close() ;

  }
}
Nicholas Carey
  • 71,308
  • 16
  • 93
  • 135
  • 2
    Do you have any time measurements to support that TPL will be faster than OP's current approach? I have seen multithreaded way as slower in performance than single thread. – displayName Aug 26 '15 at 22:37
  • @nicholas-carey I have problems with this line: **Parallel.ForEach( filesToBeCopied.SelectMany( fn => source.EnumerateFiles( fn ) ) , options , fi => { string destinationPath = Path.Combine( destination.FullName , Path.ChangeExtension( fi.Name , ".jpg") ) ; fi.CopyTo( destinationPath , false ) ;** – ger Aug 27 '15 at 14:57
1

I addressed this problem by creating a single compressed file (.zip) using the parameter to just store the file (no compression). Creating the single (.zip) file, moving that single file, then expanding at the location proved to be 2x faster when dealing with thousands of files.

Keith Gresham
  • 153
  • 1
  • 7
  • Hi @keith gresham, but I need one to one not the all package of files! thx! – ger Mar 20 '18 at 13:25
  • 1
    @ger I believe Keith's solution is referring to the problem of moving a large number of files ***when the source and destination are on different physical drives***. In this specific case, the technique he mentions here can speed things up by consolidating the destination MFT writes, which in turn can reduce cache misses during that complex activity. – Glenn Slayden Jul 24 '19 at 02:02