1

I am working on developing a genetic application. The user will select database file to search against.

The system should do:

  1. find the previous version of the database file.
  2. find differences between the 2 database files.
  3. put the differences in a third file.
  4. then perform the search against the new file (third one).

The database file is large (300 GB genomics data). I am using Windows as operating system, my application was written in js, PHP, Perl and HTML.

What is the best way to find the differences between those files?

tripleee
  • 175,061
  • 34
  • 275
  • 318
Alaa
  • 185
  • 5
  • 14
  • 1
    Since the files are large, they cannot be loaded into memory. I think the best way would be to use the `rdiff` utility using PHP. Although, I'm not sure if the generated delta file would suit your purposes of step 4. –  Mar 27 '16 at 08:18
  • @user2570380: Is there an rdiff build for WIndows? – Borodin Mar 27 '16 at 08:47
  • 1
    Are differences to be determined by a line-by-line comparison? If yes: where is the problem? If no: The way the files must be compared will dictate how to do it, and it needs to be known in order to find "the best way". – laune Mar 27 '16 at 08:51
  • @Borodin No, Cygwin must be used. –  Mar 27 '16 at 08:53
  • @user2570380 Can you please expalin more about rdiff utility using PHP? – Alaa Mar 27 '16 at 09:26
  • @user2570380 Do you mean diff utility of cygwin? – Alaa Mar 27 '16 at 09:36
  • @Alaa: You can read more abour `rdiff` here: https://librsync.sourcefrog.net/rdiff.html. But you may also find answers to this question helpful: http://stackoverflow.com/questions/688504/binary-diff-tool-for-very-large-files –  Mar 27 '16 at 15:54

0 Answers0