2

Struggling to consolidate multiple files into multi-page TIFFs using PyLiff, Libtiff, or anything that will work.

I'm consolidating thousands of files with arbitrary eight digit IDs followed by page numbers:

Directory named as date \images\1999\10\14

Group of files 1 01234567_0001.tif, 01234567_0002.tif, 01234567_0003.tif

Group of files 2 07654321_0001.tif, 07654321_0002.tif

Following conversion I'd like to have two files:

  1. 01234567.tif (3 page mulipage TIFF file)
  2. 07654321.tif (2 page multipage TIFF file)

And so on. Please give guidance on script and consolidation that will segregate by first 8 digits, merge files with that unique eight digit number, and rename the new (consolidated) file as the appropriate 8-digit-number.tiff

I realize this seems without effort. I have many disparate scripts and approaches that would muddy the waters of this forum.

Herbert
  • 21
  • 2
  • When you say "...or anything that will work", would you consider using Java? :-) If so, you could use the approach I oulined in this answer: http://stackoverflow.com/questions/22856461/scale-multi-page-tiff-image-in-java (instead of reading a multipage TIFF, you'd read multiple singe page TIFFs, but the writing would be equivalent). Assuming the file naming convention is not your problem. – Harald K Apr 11 '14 at 07:31
  • Nice piece of code. I appreciate the suggestion. But the consolidation is a crucial part of what I'm doing here. The files will be exported from our database directly and must be in multiple page TIFF format. Thanks though! I'm always looking for tools to manipulate images. That code will help on other projects :) – Herbert Apr 11 '14 at 23:20

1 Answers1

0

The solution as Ruby was able to provide (Imagemagick did the actually conversion). Hope this is helpful to others:

#!/usr/bin/env ruby

require 'find'
require 'set'

start_path = ARGV.shift || "~\Desktop\in"
output_path = ARGV.shift || "~\Desktop\out"

unless File.exist? start_path
  raise "cannot catalog contents; '#{start_path}' does not exist"
end

unless File.exist? output_path
  raise "cannot catalog contents; '#{output_path}' does not exist"
end

#make sure the output directory has a trailing slash
unless output_path =~ /\\$/
  output_path += "\\"
end

documents = Set.new

#look at each file
Find.find(start_path) do |file_path| 
  #look for the document pattern
  if file_path =~ /^(.*?(\d{8}))_\d{4}.tif/
    #track it so we can get the unique document list
    documents.add({:path => $1, :name => $2})
  end
end

documents.each do |doc|
  command = "convert #{doc[:path]}*.tif* #{output_path}#{doc[:name]}.tif"
  puts command
  `#{command}`
end
Herbert
  • 21
  • 2