1

I am attempting to group related files associated with a .tif image. You can see from the list that there are 7 related files per group. I am looking for a way to group these files so that I can move them via shutil.move() into various folders.

The following script is what I have unsuccessfully attempted based on this answer. The result is the same as the input files list.

How can I tweak this to perform the groupings I am after?

import os, itertools

files = ['F:\\juniper_project\\data\\raster\\deliverables\\OR\\reclass\\reclass_4511759_sw.tfw',
       'F:\\juniper_project\\data\\raster\\deliverables\\OR\\reclass\\reclass_4511759_sw.tif',
       'F:\\juniper_project\\data\\raster\\deliverables\\OR\\reclass\\reclass_4511759_sw.tif.aux.xml',
       'F:\\juniper_project\\data\\raster\\deliverables\\OR\\reclass\\reclass_4511759_sw.tif.ovr',
       'F:\\juniper_project\\data\\raster\\deliverables\\OR\\reclass\\reclass_4511759_sw.tif.vat.cpg',
       'F:\\juniper_project\\data\\raster\\deliverables\\OR\\reclass\\reclass_4511759_sw.tif.vat.dbf',
       'F:\\juniper_project\\data\\raster\\deliverables\\OR\\reclass\\reclass_4511759_sw.tif.xml',
       'F:\\juniper_project\\data\\raster\\deliverables\\OR\\reclass\\reclass_4511760_sw.tfw',
       'F:\\juniper_project\\data\\raster\\deliverables\\OR\\reclass\\reclass_4511760_sw.tif',
       'F:\\juniper_project\\data\\raster\\deliverables\\OR\\reclass\\reclass_4511760_sw.tif.aux.xml',
       'F:\\juniper_project\\data\\raster\\deliverables\\OR\\reclass\\reclass_4511760_sw.tif.ovr',
       'F:\\juniper_project\\data\\raster\\deliverables\\OR\\reclass\\reclass_4511760_sw.tif.vat.cpg',
       'F:\\juniper_project\\data\\raster\\deliverables\\OR\\reclass\\reclass_4511760_sw.tif.vat.dbf',
       'F:\\juniper_project\\data\\raster\\deliverables\\OR\\reclass\\reclass_4511760_sw.tif.xml']

test = sorted(files)
grouped = [list(g) for _, g in itertools.groupby(test, lambda x: x.split('_')[1])]

Intended output:

[['F:\\juniper_project\\data\\raster\\deliverables\\OR\\reclass\\reclass_4511759_sw.tfw',
           'F:\\juniper_project\\data\\raster\\deliverables\\OR\\reclass\\reclass_4511759_sw.tif',
           'F:\\juniper_project\\data\\raster\\deliverables\\OR\\reclass\\reclass_4511759_sw.tif.aux.xml',
           'F:\\juniper_project\\data\\raster\\deliverables\\OR\\reclass\\reclass_4511759_sw.tif.ovr',
           'F:\\juniper_project\\data\\raster\\deliverables\\OR\\reclass\\reclass_4511759_sw.tif.vat.cpg',
           'F:\\juniper_project\\data\\raster\\deliverables\\OR\\reclass\\reclass_4511759_sw.tif.vat.dbf',
           'F:\\juniper_project\\data\\raster\\deliverables\\OR\\reclass\\reclass_4511759_sw.tif.xml'],
           ['F:\\juniper_project\\data\\raster\\deliverables\\OR\\reclass\\reclass_4511760_sw.tfw',
           'F:\\juniper_project\\data\\raster\\deliverables\\OR\\reclass\\reclass_4511760_sw.tif',
           'F:\\juniper_project\\data\\raster\\deliverables\\OR\\reclass\\reclass_4511760_sw.tif.aux.xml',
           'F:\\juniper_project\\data\\raster\\deliverables\\OR\\reclass\\reclass_4511760_sw.tif.ovr',
           'F:\\juniper_project\\data\\raster\\deliverables\\OR\\reclass\\reclass_4511760_sw.tif.vat.cpg',
           'F:\\juniper_project\\data\\raster\\deliverables\\OR\\reclass\\reclass_4511760_sw.tif.vat.dbf',
           'F:\\juniper_project\\data\\raster\\deliverables\\OR\\reclass\\reclass_4511760_sw.tif.xml']]
ekad
  • 14,436
  • 26
  • 44
  • 46
Borealis
  • 8,044
  • 17
  • 64
  • 112

1 Answers1

1

Close! The index on the split list is one off, it should be 2.

Try

grouped = [list(g) for _, g in itertools.groupby(test, lambda x: x.split('_')[2])]
canyon289
  • 3,355
  • 4
  • 33
  • 41
  • Thanks, I notice that it works for this example, but if there is a file named `4511760_se`, it would be lumped into `files[1]` because the direction `se` is not taken into account. I suppose one (less flexible) work-around would be to slice the path: `[list(g) for _, g in itertools.groupby(test, lambda x: x[63:73])]` – Borealis May 21 '15 at 21:11
  • If you're sure your key will always be numeric you can filter for that in your list instead of trying to select the key by index – canyon289 May 21 '15 at 22:04