-2

I created a list of files in a directory using os.listdir(), and I'm trying to move percentages of the files(which are images) to different folders. So, I'm trying to move 70%, 15%, and 15% of the files to three different target folders.

Here is a slice of the file list:

print(cnv_list[0:5])
['CNV-9890872-5.jpeg', 'CNV-9911627-97.jpeg', 'CNV-9935363-11.jpeg', 'CNV-9911627-15.jpeg', 'CNV-9935363-118.jpeg']

So, I'm trying to send 70% of these files to one folder, 15% of them to another folder, and 15% to a third folder.

I saw this code below in another answer here which addresses how to move files, but not my specific question around percentages of those files: Moving all files from one directory to another using Python

import shutil
import os
    
source_dir = '/path/to/source_folder'
target_dir = '/path/to/dest_folder'
    
file_names = os.listdir(source_dir)
    
for file_name in file_names:
    shutil.move(os.path.join(source_dir, file_name), target_dir)
Yogesh Riyat
  • 129
  • 1
  • 7
  • 1
    Percentage by count or by size? – Alex Reynolds Oct 30 '21 at 01:07
  • 1
    It isn't clear in your code what target_dirs would receive the files. Just one? – tudopropaganda Oct 30 '21 at 01:08
  • By count. The target directories should receive certain percentages of the total count of files from the source directory. I also edited my question, and I hope it is helpful. – Yogesh Riyat Oct 30 '21 at 01:47
  • 3
    ...so the question is just how to partition a list, or is there any part _you don't already know how to do_ that's specific to moving files? (In general, we ask questions to be narrowly focused around a _specific technical problem_, with the parts of your program you already know how to implement factored out). – Charles Duffy Oct 30 '21 at 01:48
  • @CharlesDuffy Thank you for that guidance. You helped me realize that if I can figure out how to partition a list by percentages that would solve my problem. – Yogesh Riyat Oct 30 '21 at 01:55

2 Answers2

1

If you can partition a list 70/30, and partition a list 50/50, then you can get 70/15/15 just by partitioning twice (once 70/30, once 50/50).

def partition_pct(lst, point):
    idx = int(len(lst) * point)
    return lst[:idx], lst[idx:]

l = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
l_70, l_30 = partition_pct(l, 0.7)
l_15_1, l_15_2 = partition_pct(l_30, 0.5)

Assign l from os.listdir(), and you get filenames instead of numbers. Thus, given your preexisting cnv_list of filenames:

cnv_list_70, cnv_list_30 = partition_pct(cnv_list, .7)
cnv_list_15_1, cnv_list_15_2 = partition_pct(cnv_list_30, .5)

for (file_list, dirname) in ((cnv_list_70, 'dst_70'),
                             (cnv_list_15_1, 'dst_15_1'),
                             (cnv_list_15_2, 'dst_15_2')):
    for f in file_list:
        shutil.move(f, dirname)

...will move 70% of your files to the directory dst_70, 15% to dst_15_1, and another 15% to dst_15_2.

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • I like your answer, but I'm getting an error. See my edited question for details. – Yogesh Riyat Oct 30 '21 at 02:52
  • 1
    @YogeshRiyat, that "no such file or directory" means your filenames aren't valid relative to the current working directory. It's a problem with your list of names, not with this answer's code. You could fix it by converting them to absolute paths, or by using the `pathlib` module instead of `os.listdir()`, or even by using `os.chdir()` to _change_ the Python interpreter's current working directory to be the same directory that the files are coming from (though you'll need to make sure the output directories are relative to that new location as well). – Charles Duffy Oct 30 '21 at 02:56
0

Don't know if there's a better way but that's what i have:

def split(lst, weights):
    sizes = []
    fractions = []
    for i in weights:
        sizes.append(round(i * len(lst)))
        fractions.append((i * len(lst)) % 1)
    if sum(sizes) < len(lst):
        i = max(range(len(fractions)), key=fractions.__getitem__)
        sizes[i] += 1
    elif sum(sizes) > len(lst):
        i = min(range(len(fractions)), key=fractions.__getitem__)
        sizes[i] -= 1
    it = iter(lst)
    return [[next(it) for _ in range(size)] for size in sizes]

It take as a argument two lists one the list to split and the other with weights, it handles any configuration of weights or list lenght e.g. :

print(split(range(19), [.1,.5,.4]))

Outputs:

[[0, 1], [2, 3, 4, 5, 6, 7, 8, 9, 10], [11, 12, 13, 14, 15, 16, 17, 18]]

Note that weights are floats and sum up to 1

Pedro Maia
  • 2,666
  • 1
  • 5
  • 20