3

Greetings.

1 - Let's say I have about 500 folders of variable size with a total size of 100 GB.

2 - I want to distribute these folders automatically in other folders until the size of 700 MB is reached with the best optimization of space.

Example: In folder "CD--01" I want to have the maximum number of folders possible without passing the limit of 700 MB, and so on in "CD--02", "CD--03"...

Is there a tool that allows me to do this "on the fly" or will I have to code one myself?

Thanks

Joao Heleno
  • 370
  • 3
  • 20
  • Doing this optimally is the knapsack problem. Can't be solved for any non-trivial dataset in a reasonable amount of time. Non-optimally is viable. – Sparr Dec 28 '08 at 01:52
  • no it's not, the files don't have a value. that makes a big difference. – msb Feb 24 '17 at 01:43

5 Answers5

2

Ultimately you're asking for a solution to the Knapsack Problem, which comes in many forms.

A simple approach would be per the following pseudocode, but this will not produce optimal solutions for all inputs (see the articles above).

while (there are unallocated files) {
    create a new, empty directory
    set remaining space to 700,000,000
    while (the size of the smallest unallocated is at most (<=) the remaining space) {
        copy into the current the largest unallocated file with size at most the remaining space
        subtract that file's size from the remaining space
        remove that file from the set of unallocated files
    }
    burn the current directory
}

(Of course, this assumes that no single file will be greater than 700MB in size. If that's possible, be sure to remove any such files from the unallocated list, else the above will produce infinitely many empty directories! ;-)

joel.neely
  • 30,725
  • 9
  • 56
  • 64
1

This is a very naive and poorly coded solution, but it works. My bash-fu is not strong, but a shell script seems like the best way to approach this problem.

#!/bin/bash
dirnum=1
for i in *
    do
    if [ `du -b -s "$i" | cut -f 1` -gt 700000000 ]
        then
        echo "$i is too big for a single folder, skipping"
        continue
    fi
    if [ ! -d "CD_$dirnum" ]
        then
        echo "creating directory CD_$dirnum"
        mkdir "CD_$dirnum"
    fi
    echo "moving $i to CD_$dirnum"
    mv "$i" "CD_$dirnum"
    if [ `du -b -s "CD_$dirnum" | cut -f 1` -gt 700000000 ]
        then
        echo "CD_$dirnum is too big now"
        mv "CD_$dirnum/$i" .
        let "dirnum += 1"
        if [ ! -d "CD_$dirnum" ]
            then
            echo "creating directory CD_$dirnum"
            mkdir "CD_$dirnum"
        fi
        echo "moving $i to CD_$dirnum"
        mv "$i" "CD_$dirnum"
    fi
done
Sparr
  • 7,489
  • 31
  • 48
  • Thanks Sparr.... I'm not under UNIX.... but I can always share a folder between Win and a Unix virtual machine and run that script. I'll give it a try. – Joao Heleno Dec 28 '08 at 12:46
  • bash is available on windows via cygwin, although some consideration must be given to issues such as drive letters and \ vs / – Sparr Dec 29 '08 at 20:21
  • Also, as joel.neely's answer points out, one obvious improvement is to look for smaller things to move into an almost-full directory, instead of creating a new one as soon as the next item won't fit into the current ont. – Sparr Dec 29 '08 at 20:22
0

If you're on UNIX (inc Mac OSX) you can script something like

tar cvzf allfolders.tgz ./allfolders
split allfolders.tgz -b 700m

This will create a (compressed) archive of all the folders and then split it into 700M sized chunks. However you'll need to recombine all the pieces then extract again using tar, when you want to reconstitute the original folder set.

If you want to keep them as individual OS folders on the CD, that's fairly difficult (in fact I think it's a kind of knapsack problem, which is NP-hard).

frankodwyer
  • 13,948
  • 9
  • 50
  • 70
0

There are tools that will do this - similar to frankodwyer's answer, WinZip will take your 100GB, zip it up and split it into any size 'chunks' you'd like - i.e. ~700MB

Here's the page the WinZip split feature

Andrew
  • 12,991
  • 15
  • 55
  • 85
0

I'm a little late to the party, but here's how I solved the problem:

#!/usr/bin/env bash

sourcedir="$1"
destdir_prefix="./disk_"
destdir_suffix=""
mblimit=4100
# bytelimit=$(( mblimit * 1024 * 1024 )) # MB as measured by OS (MiB)
bytelimit=$(( mblimit * 1000 * 1000 )) # MB as measured by marketeers
disk=() # empty array
dir_size=0
find "${sourcedir}" -type f |
  while read file; do

    file_size="$( stat --printf="%s" "${file}" )"
    disk_number=0
    stored=false
    while [[ "${stored}" == "false" ]]; do

      if [[ "${disk[$disk_number]}" == "" ]]; then
        disk[$disk_number]=0
      fi

      if [[ $(( disk[disk_number] + file_size )) -lt ${bytelimit} ]]; then
        dir="${destdir_prefix}${disk_number}${destdir_suffix}"
        mkdir -p "${dir}"
        filedir="$(echo ${file} | sed 's|[^/]*$||g')"
        mkdir -p "${dir}/${filedir}"
        disk[$disk_number]=$(( disk[disk_number] + file_size ))
        echo "${disk[$disk_number]} ${dir}/${file}"
        cp "${file}" "${dir}/${file}"
        stored=true
      else
        disk_number=$(( disk_number + 1 ))
      fi
    done
  done

This will create folders called disk_0, disk_1, etc. For each file, it tries to fit the file into disk_0, and if it won't fit, it tries disk_1, etc.

user187557
  • 843
  • 7
  • 6