I have 207 directories full of .WAV files, where each directory contains a certain number of files recorded on one day (the number varies from directory to directory). The names of the directories are just dates in YYYYMMDD format, and the filenames have already been modified so that their names are in ‘HHMMSS.WAV’ format (the time the recording was taken, i.e. 024545.WAV) in each directory. Each directory has a different recording period, so for example, directory1 contains files that were recorded on a certain day between 02am and 11am, while directory2 contains files that were recorded on a certain day between 11am and 6pm, etc.
I need to concatenate the files by hourly intervals; so for example, in directory1 there are 1920 clips, and I need to move files in each hourly interval into a separate directory – so effectively, there will be x number of new subdirectories for directory1 where x is the number of hourly intervals that are present in directory1 (i.e. directory1_00-01 for all the files in directory1 that were recorded between 00am and 01am, directory1_01-02 for all the files in directory1 that were recorded between 01am and 02am, etc. and if there were 6 hour intervals in directory1, I will need 6 subdirectories, one for each hour interval). I need to have these separate directories because it’s the only way I’ve figured out how to concatenate .WAV files together (see Script 2). The concatenated files should also be in a separate directory to contain all stitched files for directory1.
Currently, I’m doing everything manually using two python scripts and it’s getting extremely cumbersome since I’m physically changing the numbers and intervals for every hour (silly, I know):
Script 1 (to move all files in an hour into another directory; in this particular bit of code, I'm finding all the clips between 01am and 02am and moving them to the subdirectory within directory1 so that the subdirectory only contains files from 01am to 02am):
import os
import shutil
origin = r'PATH/TO/DIRECTORY1’
destination = r'PATH/TO/DIRECTORY1/DIRECTORY1_01-02'
startswith_ = '01'
[os.rename(os.path.join(origin,i), os.path.join(destination, i)) for i in os.listdir(origin) if i.startswith(startswith_)]
Script 2 (to concatenate all files in the folder and writing the output to another directory; in this particular bit of code, I'm in the subdirectory from Script 1, concatenating all the files within it, and saving the output file "directory1_01-02.WAV" in another subdirectory of directory1 called "directory1_concatenated"):
import os
import glob
import ffmpeg
from pydub import AudioSegment
os.chdir("PATH/TO/DIRECTORY1/DIRECTORY1_01-02'")
wav_segments = [AudioSegment.from_wav(wav_file) for wav_file in glob.glob("*.wav")]
combined = AudioSegment.empty()
for clip in wav_segments:
combined += clip
combined.export(‘PATH/TO/DIRECTORY1/DIRECTORY1_CONCATENATED/DIRECTORY1_01-02.WAV', format = “wav)
The idea is that by the end of it, "directory1_concatenated" should contain all the concatenated files from each hour interval within directory1.
Can anyone please help me somehow automate this process so I don’t have to do it manually for all 207 directories? Feel free to ask any questions about the process just in case I haven't explained myself very well (sorry!).
Edit: Figured out how to automate Script 1 to run thanks to the os.walk suggestions :) Now I have a follow-up question about Script 2. How do you increment the saved files so that they're numbered? When I try the following, I get an "invalid syntax" error.
rootdir = 'PATH/TO/DIRECTORY1'
for root, dirs, files in os.walk(rootdir):
for i in dirs:
wav_segments = [AudioSegment.from_wav(wav_file) for wav_file in glob.glob("*.wav")]
combined = AudioSegment.empty()
for clip in wav_segments:
combined += clip
combined.export("PATH/TO/DIRECTORY1/DIRECTORY1_CONCATENATED/DIRECTORY1_%s.wav", format = "wav", % i)
i++
I've been reading some other stack overflow questions but they all seem to deal with specific files? Or maybe I'm just not understanding os.walk fully yet - sorry, beginner here.