-1

I am trying to create a separate array for each pass of the for loop in order to store the values of 'signal' which are generated by the wavefile.read function.

Some background as to how the code works / how Id like it to work:

I have the following file path:

Root directory 
    Labeled directory
        Irrelevant multiple directories
            Multiple .wav files stored in these subdirectories

    Labeled directory
        Irrelevant multiple directories
            Multiple .wav files stored in these subdirectories

Now for each Labeled Folder, Id like to create an array that holds the values of all the .wav files contained in its respective sub directories.

This is what I attempted:

for label in df.index:

    for path, directories, files in os.walk('voxceleb1/wav_dev_files/' + label):
        for file in files:
            if file.endswith('.wav'):
                count = count + 1
                rate,signal = wavfile.read(os.path.join(path, file))

print(count)

What dataframe df looks like

Above is a snapshot of dataframe df

Ultimately, the reason for these arrays is that I would like to calculate the mean average length of time of the wav files contained in each labeled subdirectory and add this as a column vector to the dataframe.

Note that the index of the dataframe corresponds to the directory names. I appreciate any and all help!

DIB98
  • 53
  • 1
  • 6
  • I don't fully follow what you are trying to do, but suspect you'd be better off collecting the file values in a list or dictionary, not a numpy array. An empty list is `[]`, and list append is an efficient way of adding objects to a list. – hpaulj Oct 18 '19 at 20:36

1 Answers1

0

The code snippet you've posted can be simplified and modernized a bit. Here's what I came up with:

I've got the following directory structure:

I'm using text files instead of wav files in my example, because I don't have any wav files on hand. In my root, I have A and B (these are supposed to be your "labeled directories"). A has two text files. B has one immediate text file and one subfolder with another text file inside (this is meant to simulate your "irrelevant multiple directories").

The code:

def main():

    from pathlib import Path

    root_path = Path("./root/")
    labeled_directories = [path for path in root_path.iterdir() if path.is_dir()]

    txt_path_lists = []

    # Generate lists of txt paths
    for labeled_directory in labeled_directories:
        txt_path_list = list(labeled_directory.glob("**/*.txt"))
        txt_path_lists.append(txt_path_list)

    # Print the lists of txt paths
    for txt_path_list in txt_path_lists:
        print(txt_path_list)

    return 0


if __name__ == "__main__":
    import sys
    sys.exit(main())

The output:

[WindowsPath('root/A/a_one.txt'), WindowsPath('root/A/a_two.txt')]
[WindowsPath('root/B/b_one.txt'), WindowsPath('root/B/asdasdasd/b_two.txt')]

As you can see, we generated two lists of text file paths, one for each labeled directory. The glob pattern I used (**/*.txt) handles multiple nested directories, and recursively finds all text files. All you have to do is change the extension in the glob pattern to have it find .wav files instead.

Paul M.
  • 10,481
  • 2
  • 9
  • 15