I am trying to get 1a. Risk factors section from each 10-K file. I already downloaded files and saved them as txt. file.
```'/content/drive/My Drive/Colab Notebooks/10/BKR/1.txt'
'/content/drive/My Drive/Colab Notebooks/10/BKR/2.txt'```
As such, folder 10 contains several subfolders(like 10), and each subfolder(like BKR) contains several 10-K as txt file.
I tried below code to get 1a.Risk Factors section, but it failed. I would be happy if you could share your opinions.
```import re
import os, os.path
PATH = '/content/drive/My Drive/Colab Notebooks/10/BKR'
conclusions = []
for file in os.listdir(path):
with open(os.path.join(PATH, file)) as f:
data = f.read()
conclusion = re.search('1a: (.*?)([A-Z]{2,})', data).group(1)
conclusions.append(conclusion)```
The error message I got:
```
---------------------------------------------------------------------------
NotADirectoryError Traceback (most recent call last)
<ipython-input-12-051ca10fbeb3> in <module>()
5
6 conclusions = []
----> 7 for file in os.listdir(path):
8 with open(os.path.join(PATH, file)) as f:
9 data = f.read()
NotADirectoryError: [Errno 20] Not a directory: '/content/drive/My Drive/Colab Notebooks/10/APA/1.txt
'```