multiple text files each file having data like this. Urdu, English-in-brackets
So start with a function that reads a single file of that type:
def read_single_file(filename: str) -> tuple[str, str]:
urdu = ""
english = ""
with open(filename) as f:
for line in f:
line = line.strip() # remove newlines etc.
if not line: # ignore empty lines
continue
if line.startswith("["):
english = line.strip("[]")
else:
urdu = line
return (urdu, english)
Then, loop over your files; I'll assume they're just *.txt
:
import glob
results = [read_single_file(filename) for filename in glob.glob("*.txt")]
Now that you have a list of 2-tuples, you can just create a dataframe out of it:
import pandas as pd
df = pd.DataFrame(results, columns=["urdu", "english"])