How read the correct lines from this text file with a python program, and then create a .py file by filling in the data extracted from the .txt file?

Question

Text file to be read (the real one contains more numbers), called number_info.txt

veinti tres
23

veinti dos
22

veinti uno
21

veinte
20

tres
3

dos
2

uno
1

This is the code (I need help with this)

import re

def auto_coding_text_to_number():

    with open('number_info.txt', 'r') as f:

        #lines 0, 3, 6, 9, 12, 15, 18, ...
        coloquial_numbers = []

        #lines 0+1, 3+1, 6+1, 9+1, 12+1, 15+1, 18+1, ... 
        symbolic_numbers = []


    n = 0
    with open('number_to_text.py', 'w') as f:
        f.write('import re\n\ndef number_to_text_func(input_text):\n')
       
        #write replacement lines based on regex
        if(" " in coloquial_numbers[n]):
            #for example write this line:   "    input_text = re.sub(r"veinti[\s|-|]*tres", "23", input_text)"
        
        if not (" " in coloquial_numbers[n]):
            #for example write this line:   "    input_text = re.sub("tres", "3", input_text)"
            
        f.write("    return(input_text)\n    input_text = str(input())\n 
   print(number_to_text_func(input_text))")

        n = n + 1

auto_coding_text_to_number()

And this is the correct file, called number_to_text.py, that should be written by the other script

import re

def number_to_text_func(input_text):
    input_text = re.sub(r"veinti[\s|-|]*tres", "23", input_text)
    input_text = re.sub(r"veinti[\s|-|]*dos", "22", input_text)
    input_text = re.sub(r"veinti[\s|-|]*uno", "21", input_text)
    input_text = re.sub("tres", "3", input_text)
    input_text = re.sub("dos", "2", input_text)
    input_text = re.sub("uno", "1", input_text)

    return(input_text)

input_text = str(input())
print(number_to_text_func(input_text))

EDIT:

The lines inside the .txt file are structured like this

"veinti tres"  <---- line 0
"23"           <---- line 1
"veinti dos"   <---- line 2
"22"           <---- line 3
"veinti uno"   <---- line 4
"21"           <---- line 5
"veinte"       <---- line 6
"20"           <---- line 7
"tres"         <---- line 8
"3"            <---- line 9

Then I suggested separating them into 2 groups and storing them in 2 lists

#lines 0, 3, 6, 9, 12, 15, 18, ...
coloquial_numbers = ["veinti tres", "veinti dos", "veinti uno", "veinte", "tres"]

#lines 0+1, 3+1, 6+1, 9+1, 12+1, 15+1, 18+1, ...
symbolic_numbers = ["23", "22", "21", "20". "3"]


body_template = """    input_text = re.sub(r"{}", "{}", input_text)\n"""

And then the body of the function should be structured like this

input_text = re.sub(coloquial_numbers[n].replace(' ', '[\s|-|]'), symbolic_numbers[n], input_text)

Getting something like this in the function body of the output file

def number_to_text(input_text):
    input_text = re.sub(r"veinti[\s|-|]*tres", "23", input_text)
    input_text = re.sub(r"veinti[\s|-|]*dos", "22", input_text)
    input_text = re.sub(r"veinti[\s|-|]*uno", "21", input_text)
    input_text = re.sub("tres", "3", input_text)

    return(input_text)

what is/are the problem/s? Do you know how to populate the lists from `number_info.txt`? What is and what does `n` do? isn't missing a loop? — cards, Sep 08 '22 at 11:58
1) The **problem** is that I don't know how to create the loop so that the first code writes the second code to a file. 2) Regarding **the lists**, how to fill them will depend on how the loop is set up (I was able to perhaps change the lists 3 by 3, perhaps functions, but I'm not sure). 3) I tried to use the variable `n` **to iterate through the lists**. — Matt095, Sep 08 '22 at 17:57
I don't get the 2nd part. Here a way to get `coloquial_numbers, symbolic_numbers = zip(*re.findall(r'\n*([a-z\s]+)\n(\d+)', f.read()`, if you are already using regexs... just use them! — cards, Sep 08 '22 at 18:34
I got the 2nd part but it would helpful to know the rule (I don't know Spanish). If in the `number_info.txt`-file you have `treinta y uno\n31` that what would happen? `input_text = re.sub(r"treinta[\s|-|]*y[\s|-|]*uno", "31", input_text)`? — cards, Sep 08 '22 at 18:43
@cards I meant that perhaps the regex are more useful to delimit, but it is better to identify the words by the order of the lines within the txt. That is why I have indicated that to store `colloquial_numbers` it must be extracted from `#lines 0, 3, 6, 9, 12, 15, 18, ...` — Matt095, Sep 08 '22 at 20:31

cards · Accepted Answer · 2022-09-09T06:15:35.890

1

I omitted the reading/write steps for sake of simplicity. No rule(s) to specify the body of the meta function is given so I did a guess.

import re 

# body-component of the meta-code
body_template = """    input_text = re.sub(r"{}", "{}", input_text)\n"""

# read from file
with open('number_info.txt', 'r') as fd:
    text = fd.read()

# update body
body = ''
for n_text, n in re.findall(r'\n*([a-z\s]+)\n(\d+)', text):
    body += body_template.format(n_text.replace(' ', '[\s|-|]'), n)

# other components of the meta-code
header = """import re

def number_to_text_func(input_text):
"""

tail = """\n    return(input_text)

input_text = str(input())
print(number_to_text_func(input_text))"""

# merge together texts to be saved to file
meta_code = header + body + tail
print(meta_code)

Output (content of number_to_text.py)

import re

def number_to_text_func(input_text):
    input_text = re.sub(r"treinta[\s|-|]y[\s|-|]uno", "31", input_text) # <-
    input_text = re.sub(r"veinti[\s|-|]tres", "23", input_text)
    input_text = re.sub(r"veinti[\s|-|]dos", "22", input_text)
    input_text = re.sub(r"veinti[\s|-|]uno", "21", input_text)
    input_text = re.sub(r"veinte", "20", input_text)
    input_text = re.sub(r"tres", "3", input_text)
    input_text = re.sub(r"dos", "2", input_text)
    input_text = re.sub(r"uno", "1", input_text)

    return(input_text)

input_text = str(input())
print(number_to_text_func(input_text))

From the comments:

read file per line, no regex

with open('number_info.txt', 'r') as fd:
    lines = fd.read().split('\n')

symbolic_numbers, coloquial_numbers = [], []
for i, line in enumerate(lines):
    if i % 3 == 0:
        coloquial_numbers.append(line)
    elif i % 3 == 1:
        symbolic_numbers.append(line)

or read file with slices

with open('number_info.txt', 'r') as fd:
    lines = fd.read().split('\n')

coloquial_numbers = lines[::3]
symbolic_numbers = lines[1::3]

edited Sep 09 '22 at 06:15

answered Sep 08 '22 at 19:14

cards

3,936
1
7
25

@Matias Nicolas Rodriguez if smt not clear or if I misunderstood anything let me know – cards Sep 08 '22 at 19:15
thank you very much, the only thing that remains in my doubt is regarding the subject of the readings of the file – Matt095 Sep 08 '22 at 20:06
I edited with the reading part – cards Sep 08 '22 at 20:11
Instead of using the regex with words and numbers `r'\n*([a-z\s]+)\n(\d+)'` to specify the body of the function, is it possible to do so by reading the **lines 0, 3, 6, 9, 12, 15, 18, ... ** for colloqual numbers (in letters) and the **lines 0+1, 3+1, 6+1, 9+1, 12+1, 15+1, 18+1, ...** for symbolic numbers? because since it is a program based on reading a .txt it is more convenient to identify it by line position – Matt095 Sep 08 '22 at 20:27
I know that the program still works, for this case, but it would really help me to know how to set it by means of the line number within the .txt – Matt095 Sep 08 '22 at 20:32
1

use a modulo 3 arithmetics. Enumerate each line of the file then `i % 3 == 0` -> `coloquial_numbers`, `i % 3 == 1` -> `symbolic_numbers` – cards Sep 08 '22 at 20:47
If it's okay, the issue is that those lines then have to be incorporated into the file that prints in the same way as with the regex – Matt095 Sep 08 '22 at 21:46
the `text` variable, you should add: `line 0`, `line 0+1`, `line 3`, `line 3+1`, `line 6`, `line 6+1`, ... – Matt095 Sep 08 '22 at 21:51
I am lost... could you give an example in terms of python structures? where should these `line 0`, `line 0+1`, `line 3` be stored (and why)? Maybe edit your question with an update (writing in the comments is a bit painful) and make clear what is the input and expected output – cards Sep 08 '22 at 22:09
There I have added more information to the structure of the body of the function within the question – Matt095 Sep 08 '22 at 22:28
Still confused because I don't understand the difference with my approaches. You say _" I suggested separating them into 2 groups"_ but you don't say *how*. With module arithmetic? I will try another way with slices – cards Sep 08 '22 at 22:40
Ah I noticed that there is `r` for raw string which is not for every term. Make sense? – cards Sep 08 '22 at 22:49
the sequence is a line is the number written in letters, the next is the same number but written in numbers, and the next is a blank line, then the same thing is repeated but with another number. The goal is to do the same thing you did with regex but with number of lines, getting this output : `input_text = re.sub(r"line 0", "line 0+1", input_text)` and then `input_text = re.sub(r"line 3", "line 3+1", input_text)`, and then `input_text = re.sub(r"line 6", "line 6+1", input_text)`, ... – Matt095 Sep 08 '22 at 22:51
The `r` only makes sense in cases where the `[\s|-|]` regex is used, in the numbers in letters withot spaces `" "` the `r` is not necesary in the output code – Matt095 Sep 08 '22 at 22:53
I change `lines = fd.read().split()` for `lines = fd.read().split("\n")` to obtain a separation by lines and not by spaces, and with your code `if i % 3 == 0: coloquial_numbers.append(line)` and `elif i % 3 == 1: symbolic_numbers.append(line)` , we have already managed to separate the lines, and the best thing is that each number in letters is in the same position of its string as the position of the number in symbol in the other string, so we could iterate through both strings at the same time(in the **same iteration**) `coloquial_numbers[n]` and `symbolic_numbers[n]` – Matt095 Sep 09 '22 at 00:17
1

yes! that's was the origin of the confusion! sorry for that! I cleared my editor and recoded from scratch so was undetectable for me! – cards Sep 09 '22 at 06:11
I managed to make it work but it only works using the residue criterion through the `%` operator, using the second form of slices `lines[::]` has not worked for me, whether I put those 2 lines of code inside or outside the `with` statement I get the both lists empty. – Matt095 Sep 09 '22 at 07:27
1

[example](https://onlinegdb.com/Y_SHcqPse) – cards Sep 09 '22 at 12:31

How read the correct lines from this text file with a python program, and then create a .py file by filling in the data extracted from the .txt file?

1 Answers1