0

AH! I'm new to Python. Trying to get the pattern here, but could use some assistance to get unblocked.

Scenario:

  • testZip.zip file with test.rpt files inside
  • The .rpt files have multiple areas of interest ("AOI") to parse
  • AOI1: Line starting with $$
  • AOI2: Multiple lines starting with a single $

Goal:

  • To get AOI's into tabular format for upload to SQL

Sample file:

$$ADD ID=TEST BATCHID='YEP' PASSWORD=NOPE
###########################################################################################
$KEY= 9/21/2020 3:53:55 PM/2002/B0/295.30/305.30/4FAOA973_3.0_v2.19.2.0_20150203_1/20201002110149
$TIMESTAMP= 20201002110149
$MORECOLUMNS=  more columns
$YETMORE = yay

Tried so far:

import zipfile

def get_aoi1(zip):
    z = zipfile.ZipFile(zip)
    for f in z.namelist():
        with z.open(f, 'r') as rptf:
            for l in rptf.readlines():
                if l.find(b"$$") != -1:
                    return l

def get_aoi2(zip):
    z = zipfile.ZipFile(zip)
    for f in z.namelist():
        with z.open(f, 'r') as rptf:
            for l in rptf.readlines():
                if l.find(b"$") != -1:
                    return l


aoi1 = get_aoi1('testZip.zip')
aoi2 = get_aoi2('testZip.zip')

print(aoi1)
print(aoi2)

Results:

  • I get the same results for both functions
b"$$ADD ID=TEST BATCHID='YEP' PASSWORD=NOPE\r\n"
b"$$ADD ID=TEST BATCHID='YEP' PASSWORD=NOPE\r\n"

How do I get the results in text instead of bytes (b) and remove the \r\n from AOI1?

  • There doesn't seem to be an r option for z.open()
  • I've been unsuccessful with .strip()

EDIT 1:

  • Thanks for the pep @furas!
  • return l.strip().decode() worked for removing the new line and b

How do I get the correct results from AOI2 (lines with a single $ in a tabular format)?

EDIT 2:

  • @furas 2021!
  • Adding the following logic to aoi2 function worked great.
col_array = []
    for l in rptf.readlines():
        if not l.startswith(b"$$") and l.startswith(b"$"):
            col_array.append(l)
    return col_array
ericOnline
  • 1,586
  • 1
  • 19
  • 54
  • `text = b"...".decode()` – furas Jan 07 '21 at 00:10
  • string `$` you can find also in `$$` so it gives you first line instead of second in `get_aoi2` – furas Jan 07 '21 at 00:12
  • **RE:** `text = b"...".decode()`: Where in the function is this placed? – ericOnline Jan 07 '21 at 00:13
  • what means `unsuccessful` ? Did you get error ? Why didn't show it in question ? – furas Jan 07 '21 at 00:13
  • 1
    use `text = b"...".decode()` whenever you want to convert bytes to string/text – furas Jan 07 '21 at 00:14
  • 1
    it is only example `text = b"...".decode()` - in your situation it can be ie. `text = aoi1.decode()` – furas Jan 07 '21 at 00:15
  • if your data has always `$$` in first line and `$` in third and rest lines then `aoi1 = rptf.readlines()[0]` and `aoi2 = "\t".join( rptf.readlines()[2:] )` – furas Jan 07 '21 at 00:17
  • The `$$` and `$` will not reliably be on static lines. – ericOnline Jan 07 '21 at 00:21
  • Ideas for how to differentiate between a single `$` and multiple `$$`? Do you think `re` is the answer here? – ericOnline Jan 07 '21 at 00:35
  • 1
    I would first check if there is `$$` and later `$` - and this way I would know if there is only one `$` . And to make sure I would use `(not l.startswith('$$')) and l.startswith('$')` – furas Jan 07 '21 at 00:35
  • or `if l[0] == "$" and l[1] != "$"` – furas Jan 07 '21 at 00:43
  • Interesting... `if not l.startswith(b"$$") and l.startswith(b"$"): return l.strip().decode()` only returns the first row `$KEY=...`, no subsequent rows (`$TIMESTAMP=`, `$MORECOLUMNS`, etc.) – ericOnline Jan 07 '21 at 00:45
  • 1
    becuaus you use `return` which finish function at once - after first maching line. You should run loop and add lines to list - and exit function after loop. – furas Jan 07 '21 at 00:49
  • 1
    Bingo! See **Edit 2**. Thanks a bunch! You really helped. – ericOnline Jan 07 '21 at 00:54

0 Answers0