-2

1.I have a file named rexp.txt with the following content:

adf fdsf hh  h fg h 1995-11-23
dasvsbh 2000-04-12 gnym,mnbv 2001-02-17
dascvfbsn
bjhmndgfh
xgfdjnfhm244-44-2255  fgfdsg gfjhkh
fsgfdh 455-44-6577 dkjgjfkld
sgf
dgfdhj 
sdg 192.6.8.02 fdhdlk dfnfghr
fisdhfih dfhghihg 154.56.2.6 fdhusdgv
aff fjhgdf 
fdfdnfjgkpg
 fdf hgj  fdnbk gjdhgj 

dfdfg raeh95@gmail.com efhidhg  fdfuga reg@gmail.com
ergudfi rey@gmail.com iugftudfh dgufidjfdg
teeeee@gmail.comugfuhlfhs fgufif p

2.I want to extract the ssn number, date, e-mail line by line. I'm expecting code that loops through every line and returns the expected strings.

3.Correct the coding in Python:

import re
def cfor_date(str):
    t=re.search(r'(\d{4}-\d{2}-\d{2})',str)
    return t

def cfor_ssn(str):
    f=re.search(r'(\d{3}-\d{2}-\d{4})',str)
    return f

def cfor_gm(str):
    g=re.search(r'([\w\.-]+@gmail[\w{3}\.-]+)',str)
    return g

f = open("rexp.txt","r").read()
lines = f.splitlines()
for line in iter(lines):
    x=line.split(" ")
    print x
    if (cfor_date(x)) != None: # i feel problem here
        r=cfor_ssn(x)
        print r
Dinesh Pundkar
  • 4,160
  • 1
  • 23
  • 37
raenish
  • 1
  • 1

1 Answers1

1
  • You are opening file, reading it completely, then splitting what is read into list using splitlines() and then iterating over that list. Too much long and complicated process. Also, file is not closed after it was read.
  • Instead of this, why not open file using with construct and then read file completely using readlines(). No need to split lines and no need to worry of closing file.
  • In your code, once you started iterating line by line, you are again splitting line on basis of single space and then you are passing the output of split which will be list to you functions to extract date/email/ssn. Here where the problem is.
  • No need to split line on basis of spaces. Pass the line directly to functions to extract the data.
  • Your regular expression are good. I didn't modify it.
  • I have replaced the search function with findall function. Difference between both is explained in below example.
 >>> import re
 >>> a = "Dinesh 123"

 >>> t = re.search(r"\d+",a)

 >>> t <_sre.SRE_Match object at 0x01FE3918>

 >>> t.group() 
 >>> '123'


 >>> x = re.findall(r'\d+',a)
 >>> x
 >>> ['123']

For more help, check this link !!!

All above points are present in below code :

Code:

import re
def cfor_date(tmp_line):
    t=re.findall(r'(\d{4}-\d{2}-\d{2})',tmp_line)
    return t

def cfor_ssn(tmp_line):
    f=re.findall(r'(\d{3}-\d{2}-\d{4})',tmp_line)
    return f

def cfor_gm(tmp_line):
    g=re.findall(r'([\w\.-]+@gmail[\w{3}\.-]+)',tmp_line)
    return g

with open("xyz.txt","r") as fh:
    for line in fh.readlines():
        date_list = cfor_date(line)
        ssn_list = cfor_ssn(line)
        gm_list = cfor_gm(line)

        if len(ssn_list) != 0:
            print ssn_list
        if len(date_list) != 0:
            print date_list
        if len(gm_list) != 0 :
            print gm_list

Output :

C:\Users\dinesh_pundkar\Desktop>python c.py
['1995-11-23']
['2000-04-12', '2001-02-17']
['244-44-2255']
['455-44-6577']
['raeh95@gmail.com', 'reg@gmail.com']
['rey@gmail.com']
['teeeee@gmail.comugfuhlfhs']

C:\Users\dinesh_pundkar\Desktop>
Dinesh Pundkar
  • 4,160
  • 1
  • 23
  • 37
  • Can you share data where you get repeated lists? – Dinesh Pundkar Sep 22 '16 at 15:58
  • Share file on google drive or drop box. Just give me link – Dinesh Pundkar Sep 22 '16 at 16:15
  • After reading file and printing some stuffs in previous program, i need a seperate user defined function that loops over those lines and extracts me ['1995-11-23' , '244-44-2255'] in first line.['2000-04-12', '2001-02-17','455-44-6577'] in second line and loop goes on... – raenish Sep 27 '16 at 16:09
  • Can you please give the line which have above data? – Dinesh Pundkar Sep 28 '16 at 04:41
  • output of previous program: ['1995-11-23'] ['2000-04-12', '2001-02-17'] ['244-44-2255'] ['455-44-6577'].I need seperate user defined function to pick ['1995-11-23' , '244-44-2255'] in first line.['2000-04-12', '2001-02-17','455-44-6577'] in second line and loop goes on... – raenish Sep 28 '16 at 13:42