1

I want my file to get parsed by url but some of the url have missing parameters and when I am iterating through lines of log I am getting error of missing parameter. I need to append blank or null value into in parse list so that I can transform it into data frame

My Data File : log file

"GET /pixel.gife=heartbeat&creative_id=33548&in_view_time=290"
"GET/pixel.gife=heartbeat&creative_id=33548&in_view_time=23988"
"GET /pixel.gif?e=heartbeat&creative_id=33548&in_view_time=19183"
"GET /pixel.gif?e=ad_load&creative_id=33548"

I want Output as :

   E |  Creative ID | IN VIEW TIME

   heartbeat   33548    290

   heartbeat 33548 23988

   ad_load 33548 null

My Code:

parselist = []
for eachline in log.readlines():
    ip_regex = re.findall(r'(\d{18})', eachline)
    date = re.findall(r'([0-9]{4}\-[0-9]{2}\-[0-9]{2})',eachline)
    url = eachline
    parsed = urlparse.urlparse(url)
    parselist.append(ip_regex)
    parselist.append(date)
    parselist.append(urlparse.parse_qs(parsed.query)['e'])
    parselist.append(urlparse.parse_qs(parsed.query)['account_id'])
    parselist.append(urlparse.parse_qs(parsed.query)['impression_id'])
    parselist.append(urlparse.parse_qs(parsed.query)['campaign_id'])
    parselist.append(urlparse.parse_qs(parsed.query)['creative_id'])
    parselist.append(urlparse.parse_qs(parsed.query)['in_view_time'])

Error I am getting as in_view_time parameter is missing in third line:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-6-405c1bfb329e> in <module>()
     12     parselist.append(urlparse.parse_qs(parsed.query)['campaign_id'])
     13     parselist.append(urlparse.parse_qs(parsed.query)['creative_id'])
---> 14     parselist.append(urlparse.parse_qs(parsed.query)['in_view_time'])

KeyError: 'in_view_time'
ni3
  • 43
  • 2
  • 6

2 Answers2

0

You can use try and except:

parselist = []
for eachline in log.readlines():
    ip_regex = re.findall(r'(\d{18})', eachline)
    date = re.findall(r'([0-9]{4}\-[0-9]{2}\-[0-9]{2})',eachline)
    url = eachline
    parsed = urlparse.urlparse(url)
    parselist.append(ip_regex)
    parselist.append(date)
    try:
        parselist.append(urlparse.parse_qs(parsed.query)['e'])
    except:
        parselist.append('Null')
    try:
        parselist.append(urlparse.parse_qs(parsed.query)['account_id'])
    except:
        parselist.append('Null')
    try:
        parselist.append(urlparse.parse_qs(parsed.query)['impression_id'])
    except:
        parselist.append('Null')
    try:
        parselist.append(urlparse.parse_qs(parsed.query)['campaign_id'])
    except:
        parselist.append('Null')
    try:
        parselist.append(urlparse.parse_qs(parsed.query)['creative_id'])
    except:
        parselist.append('Null')
    try:
        parselist.append(urlparse.parse_qs(parsed.query)['in_view_time'])
    except:
        parselist.append('Null')

Or, in a more compact way:

parselist = []
for eachline in log.readlines():
    ip_regex = re.findall(r'(\d{18})', eachline)
    date = re.findall(r'([0-9]{4}\-[0-9]{2}\-[0-9]{2})',eachline)
    url = eachline
    parsed = urlparse.urlparse(url)
    parselist.append(ip_regex)
    parselist.append(date)

    for key in ['e','account_id','impression_id','campaign_id','creative_id','in_view_time']:
        try:
            parselist.append(urlparse.parse_qs(parsed.query)[key])
        except:
            parselist.append('Null')

As a suggestion, instead of 'Null' you can append None.

Diego
  • 1,232
  • 17
  • 20
-1
  1. Why are you creating a list (where you're losing key and just storing values)?
  2. If you are just interested in the values then you can simply write following:
for v in urlparse.parse_qs(parsed.query).values():
    parselist.append(v)
Dhia
  • 10,119
  • 11
  • 58
  • 69
Ronak S
  • 1
  • 1
  • This does not provide an answer to the question. Once you have sufficient [reputation](https://stackoverflow.com/help/whats-reputation) you will be able to [comment on any post](https://stackoverflow.com/help/privileges/comment); instead, [provide answers that don't require clarification from the asker](https://meta.stackexchange.com/questions/214173/why-do-i-need-50-reputation-to-comment-what-can-i-do-instead). - [From Review](/review/low-quality-posts/18234094) – Ivan Semochkin Dec 12 '17 at 19:42
  • fyi, dictionary (which urlparse.parse_qs(parsed.query) is), can be checked using `if 'in_view_time' in urlparse.parse_qs(parsed.query): parselist.append(urlparse.parse_qs(parsed.query)['in_view_time']) else: parselist.append('Null') ` You can do this in Diego's compact version too. – Ronak S Dec 12 '17 at 20:45