-3

I'm trying to split a string with leading 0s, and I got a lot of invalid tokens. I understand python will interpret leading 0 as octal number. What I want to do is to split "Jan-01,2005" into ['Jan-01','2005']. I tried to convert it to string, but I still have the same error. What I did was,

def split_fileB(line):
     first=str(line.split(',')) 
     return first

Does anyone know how to keep the format?

Two-Bit Alchemist
  • 17,966
  • 6
  • 47
  • 82
  • 2
    Your question contradicts itself. "What I want to do is split" -- which is a string operation -- "`'Jan-01,2005'`" -- which looks like a string, then suddenly: "I tried to convert it to string". Well what is it? – Two-Bit Alchemist Nov 19 '15 at 18:17
  • `"Jan-01,2005".split(',')` works fine. If this is not the actual data you are seeing, you need to give more information. You need to give an example of a full line, because this works just fine. – Dan Nov 19 '15 at 18:17
  • Also `str(some_string.split(','))` is going to get you a stringified list, which is almost certainly not what you want. – Two-Bit Alchemist Nov 19 '15 at 18:19
  • I see what's wrong here. I just entered jan-01,2005 instead of "jan-01,2005". Thanks! – Qianqi Shen Nov 19 '15 at 18:22

1 Answers1

0

You may use the python regexp module to split your string, and find the occurrences of substrings composed of one or more digits.

import re
pattern = re.compile("([0-9]+)")

s = "foo bar Jan-01 03-56, blah"
toks = pattern.split(s)

# toks is ['foo bar Jan-', '01', ' ', '03', '-', '56', ', blah']

If your format is exactly "MMM-DD,YYYY", then you may use something like this (adapted from the question). I assume you are trying to extract the day out of this?

def get_day_number(line):
   month_day, year = line.split(",", 1) # '1' splits at most once
   month, day = month_day.split("-", 1)
   return int(day, 10)

The octal number problem you mention won't happen until you attempt to convert a string to an integer using int(s). You can force the integer conversion to use decimal by specifying a base explicitly, a general good practice in python.

s = "010"
i = int(s, 10)
print i # 10
init_js
  • 4,143
  • 2
  • 23
  • 53