0

I am working on a dataset from Kaggle and I want to extract the titles of a Pandas column with names. I use the following code:

    def extract_patt(patt, linea):
        matchObj = re.match(patt, linea)
        result = ""
        if matchObj:
            return matchObj.group(1).lower()
        else:
            return ""

    def extract_title(linea):
        return extract_patt('^.+,\s(.+)\..+', linea)

    titles = dataframe1["Name"].apply(extract_title)

    title_mapping = {"": 0, "mr": 1, "miss": 2, "mrs": 3, "master": 4, "dr": 5, "rev": 6, "major": 7, "col": 7, "mlle": 8, "mme": 8, "don": 9, "lady": 10, "countess": 10, "jonkheer": 10, "sir": 9, "capt": 7, "ms": 2}

    for k in title_mapping:
        titles[titles == k] = title_mapping[k]

    dataframe1["Title"] = titles

However, when I run this code on Azure Machine Learning platform as a Python code, I have the following error:

Error 0085: The following error occurred during script evaluation, please view the output log for more information:
 ---------- Start of error message from Python interpreter ----------
 data:text/plain,Caught exception while executing function: Traceback (most recent call last):
   File "C:\server\invokepy.py", line 176, in batch
     rutils.RUtils.DataFrameToRFile(outlist[i], outfiles[i])
   File "C:\server\RReader\rutils.py", line 28, in DataFrameToRFile
     rwriter.write_attribute_list(attributes)
   File "C:\server\RReader\rwriter.py", line 59, in write_attribute_list
     self.write_object(value);
   File "C:\server\RReader\rwriter.py", line 121, in write_object
     write_function(flags, value.values())
   File "C:\server\RReader\rwriter.py", line 104, in write_objects
     self.write_object(value)
   File "C:\server\RReader\rwriter.py", line 121, in write_object
     write_function(flags, value.values())
   File "C:\server\RReader\rwriter.py", line 71, in write_integers
     self.write_integer(value)
   File "C:\server\RReader\rwriter.py", line 147, in write_integer
     self.writer.WriteInt32(value)
   File "C:\server\RReader\BinaryIO\binarywriter.py", line 23, in WriteInt32
     self.WriteData(self.Int32Format, data)
   File "C:\server\RReader\BinaryIO\binarywriter.py", line 14, in WriteData
     self.stream.write(pack(format, data))
 error: cannot convert argument to integer

 ---------- End of error message from Python  interpreter  ----------
Start time: UTC 09/29/2015 07:47:02
End time: UTC 09/29/2015 07:47:13

The issue may be in the mapping code because if I remove this, I have a column with the titles instead of the integers.

Edit: I also tried the following instead of the for loop to map, but I had the same error:

dataframe1["Title"].replace(title_mapping, inplace=True)
cchamberlain
  • 17,444
  • 7
  • 59
  • 72
Tasos
  • 7,325
  • 18
  • 83
  • 176

2 Answers2

0

Per my experience, the issue code is titles == k in the code titles[titles == k] = title_mapping[k]. The value type of expression titles == kis boolean type.

In Python, the boolean type is a kind of integer value type. The False value is equal with 0, and all non-zero integer is True value.

But the value type of the key of map 'titles' should be the string type so that the error message is "cannot convert argument to integer".

Best Regards.

Peter Pan
  • 23,476
  • 4
  • 25
  • 43
0

I've run into the same problem, also with the Titanic data set. I dropped the ticket and cabin number columns using Azure's built in 'Project Columns' first, then pushed the file into a Python script, and now it works.

idk what in those columns bothered it? Some one posted a message else where that empty values in the first row might be a problem and MS says a bug fix is coming.

Linda MacPhee-Cobb
  • 7,646
  • 3
  • 20
  • 18
  • Indeed, the first line was empty (NaN) and when I manually removed it before uploading the dataset, the error is not there anymore. – Tasos Dec 01 '15 at 09:41