1

I am working on exporting data from Python to an SQL database, and for performance reasons I'm trying to ensure that the data I'm exporting is registered as having the correct type. Therefore, I'm trying to create a Pandas Series of my data, having the correct data type. I assume that calling dtype on a pd.Series object yields the data of its underlying elements. I'm having trouble getting this to work for string data.

Here's a code sample demonstrating the problem:

orig_data_string = ['abc'] * 10
pd_data_string = pd.Series(orig_data_string)
pd_data_string.dtype

Running the above in a Python console yields dtype('O'), which I take to indicate an object type. What I would like was for this to be string instead. Now, I can do something similar with numerical values:

orig_data_float = [1.23] * 10
pd_data_float = pd.Series(orig_data_float)
pd_data_float.dtype

and in this case, I get the result dtype('float64'), so Pandas in this case has correctly inferred the data type from the list input. If I try pd.Series(orig_data_string).astype(str), I get the same result. How can I create a Pandas Series object with underlying data type str from a list of strings?

Alexander Sokol
  • 681
  • 4
  • 13
  • 1
    Strings are represented as `O` in Series. So if you get `dtype('O')`, it means it IS a string. – Mohit Motwani Nov 20 '18 at 12:39
  • Are you sure? The top-rated answer to this question: https://stackoverflow.com/questions/37561991/what-is-dtypeo seems to indicate that `dtype('S')` would indicate a string? – Alexander Sokol Nov 20 '18 at 12:42
  • 2
    @AlexanderSokol - It is difference between `dtypes` and `types`, check linked answer for difference [link](https://stackoverflow.com/a/42672574/2901002) – jezrael Nov 20 '18 at 12:46

1 Answers1

4

Pandas object O dtype treated as str itself. Please refer the below example.

>>> df = pd.DataFrame({'float': [1.0],
...                    'int': [1],
...                    'datetime': [pd.Timestamp('20180310')],
...                    'string': ['foo']})
>>> df.dtypes
float              float64
int                  int64
datetime    datetime64[ns]
string              object
dtype: object

Reference: https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.DataFrame.dtypes.html

Srce Cde
  • 1,764
  • 10
  • 15