4

I want to find the datatype of all the variables in a csv file in python. In R we can achieve the same using str() command .

str(data_frame)

this gives an output like this

> str(train)
'data.frame':   891 obs. of  12 variables:
 $ PassengerId: int  1 2 3 4 5 6 7 8 9 10 ...
 $ Survived   : int  0 1 1 1 0 0 0 0 1 1 ...
 $ Pclass     : int  3 1 3 1 3 3 1 3 3 2 ...
 $ Name       : Factor w/ 891 levels "Abbing, Mr. Anthony",..: 109 191 358 277 16 559 520 629 417 581 ...
 $ Sex        : Factor w/ 2 levels "female","male": 2 1 1 1 2 2 2 2 1 1 ...
 $ Age        : num  22 38 26 35 35 NA 54 2 27 14 ...
 $ SibSp      : int  1 1 0 1 0 0 0 3 0 1 ...
 $ Parch      : int  0 0 0 0 0 0 0 1 2 0 ...
 $ Ticket     : Factor w/ 681 levels "110152","110413",..: 524 597 670 50 473 276 86 396 345 133 ...
 $ Fare       : num  7.25 71.28 7.92 53.1 8.05 ...
 $ Cabin      : Factor w/ 148 levels "","A10","A14",..: 1 83 1 57 1 1 131 1 1 1 ...
 $ Embarked   : Factor w/ 4 levels "","C","Q","S": 4 2 4 4 4 3 4 4 4 2 ...

is there a similar way in python?

PM 2Ring
  • 54,345
  • 6
  • 82
  • 182
Arun
  • 625
  • 3
  • 10
  • 20
  • *all the variables in a table* What is *a table* ? – Remi Guan Nov 14 '15 at 02:57
  • @KevinGuan - it is a csv file. – Arun Nov 14 '15 at 02:59
  • There's no automated way to do this in Python but you can do it manually with the .format() directives – Silvio Mayolo Nov 14 '15 at 03:11
  • @SilvioMayolo - thank you for the information – Arun Nov 14 '15 at 03:16
  • 1
    I think `df.info()` with `pandas` is similar. More here http://pandas.pydata.org/pandas-docs/version/0.17.0/dsintro.html#console-display – Pierre L Nov 14 '15 at 03:22
  • @PierreLafortune - Thank you. It was a good pointer for me. – Arun Nov 14 '15 at 03:35
  • Here are three similar questions with answers [first](http://stackoverflow.com/questions/27749573/is-there-a-python-equivalent-of-rs-str-returning-only-the-structure-of-an-ob), [second](http://stackoverflow.com/questions/27637281/what-are-python-pandas-equivalents-for-r-functions-like-str-summary-and-he), [third](http://stackoverflow.com/questions/28161621/how-to-inspect-a-numpy-pandas-object-i-e-str-in-r) – Pierre L Nov 14 '15 at 03:36
  • In future, please mention that you are using pandas (and/or tag the question with the pandas tag) when you ask a pandas question. – PM 2Ring Nov 15 '15 at 15:45

2 Answers2

3

You probably want dtypes

>>> import pandas as pd
>>> df = pd.DataFrame({'foo': [1, 2, 3], 'bar': [1.0, 2.0, 3.0], 'baz': ['qux', 'quux', 'quuux']})
>>> df.dtypes
bar    float64
baz     object
foo      int64
dtype: object
mattexx
  • 6,456
  • 3
  • 36
  • 47
1

An easy way to tell if a string represents a valid int is to simply attempt to convert the string to an int and catch the ValueError exception if it isn't a legal int. Similarly with float. Here's a brief demo in Python 2:

data = 'string 37 3.14159 word -5 0 -1.4142 text'

def datatype(s):
    try:
        int(s)
    except ValueError:
        try:
            float(s)
        except ValueError:
            return 'string'
        else:
            return 'float'
    else:
        return 'int'

for s in data.split():
    print '%-15r: %s' % (s, datatype(s))

output

'string'       : string
'37'           : int
'3.14159'      : float
'word'         : string
'-5'           : int
'0'            : int
'-1.4142'      : float
'text'         : string

However, normal Python code (generally) wouldn't use a function quite like that: it would assume that the data is correct and wrap the conversion code in a simple try: ... except ValueError:... else: block rather than using that crazy nested structure to test data before you're ready to process it.

A sensible CSV won't have different datatypes in random positions, so your code shouldn't need to guess what type of data is in a give field. OTOH, not all CSV's are well-designed... :)

PM 2Ring
  • 54,345
  • 6
  • 82
  • 182
  • Also, take a look at [`ast.literal_eval`](https://docs.python.org/3/library/ast.html#ast.literal_eval) to handle all Python literals (without using unsafe `eval`). Let's you do a single call to convert to either `int` or `float` as appropriate. – ShadowRanger Nov 14 '15 at 03:23
  • The question talks about a simple CSV file, so I'm just showing a simple solution that doesn't require advanced modules that a new Python programmer shouldn't need to bother with (like `ast`), or 3rd-party modules. If they wanted a solution using `pandas` (for example), then they could have mentioned that in the question (and added the appropriate tag). – PM 2Ring Nov 14 '15 at 03:34
  • `ast` as a whole: Nuts, agreed. `ast.literal_eval` specifically is incredibly useful. Not saying it needs to be in the answer, but people should know about it since it's not at all easy to find otherwise. – ShadowRanger Nov 14 '15 at 03:46
  • @ShadowRanger: Good point, `ast.literal_eval` certainly does have its uses. OTOH, new Python programmers need to know about the standard techniques using `try:..except`, and the [EAFP](http://stackoverflow.com/a/11360880/4014959) principle, IMHO. – PM 2Ring Nov 14 '15 at 03:50