Questions tagged [pandas]

Pandas is a Python library for data manipulation and analysis, e.g. dataframes, multidimensional time series and cross-sectional datasets commonly found in statistics, experimental science results, econometrics, or finance. Pandas is one of the main data science libraries in Python.

Pandas is a Python library for PAN-el DA-ta manipulation and analysis, e.g. multidimensional time series and cross-sectional data sets commonly found in statistics, experimental science results, econometrics, or finance. pandas is implemented primarily using NumPy and Cython; it is intended to be able to integrate very easily with NumPy-based scientific libraries, such as statsmodels.

To create a reproducible Pandas example:

Main Features:

  • Data structures: for one- and two-dimensional labeled datasets (respectively Series and DataFrames). Some of their main features include:
    • Automatically aligning data and interpolation
    • Handling missing observations in calculations
    • Convenient slicing and reshaping ("reindexing") functions
    • Categorical data types
    • Provide 'group by' aggregation or transformation functionality
    • Tools for merging and joining together data sets
    • Simple Matplotlib integration for plotting and graphing
    • Multi-Indexing providing structure to indices that allow for representation of an arbitrary number of dimensions.
  • Date tools: objects for expressing date offsets or generating date ranges. Dates can be aligned to a specific time zone and converted or compared at will
  • Statistical models: convenient ordinary least squares and panel OLS implementations for in-sample or rolling time series and cross-sectional regressions. These will hopefully be the starting point for implementing models
  • Intelligent Cython offloading; complex computations are performed rapidly due to these optimizations.
  • Static and moving statistical tools: mean, standard deviation, correlation, and covariance
  • Rich User Documentation, using Sphinx

Asking Questions:

  • Before asking the question, make sure you have gone through the 10 Minutes to pandas introduction. It covers all the basic functionality of Pandas.
  • See this question on asking good questions: How to make good reproducible pandas examples
  • Please provide the version of Pandas, NumPy, and platform details (if appropriate) in your questions

Answering Questions:

Useful Canonicals:

More FAQs are at this link.

Resources and Tutorials:

Books:

282843 questions
630
votes
17 answers

How to replace NaN values by Zeroes in a column of a Pandas Dataframe?

I have a Pandas Dataframe as below: itm Date Amount 67 420 2012-09-30 00:00:00 65211 68 421 2012-09-09 00:00:00 29424 69 421 2012-09-16 00:00:00 29877 70 421 2012-09-23 00:00:00 30990 71 421 2012-09-30…
George Thompson
  • 6,627
  • 4
  • 16
  • 16
626
votes
13 answers

how to sort pandas dataframe from one column

I have a data frame like this: print(df) 0 1 2 0 354.7 April 4.0 1 55.4 August 8.0 2 176.5 December 12.0 3 95.5 February 2.0 4 85.6 January 1.0 5 152 July 7.0 6 238.7 …
Sachila Ranawaka
  • 39,756
  • 7
  • 56
  • 80
608
votes
5 answers

How can I pivot a dataframe?

What is pivot? How do I pivot? Long format to wide format? I've seen a lot of questions that ask about pivot tables, even if they don't know it. It is virtually impossible to write a canonical question and answer that encompasses all aspects of…
piRSquared
  • 285,575
  • 57
  • 475
  • 624
606
votes
8 answers

Create new column based on values from other columns / apply a function of multiple columns, row-wise in Pandas

I want to apply my custom function (it uses an if-else ladder) to these six columns (ERI_Hispanic, ERI_AmerInd_AKNatv, ERI_Asian, ERI_Black_Afr.Amer, ERI_HI_PacIsl, ERI_White) in each row of my dataframe. I've tried different methods from other…
Dave
  • 6,968
  • 7
  • 26
  • 32
591
votes
5 answers

How to check if a column exists in Pandas

How do I check if a column exists in a Pandas DataFrame df? A B C 0 3 40 100 1 6 30 200 How would I check if the column "A" exists in the above DataFrame so that I can compute: df['sum'] = df['A'] + df['C'] And if "A" doesn't…
npires
  • 6,093
  • 2
  • 13
  • 9
584
votes
7 answers

Filter dataframe rows if value in column is in a set list of values

I have a Python pandas DataFrame rpt: rpt MultiIndex: 47518 entries, ('000002', '20120331') to ('603366', '20091231') Data columns: STK_ID 47518 non-null values STK_Name …
bigbug
  • 55,954
  • 42
  • 77
  • 96
573
votes
12 answers

Remap values in pandas column with a dict, preserve NaNs

I have a dictionary which looks like this: di = {1: "A", 2: "B"} I would like to apply it to the col1 column of a dataframe similar to: col1 col2 0 w a 1 1 2 2 2 NaN to get: col1 col2 0 w a 1 …
TheChymera
  • 17,004
  • 14
  • 56
  • 86
572
votes
18 answers

Convert Python dict into a dataframe

I have a Python dictionary like the following: {u'2012-06-08': 388, u'2012-06-09': 388, u'2012-06-10': 388, u'2012-06-11': 389, u'2012-06-12': 389, u'2012-06-13': 389, u'2012-06-14': 389, u'2012-06-15': 389, u'2012-06-16': 389, …
anonuser0428
  • 11,789
  • 22
  • 63
  • 86
563
votes
8 answers

Selecting a row of pandas series/dataframe by integer index

I am curious as to why df[2] is not supported, while df.ix[2] and df[2:3] both work. In [26]: df.ix[2] Out[26]: A 1.027680 B 1.514210 C -1.466963 D -0.162339 Name: 2000-01-03 00:00:00 In [27]: df[2:3] Out[27]: A …
user1642513
553
votes
11 answers

Get list from pandas dataframe column or row?

I have a dataframe df imported from an Excel document like this: cluster load_date budget actual fixed_price A 1/1/2014 1000 4000 Y A 2/1/2014 12000 10000 Y A 3/1/2014 36000 2000 Y B 4/1/2014 15000 10000 …
yoshiserry
  • 20,175
  • 35
  • 77
  • 104
552
votes
16 answers

How to group dataframe rows into list in pandas groupby

I have a pandas data frame df like: a b A 1 A 2 B 5 B 5 B 4 C 6 I want to group by the first column and get second column as lists in rows: A [1,2] B [5,5,4] C [6] Is it possible to do something like this using pandas groupby?
Abhishek Thakur
  • 16,337
  • 15
  • 66
  • 97
550
votes
13 answers

Pandas read_csv: low_memory and dtype options

df = pd.read_csv('somefile.csv') ...gives an error: .../site-packages/pandas/io/parsers.py:1130: DtypeWarning: Columns (4,5,7,16) have mixed types. Specify dtype option on import or set low_memory=False. Why is the dtype option related to…
Josh
  • 11,979
  • 17
  • 60
  • 96
549
votes
14 answers

How to select all columns except one in pandas?

I have a dataframe that look like this: a b c d 0 0.418762 0.042369 0.869203 0.972314 1 0.991058 0.510228 0.594784 0.534366 2 0.407472 0.259811 0.396664 0.894202 3 0.726168 0.139531 0.324932 …
markov zain
  • 11,987
  • 13
  • 35
  • 39
548
votes
8 answers

How can I use the apply() function for a single column?

I have a pandas dataframe with multiple columns. I want to change the values of the only the first column without affecting the other columns. How can I do that using apply() in pandas?
Amani
  • 16,245
  • 29
  • 103
  • 153
548
votes
3 answers

How to reset index in a pandas dataframe?

I have a dataframe from which I remove some rows. As a result, I get a dataframe in which index is something like that: [1,5,6,10,11] and I would like to reset it to [0,1,2,3,4]. How can I do it? The following seems to work: df =…
Roman
  • 124,451
  • 167
  • 349
  • 456