Questions tagged [pandas]

Pandas is a Python library for data manipulation and analysis, e.g. dataframes, multidimensional time series and cross-sectional datasets commonly found in statistics, experimental science results, econometrics, or finance. Pandas is one of the main data science libraries in Python.

Pandas is a Python library for PAN-el DA-ta manipulation and analysis, e.g. multidimensional time series and cross-sectional data sets commonly found in statistics, experimental science results, econometrics, or finance. pandas is implemented primarily using NumPy and Cython; it is intended to be able to integrate very easily with NumPy-based scientific libraries, such as statsmodels.

To create a reproducible Pandas example:

Main Features:

Data structures: for one- and two-dimensional labeled datasets (respectively Series and DataFrames). Some of their main features include:
- Automatically aligning data and interpolation
- Handling missing observations in calculations
- Convenient slicing and reshaping ("reindexing") functions
- Categorical data types
- Provide 'group by' aggregation or transformation functionality
- Tools for merging and joining together data sets
- Simple Matplotlib integration for plotting and graphing
- Multi-Indexing providing structure to indices that allow for representation of an arbitrary number of dimensions.
Date tools: objects for expressing date offsets or generating date ranges. Dates can be aligned to a specific time zone and converted or compared at will
Statistical models: convenient ordinary least squares and panel OLS implementations for in-sample or rolling time series and cross-sectional regressions. These will hopefully be the starting point for implementing models
Intelligent Cython offloading; complex computations are performed rapidly due to these optimizations.
Static and moving statistical tools: mean, standard deviation, correlation, and covariance
Rich User Documentation, using Sphinx

Asking Questions:

Before asking the question, make sure you have gone through the 10 Minutes to pandas introduction. It covers all the basic functionality of Pandas.
See this question on asking good questions: How to make good reproducible pandas examples
Please provide the version of Pandas, NumPy, and platform details (if appropriate) in your questions

Answering Questions:

How can I effectively load data on Stack Overflow questions using Pandas read_clipboard? (useful for copy pasting data from questions into your terminal as DataFrames)
Copying MultiIndex dataframes with pd.read_clipboard?

Useful Canonicals:

Resources and Tutorials:

Books:

282843 questions

630

votes

17 answers

How to replace NaN values by Zeroes in a column of a Pandas Dataframe?

I have a Pandas Dataframe as below: itm Date Amount 67 420 2012-09-30 00:00:00 65211 68 421 2012-09-09 00:00:00 29424 69 421 2012-09-16 00:00:00 29877 70 421 2012-09-23 00:00:00 30990 71 421 2012-09-30…

asked Nov 08 '12 at 18:50

George Thompson

6,627
4
16
16

626

votes

13 answers

how to sort pandas dataframe from one column

I have a data frame like this: print(df) 0 1 2 0 354.7 April 4.0 1 55.4 August 8.0 2 176.5 December 12.0 3 95.5 February 2.0 4 85.6 January 1.0 5 152 July 7.0 6 238.7 …

python pandas dataframe sorting time

asked Jun 13 '16 at 10:44

Sachila Ranawaka

39,756
7
56
80

608

votes

5 answers

How can I pivot a dataframe?

What is pivot? How do I pivot? Long format to wide format? I've seen a lot of questions that ask about pivot tables, even if they don't know it. It is virtually impossible to write a canonical question and answer that encompasses all aspects of…

python pandas group-by pivot

asked Nov 07 '17 at 08:00

piRSquared

285,575
57
475
624

606

votes

8 answers

Create new column based on values from other columns / apply a function of multiple columns, row-wise in Pandas

I want to apply my custom function (it uses an if-else ladder) to these six columns (ERI_Hispanic, ERI_AmerInd_AKNatv, ERI_Asian, ERI_Black_Afr.Amer, ERI_HI_PacIsl, ERI_White) in each row of my dataframe. I've tried different methods from other…

python pandas dataframe numpy apply

asked Nov 12 '14 at 12:08

Dave

6,968
7
26
32

591

votes

5 answers

How to check if a column exists in Pandas

How do I check if a column exists in a Pandas DataFrame df? A B C 0 3 40 100 1 6 30 200 How would I check if the column "A" exists in the above DataFrame so that I can compute: df['sum'] = df['A'] + df['C'] And if "A" doesn't…

python pandas dataframe

asked Jul 21 '14 at 16:43

npires

6,093
2
13
9

584

votes

7 answers

Filter dataframe rows if value in column is in a set list of values

I have a Python pandas DataFrame rpt: rpt MultiIndex: 47518 entries, ('000002', '20120331') to ('603366', '20091231') Data columns: STK_ID 47518 non-null values STK_Name …

python pandas dataframe

asked Aug 22 '12 at 03:16

bigbug

55,954
42
77
96

573

votes

12 answers

Remap values in pandas column with a dict, preserve NaNs

I have a dictionary which looks like this: di = {1: "A", 2: "B"} I would like to apply it to the col1 column of a dataframe similar to: col1 col2 0 w a 1 1 2 2 2 NaN to get: col1 col2 0 w a 1 …

python pandas dataframe dictionary remap

asked Nov 27 '13 at 18:56

TheChymera

17,004
14
56
86

572

votes

18 answers

Convert Python dict into a dataframe

I have a Python dictionary like the following: {u'2012-06-08': 388, u'2012-06-09': 388, u'2012-06-10': 388, u'2012-06-11': 389, u'2012-06-12': 389, u'2012-06-13': 389, u'2012-06-14': 389, u'2012-06-15': 389, u'2012-06-16': 389, …

python pandas dataframe

asked Sep 16 '13 at 21:02

anonuser0428

11,789
22
63
86

563

votes

8 answers

Selecting a row of pandas series/dataframe by integer index

I am curious as to why df[2] is not supported, while df.ix[2] and df[2:3] both work. In [26]: df.ix[2] Out[26]: A 1.027680 B 1.514210 C -1.466963 D -0.162339 Name: 2000-01-03 00:00:00 In [27]: df[2:3] Out[27]: A …

python pandas dataframe indexing

asked Apr 19 '13 at 03:14

user1642513

553

votes

11 answers

Get list from pandas dataframe column or row?

I have a dataframe df imported from an Excel document like this: cluster load_date budget actual fixed_price A 1/1/2014 1000 4000 Y A 2/1/2014 12000 10000 Y A 3/1/2014 36000 2000 Y B 4/1/2014 15000 10000 …

python pandas list dataframe

asked Mar 12 '14 at 03:12

yoshiserry

20,175
35
77
104

552

votes

16 answers

How to group dataframe rows into list in pandas groupby

I have a pandas data frame df like: a b A 1 A 2 B 5 B 5 B 4 C 6 I want to group by the first column and get second column as lists in rows: A [1,2] B [5,5,4] C [6] Is it possible to do something like this using pandas groupby?

python pandas list aggregate pandas-groupby

asked Mar 06 '14 at 08:31

Abhishek Thakur

16,337
15
66
97

550

votes

13 answers

Pandas read_csv: low_memory and dtype options

df = pd.read_csv('somefile.csv') ...gives an error: .../site-packages/pandas/io/parsers.py:1130: DtypeWarning: Columns (4,5,7,16) have mixed types. Specify dtype option on import or set low_memory=False. Why is the dtype option related to…

python parsing numpy pandas dataframe

asked Jun 16 '14 at 19:56

Josh

11,979
17
60
96

549

votes

14 answers

How to select all columns except one in pandas?

I have a dataframe that look like this: a b c d 0 0.418762 0.042369 0.869203 0.972314 1 0.991058 0.510228 0.594784 0.534366 2 0.407472 0.259811 0.396664 0.894202 3 0.726168 0.139531 0.324932 …

python pandas dataframe select

asked Apr 21 '15 at 05:24

markov zain

11,987
13
35
39

548

votes

8 answers

How can I use the apply() function for a single column?

I have a pandas dataframe with multiple columns. I want to change the values of the only the first column without affecting the other columns. How can I do that using apply() in pandas?

python pandas dataframe numpy apply

asked Jan 23 '16 at 10:04

Amani

16,245
29
103
153

548

votes

3 answers

How to reset index in a pandas dataframe?

I have a dataframe from which I remove some rows. As a result, I get a dataframe in which index is something like that: [1,5,6,10,11] and I would like to reset it to [0,1,2,3,4]. How can I do it? The following seems to work: df =…

python indexing pandas dataframe

asked Dec 10 '13 at 09:12

Roman

124,451
167
349
456

Prev 1 2 3

…

99 100 Next