Questions tagged [pandas]

Pandas is a Python library for data manipulation and analysis, e.g. dataframes, multidimensional time series and cross-sectional datasets commonly found in statistics, experimental science results, econometrics, or finance. Pandas is one of the main data science libraries in Python.

Pandas is a Python library for PAN-el DA-ta manipulation and analysis, e.g. multidimensional time series and cross-sectional data sets commonly found in statistics, experimental science results, econometrics, or finance. pandas is implemented primarily using NumPy and Cython; it is intended to be able to integrate very easily with NumPy-based scientific libraries, such as statsmodels.

To create a reproducible Pandas example:

Main Features:

Data structures: for one- and two-dimensional labeled datasets (respectively Series and DataFrames). Some of their main features include:
- Automatically aligning data and interpolation
- Handling missing observations in calculations
- Convenient slicing and reshaping ("reindexing") functions
- Categorical data types
- Provide 'group by' aggregation or transformation functionality
- Tools for merging and joining together data sets
- Simple Matplotlib integration for plotting and graphing
- Multi-Indexing providing structure to indices that allow for representation of an arbitrary number of dimensions.
Date tools: objects for expressing date offsets or generating date ranges. Dates can be aligned to a specific time zone and converted or compared at will
Statistical models: convenient ordinary least squares and panel OLS implementations for in-sample or rolling time series and cross-sectional regressions. These will hopefully be the starting point for implementing models
Intelligent Cython offloading; complex computations are performed rapidly due to these optimizations.
Static and moving statistical tools: mean, standard deviation, correlation, and covariance
Rich User Documentation, using Sphinx

Asking Questions:

Before asking the question, make sure you have gone through the 10 Minutes to pandas introduction. It covers all the basic functionality of Pandas.
See this question on asking good questions: How to make good reproducible pandas examples
Please provide the version of Pandas, NumPy, and platform details (if appropriate) in your questions

Answering Questions:

How can I effectively load data on Stack Overflow questions using Pandas read_clipboard? (useful for copy pasting data from questions into your terminal as DataFrames)
Copying MultiIndex dataframes with pd.read_clipboard?

Useful Canonicals:

Resources and Tutorials:

Books:

282843 questions

778

votes

11 answers

Get statistics for each group (such as count, mean, etc) using pandas GroupBy?

I have a dataframe df and I use several columns from it to groupby: df['col1','col2','col3','col4'].groupby(['col1','col2']).mean() In the above way, I almost get the table (dataframe) that I need. What is missing is an additional column that…

asked Oct 15 '13 at 15:00

Roman

124,451
167
349
456

772

votes

24 answers

Set value for particular cell in pandas DataFrame using index

I have created a Pandas DataFrame df = DataFrame(index=['A','B','C'], columns=['x','y']) and have got this x y A NaN NaN B NaN NaN C NaN NaN Now, I would like to assign a value to particular cell, for example to row C and column x. I…

python pandas dataframe cell nan

asked Dec 12 '12 at 14:40

Mitkp

7,800
3
14
8

766

votes

23 answers

Constructing pandas DataFrame from values in variables gives "ValueError: If using all scalar values, you must pass an index"

This may be a simple question, but I can not figure out how to do this. Lets say that I have two variables as follows. a = 2 b = 3 I want to construct a DataFrame from this: df2 = pd.DataFrame({'A':a,'B':b}) This generates an error: ValueError:…

python pandas dataframe scalar

asked Jul 24 '13 at 16:40

Nilani Algiriyage

32,876
32
87
121

751

votes

20 answers

Import multiple CSV files into pandas and concatenate into one DataFrame

I would like to read several CSV files from a directory into pandas and concatenate them into one big DataFrame. I have not been able to figure it out though. Here is what I have so far: import glob import pandas as pd # Get data file names path =…

python pandas csv dataframe concatenation

asked Jan 03 '14 at 15:00

jonas

13,559
22
57
75

720

votes

15 answers

How to apply a function to two columns of Pandas dataframe

Suppose I have a df which has columns of 'ID', 'col_1', 'col_2'. And I define a function : f = lambda x, y : my_function_expression. Now I want to apply the f to df's two columns 'col_1', 'col_2' to element-wise calculate a new column 'col_3' ,…

python pandas dataframe

asked Nov 11 '12 at 13:48

bigbug

55,954
42
77
96

718

votes

6 answers

How to avoid pandas creating an index in a saved csv

I am trying to save a csv to a folder after making some edits to the file. Every time I use pd.to_csv('C:/Path of file.csv') the csv file has a separate column of indexes. I want to avoid printing the index to csv. I tried: pd.read_csv('C:/Path to…

python csv indexing pandas

asked Dec 30 '13 at 18:24

Alexis

8,531
5
19
21

697

votes

11 answers

Difference between map, applymap and apply methods in Pandas

Can you tell me when to use these vectorization methods with basic examples? I see that map is a Series method whereas the rest are DataFrame methods. I got confused about apply and applymap methods though. Why do we have two methods for applying a…

python pandas dataframe vectorization

asked Nov 05 '13 at 20:20

marillion

10,618
19
48
63

697

votes

19 answers

How can I get a value from a cell of a dataframe?

I have constructed a condition that extracts exactly one row from my dataframe: d2 = df[(df['l_ext']==l_ext) & (df['item']==item) & (df['wn']==wn) & (df['wd']==1)] Now I would like to take a value from a particular column: val = d2['col_name'] But…

python pandas dataframe

asked May 24 '13 at 07:17

Roman

124,451
167
349
456

693

votes

28 answers

How to check if any value is NaN in a Pandas DataFrame

In Python Pandas, what's the best way to check whether a DataFrame has one (or more) NaN values? I know about the function pd.isnan, but this returns a DataFrame of booleans for each element. This post right here doesn't exactly answer my question…

python pandas dataframe nan

asked Apr 09 '15 at 05:09

hlin117

20,764
31
72
93

691

votes

16 answers

Convert pandas dataframe to NumPy array

How do I convert a pandas dataframe into a NumPy array? DataFrame: import numpy as np import pandas as pd index = [1, 2, 3, 4, 5, 6, 7] a = [np.nan, np.nan, np.nan, 0.1, 0.1, 0.1, 0.1] b = [0.2, np.nan, 0.2, 0.2, 0.2, np.nan, np.nan] c = [np.nan,…

python arrays pandas numpy dataframe

asked Nov 02 '12 at 00:57

Mister Nobody

6,927
3
13
3

683

votes

25 answers

UnicodeDecodeError when reading CSV file in Pandas

I'm running a program which is processing 30,000 similar files. A random number of them are stopping and producing this error... File "C:\Importer\src\dfman\importer.py", line 26, in import_chr data = pd.read_csv(filepath, names=fields) File…

python pandas csv dataframe unicode

asked Aug 11 '13 at 12:06

TravisVOX

20,342
13
37
41

673

votes

12 answers

Converting a Pandas GroupBy output from Series to DataFrame

I'm starting with input data like this df1 = pandas.DataFrame( { "Name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"] , "City" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland"] } ) Which when printed…

python pandas dataframe pandas-groupby multi-index

asked Apr 29 '12 at 16:10

saveenr

8,439
3
19
20

665

votes

49 answers

Python Pandas Error tokenizing data

I'm trying to use pandas to manipulate a .csv file but I get this error: pandas.parser.CParserError: Error tokenizing data. C error: Expected 2 fields in line 3, saw 12 I have tried to read the pandas docs, but found nothing. My code is…

python csv pandas

asked Aug 04 '13 at 01:54

abuteau

6,963
4
16
20

649

votes

6 answers

How to delete rows from a pandas DataFrame based on a conditional expression

I have a pandas DataFrame and I want to delete rows from it where the length of the string in a particular column is greater than 2. I expect to be able to do this (per this answer): df[(len(df['column name']) < 2)] but I just get the…

python pandas

asked Dec 13 '12 at 01:28

sjs

8,830
3
19
19

637

votes

5 answers

How to check whether a pandas DataFrame is empty?

How to check whether a pandas DataFrame is empty? In my case I want to print some message in terminal if the DataFrame is empty.

python pandas dataframe

asked Nov 07 '13 at 05:45

Nilani Algiriyage

32,876
32
87
121

Prev 1 2

…

99 100 Next