Questions tagged [pandas]

Pandas is a Python library for data manipulation and analysis, e.g. dataframes, multidimensional time series and cross-sectional datasets commonly found in statistics, experimental science results, econometrics, or finance. Pandas is one of the main data science libraries in Python.

Pandas is a Python library for PAN-el DA-ta manipulation and analysis, e.g. multidimensional time series and cross-sectional data sets commonly found in statistics, experimental science results, econometrics, or finance. pandas is implemented primarily using NumPy and Cython; it is intended to be able to integrate very easily with NumPy-based scientific libraries, such as statsmodels.

To create a reproducible Pandas example:

Main Features:

  • Data structures: for one- and two-dimensional labeled datasets (respectively Series and DataFrames). Some of their main features include:
    • Automatically aligning data and interpolation
    • Handling missing observations in calculations
    • Convenient slicing and reshaping ("reindexing") functions
    • Categorical data types
    • Provide 'group by' aggregation or transformation functionality
    • Tools for merging and joining together data sets
    • Simple Matplotlib integration for plotting and graphing
    • Multi-Indexing providing structure to indices that allow for representation of an arbitrary number of dimensions.
  • Date tools: objects for expressing date offsets or generating date ranges. Dates can be aligned to a specific time zone and converted or compared at will
  • Statistical models: convenient ordinary least squares and panel OLS implementations for in-sample or rolling time series and cross-sectional regressions. These will hopefully be the starting point for implementing models
  • Intelligent Cython offloading; complex computations are performed rapidly due to these optimizations.
  • Static and moving statistical tools: mean, standard deviation, correlation, and covariance
  • Rich User Documentation, using Sphinx

Asking Questions:

  • Before asking the question, make sure you have gone through the 10 Minutes to pandas introduction. It covers all the basic functionality of Pandas.
  • See this question on asking good questions: How to make good reproducible pandas examples
  • Please provide the version of Pandas, NumPy, and platform details (if appropriate) in your questions

Answering Questions:

Useful Canonicals:

More FAQs are at this link.

Resources and Tutorials:

Books:

282843 questions
778
votes
11 answers

Get statistics for each group (such as count, mean, etc) using pandas GroupBy?

I have a dataframe df and I use several columns from it to groupby: df['col1','col2','col3','col4'].groupby(['col1','col2']).mean() In the above way, I almost get the table (dataframe) that I need. What is missing is an additional column that…
Roman
  • 124,451
  • 167
  • 349
  • 456
772
votes
24 answers

Set value for particular cell in pandas DataFrame using index

I have created a Pandas DataFrame df = DataFrame(index=['A','B','C'], columns=['x','y']) and have got this x y A NaN NaN B NaN NaN C NaN NaN Now, I would like to assign a value to particular cell, for example to row C and column x. I…
Mitkp
  • 7,800
  • 3
  • 14
  • 8
766
votes
23 answers

Constructing pandas DataFrame from values in variables gives "ValueError: If using all scalar values, you must pass an index"

This may be a simple question, but I can not figure out how to do this. Lets say that I have two variables as follows. a = 2 b = 3 I want to construct a DataFrame from this: df2 = pd.DataFrame({'A':a,'B':b}) This generates an error: ValueError:…
Nilani Algiriyage
  • 32,876
  • 32
  • 87
  • 121
751
votes
20 answers

Import multiple CSV files into pandas and concatenate into one DataFrame

I would like to read several CSV files from a directory into pandas and concatenate them into one big DataFrame. I have not been able to figure it out though. Here is what I have so far: import glob import pandas as pd # Get data file names path =…
jonas
  • 13,559
  • 22
  • 57
  • 75
720
votes
15 answers

How to apply a function to two columns of Pandas dataframe

Suppose I have a df which has columns of 'ID', 'col_1', 'col_2'. And I define a function : f = lambda x, y : my_function_expression. Now I want to apply the f to df's two columns 'col_1', 'col_2' to element-wise calculate a new column 'col_3' ,…
bigbug
  • 55,954
  • 42
  • 77
  • 96
718
votes
6 answers

How to avoid pandas creating an index in a saved csv

I am trying to save a csv to a folder after making some edits to the file. Every time I use pd.to_csv('C:/Path of file.csv') the csv file has a separate column of indexes. I want to avoid printing the index to csv. I tried: pd.read_csv('C:/Path to…
Alexis
  • 8,531
  • 5
  • 19
  • 21
697
votes
11 answers

Difference between map, applymap and apply methods in Pandas

Can you tell me when to use these vectorization methods with basic examples? I see that map is a Series method whereas the rest are DataFrame methods. I got confused about apply and applymap methods though. Why do we have two methods for applying a…
marillion
  • 10,618
  • 19
  • 48
  • 63
697
votes
19 answers

How can I get a value from a cell of a dataframe?

I have constructed a condition that extracts exactly one row from my dataframe: d2 = df[(df['l_ext']==l_ext) & (df['item']==item) & (df['wn']==wn) & (df['wd']==1)] Now I would like to take a value from a particular column: val = d2['col_name'] But…
Roman
  • 124,451
  • 167
  • 349
  • 456
693
votes
28 answers

How to check if any value is NaN in a Pandas DataFrame

In Python Pandas, what's the best way to check whether a DataFrame has one (or more) NaN values? I know about the function pd.isnan, but this returns a DataFrame of booleans for each element. This post right here doesn't exactly answer my question…
hlin117
  • 20,764
  • 31
  • 72
  • 93
691
votes
16 answers

Convert pandas dataframe to NumPy array

How do I convert a pandas dataframe into a NumPy array? DataFrame: import numpy as np import pandas as pd index = [1, 2, 3, 4, 5, 6, 7] a = [np.nan, np.nan, np.nan, 0.1, 0.1, 0.1, 0.1] b = [0.2, np.nan, 0.2, 0.2, 0.2, np.nan, np.nan] c = [np.nan,…
Mister Nobody
  • 6,927
  • 3
  • 13
  • 3
683
votes
25 answers

UnicodeDecodeError when reading CSV file in Pandas

I'm running a program which is processing 30,000 similar files. A random number of them are stopping and producing this error... File "C:\Importer\src\dfman\importer.py", line 26, in import_chr data = pd.read_csv(filepath, names=fields) File…
TravisVOX
  • 20,342
  • 13
  • 37
  • 41
673
votes
12 answers

Converting a Pandas GroupBy output from Series to DataFrame

I'm starting with input data like this df1 = pandas.DataFrame( { "Name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"] , "City" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland"] } ) Which when printed…
saveenr
  • 8,439
  • 3
  • 19
  • 20
665
votes
49 answers

Python Pandas Error tokenizing data

I'm trying to use pandas to manipulate a .csv file but I get this error: pandas.parser.CParserError: Error tokenizing data. C error: Expected 2 fields in line 3, saw 12 I have tried to read the pandas docs, but found nothing. My code is…
abuteau
  • 6,963
  • 4
  • 16
  • 20
649
votes
6 answers

How to delete rows from a pandas DataFrame based on a conditional expression

I have a pandas DataFrame and I want to delete rows from it where the length of the string in a particular column is greater than 2. I expect to be able to do this (per this answer): df[(len(df['column name']) < 2)] but I just get the…
sjs
  • 8,830
  • 3
  • 19
  • 19
637
votes
5 answers

How to check whether a pandas DataFrame is empty?

How to check whether a pandas DataFrame is empty? In my case I want to print some message in terminal if the DataFrame is empty.
Nilani Algiriyage
  • 32,876
  • 32
  • 87
  • 121