Questions tagged [pandas]

Pandas is a Python library for data manipulation and analysis, e.g. dataframes, multidimensional time series and cross-sectional datasets commonly found in statistics, experimental science results, econometrics, or finance. Pandas is one of the main data science libraries in Python.

Pandas is a Python library for PAN-el DA-ta manipulation and analysis, e.g. multidimensional time series and cross-sectional data sets commonly found in statistics, experimental science results, econometrics, or finance. pandas is implemented primarily using NumPy and Cython; it is intended to be able to integrate very easily with NumPy-based scientific libraries, such as statsmodels.

To create a reproducible Pandas example:

Main Features:

  • Data structures: for one- and two-dimensional labeled datasets (respectively Series and DataFrames). Some of their main features include:
    • Automatically aligning data and interpolation
    • Handling missing observations in calculations
    • Convenient slicing and reshaping ("reindexing") functions
    • Categorical data types
    • Provide 'group by' aggregation or transformation functionality
    • Tools for merging and joining together data sets
    • Simple Matplotlib integration for plotting and graphing
    • Multi-Indexing providing structure to indices that allow for representation of an arbitrary number of dimensions.
  • Date tools: objects for expressing date offsets or generating date ranges. Dates can be aligned to a specific time zone and converted or compared at will
  • Statistical models: convenient ordinary least squares and panel OLS implementations for in-sample or rolling time series and cross-sectional regressions. These will hopefully be the starting point for implementing models
  • Intelligent Cython offloading; complex computations are performed rapidly due to these optimizations.
  • Static and moving statistical tools: mean, standard deviation, correlation, and covariance
  • Rich User Documentation, using Sphinx

Asking Questions:

  • Before asking the question, make sure you have gone through the 10 Minutes to pandas introduction. It covers all the basic functionality of Pandas.
  • See this question on asking good questions: How to make good reproducible pandas examples
  • Please provide the version of Pandas, NumPy, and platform details (if appropriate) in your questions

Answering Questions:

Useful Canonicals:

More FAQs are at this link.

Resources and Tutorials:

Books:

282843 questions
47
votes
5 answers

Pandas Dataframe Find Rows Where all Columns Equal

I have a dataframe that has characters in it - I want a boolean result by row that tells me if all columns for that row have the same value. For example, I have df = [ a b c d 0 'C' 'C' 'C' 'C' 1 'C' 'C' 'A' 'A' 2 'A' 'A'…
Lisa L
  • 473
  • 1
  • 4
  • 6
47
votes
2 answers

pandas - how to access cell in pandas, equivalent of df[3,4] in R

If I have a pandas DataFrame object, how do I simply access a cell? In R, assuming my data.frame is called df, I can access the 3rd row and 4th column by df[3,4] What is the equivalent in python?
bill999
  • 2,147
  • 8
  • 51
  • 103
47
votes
8 answers

Pandas dataframe hide index functionality?

Is it possible to hide the index when displaying pandas DataFrames, so that only the column names appear at the top of the table? This would need to work for both the html representation in ipython notebook and to_latex() function (which I'm using…
J Grif
  • 1,003
  • 2
  • 12
  • 16
47
votes
7 answers

Pandas: Get duplicated indexes

Given a dataframe, I want to get the duplicated indexes, which do not have duplicate values in the columns, and see which values are different. Specifically, I have this dataframe: import pandas as pd wget…
Olga Botvinnik
  • 1,564
  • 1
  • 14
  • 32
47
votes
3 answers

Unpivot Pandas Data

I currently have a DataFrame laid out as: Jan Feb Mar Apr ... 2001 1 12 12 19 2002 9 ... 2003 ... and I would like to "unpivot" the data to look like: Date Value Jan 2001 1 Feb 2001 1 Mar 2001 12 ... Jan 2002 …
Alex Rothberg
  • 10,243
  • 13
  • 60
  • 120
47
votes
11 answers

Pandas ParserError EOF character when reading multiple csv files to HDF5

Using Python3, Pandas 0.12 I'm trying to write multiple csv files (total size is 7.9 GB) to a HDF5 store to process later onwards. The csv files contain around a million of rows each, 15 columns and data types are mostly strings, but some floats.…
Matthijs
  • 779
  • 1
  • 8
  • 19
47
votes
5 answers

Get column name where value is something in pandas dataframe

I'm trying to find, at each timestamp, the column name in a dataframe for which the value matches with the one in a timeseries at the same timestamp. Here is my dataframe: >>> df col5 col4 col3 col2 …
leroygr
  • 2,349
  • 4
  • 18
  • 18
47
votes
3 answers

Python pandas, Plotting options for multiple lines

I want to plot multiple lines from a pandas dataframe and setting different options for each line. I would like to do something…
Joerg
  • 669
  • 1
  • 6
  • 10
47
votes
4 answers

Filter out groups with a length equal to one

I am creating a groupby object from a Pandas DataFrame and want to select out all the groups with > 1 size. Example: A B 0 foo 0 1 bar 1 2 foo 2 3 foo 3 The following doesn't seem to work: grouped =…
Abhi
  • 6,075
  • 10
  • 41
  • 55
47
votes
12 answers

Creating dummy variables in pandas for python

I'm trying to create a series of dummy variables from a categorical variable using pandas in python. I've come across the get_dummies function, but whenever I try to call it I receive an error that the name is not defined. Any thoughts or other…
user1074057
  • 1,772
  • 5
  • 20
  • 30
46
votes
5 answers

pandas convert from datetime to integer timestamp

Considering a pandas dataframe in python having a column named time of type integer, I can convert it to a datetime format with the following instruction. df['time'] = pandas.to_datetime(df['time'], unit='s') so now the column has entries like:…
roschach
  • 8,390
  • 14
  • 74
  • 124
46
votes
2 answers

Pandas filter data frame rows by function

I want to filter a dataframe by a more complex function based on different values in the row. Is there a possibility to filter DF rows by a boolean function like you can do it e.g. in ES6 filter function? Extreme simplified example to illustrate the…
Karl Adler
  • 15,780
  • 10
  • 70
  • 88
46
votes
3 answers

pandas, melt, unmelt preserve index

I've got a table of clients (coper) and asset allocation (asset) A = [[1,2],[3,4],[5,6]] idx = ['coper1','coper2','coper3'] cols = ['asset1','asset2'] df = pd.DataFrame(A,index = idx, columns = cols) so my data look like asset1 …
Mohammad Athar
  • 1,953
  • 1
  • 15
  • 31
46
votes
2 answers

Pandas add column with value based on condition based on other columns

I have the following pandas dataframe: import pandas as pd import numpy as np d = {'age' : [21, 45, 45, 5], 'salary' : [20, 40, 10, 100]} df = pd.DataFrame(d) and would like to add an extra column called "is_rich" which captures if a person…
Rutger Hofste
  • 4,073
  • 3
  • 33
  • 44
46
votes
5 answers

Read JSON to pandas dataframe - ValueError: Mixing dicts with non-Series may lead to ambiguous ordering

I am trying to read in the JSON structure below into pandas dataframe, but it throws out the error message: ValueError: Mixing dicts with non-Series may lead to ambiguous ordering. Json data: { "status": { "statuscode": 200, …
userPyGeo
  • 3,631
  • 4
  • 14
  • 24
1 2 3
99
100