Questions tagged [pandas]

Pandas is a Python library for data manipulation and analysis, e.g. dataframes, multidimensional time series and cross-sectional datasets commonly found in statistics, experimental science results, econometrics, or finance. Pandas is one of the main data science libraries in Python.

Pandas is a Python library for PAN-el DA-ta manipulation and analysis, e.g. multidimensional time series and cross-sectional data sets commonly found in statistics, experimental science results, econometrics, or finance. pandas is implemented primarily using NumPy and Cython; it is intended to be able to integrate very easily with NumPy-based scientific libraries, such as statsmodels.

To create a reproducible Pandas example:

Main Features:

  • Data structures: for one- and two-dimensional labeled datasets (respectively Series and DataFrames). Some of their main features include:
    • Automatically aligning data and interpolation
    • Handling missing observations in calculations
    • Convenient slicing and reshaping ("reindexing") functions
    • Categorical data types
    • Provide 'group by' aggregation or transformation functionality
    • Tools for merging and joining together data sets
    • Simple Matplotlib integration for plotting and graphing
    • Multi-Indexing providing structure to indices that allow for representation of an arbitrary number of dimensions.
  • Date tools: objects for expressing date offsets or generating date ranges. Dates can be aligned to a specific time zone and converted or compared at will
  • Statistical models: convenient ordinary least squares and panel OLS implementations for in-sample or rolling time series and cross-sectional regressions. These will hopefully be the starting point for implementing models
  • Intelligent Cython offloading; complex computations are performed rapidly due to these optimizations.
  • Static and moving statistical tools: mean, standard deviation, correlation, and covariance
  • Rich User Documentation, using Sphinx

Asking Questions:

  • Before asking the question, make sure you have gone through the 10 Minutes to pandas introduction. It covers all the basic functionality of Pandas.
  • See this question on asking good questions: How to make good reproducible pandas examples
  • Please provide the version of Pandas, NumPy, and platform details (if appropriate) in your questions

Answering Questions:

Useful Canonicals:

More FAQs are at this link.

Resources and Tutorials:

Books:

282843 questions
467
votes
6 answers

Convert DataFrame column type from string to datetime

How can I convert a DataFrame column of strings (in dd/mm/yyyy format) to datetime dtype?
perigee
  • 9,438
  • 11
  • 31
  • 35
466
votes
7 answers

Convert Pandas Column to DateTime

I have one field in a pandas DataFrame that was imported as string format. It should be a datetime variable. How do I convert it to a datetime column, and then filter based on date? Example: raw_data = pd.DataFrame({'Mycol':…
Chris
  • 12,900
  • 12
  • 43
  • 65
460
votes
24 answers

Normalize columns of a dataframe

I have a dataframe in pandas where each column has different value range. For example: df: A B C 1000 10 0.5 765 5 0.35 800 7 0.09 Any idea how I can normalize the columns of this dataframe where each value is between 0 and 1? My…
ahajib
  • 12,838
  • 29
  • 79
  • 120
458
votes
11 answers

How can I display full (non-truncated) dataframe information in HTML when converting from Pandas dataframe to HTML?

I converted a Pandas dataframe to an HTML output using the DataFrame.to_html function. When I save this to a separate HTML file, the file shows truncated output. For example, in my TEXT column, df.head(1) will show The film was an excellent…
Amy
  • 4,693
  • 3
  • 12
  • 7
454
votes
14 answers

Converting between datetime, Timestamp and datetime64

How do I convert a numpy.datetime64 object to a datetime.datetime (or Timestamp)? In the following code, I create a datetime, timestamp and datetime64 objects. import datetime import numpy as np import pandas as pd dt = datetime.datetime(2012, 5,…
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
451
votes
13 answers

How to reversibly store and load a Pandas dataframe to/from disk

Right now I'm importing a fairly large CSV as a dataframe every time I run the script. Is there a good solution for keeping that dataframe constantly available in between runs so I don't have to spend all that time waiting for the script to run?
jeffstern
  • 4,706
  • 4
  • 16
  • 10
449
votes
7 answers

Remove pandas rows with duplicate indices

How to remove rows with duplicate index values? In the weather DataFrame below, sometimes a scientist goes back and corrects observations -- not by editing the erroneous rows, but by appending a duplicate row to the end of a file. I'm reading some…
Paul H
  • 65,268
  • 20
  • 159
  • 136
447
votes
7 answers

How to add pandas data to an existing csv file?

I want to know if it is possible to use the pandas to_csv() function to add a dataframe to an existing csv file. The csv file has the same structure as the loaded data.
Ayoub Ennassiri
  • 4,606
  • 3
  • 13
  • 9
423
votes
10 answers

Creating a Pandas DataFrame from a Numpy array: How do I specify the index column and column headers?

I have a Numpy array consisting of a list of lists, representing a two-dimensional array with row labels and column names as shown below: data = np.array([['','Col1','Col2'],['Row1',1,2],['Row2',3,4]]) I'd like the resulting DataFrame to have Row1…
user3132783
  • 5,275
  • 3
  • 15
  • 7
417
votes
15 answers

pandas: filter rows of DataFrame with operator chaining

Most operations in pandas can be accomplished with operator chaining (groupby, aggregate, apply, etc), but the only way I've found to filter rows is via normal bracket indexing df_filtered = df[df['column'] == value] This is unappealing as it…
duckworthd
  • 14,679
  • 16
  • 53
  • 68
414
votes
10 answers

How to invert the x or y axis

I have a scatter plot graph with a bunch of random x, y coordinates. Currently the Y-Axis starts at 0 and goes up to the max value. I would like the Y-Axis to start at the max value and go up to 0. points = [(10,5), (5,11), (24,13), (7,8)] x_arr…
DarkAnt
  • 4,163
  • 2
  • 17
  • 6
412
votes
12 answers

Convert a Pandas DataFrame to a dictionary

I have a DataFrame with four columns. I want to convert this DataFrame to a python dictionary. I want the elements of first column be keys and the elements of other columns in the same row be values. DataFrame: ID A B C 0 p 1 3 …
Prince Bhatti
  • 4,671
  • 4
  • 18
  • 24
411
votes
10 answers

How to get/set a pandas index column title or name?

How do I get the index column name in Python's pandas? Here's an example dataframe: Column 1 Index Title Apples 1 Oranges 2 Puppies 3 Ducks 4 What I'm trying to do is…
Radical Edward
  • 5,234
  • 5
  • 21
  • 33
406
votes
17 answers

pandas get rows which are NOT in other dataframe

I've two pandas data frames that have some rows in common. Suppose dataframe2 is a subset of dataframe1. How can I get the rows of dataframe1 which are not in dataframe2? df1 = pandas.DataFrame(data = {'col1' : [1, 2, 3, 4, 5], 'col2' : [10, 11, 12,…
think nice things
  • 4,315
  • 3
  • 14
  • 12
406
votes
27 answers

What does axis in pandas mean?

Here is my code to generate a dataframe: import pandas as pd import numpy as np dff = pd.DataFrame(np.random.randn(1,2),columns=list('AB')) then I got the dataframe: +------------+---------+--------+ | | A | B …
jerry_sjtu
  • 5,216
  • 8
  • 29
  • 42