Questions tagged [dataframe]

A data frame is a 2D tabular data structure. Usually, it contains data where rows are observations and columns are variables and are allowed to be of different types (as distinct from an array or matrix). While "data frame" or "dataframe" is the term used for this concept in several languages (R, Apache Spark, deedle, Maple, the pandas library in Python and the DataFrames library in Julia), "table" is the term used in MATLAB and SQL.

A data frame is a tabular data structure. Usually, it contains data where rows are observations and columns are variables of various types. While data frame or dataframe is the term used for this concept in several languages (R, Apache Spark, deedle, Maple, the pandas library in Python and the DataFrames library in Julia), table is the term used in MATLAB and SQL.

The sections below correspond to each language that uses this term and are aimed at the level of an audience only familiar with the given language.

`data.frame` in R

Data frames (object class data.frame) are one of the basic tabular data structures in the R language, alongside matrices. Unlike matrices, each column can be a different data type. In terms of implementation, a data frame is a list of equal-length column vectors.

Type ?data.frame for help constructing a data frame. An example:

data.frame(
  x = letters[1:5], 
  y = 1:5, 
  z = (1:5) > 3
)
#   x y     z
# 1 a 1 FALSE
# 2 b 2 FALSE
# 3 c 3 FALSE
# 4 d 4  TRUE
# 5 e 5  TRUE

Related functions include is.data.frame, which tests whether an object is a data.frame; and as.data.frame, which coerces many other data structures to data.frame (through S3 dispatch, see ?S3). base r data.frames have been extended or modified to create new data structures by several R packages, including data.table and tibble. For further reading, see the paragraph on Data frames in the CRAN manual Intro to R

DataFrame in Python's pandas library

The pandas library in Python is the canonical tabular data framework on the SciPy stack, and the DataFrame is its two-dimensional data object. It is basically a rectangular array like a 2D numpy ndarray, but with associated indices on each axis which can be used for alignment. As in R, from an implementation perspective, columns are somewhat prioritized over rows: the DataFrame resembles a dictionary with column names as keys and Series (pandas' one-dimensional data structure) as values. The DataFrame object in pandas.

After importing numpy and pandas under the usual aliases (import numpy as np, import pandas as pd), we can construct a DataFrame in several ways, such as passing a dictionary of column names and values:

>>> pd.DataFrame({"x": list("abcde"), "y": range(1,6), "z": np.arange(1,6) > 3})
   x  y      z
0  a  1  False
1  b  2  False
2  c  3  False
3  d  4   True
4  e  5   True

DataFrame in Apache Spark

A Spark DataFrame is a distributed collection of data organized into named columns. It is conceptually equivalent to a table in a relational database or a data frame in R/Python, but with richer optimizations under the hood. DataFrames can be constructed from a wide array of sources such as: structured data files, tables in Hive, external databases, or existing RDDs. (source)

DataFrame in Maple

A DataFrame is one of the basic data structures in Maple. Data frames are a list of variables, known as DataSeries, which are displayed in a rectangular grid. Every column (variable) in a DataFrame has the same length, however, each variable can have a different type, such as integer, float, string, name, boolean, etc.

When printed, Data frames resemble matrices in that they are viewed as a rectangular grid, but a key difference is that the first row corresponds to the column (variable) names, and the first column corresponds to the row (individual) names. These row and columns are treated as header meta-information and are not a part of the data. Moreover, the data stored in a DataFrame can be accessed using these header names, as well as by the standard numbered index. For more details, see the Guide to DataFrames in the online Maple Programming Help.

143674 questions

451

votes

13 answers

How to reversibly store and load a Pandas dataframe to/from disk

Right now I'm importing a fairly large CSV as a dataframe every time I run the script. Is there a good solution for keeping that dataframe constantly available in between runs so I don't have to spend all that time waiting for the script to run?

python pandas dataframe

asked Jun 13 '13 at 23:05

jeffstern

4,706
4
16
10

449

votes

7 answers

Remove pandas rows with duplicate indices

How to remove rows with duplicate index values? In the weather DataFrame below, sometimes a scientist goes back and corrects observations -- not by editing the erroneous rows, but by appending a duplicate row to the end of a file. I'm reading some…

python pandas dataframe duplicates

asked Oct 23 '12 at 17:11

Paul H

65,268
20
159
136

447

votes

7 answers

How to add pandas data to an existing csv file?

I want to know if it is possible to use the pandas to_csv() function to add a dataframe to an existing csv file. The csv file has the same structure as the loaded data.

python pandas csv dataframe

asked Jul 08 '13 at 15:33

Ayoub Ennassiri

4,606
3
13
9

445

votes

10 answers

Extracting specific columns from a data frame

I have an R data frame with 6 columns, and I want to create a new dataframe that only has three of the columns. Assuming my data frame is df, and I want to extract columns A, B, and E, this is the only command I can figure out: …

r dataframe r-faq

asked Apr 10 '12 at 02:24

Aren Cambre

6,540
9
30
36

431

votes

13 answers

Sample random rows in dataframe

I am struggling to find the appropriate function that would return a specified number of rows picked up randomly without replacement from a data frame in R language? Can anyone help me out?

r dataframe random r-faq

asked Nov 25 '11 at 19:08

nikhil

9,023
22
55
81

423

votes

10 answers

Creating a Pandas DataFrame from a Numpy array: How do I specify the index column and column headers?

I have a Numpy array consisting of a list of lists, representing a two-dimensional array with row labels and column names as shown below: data = np.array([['','Col1','Col2'],['Row1',1,2],['Row2',3,4]]) I'd like the resulting DataFrame to have Row1…

python pandas dataframe list numpy

asked Dec 24 '13 at 15:09

user3132783

5,275
3
15
7

417

votes

15 answers

pandas: filter rows of DataFrame with operator chaining

Most operations in pandas can be accomplished with operator chaining (groupby, aggregate, apply, etc), but the only way I've found to filter rows is via normal bracket indexing df_filtered = df[df['column'] == value] This is unappealing as it…

python pandas dataframe

asked Aug 08 '12 at 17:25

duckworthd

14,679
16
53
68

412

votes

12 answers

Convert a Pandas DataFrame to a dictionary

I have a DataFrame with four columns. I want to convert this DataFrame to a python dictionary. I want the elements of first column be keys and the elements of other columns in the same row be values. DataFrame: ID A B C 0 p 1 3 …

python pandas dictionary dataframe

asked Nov 03 '14 at 14:47

Prince Bhatti

4,671
4
18
24

411

votes

10 answers

How to get/set a pandas index column title or name?

How do I get the index column name in Python's pandas? Here's an example dataframe: Column 1 Index Title Apples 1 Oranges 2 Puppies 3 Ducks 4 What I'm trying to do is…

python pandas dataframe

asked Aug 02 '13 at 17:30

Radical Edward

5,234
5
21
33

406

votes

17 answers

pandas get rows which are NOT in other dataframe

I've two pandas data frames that have some rows in common. Suppose dataframe2 is a subset of dataframe1. How can I get the rows of dataframe1 which are not in dataframe2? df1 = pandas.DataFrame(data = {'col1' : [1, 2, 3, 4, 5], 'col2' : [10, 11, 12,…

python pandas dataframe

asked Mar 06 '15 at 15:10

think nice things

4,315
3
14
12

406

votes

27 answers

What does axis in pandas mean?

Here is my code to generate a dataframe: import pandas as pd import numpy as np dff = pd.DataFrame(np.random.randn(1,2),columns=list('AB')) then I got the dataframe: +------------+---------+--------+ | | A | B …

python pandas numpy dataframe

asked Mar 03 '14 at 14:41

jerry_sjtu

5,216
8
29
42

400

votes

9 answers

Combining two Series into a DataFrame in pandas

I have two Series s1 and s2 with the same (non-consecutive) indices. How do I combine s1 and s2 to being two columns in a DataFrame and keep one of the indices as a third column?

python pandas series dataframe

asked Aug 05 '13 at 15:37

user7289

32,560
28
71
88

398

votes

12 answers

What is the most efficient way to loop through dataframes with pandas?

I want to perform my own complex operations on financial data in dataframes in a sequential manner. For example I am using the following MSFT CSV file taken from Yahoo Finance: Date,Open,High,Low,Close,Volume,Adj…

python pandas performance dataframe for-loop

asked Oct 20 '11 at 14:46

Muppet

5,767
6
29
39

398

votes

18 answers

Convert data.frame columns from factors to characters

I have a data frame. Let's call him bob: > head(bob) phenotype exclusion GSM399350 3- 4- 8- 25- 44+ 11b- 11c- 19- NK1.1- Gr1- TER119- GSM399351 3- 4- 8- 25- 44+ 11b- 11c- 19- NK1.1- Gr1- TER119- GSM399352 3-…

r dataframe

asked May 17 '10 at 16:52

Mike Dewar

10,945
14
49
65

397

votes

13 answers

Using Pandas to pd.read_excel() for multiple worksheets of the same workbook

I have a large spreadsheet file (.xlsx) that I'm processing using python pandas. It happens that I need data from two tabs (sheets) in that large file. One of the tabs has a ton of data and the other is just a few square cells. When I use…

python excel pandas dataframe xlsx

asked Oct 23 '14 at 04:21

HaPsantran

5,581
6
24
39

Prev 1 2 3

…

99 100 Next

Questions tagged [dataframe]

data.frame in R

DataFrame in Python's pandas library

DataFrame in Apache Spark

DataFrame in Maple

`data.frame` in R