Questions tagged [dataframe]

A data frame is a 2D tabular data structure. Usually, it contains data where rows are observations and columns are variables and are allowed to be of different types (as distinct from an array or matrix). While "data frame" or "dataframe" is the term used for this concept in several languages (R, Apache Spark, deedle, Maple, the pandas library in Python and the DataFrames library in Julia), "table" is the term used in MATLAB and SQL.

A data frame is a tabular data structure. Usually, it contains data where rows are observations and columns are variables of various types. While data frame or dataframe is the term used for this concept in several languages (R, Apache Spark, deedle, Maple, the pandas library in Python and the DataFrames library in Julia), table is the term used in MATLAB and SQL.

The sections below correspond to each language that uses this term and are aimed at the level of an audience only familiar with the given language.

`data.frame` in R

Data frames (object class data.frame) are one of the basic tabular data structures in the R language, alongside matrices. Unlike matrices, each column can be a different data type. In terms of implementation, a data frame is a list of equal-length column vectors.

Type ?data.frame for help constructing a data frame. An example:

data.frame(
  x = letters[1:5], 
  y = 1:5, 
  z = (1:5) > 3
)
#   x y     z
# 1 a 1 FALSE
# 2 b 2 FALSE
# 3 c 3 FALSE
# 4 d 4  TRUE
# 5 e 5  TRUE

Related functions include is.data.frame, which tests whether an object is a data.frame; and as.data.frame, which coerces many other data structures to data.frame (through S3 dispatch, see ?S3). base r data.frames have been extended or modified to create new data structures by several R packages, including data.table and tibble. For further reading, see the paragraph on Data frames in the CRAN manual Intro to R

DataFrame in Python's pandas library

The pandas library in Python is the canonical tabular data framework on the SciPy stack, and the DataFrame is its two-dimensional data object. It is basically a rectangular array like a 2D numpy ndarray, but with associated indices on each axis which can be used for alignment. As in R, from an implementation perspective, columns are somewhat prioritized over rows: the DataFrame resembles a dictionary with column names as keys and Series (pandas' one-dimensional data structure) as values. The DataFrame object in pandas.

After importing numpy and pandas under the usual aliases (import numpy as np, import pandas as pd), we can construct a DataFrame in several ways, such as passing a dictionary of column names and values:

>>> pd.DataFrame({"x": list("abcde"), "y": range(1,6), "z": np.arange(1,6) > 3})
   x  y      z
0  a  1  False
1  b  2  False
2  c  3  False
3  d  4   True
4  e  5   True

DataFrame in Apache Spark

A Spark DataFrame is a distributed collection of data organized into named columns. It is conceptually equivalent to a table in a relational database or a data frame in R/Python, but with richer optimizations under the hood. DataFrames can be constructed from a wide array of sources such as: structured data files, tables in Hive, external databases, or existing RDDs. (source)

DataFrame in Maple

A DataFrame is one of the basic data structures in Maple. Data frames are a list of variables, known as DataSeries, which are displayed in a rectangular grid. Every column (variable) in a DataFrame has the same length, however, each variable can have a different type, such as integer, float, string, name, boolean, etc.

When printed, Data frames resemble matrices in that they are viewed as a rectangular grid, but a key difference is that the first row corresponds to the column (variable) names, and the first column corresponds to the row (individual) names. These row and columns are treated as header meta-information and are not a part of the data. Moreover, the data stored in a DataFrame can be accessed using these header names, as well as by the standard numbered index. For more details, see the Guide to DataFrames in the online Maple Programming Help.

143674 questions

392

votes

10 answers

Get column index from column name in python pandas

In R when you need to retrieve a column index based on the name of the column you could do idx <- which(names(my_data)==my_colum_name) Is there a way to do the same with pandas dataframes?

python pandas dataframe indexing

asked Oct 22 '12 at 23:48

ak3nat0n

6,060
6
36
59

391

votes

13 answers

Select DataFrame rows between two dates

I am creating a DataFrame from a csv as follows: stock = pd.read_csv('data_in/' + filename + '.csv', skipinitialspace=True) The DataFrame has a date column. Is there a way to create a new DataFrame (or just overwrite the existing one) which only…

python pandas dataframe date

asked Mar 31 '15 at 13:38

darkpool

13,822
16
54
89

389

votes

12 answers

How does one reorder columns in a data frame?

How would one change this input (with the sequence: time, in, out, files): Time In Out Files 1 2 3 4 2 3 4 5 To this output (with the sequence: time, out, in, files)? Time Out In Files 1 3 2 4 2 4…

r sorting dataframe r-faq

asked Apr 11 '11 at 11:55

Catherine

5,345
11
30
28

387

votes

5 answers

Pandas DataFrame to List of Dictionaries

I have the following DataFrame: customer item1 item2 item3 1 apple milk tomato 2 water orange potato 3 juice mango chips which I want to translate it to list of dictionaries per…

python list dictionary pandas dataframe

asked Apr 23 '15 at 06:12

Mohamad Ibrahim

5,085
9
31
45

382

votes

10 answers

Add column to dataframe with constant value

I have an existing dataframe which I need to add an additional column to which will contain the same value for every row. Existing df: Date, Open, High, Low, Close 01-01-2015, 565, 600, 400, 450 New df: Name, Date, Open, High, Low, Close abc,…

python pandas dataframe

asked Apr 08 '15 at 14:09

darkpool

13,822
16
54
89

382

votes

5 answers

Pandas read in table without headers

Using pandas, how do I read in only a subset of the columns (say 4th and 7th columns) of a .csv file with no headers? I cannot seem to be able to do so using usecols.

python pandas dataframe csv

asked Mar 26 '15 at 19:27

user308827

21,227
87
254
417

381

votes

11 answers

How do I Pandas group-by to get sum?

I am using this dataframe: Fruit Date Name Number Apples 10/6/2016 Bob 7 Apples 10/6/2016 Bob 8 Apples 10/6/2016 Mike 9 Apples 10/7/2016 Steve 10 Apples 10/7/2016 Bob 1 Oranges 10/7/2016 Bob 2 Oranges 10/6/2016 Tom …

python pandas dataframe group-by aggregate

asked Oct 07 '16 at 17:36

Trying_hard

8,931
29
62
85

379

votes

18 answers

Detect and exclude outliers in a pandas DataFrame

I have a pandas data frame with few columns. Now I know that certain rows are outliers based on a certain column value. For instance column 'Vol' has all values around 12xx and one value is 4000 (outlier). Now I would like to exclude those rows…

python pandas filtering dataframe outliers

asked Apr 21 '14 at 14:51

AMM

17,130
24
65
77

374

votes

8 answers

Pandas Replace NaN with blank/empty string

I have a Pandas Dataframe as shown below: 1 2 3 0 a NaN read 1 b l unread 2 c NaN read I want to remove the NaN values with an empty string so that it looks like so: 1 2 3 0 a "" read 1 b l …

python pandas dataframe nan

asked Nov 10 '14 at 06:29

user1452759

8,810
15
42
58

371

votes

12 answers

How to drop columns by name in a data frame

I have a large data set and I would like to read specific columns or drop all the others. data <- read.dta("file.dta") I select the columns that I'm not interested in: var.out <- names(data)[!names(data) %in% c("iden", "name", "x_serv",…

r dataframe subset

asked Mar 08 '11 at 14:56

leroux

3,830
3
16
8

371

votes

11 answers

Convert floats to ints in Pandas?

I've been working with data imported from a CSV. Pandas changed some columns to float, so now the numbers in these columns get displayed as floating points! However, I need them to be displayed as integers or without comma. Is there a way to convert…

python pandas dataframe floating-point integer

asked Jan 22 '14 at 18:42

MJP

5,327
6
18
18

368

votes

13 answers

Opposite of %in%: exclude rows with values specified in a vector

A categorical variable V1 in a data frame D1 can have values represented by the letters from A to Z. I want to create a subset D2, which excludes some values, say, B, N and T. Basically, I want a command which is the opposite of %in% D2 = subset(D1,…

r dataframe subset

asked Apr 29 '11 at 12:06

user702432

11,898
21
55
70

363

votes

29 answers

Convert Pandas column containing NaNs to dtype `int`

I read data from a .csv file to a Pandas dataframe as below. For one of the columns, namely id, I want to specify the column type as int. The problem is the id series has missing/empty values. When I try to cast the id column to integer while…

python pandas dataframe nan dtype

asked Jan 22 '14 at 15:51

Zhubarb

11,432
18
75
114

363

votes

11 answers

How to split a dataframe string column into two columns?

I have a data frame with one (string) column and I'd like to split it into two (string) columns, with one column header as 'fips' and the other 'row' My dataframe df looks like this: row 0 00000 UNITED STATES 1 01000 ALABAMA 2 …

python dataframe pandas

asked Feb 07 '13 at 06:30

a k

3,633
3
13
5

362

votes

27 answers

Split (explode) pandas dataframe string entry to separate rows

I have a pandas dataframe in which one column of text strings contains comma-separated values. I want to split each CSV field and create a new row per entry (assume that CSV are clean and need only be split on ','). For example, a should become…

python pandas numpy dataframe

asked Oct 01 '12 at 20:42

Vincent

15,809
7
37
39

Prev 1 2 3

…

99 100 Next

Questions tagged [dataframe]

data.frame in R

DataFrame in Python's pandas library

DataFrame in Apache Spark

DataFrame in Maple

`data.frame` in R