Questions tagged [dataframe]

A data frame is a 2D tabular data structure. Usually, it contains data where rows are observations and columns are variables and are allowed to be of different types (as distinct from an array or matrix). While "data frame" or "dataframe" is the term used for this concept in several languages (R, Apache Spark, deedle, Maple, the pandas library in Python and the DataFrames library in Julia), "table" is the term used in MATLAB and SQL.

A data frame is a tabular data structure. Usually, it contains data where rows are observations and columns are variables of various types. While data frame or dataframe is the term used for this concept in several languages (R, Apache Spark, deedle, Maple, the pandas library in Python and the DataFrames library in Julia), table is the term used in MATLAB and SQL.

The sections below correspond to each language that uses this term and are aimed at the level of an audience only familiar with the given language.

`data.frame` in R

Data frames (object class data.frame) are one of the basic tabular data structures in the R language, alongside matrices. Unlike matrices, each column can be a different data type. In terms of implementation, a data frame is a list of equal-length column vectors.

Type ?data.frame for help constructing a data frame. An example:

data.frame(
  x = letters[1:5], 
  y = 1:5, 
  z = (1:5) > 3
)
#   x y     z
# 1 a 1 FALSE
# 2 b 2 FALSE
# 3 c 3 FALSE
# 4 d 4  TRUE
# 5 e 5  TRUE

Related functions include is.data.frame, which tests whether an object is a data.frame; and as.data.frame, which coerces many other data structures to data.frame (through S3 dispatch, see ?S3). base r data.frames have been extended or modified to create new data structures by several R packages, including data.table and tibble. For further reading, see the paragraph on Data frames in the CRAN manual Intro to R

DataFrame in Python's pandas library

The pandas library in Python is the canonical tabular data framework on the SciPy stack, and the DataFrame is its two-dimensional data object. It is basically a rectangular array like a 2D numpy ndarray, but with associated indices on each axis which can be used for alignment. As in R, from an implementation perspective, columns are somewhat prioritized over rows: the DataFrame resembles a dictionary with column names as keys and Series (pandas' one-dimensional data structure) as values. The DataFrame object in pandas.

After importing numpy and pandas under the usual aliases (import numpy as np, import pandas as pd), we can construct a DataFrame in several ways, such as passing a dictionary of column names and values:

>>> pd.DataFrame({"x": list("abcde"), "y": range(1,6), "z": np.arange(1,6) > 3})
   x  y      z
0  a  1  False
1  b  2  False
2  c  3  False
3  d  4   True
4  e  5   True

DataFrame in Apache Spark

A Spark DataFrame is a distributed collection of data organized into named columns. It is conceptually equivalent to a table in a relational database or a data frame in R/Python, but with richer optimizations under the hood. DataFrames can be constructed from a wide array of sources such as: structured data files, tables in Hive, external databases, or existing RDDs. (source)

DataFrame in Maple

A DataFrame is one of the basic data structures in Maple. Data frames are a list of variables, known as DataSeries, which are displayed in a rectangular grid. Every column (variable) in a DataFrame has the same length, however, each variable can have a different type, such as integer, float, string, name, boolean, etc.

When printed, Data frames resemble matrices in that they are viewed as a rectangular grid, but a key difference is that the first row corresponds to the column (variable) names, and the first column corresponds to the row (individual) names. These row and columns are treated as header meta-information and are not a part of the data. Moreover, the data stored in a DataFrame can be accessed using these header names, as well as by the standard numbered index. For more details, see the Guide to DataFrames in the online Maple Programming Help.

143674 questions

784

votes

32 answers

How do I count the NaN values in a column in pandas DataFrame?

I want to find the number of NaN in each column of my data.

python pandas dataframe

asked Oct 08 '14 at 21:00

user3799307

7,849
3
12
3

778

votes

11 answers

Get statistics for each group (such as count, mean, etc) using pandas GroupBy?

I have a dataframe df and I use several columns from it to groupby: df['col1','col2','col3','col4'].groupby(['col1','col2']).mean() In the above way, I almost get the table (dataframe) that I need. What is missing is an additional column that…

python pandas dataframe group-by statistics

asked Oct 15 '13 at 15:00

Roman

124,451
167
349
456

772

votes

24 answers

Set value for particular cell in pandas DataFrame using index

I have created a Pandas DataFrame df = DataFrame(index=['A','B','C'], columns=['x','y']) and have got this x y A NaN NaN B NaN NaN C NaN NaN Now, I would like to assign a value to particular cell, for example to row C and column x. I…

python pandas dataframe cell nan

asked Dec 12 '12 at 14:40

Mitkp

7,800
3
14
8

766

votes

23 answers

Constructing pandas DataFrame from values in variables gives "ValueError: If using all scalar values, you must pass an index"

This may be a simple question, but I can not figure out how to do this. Lets say that I have two variables as follows. a = 2 b = 3 I want to construct a DataFrame from this: df2 = pd.DataFrame({'A':a,'B':b}) This generates an error: ValueError:…

python pandas dataframe scalar

asked Jul 24 '13 at 16:40

Nilani Algiriyage

32,876
32
87
121

751

votes

20 answers

Import multiple CSV files into pandas and concatenate into one DataFrame

I would like to read several CSV files from a directory into pandas and concatenate them into one big DataFrame. I have not been able to figure it out though. Here is what I have so far: import glob import pandas as pd # Get data file names path =…

python pandas csv dataframe concatenation

asked Jan 03 '14 at 15:00

jonas

13,559
22
57
75

720

votes

15 answers

How to apply a function to two columns of Pandas dataframe

Suppose I have a df which has columns of 'ID', 'col_1', 'col_2'. And I define a function : f = lambda x, y : my_function_expression. Now I want to apply the f to df's two columns 'col_1', 'col_2' to element-wise calculate a new column 'col_3' ,…

python pandas dataframe

asked Nov 11 '12 at 13:48

bigbug

55,954
42
77
96

697

votes

11 answers

Difference between map, applymap and apply methods in Pandas

Can you tell me when to use these vectorization methods with basic examples? I see that map is a Series method whereas the rest are DataFrame methods. I got confused about apply and applymap methods though. Why do we have two methods for applying a…

python pandas dataframe vectorization

asked Nov 05 '13 at 20:20

marillion

10,618
19
48
63

697

votes

19 answers

How can I get a value from a cell of a dataframe?

I have constructed a condition that extracts exactly one row from my dataframe: d2 = df[(df['l_ext']==l_ext) & (df['item']==item) & (df['wn']==wn) & (df['wd']==1)] Now I would like to take a value from a particular column: val = d2['col_name'] But…

python pandas dataframe

asked May 24 '13 at 07:17

Roman

124,451
167
349
456

693

votes

28 answers

How to check if any value is NaN in a Pandas DataFrame

In Python Pandas, what's the best way to check whether a DataFrame has one (or more) NaN values? I know about the function pd.isnan, but this returns a DataFrame of booleans for each element. This post right here doesn't exactly answer my question…

python pandas dataframe nan

asked Apr 09 '15 at 05:09

hlin117

20,764
31
72
93

691

votes

16 answers

Convert pandas dataframe to NumPy array

How do I convert a pandas dataframe into a NumPy array? DataFrame: import numpy as np import pandas as pd index = [1, 2, 3, 4, 5, 6, 7] a = [np.nan, np.nan, np.nan, 0.1, 0.1, 0.1, 0.1] b = [0.2, np.nan, 0.2, 0.2, 0.2, np.nan, np.nan] c = [np.nan,…

python arrays pandas numpy dataframe

asked Nov 02 '12 at 00:57

Mister Nobody

6,927
3
13
3

683

votes

25 answers

UnicodeDecodeError when reading CSV file in Pandas

I'm running a program which is processing 30,000 similar files. A random number of them are stopping and producing this error... File "C:\Importer\src\dfman\importer.py", line 26, in import_chr data = pd.read_csv(filepath, names=fields) File…

python pandas csv dataframe unicode

asked Aug 11 '13 at 12:06

TravisVOX

20,342
13
37
41

673

votes

12 answers

Converting a Pandas GroupBy output from Series to DataFrame

I'm starting with input data like this df1 = pandas.DataFrame( { "Name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"] , "City" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland"] } ) Which when printed…

python pandas dataframe pandas-groupby multi-index

asked Apr 29 '12 at 16:10

saveenr

8,439
3
19
20

665

votes

11 answers

The difference between bracket [ ] and double bracket [[ ]] for accessing the elements of a list or dataframe

R provides two different methods for accessing the elements of a list or data.frame: [] and [[]]. What is the difference between the two, and when should I use one over the other?

r list dataframe extract r-faq

asked Jul 23 '09 at 03:33

Sharpie

17,323
4
44
47

637

votes

5 answers

How to check whether a pandas DataFrame is empty?

How to check whether a pandas DataFrame is empty? In my case I want to print some message in terminal if the DataFrame is empty.

python pandas dataframe

asked Nov 07 '13 at 05:45

Nilani Algiriyage

32,876
32
87
121

634

votes

26 answers

Convert a list to a data frame

I have a nested list of data. Its length is 132 and each item is a list of length 20. Is there a quick way to convert this structure into a data frame that has 132 rows and 20 columns of data? Here is some sample data to work with: l <- replicate( …

r list dataframe

asked Nov 19 '10 at 16:40

Btibert3

38,798
44
129
168

Prev 1 2

…

99 100 Next

Questions tagged [dataframe]

data.frame in R

DataFrame in Python's pandas library

DataFrame in Apache Spark

DataFrame in Maple

`data.frame` in R