Questions tagged [dataframe]

A data frame is a 2D tabular data structure. Usually, it contains data where rows are observations and columns are variables and are allowed to be of different types (as distinct from an array or matrix). While "data frame" or "dataframe" is the term used for this concept in several languages (R, Apache Spark, deedle, Maple, the pandas library in Python and the DataFrames library in Julia), "table" is the term used in MATLAB and SQL.

A data frame is a tabular data structure. Usually, it contains data where rows are observations and columns are variables of various types. While data frame or dataframe is the term used for this concept in several languages (R, Apache Spark, deedle, Maple, the pandas library in Python and the DataFrames library in Julia), table is the term used in MATLAB and SQL.

The sections below correspond to each language that uses this term and are aimed at the level of an audience only familiar with the given language.

`data.frame` in R

Data frames (object class data.frame) are one of the basic tabular data structures in the R language, alongside matrices. Unlike matrices, each column can be a different data type. In terms of implementation, a data frame is a list of equal-length column vectors.

Type ?data.frame for help constructing a data frame. An example:

data.frame(
  x = letters[1:5], 
  y = 1:5, 
  z = (1:5) > 3
)
#   x y     z
# 1 a 1 FALSE
# 2 b 2 FALSE
# 3 c 3 FALSE
# 4 d 4  TRUE
# 5 e 5  TRUE

Related functions include is.data.frame, which tests whether an object is a data.frame; and as.data.frame, which coerces many other data structures to data.frame (through S3 dispatch, see ?S3). base r data.frames have been extended or modified to create new data structures by several R packages, including data.table and tibble. For further reading, see the paragraph on Data frames in the CRAN manual Intro to R

DataFrame in Python's pandas library

The pandas library in Python is the canonical tabular data framework on the SciPy stack, and the DataFrame is its two-dimensional data object. It is basically a rectangular array like a 2D numpy ndarray, but with associated indices on each axis which can be used for alignment. As in R, from an implementation perspective, columns are somewhat prioritized over rows: the DataFrame resembles a dictionary with column names as keys and Series (pandas' one-dimensional data structure) as values. The DataFrame object in pandas.

After importing numpy and pandas under the usual aliases (import numpy as np, import pandas as pd), we can construct a DataFrame in several ways, such as passing a dictionary of column names and values:

>>> pd.DataFrame({"x": list("abcde"), "y": range(1,6), "z": np.arange(1,6) > 3})
   x  y      z
0  a  1  False
1  b  2  False
2  c  3  False
3  d  4   True
4  e  5   True

DataFrame in Apache Spark

A Spark DataFrame is a distributed collection of data organized into named columns. It is conceptually equivalent to a table in a relational database or a data frame in R/Python, but with richer optimizations under the hood. DataFrames can be constructed from a wide array of sources such as: structured data files, tables in Hive, external databases, or existing RDDs. (source)

DataFrame in Maple

A DataFrame is one of the basic data structures in Maple. Data frames are a list of variables, known as DataSeries, which are displayed in a rectangular grid. Every column (variable) in a DataFrame has the same length, however, each variable can have a different type, such as integer, float, string, name, boolean, etc.

When printed, Data frames resemble matrices in that they are viewed as a rectangular grid, but a key difference is that the first row corresponds to the column (variable) names, and the first column corresponds to the row (individual) names. These row and columns are treated as header meta-information and are not a part of the data. Moreover, the data stored in a DataFrame can be accessed using these header names, as well as by the standard numbered index. For more details, see the Guide to DataFrames in the online Maple Programming Help.

143674 questions

votes

1 answer

"IndexError: positional indexers are out-of-bounds" when they're demonstrably not

Here's an MWE of some code I'm using. I slowly whittle down an initial dataframe via slicing and some conditions until I have only the rows that I need. Each block of five rows actually represents a different object so that, as I whittle things…

python pandas dataframe conditional-statements

asked May 22 '17 at 22:28

Arnold

votes

3 answers

How to add dictionaries to a DataFrame as a row?

I have a DataFrame with following columns: columns = ['Autor', 'Preţul', 'Suprafaţa totală', 'Etaj', 'Etaje', 'Tipul casei', 'Tipul de camere','Numărul de camere','Starea apartamentului', 'Planificare', 'Tipul clădirii', 'Sectorul', 'Strada', …

python pandas dictionary dataframe

asked Mar 06 '17 at 17:59

Sinchetru

votes

2 answers

Splitting a list in a Pandas cell into multiple columns

I have a really simple Pandas dataframe where each cell contains a list. I'd like to split each element of the list into it's own column. I can do that by exporting the values and then creating a new dataframe. This doesn't seem like a good way to…

python python-2.7 list pandas dataframe

asked Dec 02 '16 at 03:32

user2242044

8,803
25
97
164

votes

2 answers

Keeping columns in the specified order when using UseCols in Pandas Read_CSV

I have a csv file with 50 columns of data. I am using Pandas read_csv function to pull in a subset of these columns, using the usecols parameter to choose the ones I want: cols_to_use = [0,1,5,16,8] df_ret = pd.read_csv(filepath, index_col=False,…

python pandas dataframe

asked Oct 13 '16 at 14:53

AButkov

votes

3 answers

How do I turn a dataframe into a series of lists?

I have had to do this several times and I'm always frustrated. I have a dataframe: df = pd.DataFrame([[1, 2, 3, 4], [5, 6, 7, 8]], ['a', 'b'], ['A', 'B', 'C', 'D']) print df A B C D a 1 2 3 4 b 5 6 7 8 I want to turn df…

python list pandas dataframe series

asked Aug 02 '16 at 06:29

Brian

1,555
3
16
23

votes

5 answers

pandas, multiply all the numeric values in the data frame by a constant

How to multiply all the numeric values in the data frame by a constant without having to specify column names explicitly? Example: In [13]: df = pd.DataFrame({'col1': ['A','B','C'], 'col2':[1,2,3], 'col3': [30, 10,20]}) In [14]: df Out[14]: col1…

python pandas dataframe

asked Jul 23 '16 at 15:13

CentAu

10,660
15
59
85

votes

1 answer

How to know if the data is a list or data.frame in R

How do I know if my data in R is a list or a data.frame? If I use typeof(x) it says list, if I use class(x) it says data.frame?

r list dataframe

asked Jul 23 '16 at 08:13

carlosmaria

votes

2 answers

Pandas.dataframe.query() - fetch not null rows (Pandas equivalent to SQL: "IS NOT NULL")

I am fetching the rows with some values from a pandas dataframe with the following code. I need to convert this code to pandas.query(). results = rs_gp[rs_gp['Col1'].notnull()] When I convert to: results = rs_gp.query('Col1!=None') It gives me the…

python pandas dataframe

asked Jun 16 '16 at 15:38

Rtut

votes

3 answers

Using lambda if condition on different columns in Pandas dataframe

I have simple dataframe: import pandas as pd frame = pd.DataFrame(np.random.randn(4, 3), columns=list('abc')) Thus for example: a b c 0 -0.813530 -1.291862 1.330320 1 -1.066475 0.624504 1.690770 2 1.330330 -0.675750 …

python pandas numpy dataframe lambda

asked May 25 '16 at 16:38

PeterL

votes

8 answers

How to create a DataFrame from a text file in Spark

I have a text file on HDFS and I want to convert it to a Data Frame in Spark. I am using the Spark Context to load the file and then try to generate individual columns from that file. val myFile = sc.textFile("file.txt") val myFile1 =…

scala apache-spark dataframe apache-spark-sql rdd

asked Apr 21 '16 at 10:06

Rahul

2,354
3
21
30

votes

4 answers

pandas multiple conditions based on multiple columns

I am trying to color points of a pandas dataframe depending on TWO conditions. Example: IF value of col1 > a AND value of col2 - value of col3 < b THEN value of col4 = string ELSE value of col4 = other string. I have tried so many different ways…

python pandas dataframe numpy conditional-statements

asked Apr 13 '16 at 15:25

Robert

votes

1 answer

Select rows from a DataFrame based on multiple values in a column in pandas

This is not a repetitive question, yet similar to Select rows from a DataFrame based on values in a column in pandas In that answer up in the previous link it is only based on one criteria what if I have more than one criteria. I would like to…

python pandas dataframe

asked Apr 04 '16 at 18:23

rsc05

3,626
2
36
57

votes

2 answers

How to create new column and insert row values while iterating through pandas data frame

I am trying to create a function that iterates through a pandas dataframe row by row. I want to create a new column based on row values of other columns. My original dataframe could look like this: df: A B 0 1 2 1 3 4 2 2 2 Now I…

python pandas iteration dataframe

asked Dec 07 '15 at 08:10

sequence_hard

5,115
10
30
50

votes

1 answer

Using pandas.Dataframe.groupby without alphabetical ordering

I have a dataframe that I want to alter (according to the code right below) but it put's all the 'Experiment' name values in alphabetical order. Is there a way to leave the order as it is after calling pandas.Dataframe.groupby? df =…

python pandas group-by dataframe alphabetical-sort

asked Jul 29 '15 at 00:06

anonymous

votes

4 answers

Is there a pythonic way to do a contingency table in Pandas?

Given a dataframe that looks like this: A B 2005-09-06 5 -2 2005-09-07 -1 3 2005-09-08 4 5 2005-09-09 -8 2 2005-09-10 -2 -5 2005-09-11 -7 9 2005-09-12 2 8 2005-09-13 6 -5 2005-09-14 6 -5 Is there a…

python python-2.7 pandas dataframe

asked Apr 27 '15 at 16:41

hernanavella

5,462
8
47
84

Prev 1 2 3

…

99 100 Next