Questions tagged [dataframe]

A data frame is a 2D tabular data structure. Usually, it contains data where rows are observations and columns are variables and are allowed to be of different types (as distinct from an array or matrix). While "data frame" or "dataframe" is the term used for this concept in several languages (R, Apache Spark, deedle, Maple, the pandas library in Python and the DataFrames library in Julia), "table" is the term used in MATLAB and SQL.

A data frame is a tabular data structure. Usually, it contains data where rows are observations and columns are variables of various types. While data frame or dataframe is the term used for this concept in several languages (R, Apache Spark, deedle, Maple, the pandas library in Python and the DataFrames library in Julia), table is the term used in MATLAB and SQL.

The sections below correspond to each language that uses this term and are aimed at the level of an audience only familiar with the given language.

`data.frame` in R

Data frames (object class data.frame) are one of the basic tabular data structures in the R language, alongside matrices. Unlike matrices, each column can be a different data type. In terms of implementation, a data frame is a list of equal-length column vectors.

Type ?data.frame for help constructing a data frame. An example:

data.frame(
  x = letters[1:5], 
  y = 1:5, 
  z = (1:5) > 3
)
#   x y     z
# 1 a 1 FALSE
# 2 b 2 FALSE
# 3 c 3 FALSE
# 4 d 4  TRUE
# 5 e 5  TRUE

Related functions include is.data.frame, which tests whether an object is a data.frame; and as.data.frame, which coerces many other data structures to data.frame (through S3 dispatch, see ?S3). base r data.frames have been extended or modified to create new data structures by several R packages, including data.table and tibble. For further reading, see the paragraph on Data frames in the CRAN manual Intro to R

DataFrame in Python's pandas library

The pandas library in Python is the canonical tabular data framework on the SciPy stack, and the DataFrame is its two-dimensional data object. It is basically a rectangular array like a 2D numpy ndarray, but with associated indices on each axis which can be used for alignment. As in R, from an implementation perspective, columns are somewhat prioritized over rows: the DataFrame resembles a dictionary with column names as keys and Series (pandas' one-dimensional data structure) as values. The DataFrame object in pandas.

After importing numpy and pandas under the usual aliases (import numpy as np, import pandas as pd), we can construct a DataFrame in several ways, such as passing a dictionary of column names and values:

>>> pd.DataFrame({"x": list("abcde"), "y": range(1,6), "z": np.arange(1,6) > 3})
   x  y      z
0  a  1  False
1  b  2  False
2  c  3  False
3  d  4   True
4  e  5   True

DataFrame in Apache Spark

A Spark DataFrame is a distributed collection of data organized into named columns. It is conceptually equivalent to a table in a relational database or a data frame in R/Python, but with richer optimizations under the hood. DataFrames can be constructed from a wide array of sources such as: structured data files, tables in Hive, external databases, or existing RDDs. (source)

DataFrame in Maple

A DataFrame is one of the basic data structures in Maple. Data frames are a list of variables, known as DataSeries, which are displayed in a rectangular grid. Every column (variable) in a DataFrame has the same length, however, each variable can have a different type, such as integer, float, string, name, boolean, etc.

When printed, Data frames resemble matrices in that they are viewed as a rectangular grid, but a key difference is that the first row corresponds to the column (variable) names, and the first column corresponds to the row (individual) names. These row and columns are treated as header meta-information and are not a part of the data. Moreover, the data stored in a DataFrame can be accessed using these header names, as well as by the standard numbered index. For more details, see the Guide to DataFrames in the online Maple Programming Help.

143674 questions

votes

5 answers

Removing rows from R data frame

I have the following data frame: > str(df) 'data.frame': 3149 obs. of 9 variables: $ mkod : int 5029 5035 5036 5042 5048 5050 5065 5071 5072 5075 ... $ mad : Factor w/ 65 levels "Akgün Kasetçilik ",..: 58 29 59 40 56 11 33 34 19 20…

r dataframe rows

asked Oct 27 '11 at 11:24

Mehper C. Palavuzlar

10,089
23
56
69

votes

9 answers

How to treat '' error?

After installing pypfopt and u-numpy, dataframe.info() command shows this error. TypeError: Cannot interpret '' as a data type

python-3.x pandas dataframe typeerror

asked Mar 10 '21 at 11:07

A V

votes

3 answers

Is there an efficient method of checking whether a column has mixed dtypes?

Consider np.random.seed(0) s1 = pd.Series([1, 2, 'a', 'b', [1, 2, 3]]) s2 = np.random.randn(len(s1)) s3 = np.random.choice(list('abcd'), len(s1)) df = pd.DataFrame({'A': s1, 'B': s2, 'C': s3}) df A B C 0 1 1.764052 …

python pandas numpy dataframe typechecking

asked Dec 12 '18 at 15:11

cs95

379,657
97
704
746

votes

4 answers

Pandas: how to merge two dataframes on a column by keeping the information of the first one?

I have two dataframes df1 and df2. df1 contains the information of the age of people, while df2 contains the information of the sex of people. Not all the people are in df1 nor in df2 df1 Name Age 0 Tom 34 1 Sara 18 2 Eva …

python pandas dataframe

asked Oct 26 '18 at 13:59

emax

6,965
19
74
141

votes

3 answers

Python - Delete duplicates in a dataframe based on two columns combinations?

I have a dataframe with 3 columns in Python: Name1 Name2 Value Juan Ale 1 Ale Juan 1 and would like to eliminate the duplicates based on columns Name1 and Name2 combinations. In my example both rows are equal (but they are in different…

python pandas sorting dataframe

asked Jul 05 '18 at 01:10

Juan

votes

1 answer

pandas dataframe Shape of passed values is (1, 4), indices imply (4, 4)

I am trying to create a pandas dataframe with one row using and ended up testing the following simple line of code: df = pd.DataFrame([1,2,3,4], columns=['a', 'b', 'v', 'w']) Although this seems very simple i get the following error Shape of passed…

pandas dataframe shapes

asked Jun 15 '18 at 10:52

saias

votes

5 answers

Python Dataframes: Describing a single column

Is there a way I can apply df.describe() to just an isolated column in a DataFrame. For example if I have several columns and I use df.describe() - it returns and describes all the columns. From research, I understand I can add the following: "A…

python dataframe describe

asked May 04 '18 at 01:38

Gitliong

votes

4 answers

Convert list of arrays to pandas dataframe

I have a list of numpy arrays that I'm trying to convert to DataFrame. Each array should be a row of the dataframe. Using pd.DataFrame() isn't working. It always gives the error: ValueError: Must pass 2-d input. Is there a better way to do…

python python-3.x pandas numpy dataframe

asked Mar 28 '18 at 17:40

Marcos Santana

votes

4 answers

Transpose DataFrame Without Aggregation in Spark with scala

I looked number different solutions online, but count not find what I am trying to achine. Please help me on this. I am using Apache Spark 2.1.0 with Scala. Below is my dataframe: +-----------+-------+ |COLUMN_NAME| VALUE…

scala apache-spark dataframe transpose

asked Mar 20 '18 at 19:30

Maruti K

votes

3 answers

Pandas DataFrame.groupby() to dictionary with multiple columns for value

type(Table) pandas.core.frame.DataFrame Table ======= ======= ======= Column1 Column2 Column3 0 23 1 1 5 2 1 2 3 1 19 5 2 56 1 2 22 2 3 2 4 3 14 5 4 59…

python pandas dictionary dataframe jupyter

asked Feb 27 '18 at 20:13

Micks Ketches

votes

2 answers

What does offset mean in a pandas rolling window?

The rolling window function pandas.DataFrame.rolling takes a window argument that is described as follows: window : int, or offset Size of the moving window. This is the number of observations used for calculating the statistic. Each window will be…

python python-3.x pandas dataframe datetime

asked Feb 18 '18 at 18:57

ascripter

5,665
12
45
68

votes

3 answers

multiple if else conditions in pandas dataframe and derive multiple columns

I have a dataframe like below. import pandas as pd import numpy as np raw_data = {'student':['A','B','C','D','E'], 'score': [100, 96, 80, 105,156], 'height': [7, 4,9,5,3], 'trigger1' : [84,95,15,78,16], 'trigger2' :…

python pandas if-statement dataframe

asked Feb 01 '18 at 18:08

Kumar AK

votes

5 answers

How to drop multiple column names given in a list from Spark DataFrame?

I have a dynamic list which is created based on value of n. n = 3 drop_lst = ['a' + str(i) for i in range(n)] df.drop(drop_lst) But the above is not working. Note: My use case requires a dynamic list. If I just do the below without list it…

dataframe apache-spark pyspark apache-spark-sql

asked Dec 15 '17 at 10:58

GeorgeOfTheRF

8,244
23
57
80

votes

5 answers

How to merge/combine columns in pandas?

I have a (example-) dataframe with 4 columns: data = {'A': ['a', 'b', 'c', 'd', 'e', 'f'], 'B': [42, 52, np.nan, np.nan, np.nan, np.nan], 'C': [np.nan, np.nan, 31, 2, np.nan, np.nan], 'D': [np.nan, np.nan, np.nan, np.nan, 62, 70]} df =…

python pandas dataframe merge multiple-columns

asked Oct 04 '17 at 11:35

mati

1,093
4
12
18

votes

6 answers

Returning a dataframe in python function

I am trying to create and return a data frame from a Python function def create_df(): data = {'state': ['Ohio','Ohio','Ohio','Nevada','Nevada'], 'year': [2000,2001,2002,2001,2002], 'pop': [1.5,1.7,3.6,2.4,2.9]} df =…

python-3.x function dataframe return

asked Aug 08 '17 at 23:36

Manoj Agrawal

Prev 1 2 3

…

99 100 Next

Questions tagged [dataframe]

data.frame in R

DataFrame in Python's pandas library

DataFrame in Apache Spark

DataFrame in Maple

`data.frame` in R