Questions tagged [dataframe]

A data frame is a 2D tabular data structure. Usually, it contains data where rows are observations and columns are variables and are allowed to be of different types (as distinct from an array or matrix). While "data frame" or "dataframe" is the term used for this concept in several languages (R, Apache Spark, deedle, Maple, the pandas library in Python and the DataFrames library in Julia), "table" is the term used in MATLAB and SQL.

A data frame is a tabular data structure. Usually, it contains data where rows are observations and columns are variables of various types. While data frame or dataframe is the term used for this concept in several languages (R, Apache Spark, deedle, Maple, the pandas library in Python and the DataFrames library in Julia), table is the term used in MATLAB and SQL.

The sections below correspond to each language that uses this term and are aimed at the level of an audience only familiar with the given language.

data.frame in R

Data frames (object class data.frame) are one of the basic tabular data structures in the R language, alongside matrices. Unlike matrices, each column can be a different data type. In terms of implementation, a data frame is a list of equal-length column vectors.

Type ?data.frame for help constructing a data frame. An example:

data.frame(
  x = letters[1:5], 
  y = 1:5, 
  z = (1:5) > 3
)
#   x y     z
# 1 a 1 FALSE
# 2 b 2 FALSE
# 3 c 3 FALSE
# 4 d 4  TRUE
# 5 e 5  TRUE

Related functions include is.data.frame, which tests whether an object is a data.frame; and as.data.frame, which coerces many other data structures to data.frame (through S3 dispatch, see ?S3). base data.frames have been extended or modified to create new data structures by several R packages, including and . For further reading, see the paragraph on Data frames in the CRAN manual Intro to R


DataFrame in Python's pandas library

The pandas library in Python is the canonical tabular data framework on the SciPy stack, and the DataFrame is its two-dimensional data object. It is basically a rectangular array like a 2D numpy ndarray, but with associated indices on each axis which can be used for alignment. As in R, from an implementation perspective, columns are somewhat prioritized over rows: the DataFrame resembles a dictionary with column names as keys and Series (pandas' one-dimensional data structure) as values. The DataFrame object in pandas.

After importing numpy and pandas under the usual aliases (import numpy as np, import pandas as pd), we can construct a DataFrame in several ways, such as passing a dictionary of column names and values:

>>> pd.DataFrame({"x": list("abcde"), "y": range(1,6), "z": np.arange(1,6) > 3})
   x  y      z
0  a  1  False
1  b  2  False
2  c  3  False
3  d  4   True
4  e  5   True

DataFrame in Apache Spark

A Spark DataFrame is a distributed collection of data organized into named columns. It is conceptually equivalent to a table in a relational database or a data frame in R/Python, but with richer optimizations under the hood. DataFrames can be constructed from a wide array of sources such as: structured data files, tables in Hive, external databases, or existing RDDs. (source)


DataFrame in Maple

A DataFrame is one of the basic data structures in Maple. Data frames are a list of variables, known as DataSeries, which are displayed in a rectangular grid. Every column (variable) in a DataFrame has the same length, however, each variable can have a different type, such as integer, float, string, name, boolean, etc.

When printed, Data frames resemble matrices in that they are viewed as a rectangular grid, but a key difference is that the first row corresponds to the column (variable) names, and the first column corresponds to the row (individual) names. These row and columns are treated as header meta-information and are not a part of the data. Moreover, the data stored in a DataFrame can be accessed using these header names, as well as by the standard numbered index. For more details, see the Guide to DataFrames in the online Maple Programming Help.

143674 questions
24
votes
3 answers

How to reverse a 2-dimensional table (DataFrame) into a 1 dimensional list using Pandas?

I am looking in Python/Pandas for a tip that reverses a 2-dimension table into 1 dimensional list. I usually leverage an Excel function to do it, but I believe that there is a smart Python way to do it. Step More details of the Excel…
Ning Chen
  • 712
  • 2
  • 7
  • 11
24
votes
1 answer

Difference between `names(df[1]) <- ` and `names(df)[1] <- `

Consider the following: df <- data.frame(a = 1, b = 2, c = 3) names(df[1]) <- "d" ## First method ## a b c ##1 1 2 3 names(df)[1] <- "d" ## Second method ## d b c ##1 1 2 3 Both methods didn't return an error, but the first didn't change the…
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
24
votes
2 answers

Duplicate a column in data frame and rename it to another column name

I have a data frame like sample below. I would like to duplicat a column in the data frame and rename to another column name. Name Age Rate Aira 23 90 Ben 32 98 Cat 27 95 Desire output is : Name Age Rate …
Ianthe
  • 5,559
  • 21
  • 57
  • 74
24
votes
2 answers

Get count of values across columns-Pandas DataFrame

I have a Pandas DataFrame like following: A B C 0 192.168.2.85 192.168.2.85 124.43.113.22 1 192.248.8.183 192.248.8.183 192.168.2.85 2 192.168.2.161 NaN 192.248.8.183 3 66.249.74.52 …
Nilani Algiriyage
  • 32,876
  • 32
  • 87
  • 121
24
votes
3 answers

Python Pandas - Date Column to Column index

I have a table of data imported from a CSV file into a DataFrame. The data contains around 10 categorical fields, 1 month column (in date time format) and the rest are data series. How do I convert the date column into an index across the the…
MrHopko
  • 879
  • 1
  • 7
  • 16
24
votes
7 answers

How to split a number into digits in R

I have a data frame with a numerical ID variable which identify the Primary, Secondary and Ultimate Sampling Units from a multistage sampling scheme. I want to split the original ID variable into three new variables, identifying the different…
jrs-x
  • 336
  • 1
  • 2
  • 10
24
votes
4 answers

Delete a column in a data frame within a list

I made a list out of my dataframe, based on the factor levels in column A. In the list I would like to remove that column. My head is saying lapply, but not anything else :P $A ID Test A 1 A 1 $B ID Test B 1 B 3 B 5 Into this …
ego_
  • 1,409
  • 6
  • 21
  • 31
23
votes
5 answers

Strategies for formatting JSON output from R

I'm trying to figure out the best way of producing a JSON file from R. I have the following dataframe tmp in R. > tmp gender age welcoming proud tidy unique 1 1 30 4 4 4 4 2 2 34 4 2 4 4 3 1…
djq
  • 14,810
  • 45
  • 122
  • 157
23
votes
6 answers

Replace all NA with FALSE in selected columns in R

I have a question similar to this one, but my dataset is a bit bigger: 50 columns with 1 column as UID and other columns carrying either TRUE or NA, I want to change all the NA to FALSE, but I don't want to use explicit loop. Can plyr do the trick?…
lokheart
  • 23,743
  • 39
  • 98
  • 169
23
votes
3 answers

Appending row to dataframe with concat()

I have defined an empty data frame with df = pd.DataFrame(columns=['Name', 'Weight', 'Sample']) and want to append rows in a for loop like this: for key in my_dict: ... row = {'Name':key, 'Weight':wg, 'Sample':sm} df = pd.concat(row,…
mahmood
  • 23,197
  • 49
  • 147
  • 242
23
votes
7 answers

What is the best way to transpose a data.frame in R and to set one of the columns to be the header for the new transposed table?

What is the best way to transpose a data.frame in R and to set one of the columns to be the header for the new transposed table? I have coded up a way to do this below. As I am still new to R. I would like suggestions to improve my code as well as…
themartinmcfly
  • 2,004
  • 2
  • 13
  • 12
23
votes
3 answers

How can I Export Pandas DataFrame to Google Sheets using Python?

I managed to read data from a Google Sheet file using this method: # ACCES GOOGLE SHEET googleSheetId = 'myGoogleSheetId' workSheetName = 'mySheetName' URL = 'https://docs.google.com/spreadsheets/d/{0}/gviz/tq?tqx=out:csv&sheet={1}'.format( …
23
votes
2 answers

Check if Pandas DataFrame cell contains certain string

Suppose I have the following Pandas DataFrame: a b 0 NAN BABA UN EQUITY 1 NAN 2018 2 NAN 2017 3 NAN 2016 4 NAN NAN 5 NAN 700 HK EQUITY 6 NAN 2018 7 NAN …
turtle101
  • 359
  • 1
  • 2
  • 7
23
votes
5 answers

Assign a Dictionary Value to a DataFrame Column Based on Dictionary Key

I'm looking to map the value in a dict to one column in a DataFrame where the key in the dict is equal to a second column in that DataFrame For example: If my dict is: dict = {'abc':'1/2/2003', 'def':'1/5/2017', 'ghi':'4/10/2013'} and my DataFrame…
Windstorm1981
  • 2,564
  • 7
  • 29
  • 57
23
votes
2 answers

How to add numpy matrix as new columns for pandas dataframe?

I have a NxM dataframe and a NxL numpy matrix. I'd like to add the matrix to the dataframe to create L new columns by simply appending the columns and rows the same order they appear. I tried merge() and join(), but I end up with errors: assign()…
Booley
  • 819
  • 1
  • 9
  • 25