2

Are Rows and Columns treated essentially the same as a data object? For example, in the following:

import pandas as pd
df = pd.DataFrame([
    {"Title": "Titanic",    "ReleaseYear": 1997, "Director": "James Cameron"},
    {"Title": "Spider-Man", "ReleaseYear": 2002, "Director": "Sam Raimi"}
]
title_column = df['Title']
print(title_column)
print (type(title_column))

row_one = df.loc[0]
print(row_one)
print (type(row_one))

They both return a Series, where the Column is 0-indexed and the Row is Column-indexed:

0       Titanic
1    Spider-Man
Name: Title, dtype: object
<class 'pandas.core.series.Series'>

Title                Titanic
ReleaseYear             1997
Director       James Cameron
Name: 0, dtype: object
<class 'pandas.core.series.Series'>

And then, as soon as more than one column or row is selected, it becomes a DataFrame. Are the Row and Column basically the same type of object, or what are the differences between them, in how they're used?

David542
  • 104,438
  • 178
  • 489
  • 842
  • 1
    A [Series](https://pandas.pydata.org/pandas-docs/stable/reference/series.html) is a _One-dimensional ndarray with axis labels (including time series)_, whether from a single column or row. A [DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/frame.html) is a _Two-dimensional, size-mutable, potentially heterogeneous tabular data._ – Trenton McKinney Aug 16 '20 at 05:13
  • `Are the Row and Column basically the same type of object` - I think yes, for second subquestion is possible more clarify? – jezrael Aug 16 '20 at 05:13

1 Answers1

2

If check docs for DataFrame:

DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects

If check Series:

Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.).

So if select by one index or one columns (and not duplicated index value or column value) get always Series.

I think is not many differencies between both Series (row or column Series), only obviously one for general DataFrame with different types of columns (Series) like here - column ReleaseYear is filled by numbers, integers, both another columns are filled by strings.

So if check Series.dtype of data get differencies. For columns are same types, object what is obviously strings or integers, but for Series from rows are mixed types of values, first value is string, second integers and third string. So finally get object. If test separately by .apply(type) is possible check it:

Notice:

If all columns has same types then there is no such differency here.

Notice1:

Sure, is possible create Series filled by mixed data, then Series created from column has object dtype too same like Series created from row.

year_column = df['ReleaseYear']
print(year_column)
0    1997
1    2002
Name: ReleaseYear, dtype: int64

print (type(year_column))
<class 'pandas.core.series.Series'>

print (year_column.dtype)
int64

print (year_column.apply(type))
0    <class 'int'>
1    <class 'int'>
Name: ReleaseYear, dtype: object

row_one = df.loc[0]
print(row_one)
Title                Titanic
ReleaseYear             1997
Director       James Cameron
Name: 0, dtype: object

print (type(row_one))
<class 'pandas.core.series.Series'>

print (row_one.dtype)
object

print (row_one.apply(type))
Title                  <class 'str'>
ReleaseYear    <class 'numpy.int64'>
Director               <class 'str'>
Name: 0, dtype: object
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252