16

Im attempting to convert a dataframe into a series using code which, simplified, looks like this:

dates = ['2016-1-{}'.format(i)for i in range(1,21)]
values = [i for i in range(20)]
data = {'Date': dates, 'Value': values}
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
ts = pd.Series(df['Value'], index=df['Date'])
print(ts)

However, print output looks like this:

Date
2016-01-01   NaN
2016-01-02   NaN
2016-01-03   NaN
2016-01-04   NaN
2016-01-05   NaN
2016-01-06   NaN
2016-01-07   NaN
2016-01-08   NaN
2016-01-09   NaN
2016-01-10   NaN
2016-01-11   NaN
2016-01-12   NaN
2016-01-13   NaN
2016-01-14   NaN
2016-01-15   NaN
2016-01-16   NaN
2016-01-17   NaN
2016-01-18   NaN
2016-01-19   NaN
2016-01-20   NaN
Name: Value, dtype: float64

Where does NaN come from? Is a view on a DataFrame object not a valid input for the Series class ?

I have found the to_series function for pd.Index objects, is there something similar for DataFrames ?

deepbrook
  • 2,523
  • 4
  • 28
  • 49
  • Are you starting with the dataframe or is it just an intermediate step? – k-nut Mar 05 '16 at 19:43
  • Starting with a dataframe - which is why I didnt put it in a Series straight away; the data is loaded from a CSV with multiple columns. – deepbrook Mar 05 '16 at 19:45

3 Answers3

35

I think you can use values, it convert column Value to array:

ts = pd.Series(df['Value'].values, index=df['Date'])
import pandas as pd
import numpy as np
import io

dates = ['2016-1-{}'.format(i)for i in range(1,21)]
values = [i for i in range(20)]
data = {'Date': dates, 'Value': values}
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
print df['Value'].values
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]

ts = pd.Series(df['Value'].values, index=df['Date'])
print(ts)
Date
2016-01-01     0
2016-01-02     1
2016-01-03     2
2016-01-04     3
2016-01-05     4
2016-01-06     5
2016-01-07     6
2016-01-08     7
2016-01-09     8
2016-01-10     9
2016-01-11    10
2016-01-12    11
2016-01-13    12
2016-01-14    13
2016-01-15    14
2016-01-16    15
2016-01-17    16
2016-01-18    17
2016-01-19    18
2016-01-20    19
dtype: int64

Or you can use:

ts1 = pd.Series(data=values, index=pd.to_datetime(dates))
print(ts1)
2016-01-01     0
2016-01-02     1
2016-01-03     2
2016-01-04     3
2016-01-05     4
2016-01-06     5
2016-01-07     6
2016-01-08     7
2016-01-09     8
2016-01-10     9
2016-01-11    10
2016-01-12    11
2016-01-13    12
2016-01-14    13
2016-01-15    14
2016-01-16    15
2016-01-17    16
2016-01-18    17
2016-01-19    18
2016-01-20    19
dtype: int64

Thank you @ajcr for better explanation why you get NaN:

When you give a Series or DataFrame column to pd.Series, it will reindex it using the index you specify. Since your DataFrame column has an integer index (not a date index) you get lots of missing values.

Community
  • 1
  • 1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • And so I can! Cheers, mate! – deepbrook Mar 05 '16 at 19:42
  • Out of curiousity, from what does it convert the data to list ? I always thought I can think of the returned values of `df['Date']` as an iterable, comparable to lists - is that not the case? – deepbrook Mar 05 '16 at 19:44
  • 2
    @j4ck: When you give a Series or DataFrame column to `pd.Series`, it will *reindex* it using the index you specify. Since your DataFrame column has an integer index (not a date index). you get lots of missing values. – Alex Riley Mar 05 '16 at 19:47
  • @ajcr - Thank you for explanation. – jezrael Mar 05 '16 at 19:49
1

You can just do:

s = df.set_index('Date')

Which is now a one column dataframe.

If you really want it as a Series:

s = df.set_index('Date').Value

btw, NaN is numpy's Not-a-Number.

Using your method, you could use:

ts = pd.Series(df['Value'].values, name='Value', index=df['Date'])

The reason you are getting the NaNs is that you are not providing the data in the correct format. You are passing a Series to a Series.

Alexander
  • 105,104
  • 32
  • 201
  • 196
0

If you are only looking for a to create series with those values you could have also done:

 pd.Series( [i for i in range(20)],  pd.date_range('2016-01-02', periods=20, freq='D'))
k-nut
  • 3,447
  • 2
  • 18
  • 28