1

Consider the following file (stock_prices.csv) in CSV format:

Symbol,Price
RY,96.61
NA,58.69
BNS,80.35

When using the odo function in Python to insert the CSV file into a SQLite database table, the NA ticker appears to be inserted as a None value.

from odo import odo, dshape

input_csv = 'stock_prices.csv'
output_sqlite = 'sqlite:///stocks.db::stock_prices'
ds = dshape('var * {Symbol: string, Price: float64}')

odo(input_csv, output_sqlite, dshape=ds)

Here is the code I used to query the SQLite database.

DB_PATH = 'stocks.db'
cn = sqlite3.connect(DB_PATH)
c = cn.cursor()
c.execute("SELECT * FROM stock_prices")
for row in c.fetchall():
    print(row)

Results were as follows:

('RY', 96.61)
(None, 58.69)
('BNS', 80.35)

While I could update each row whose Symbol is None with 'NA', I would rather insert the row correctly the first time.

Note: I am using odo function because for my actual project the files I will be inserting into the table are as large as several gigabytes and contain around 15-20 columns. odo seemed to me to be the quickest way for me to accomplish what I need to do in a short period of time.

bktags
  • 13
  • 3

1 Answers1

0

One way to get around this is to read the file in using pandas and specifying na_filter=False.

>>> import pandas as pd
>>> df = pd.read_csv('stock_prices.csv',na_filter=False)
>>> df
  Ticker  Price
0     RY  96.61
1     NA  58.69
2    BNS  80.35

And import:

odo(df, output_sqlite, dshape=ds)

Results:

>>> for row in c.fetchall():
...     print(row)
...
(u'RY', 96.61)
(u'NA', 58.69)
(u'BNS', 80.35)
mechanical_meat
  • 163,903
  • 24
  • 228
  • 223