Is there a way to load data from a csv file in google cloud storage into separate variables in datalab?

Question

I used to access data from a CSV file in my local directory using Jupyter Notebook, however, now I want access a CSV file that is stored in google cloud storage via datalab. This is the part of the function as I used to run it:

def function1(file_name):
    new_file = open("file_name.csv", "w")
    new_file.write("variable"+'\n')
    with open(file_name, "r") as csv_file:
        csv_reader = csv.reader(csv_file, delimiter=',')
        for row in csv_reader:
            values_in_column1 = int(row[0])
            variable = values_in_column1 * 0.6 / 5

How can I change this function in order to make it work with CSV files stored in google cloiud storage and in datalab?

Datalab gives me the possibility to load the data of a csv file into one variable, but I don't want to load all data into one variable. I want to load the values from each column into a different variable.

%%gcs read --object gs://bucket-name/file_name.csv --variable variable_name

Does anyone recommend using dictionaries or lists? Or is there an easier way to do this?

I have tried using storage from google.cloud, but I can't import it while I have been updating google cloud storage via my terminal.

ImportErrorTraceback (most recent call last)
<ipython-input-6-943e66fe7e46> in <module>()
----> 1 from google.cloud import storage
      2 
      3 storage_client = storage.Client()
      4 bucket = storage_client.get_bucket(bucket_name)
      5 blob = bucket.blob(source_blob_name)

ImportError: cannot import name storage

score 2 · Answer 1 · answered Dec 03 '19 at 16:06

I created a notebook instance link.
I copied a csv file to Google Cloud Storage.
```
    gsutil cp file.csv gs://my-bucket/
```

Then using pandas:

    import pandas as pd

    df = pd.read_csv('gs://my-bucket/file.csv')

    df 

    # cdatetime address district    beat    grid    crimedescr  ucr_ncic_code   latitude    longitude
    # 0 1/1/06 0:00 3108 OCCIDENTAL DR  3   3C  1115    10851(A)VC TAKE VEH W/O OWNER   2404    38.550420   -121.391416
    # 1 1/1/06 0:00 2082 EXPEDITION WAY 5   5A  1512    459 PC BURGLARY RESIDENCE   2204    38.473501   -121.490186

    # You can access now the columns of the dataframe

    df['district']

     #0      3
     #1      5
     #2      2
     #3      6
     #4      2

   df['variable'] = df['district'] * 0.6 / 5

    #0      0.36
    #1      0.60
    #2      0.24
    #3      0.72
    #4      0.24

Thank you marian.vladoi . it really helped me ! – Idriss Brahimi Aug 13 '20 at 13:53 — Idriss Brahimi, Aug 13 '20 at 13:53

score 0 · Answer 2 · answered Dec 04 '19 at 20:18

I started off by creating Datalab instance and establishing connection to localhost through port 8081. I would recommend you look into this link to better understand the potential of Datalab’s functionalities and data processing capabilities. https://cloud.google.com/datalab/docs/quickstart

I’ve tried this script in Datalab and it worked just fine for me. I managed to read my sample data from one my objects in my bucket into a dataframe:

import google.datalab.storage as storage
import pandas as pd
import numpy as np
from io import BytesIO

mybucket = storage.Bucket('my-test-bucket-1-2-3-4')
data_csv = mybucket.object('test1.csv')

uri = data_csv.uri
%gcs read --object $uri --variable data

df = pd.read_csv(BytesIO(data))
df.head()

How to read data from Google storage cloud to Google cloud datalab

I see that you are also attempting to perform row operations on your data. I would suggest you use pandas.DataFrame.apply to perform such operations. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html

Is there a way to load data from a csv file in google cloud storage into separate variables in datalab?

2 Answers2