0

I am trying to divide the dataframe like below:

from io import StringIO
import pandas as pd

data = """
A,B,C
87jg,28,3012
h372,28,3011
kj87,27,3011
2yh8,54,3010
802h,53,3010
5d8b,52,3010
"""
df = pd.read_csv(StringIO(data), sep=',')

for key, group in df.groupby(['C','B']):
    group.to_csv(f'df_{key}.csv', index=False)

This would export the results of group by dataframes to local machine. Is there a way to perform this operation and upload these multiple split csv's to s3 (something like put_object of boto3)

Marcin
  • 215,873
  • 14
  • 235
  • 294
omdurg
  • 330
  • 3
  • 13

2 Answers2

0

You can use s3fs which you have to install as well. Installation can be done using pip, e.g.:

pip install s3fs

Verified example based on your code:

import os

from io import StringIO
import pandas as pd
import s3fs

# I did not use my default aws profile
# so had to provide key and secret. If you use
# the default aws profile, providing `key`
# and `secret` should not be required
fs = s3fs.S3FileSystem(
        anon=False,
        key='<access_key>',
        secret='<secret_key>')

data = """ 
A,B,C
87jg,28,3012
h372,28,3011
kj87,27,3011
2yh8,54,3010
802h,53,3010
5d8b,52,3010
"""
df = pd.read_csv(StringIO(data), sep=',')

for key, group in df.groupby(['C','B']):
    group.to_csv(fs.open(f's3://<bucket-name>/df_{key[0]}-M{key[1]}.csv', 'w'), index=False)

The code correctly uploads the files:

enter image description here

Marcin
  • 215,873
  • 14
  • 235
  • 294
0
from io import StringIO
import pandas as pd
import boto3


data = """
A,B,C
87jg,28,3012
h372,28,3011
kj87,27,3011
2yh8,54,3010
802h,53,3010
5d8b,52,3010
"""
df = pd.read_csv(StringIO(data), sep=',')

client = boto3.client('s3')
for key, group in df.groupby(['C', 'B']):
    group.to_csv(f'df_{key}.csv', index=False)
    client.upload_file(f'df_{key}.csv', 'my-another-test-bucket-2',
                       f'df_{key[0]}-M{key[1]}.csv')

S3 Bucket

enter image description here

CK__
  • 1,252
  • 1
  • 11
  • 25
  • @omdurg Yeah, it's possible. I have updated answer with screenshot. Try it out. – CK__ Aug 22 '20 at 13:42
  • `[Errno 30] Read-only file system:`? Didn't get your question. I would suggest to ask a fresh question(as this is marked as duplicate). Also let us know which answer worked for you for this particular question. You can upvote/accept provided answer. For the `Read-only` issue please ask another question providing all possible details. – CK__ Sep 12 '20 at 02:02