I am fairly new to both S3 as well as boto3. I am trying to read in some data in the following format:
https://blahblah.s3.amazonaws.com/data1.csv https://blahblah.s3.amazonaws.com/data2.csv https://blahblah.s3.amazonaws.com/data3.csv
I am importing
boto3, and it seems like I would need to do something like:
import boto3 s3 = boto3.client('s3')
However, what should I do after creating this client if I want to read in all files separately in-memory (I am not supposed to locally download this data). Ideally, I would like to read in each CSV data file into separate Pandas DataFrames (which I know how to do once I know how to access the S3 data).
Please understand I’m fairly new to both
boto3 as well as
S3, so I don’t even know where to begin.
import boto3 s3 = boto3.resource('s3') obj = s3.Object(<<bucketname>>, <<itemname>>) body = obj.get()['Body'].read()
You’ll have 2 options, both the options you’ve already mentioned:
- Downloading the file locally using
s3.download_file( "<bucket-name>", "<key-of-file>", "<local-path-where-file-will-be-downloaded>" )
- Loading the file contents into memory using
response = s3.get_object(Bucket="<bucket-name>", Key="<key-of-file>") contentBody = response.get("Body") # You need to read the content as it is a Stream content = contentBody.read()
Either approach is fine and you can just chose which one fits your scenario better.