Notes: Permissions
Overview
AWS provides several ways to control who can access S3 resources
- IAM controls access across all AWS services
- Bucket Policies control access to buckets and everything inside them
- ACLs (Access Control Lists) manage access to specific files or objects
- Presigned URLs give temporary access to files
In small setups, where only one person manages everything, we don’t need complex permission systems.
- IAM and Bucket Policies are better for multi-user setups
- For single-user environments, ACLs and presigned URLs are simpler and faster
How AWS Decides Access
When a request is made for an S3 object, AWS checks permissions step by step.
- If it’s a presigned URL, access is granted temporarily
- If not, AWS checks IAM and bucket policies
- If nothing allows it, access is denied by default
ACLs
ACLs define who can access specific objects in your S3 bucket.
- Each object can have its own ACL
- The two common types are
private
andpublic-read
- By default, all uploaded files are private
Changing ACLs on Existing Files
We can change a file’s access using the Boto3 put_object_acl
method.
- Default ACL is private when uploaded
- Changing to
public-read
allows anyone to download it
import boto3
s3 = boto3.client('s3')
s3.put_object_acl(Bucket='my-bucket', Key='data.csv', ACL='public-read')
Expected result:: The file data.csv
in my-bucket
can now be accessed publicly.
Setting ACLs During Upload
You can make an object public at the time of upload using the ExtraArgs
parameter.
- Add
ACL: 'public-read'
toExtraArgs
- Simplifies workflow by avoiding a second ACL call
s3.upload_file(
'local-data.csv',
'my-bucket',
'uploads/data.csv',
ExtraArgs={'ACL': 'public-read'}
)
Expected result:: The file uploads with a public-read
ACL, ready for public download immediately.
Accessing Public Objects
Once an object has a public-read ACL, anyone can view it using a simple URL format.
- Format:
https://<bucket-name>.s3.amazonaws.com/<object-key>
- Example:
https://reports-bucket.s3.amazonaws.com/2024/traffic.csv
You can now share this link, and anyone can open or download the file directly.
Accessing Public Files
We can easily create public URLs with Python’s format()
method.
- Define a URL string with placeholders for bucket and key
- Use
format(bucket, key)
to fill in values dynamically
bucket = 'reports-bucket'
key = '2024/traffic.csv'
url = "https://{}.s3.amazonaws.com/{}".format(bucket, key)
print(url)
Expected result:
https://reports-bucket.s3.amazonaws.com/2024/traffic.csv
Accessing Private Files
Files in S3 are private by default unless we explicitly make them public. To read private files, we need special methods.
- Private files cannot be accessed via public URLs directly
- Attempting to read a private file publicly gives a
403 Forbidden
error
Download Private Files
We can download private files to local storage before processing them.
- Use
download_file
from Boto3 - Once downloaded, read the file locally with Pandas
- This works well for files that do not change frequently
s3.download_file('city-data', 'reports/private.csv', 'local/private.csv')
import pandas as pd
df = pd.read_csv('local/private.csv')
print(df.head())
Access Files Directly from S3
We can also read private files directly without downloading them.
- Use
get_object
with the bucket name and object key - Response contains metadata and a
Body
key - The
Body
is aStreamingBody
object that streams content without downloading fully
This method lets us work with private files efficiently without storing local copies.
response = s3.get_object(Bucket='city-data', Key='reports/private.csv')
df = pd.read_csv(response['Body'])
print(df.head())
Using Pre-signed URLs
Pre-signed URLs provide temporary access to private files in S3.
- Generate a URL that expires after a set time (e.g., 1 hour)
- The URL can be opened in Pandas or a browser
- Useful for sharing files without making them public
url = s3.generate_presigned_url(
ClientMethod='get_object',
Params={'Bucket': 'city-data', 'Key': 'reports/private.csv'},
ExpiresIn=3600
)
print(url)
Expected result: A temporary URL that anyone can use for 1 hour to access private.csv
.
Multiple Files in One DataFrame
We can combine multiple CSVs from S3 into a single DataFrame.
- Create a list to store individual DataFrames
- Use
list_objects
with a prefix to find all relevant files - Loop through the files, read each into a DataFrame, and append to the list
- Use
pd.concat
to combine all DataFrames into one
files = s3.list_objects(Bucket='city-data', Prefix='2019/')['Contents']
dfs = []
for f in files:
obj = s3.get_object(Bucket='city-data', Key=f['Key'])
dfs.append(pd.read_csv(obj['Body']))
combined_df = pd.concat(dfs, ignore_index=True)
print(combined_df.head())