Initial Testing

Updated Jul 29, 2020 ·

Overview

GetItDone is an app used by residents to report public issues.

Residents report potholes or broken streetlights
The data is sent to city systems through AWS
Use Boto3 and other AWS tools to analyze and visualize reports

Pre-requisites

Required:

Note on the IAM Policy: The IAM policy attached to your IAM must have the following permissions:

AmazonS3FullAccess
AmazonSNSFullAccess
AmazonRekognitionFullAccess
TranslateFullAccess
ComprehendFullAccess

1. Create Multiple AWS Clients

Create a client for S3 and SNS.

import boto3

s3 = boto3.client('s3', region_name='us-east-1',
                  aws_access_key_id=AWS_KEY_ID,
                  aws_secret_access_key=AWS_SECRET)

sns = boto3.client('sns', region_name='us-east-1',
                   aws_access_key_id=AWS_KEY_ID,
                   aws_secret_access_key=AWS_SECRET)

buckets = s3.list_buckets()
topics = sns.list_topics()

print(buckets)
print(topics)

2. Create S3 Buckets

Create S3 buckets to organize your data for different parts of your project.

gid-staging
gid-processed
gid-test

import boto3

s3 = boto3.client('s3', region_name='us-east-1',
                  aws_access_key_id=AWS_KEY_ID,
                  aws_secret_access_key=AWS_SECRET)

response_staging = s3.create_bucket(Bucket='gid-staging')
response_processed = s3.create_bucket(Bucket='gid-processed')
response_test = s3.create_bucket(Bucket='gid-test')

print(response_staging)

Next, list all buckets in your AWS account to confirm your setup.

response = s3.list_buckets()
for bucket in response['Buckets']:
    print(bucket['Name'])

Sample output:

gid-staging
gid-processed
gid-test

3. Clean Up Buckets

When a bucket is no longer needed, delete it to stay organized. Delete the gid-test bucket.

s3.delete_bucket(Bucket='gid-test')

response = s3.list_buckets()
for bucket in response['Buckets']:
    print(bucket['Name'])

Sample output:

gid-staging
gid-processed

4. Rename Buckets

If your project name changes, update your bucket names to match. In our case, we need to change the bucket prefix from gim- to gid-.

response = s3.list_buckets()
for bucket in response['Buckets']:
    if 'gim' in bucket['Name']:
        s3.delete_bucket(Bucket=bucket['Name'])

s3.create_bucket(Bucket='gid-staging')
s3.create_bucket(Bucket='gid-processed')

response = s3.list_buckets()
for bucket in response['Buckets']:
    print(bucket['Name'])

5. Upload and Retrieve Files

Once the buckets are ready, upload the files. Make sure to confirm that the upload is successful.

s3.upload_file(Bucket='gid-staging',
               Filename='final_report.csv',
               Key='2019/final_report_01_01.csv')

response = s3.head_object(Bucket='gid-staging',
                          Key='2019/final_report_01_01.csv')
print(response['ContentLength'])

Expected output:

6. Delete Files with Prefix

Record retention has been a huge issue, so we need to ensure that records are not kept past the mandated retention dates. Remove all files from 2018 to stay compliant.

response = s3.list_objects(Bucket='gid-staging', Prefix='2018/final_')

if 'Contents' in response:
    for obj in response['Contents']:
        s3.delete_object(Bucket='gid-staging', Key=obj['Key'])

response = s3.list_objects(Bucket='gid-staging')
for obj in response['Contents']:
    print(obj['Key'])

Sample output:

2019/final_report_01_00
2019/final_report_01_01
2019/final_report_01_02
2019/final_report_01_03
2019/final_report_01_04

7. Upload Public Reports

To ensure transparency, publish the aggregate reports and make them publicly available.

s3.upload_file(
  Filename='./final_report.csv',
  Key='2019/final_report_2019_02_20.csv',
  Bucket='gid-staging',
  ExtraArgs={'ACL': 'public-read'}
)

Additionally, make all aggregated reports since the beginning of 2019 public as well.

response = s3.list_objects(Bucket='gid-staging', Prefix='2019/final_')

for obj in response['Contents']:
    s3.put_object_acl(Bucket='gid-staging', Key=obj['Key'], ACL='public-read')
    print("https://{}.s3.amazonaws.com/{}".format('gid-staging', obj['Key']))

Sample output:

https://gid-staging.s3.amazonaws.com/2019/final_report_01_00.csv
https://gid-staging.s3.amazonaws.com/2019/final_report_01_01.csv
https://gid-staging.s3.amazonaws.com/2019/final_report_01_02.csv

This lets anyone access your reports directly through their browsers.

As a special request, the client asked to prioritize and keep reports from a specific district confidential, as they want to review them before the information goes public.

To handle this, we need to share the private report securely and only for a short time.

share_url = s3.generate_presigned_url(
  ClientMethod='get_object',
  ExpiresIn=3600,
  Params={'Bucket': 'gid-staging', 'Key': 'final_report.csv'}
)
print(share_url)

Sample output:

https://gid-staging.s3.amazonaws.com/final_report.csv?AWSAccessKeyId=FakeKey&Signature=ExampleSignature&Expires=1760943615

9. Read Private Files into Pandas

Add up all requests starting from 2019 to identify the overall trend. To achieve this, you’ll need to read the daily CSV files stored in the gid-requests bucket and combine them into one dataset.

The files in gid-requests are private, meaning only you can access them through your credentials, while they remain hidden from the public.

The boto3 S3 client is already initialized and stored in the s3 variable, and the list of objects in gid-requests is available in the response variable.

df_list = []

for file in response['Contents']:
    obj = s3.get_object(Bucket='gid-requests', Key=file['Key'])
    obj_df = pd.read_csv(obj['Body'])
    df_list.append(obj_df)

df = pd.concat(df_list)
df.head()

Sample output:

        service_name  request_count
72 Hour Violation              8
 Graffiti Removal              2
Missed Collection             12
 Street Light Out             21
          Pothole             33

10. Generate HTML Tables from Pandas

The client asked to generate a table of all the services in the system. The system is dynamic and grows on a weekly basis, adding additional services. The DataFrame of available services are loaded into the services_df variable:

## HTML table with no border and selected columns
services_df.to_html('./services_no_border.html',
                    columns=['service_name', 'link'],
                    border=0)

## HTML table with border and all columns
services_df.to_html('./services_border_all_columns.html', border=1)

11. Hosts HTML Dashboards on S3

A dashboard html file was generated to be used tro schedule the client's staff accordingly.

Make sure the generated HTML File is continuously updated and serve it as a website.

s3.upload_file(Filename='dashboard.html',
               Bucket='getitdone-public',
               Key='index.html',
               ExtraArgs={'ContentType': 'text/html', 'ACL': 'public-read'})

print("http://{}.s3.amazonaws.com/{}".format('getitdone-public', 'index.html'))

Sample output:

http://getitdone-public.s3.amazonaws.com/index.html

Overview​

Pre-requisites​

1. Create Multiple AWS Clients​

2. Create S3 Buckets​

3. Clean Up Buckets​

4. Rename Buckets​

5. Upload and Retrieve Files​

6. Delete Files with Prefix​

7. Upload Public Reports​

8. Share Private Files​

9. Read Private Files into Pandas​

10. Generate HTML Tables from Pandas​

11. Hosts HTML Dashboards on S3​

Overview

Pre-requisites

1. Create Multiple AWS Clients

2. Create S3 Buckets

3. Clean Up Buckets

4. Rename Buckets

5. Upload and Retrieve Files

6. Delete Files with Prefix

7. Upload Public Reports

8. Share Private Files

9. Read Private Files into Pandas

10. Generate HTML Tables from Pandas

11. Hosts HTML Dashboards on S3