Monthly Reports

Updated Jul 29, 2020 ·

Requirement

Each month, you are required to generate a report showing the number of requests made through the GetItDone app, categorized by case type.

The client requests monthly reports for analysis
The report must include request counts by category
Council members should view and access charts on a web page

This setup helps automate report delivery and allows the council to easily visualize the data.

Pre-requisites

Required:

Note on the IAM Policy: The IAM policy attached to your IAM must have the following permissions:

AmazonS3FullAccess
AmazonSNSFullAccess
AmazonRekognitionFullAccess
TranslateFullAccess
ComprehendFullAccess

Steps: Building the Report

To create the full report and publish it online, follow these steps.

Download raw CSV files from S3
Merge them into one complete dataset
Create an aggregated DataFrame for analysis
Write the output to CSV and HTML
Generate visual charts using Bokeh
Upload all files to S3 for online viewing

1. Collect Raw Data

The raw data comes from the gid-requests bucket. Each file represents daily GetItDone requests.

Retrieve all CSVs for the target month
Filter files using the Prefix argument in list_objects
Access files through S3 directly, not via URL, since they are private

To start with, upload new reports for February and update the file listing.

import boto3
s3 = boto3.client('s3')

df_list = []

for file in response['Contents']:
    s3_day_reqs = s3.get_object(Bucket='gid-requests', Key=file['Key'])
    day_reqs = pd.read_csv(s3_day_reqs['Body'])
    df_list.append(day_reqs)

This loads all daily files into memory for aggregation.

2. Combine and Aggregate Data

Now merge the individual datasets into one.

all_reqs = pd.concat(df_list)
all_reqs.head()

Output:

   service_request_id  service_request_parent_id  sap_notification_number   requested_datetime  case_age_days  ...         comm_plan_name park_name          case_origin referred_department  \
           2553572                        NaN                      NaN  2019-04-03T08:58:00            0.0  ...            Kearny Mesa       NaN               Mobile                 NaN   
           2553573                        NaN                      NaN  2019-04-03T08:58:00            0.0  ...  Mid-City:City Heights       NaN  Crew/Self Generated                 NaN   
           2553570                        NaN                      NaN  2019-04-03T08:55:00            0.0  ...          Mission Beach       NaN                Phone                 NaN   
           2553568                  2.538e+06                      NaN  2019-04-03T08:54:00            0.0  ...            Tierrasanta       NaN                  Web                 NaN   
           2553565                        NaN                      NaN  2019-04-03T08:53:00            0.0  ...               Downtown       NaN               Mobile                 NaN   

                                  public_description  
                         72hr plus tire violation  
          Graffiti Reported at 3927 El Cajon Blvd  
                                     Missed trash  
ALL 4 STREET LIGHTS ON THE SIGNAL STANDARDS AR...  
                                              NaN  

[5 rows x 18 columns]

For this sample, we need to aggregate the February reports.

# Write to a CSV and HTML file with no border
agg_df.to_csv('./feb_final_report.csv', index=False)
agg_df.to_html('./feb_final_report.html', border=0)

3. Create and Upload Reports Bucket

As an additional step, we need to create a bucket to store and share the final reports.

s3.create_bucket(Bucket='gid-reports')

s3.upload_file(
    Filename='./feb_final_report.csv', 
    Key='2019/feb/final_report.html', 
    Bucket='gid-reports',
    ExtraArgs={'ACL': 'public-read'}
)

s3.upload_file(
    Filename='./feb_final_report.html', 
    Key='2019/feb/final_report.html', 
    Bucket='gid-reports',
    ExtraArgs={
        'ContentType': 'text/html',
        'ACL': 'public-read'
    }
)

4. Create an Index Page

The client requested that all reports be viewable through the directory listing

objects_list = s3.list_objects(Bucket='gid-reports', Prefix='2019/')
objects_df = pd.DataFrame(objects_list['Contents'])
base_url = "http://gid-reports.s3.amazonaws.com/"
objects_df['Link'] = base_url + objects_df['Key']

# Preview the resulting DataFrame
objects_df.head()

Output:

2019/feb/final_chart.html  2025-10-20 21:18:26+00:00  "ed175e887c447df1ec5932ac7fa67cc2"  7697  STANDARD      
2019/feb/final_report.csv  2025-10-20 21:18:26+00:00  "1519c57b81bb9257756c04dee24dd728"  209   STANDARD      
2019/feb/final_report.html 2025-10-20 21:18:26+00:00  "03baa6b325d75dff02ef83af39a8205f"  536   STANDARD      
2019/jan/final_chart.html  2025-10-20 21:18:26+00:00  "ed175e887c447df1ec5932ac7fa67cc2"  7697  STANDARD      
2019/jan/final_report.csv  2025-10-20 21:18:26+00:00  "1519c57b81bb9257756c04dee24dd728"  209   STANDARD      

                                                                                                  Owner                                                            Link  
{'DisplayName': 'webfile', 'ID': '75aa57f09aa0c8caeab4f8c24e99d10f8e7faeebf76c078efc7c6caea54ba06a'}  http://gid-reports.s3.amazonaws.com/2019/feb/final_chart.html   
{'DisplayName': 'webfile', 'ID': '75aa57f09aa0c8caeab4f8c24e99d10f8e7faeebf76c078efc7c6caea54ba06a'}  http://gid-reports.s3.amazonaws.com/2019/feb/final_report.csv   
{'DisplayName': 'webfile', 'ID': '75aa57f09aa0c8caeab4f8c24e99d10f8e7faeebf76c078efc7c6caea54ba06a'}  http://gid-reports.s3.amazonaws.com/2019/feb/final_report.html  
{'DisplayName': 'webfile', 'ID': '75aa57f09aa0c8caeab4f8c24e99d10f8e7faeebf76c078efc7c6caea54ba06a'}  http://gid-reports.s3.amazonaws.com/2019/jan/final_chart.html   
{'DisplayName': 'webfile', 'ID': '75aa57f09aa0c8caeab4f8c24e99d10f8e7faeebf76c078efc7c6caea54ba06a'}  http://gid-reports.s3.amazonaws.com/2019/jan/final_report.csv 

5. Upload the New Index

Create the HTML index page that links directly to each report and chart.

Next, upload the index file so everyone can view it. Uploading the new file will overwrite the index.html.

objects_df.to_html('report_listing.html',
    render_links=True,
    columns=['Link','LastModified','Size'])

s3.upload_file(
  Filename='./report_listing.html', Key='index.html', 
  Bucket='gid-reports',
  ExtraArgs = {
    'ContentType': 'text/html', 
    'ACL': 'public-read'
  })

You now have a live report site where the client and council members can view and interact with all monthly results. By automating these steps, you can keep reports up-to-date, improve transparency, and make GetItDone’s performance easy to monitor.

Requirement​

Pre-requisites​

Steps: Building the Report​

1. Collect Raw Data​

2. Combine and Aggregate Data​

3. Create and Upload Reports Bucket​

4. Create an Index Page​

5. Upload the New Index​