Ingesting Logs from S3

Updated Dec 30, 2022 ·

Overview

S3 buckets are cloud storage containers provided by Amazon Web Services (AWS) to store and manage data. They can be used for various purposes, including storing logs, backups, and other files.

S3 buckets allow for easy storage and retrieval of data.
Store log files from different sources, such as applications or servers.

Logs stored in S3 buckets can be imported into Logstash to be transformed to structured data. Once processed, the logs can be sent to Elasticsearch for indexing and analysis.

Lab Environment

This lab focuses on importing logs stored in a S3 Bucket using Logstash and Elasticsearch.

Node	Hostname	IP Address
Node 1	elasticsearch	192.168.56.101
Node 2	logstash	192.168.56.102

Setup details:

The nodes are created in VirtualBox using Vagrant
An SSH key is generated on the Elasticsearch node
The Logstash node can reach Elasticsearch node via port 9200

S3 Bucket detail:

S3 Bucket named "prod-logs" is created and set to public access
IAM User "siem" is created and granted Administrator permissions (for testing)
Access keys are generated under siem user

info

If you're using cloud compute instances, you can skip some of the pre-requites.

Pre-requisites

Create the AWS Resources

info

You can sign up for a free tier account in AWS.

Login to your AWS Account and create the following resources. Note that You may use a different S3 Bucket name and IAM User name.

Resource	Name	How to
S3 Bucket	`sample-logs-<add-name-here`	Using the S3 Console
IAM User	`logstash`	Create an IAM user

S3 Bucket:

S3 Buckets need to have unique name.
Set S3 Bucket access to public.
You may use a different S3 Bucket name.
Upload the sample log file to the S3 Bucket: access_log.log
After the lab, you can delete this S3 Bucket.

IAM User:

Attach the AmazonS3ReadOnlyAccess policy directly to the IAM user.
After creating the user, select the user > Security credentials > Access keys > Create access key.
Set Access key best practices & alternatives to Other > Next > Create access key.
Copy the Access key and Secret access key > Done.

info
Make sure to copy the Access key and Secret access key. You will only see the secret access key once, during creation.
After the lab, you can delete this IAM user.

Configure Logstash

Create the s3-read.conf file.

sudo vi /etc/logstash/conf.d/s3-read.conf

Use the configuration file below. Make sure to set the following:

Bucket name
Region
Access key
Secret access key
Index name (you can set any name)

input {
  s3 {
    bucket => "samplelogs-eden"
    access_key_id=> "AKIA******************"
    secret_access_key=> "hW*******************************"
    region ==> "ap-southeast-1"
  }
}

filter {
  grok {
    match => { "message" => "%{COMBINEDAPACHELOG}" }
  }
  date {
    match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
  }
  mutate {
    add_field => { "debug" => "true" }
  }      
}

output {
    stdout {
      codec => rubydebug
    }
    elasticsearch {
        hosts => ["$ELASTIC_ENDPOINT:9200"]                  ## address of elasticsearch node
        index => "s3-logs"
        user => "elastic"
        password => "enter-password-here"
        ssl => true
        cacert => "/usr/share/ca-certificates/elastic-ca.crt"   ## Shared Elasticsearch CA certificate path
    }
}

Start Logstash with the updated configuration:

/usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/s3-read.conf

Verify Data in Elasticsearch

First, store the Elasticsearch endpoint and credentials in variables:

ELASTIC_ENDPOINT="https://your-elasticsearch-endpoint"
ELASTIC_USER="your-username"
ELASTIC_PW="your-password"

Verify that the s3-logs index has been created.

curl -s -u $ELASTIC_USER:$ELASTIC_PW \
-H 'Content-Type: application/json' \
-XGET $ELASTIC_ENDPOINT:9200/_cat/indices?v

Output:

health status index           uuid                   pri rep docs.count docs.deleted store.size pri.store.size dataset.size
yellow open   demo-csv        bOUUiz2lSpWmeknhKl-H2Q   1   1          4            0     18.6kb         18.6kb       18.6kb
yellow open   movielens-sql   GhfPWKYBQgumzbDiBPONTQ   1   1       1682            0    282.8kb        282.8kb      282.8kb
yellow open   demo-json-drop  IwgvhAEEThGUYQcJX-cbuA   1   1          3            0     24.1kb         24.1kb       24.1kb
yellow open   json-split      4CcfiWDVRQWkflMZP1jFlg   1   1          5            0     16.7kb         16.7kb       16.7kb
yellow open   s3-logs         ZjOQ0u_hT2GMXfR1gz0xjQ   1   1      12875            0       10mb           10mb         10mb
yellow open   demo-json       2abPTr7ZSPSKCFOgD7ED7Q   1   1         10            0     49.1kb         49.1kb       49.1kb  

Run a sample query. This should return the apache logs in the sample log file puloaded to S3.

curl -s -u $ELASTIC_USER:$ELASTIC_PW  \
-H 'Content-Type: application/json' \
-XGET "$ELASTIC_ENDPOINT:9200/s3-logs/_search?pretty=true" | jq 

Overview​

Lab Environment​

Pre-requisites​

Create the AWS Resources​

Configure Logstash​

Verify Data in Elasticsearch​