Ingesting Logs from S3
Overview
S3 buckets are cloud storage containers provided by Amazon Web Services (AWS) to store and manage data. They can be used for various purposes, including storing logs, backups, and other files.
- S3 buckets allow for easy storage and retrieval of data.
- Store log files from different sources, such as applications or servers.
Logs stored in S3 buckets can be imported into Logstash to be transformed to structured data. Once processed, the logs can be sent to Elasticsearch for indexing and analysis.
Lab Environment
This lab focuses on importing logs stored in a S3 Bucket using Logstash and Elasticsearch.
Node | Hostname | IP Address |
---|---|---|
Node 1 | elasticsearch | 192.168.56.101 |
Node 2 | logstash | 192.168.56.102 |
Setup details:
-
The nodes are created in VirtualBox using Vagrant
-
An SSH key is generated on the Elasticsearch node
-
The Logstash node can reach Elasticsearch node via port 9200
S3 Bucket detail:
- S3 Bucket named "prod-logs" is created and set to public access
- IAM User "siem" is created and granted
Administrator
permissions (for testing) - Access keys are generated under
siem
user
If you're using cloud compute instances, you can skip some of the pre-requites.
Pre-requisites
- Create the nodes in VirtualBox
- Install Elasticsearch on node 1
- Install Logstash on node 2
- Configure SSL on Elasticsearch
- Share Elasticsearch CA cert to Logstash
- Install jq on Elasticsearch node
- AWS Account
Create the AWS Resources
You can sign up for a free tier account in AWS.
Login to your AWS Account and create the following resources. Note that You may use a different S3 Bucket name and IAM User name.
Resource | Name | How to |
---|---|---|
S3 Bucket | sample-logs-<add-name-here | Using the S3 Console |
IAM User | logstash | Create an IAM user |
S3 Bucket:
- S3 Buckets need to have unique name.
- Set S3 Bucket access to public.
- You may use a different S3 Bucket name.
- Upload the sample log file to the S3 Bucket: access_log.log
- After the lab, you can delete this S3 Bucket.
IAM User:
-
Attach the
AmazonS3ReadOnlyAccess
policy directly to the IAM user. -
After creating the user, select the user > Security credentials > Access keys > Create access key.
-
Set
Access key best practices & alternatives
to Other > Next > Create access key. -
Copy the Access key and Secret access key > Done.
infoMake sure to copy the Access key and Secret access key. You will only see the secret access key once, during creation.
-
After the lab, you can delete this IAM user.
Configure Logstash
Login to the Logstash node, switch to root user, and perform the following:
-
Create the
s3-read.conf
file.sudo vi /etc/logstash/conf.d/s3-read.conf
Use the configuration file below. Make sure to set the following:
- Bucket name
- Region
- Access key
- Secret access key
- Index name (you can set any name)
input {
s3 {
bucket => "samplelogs-eden"
access_key_id=> "AKIA******************"
secret_access_key=> "hW*******************************"
region ==> "ap-southeast-1"
}
}
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
date {
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
}
mutate {
add_field => { "debug" => "true" }
}
}
output {
stdout {
codec => rubydebug
}
elasticsearch {
hosts => ["$ELASTIC_ENDPOINT:9200"] ## address of elasticsearch node
index => "s3-logs"
user => "elastic"
password => "enter-password-here"
ssl => true
cacert => "/usr/share/ca-certificates/elastic-ca.crt" ## Shared Elasticsearch CA certificate path
}
} -
Start Logstash with the updated configuration:
/usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/s3-read.conf
Verify Data in Elasticsearch
Login to the Elasticsearch node and switch to root user:
-
First, store the Elasticsearch endpoint and credentials in variables:
ELASTIC_ENDPOINT="https://your-elasticsearch-endpoint"
ELASTIC_USER="your-username"
ELASTIC_PW="your-password" -
Verify that the
s3-logs
index has been created.curl -s -u $ELASTIC_USER:$ELASTIC_PW \
-H 'Content-Type: application/json' \
-XGET $ELASTIC_ENDPOINT:9200/_cat/indices?vOutput:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size dataset.size
yellow open demo-csv bOUUiz2lSpWmeknhKl-H2Q 1 1 4 0 18.6kb 18.6kb 18.6kb
yellow open movielens-sql GhfPWKYBQgumzbDiBPONTQ 1 1 1682 0 282.8kb 282.8kb 282.8kb
yellow open demo-json-drop IwgvhAEEThGUYQcJX-cbuA 1 1 3 0 24.1kb 24.1kb 24.1kb
yellow open json-split 4CcfiWDVRQWkflMZP1jFlg 1 1 5 0 16.7kb 16.7kb 16.7kb
yellow open s3-logs ZjOQ0u_hT2GMXfR1gz0xjQ 1 1 12875 0 10mb 10mb 10mb
yellow open demo-json 2abPTr7ZSPSKCFOgD7ED7Q 1 1 10 0 49.1kb 49.1kb 49.1kb -
Run a sample query. This should return the apache logs in the sample log file puloaded to S3.
curl -s -u $ELASTIC_USER:$ELASTIC_PW \
-H 'Content-Type: application/json' \
-XGET "$ELASTIC_ENDPOINT:9200/s3-logs/_search?pretty=true" | jq