Sample Pipeline
Overview
This lab focuses on using a sample Apache web server log file with Logstash and Elasticsearch.
- Configure Logstash to ingest log data from the sample file
- Verify successful indexing in Elasticsearch
- Query the indexed data.
Lab Environment
| Node | Hostname | IP Address |
|---|---|---|
| Node 1 | elasticsearch | 192.168.56.101 |
| Node 2 | logstash | 192.168.56.102 |
Setup details:
-
The nodes are created in VirtualBox using Vagrant.
-
An SSH key is generated on the Elasticsearch node
-
The Logstash node can reach Elasticsearch node via port 9200
Pre-requisites
- Create the nodes in VirtualBox
- Install Elasticsearch on node 1
- Install Logstash on node 2
- Configure SSL on Elasticsearch
- Share Elasticsearch CA cert to Logstash
- Install jq on Elasticsearch node
Steps
Login to the Logstash node, switch to root user, and perform the following:
-
Download the sample Apache web server log file below: access_log.log
You can configure a fileshare in the VM's settings, map it to a local folder in your computer, and place the access log in that folder. Then, confirm the VM can access the fileshare and copy the log to
/tmpwithin the VM.For more information, please see Setup Fileshare
-
Change the permissions of the sample log file.
ls -la /tmp/access_log.logchmod 644 /tmp/access_log.log -
Confirm that Logstash can communicate with Elasticsearch on port 9200.
$ telnet 192.168.56.101 9200Trying 192.168.56.101...Connected to 192.168.56.101.Escape character is '^]'. -
Ensure that Elasticsearch CA cert is shared to Logstash.
Set the permissions and ownership.
sudo chown root:logstash /usr/share/ca-certificates/elastic-ca.crtsudo chmod 640 /usr/share/ca-certificates/elastic-ca.crtinfoStore the Elasticsearch endpoint and credentials in variables:
ELASTIC_ENDPOINT="https://your-elasticsearch-endpoint"ELASTIC_USER="your-username"ELASTIC_PW="your-password"Manually verify the certificate works using curl:
curl --cacert /usr/share/ca-certificates/elastic-ca.crt -u $ELASTIC_USER:$ELASTIC_PW $ELASTIC_ENDPOINT:9200Output:
{"name" : "node-1","cluster_name" : "elasticsearch","cluster_uuid" : "Lmfoq9mbRBqis3GvrLVTZw","version" : {"number" : "8.17.0","build_flavor" : "default","build_type" : "deb","build_hash" : "2b6a7fed44faa321997703718f07ee0420804b41","build_date" : "2024-12-11T12:08:05.663969764Z","build_snapshot" : false,"lucene_version" : "9.12.0","minimum_wire_compatibility_version" : "7.17.0","minimum_index_compatibility_version" : "7.0.0"},"tagline" : "You Know, for Search"} -
Configure Logstash.
sudo vi /etc/logstash/conf.d/logstash.confUse the following configuration to process the sample access log file located on the Logstash node. This configuration will:
- Set the index name to
sample-access-log - Read the file from the start
- Apply a Grok filter to parse the log entries
- Send the processed data to Elasticsearch
- Output the results to the standard output
Make sure to set the password.
input {file {path => "/tmp/access_log.log"start_position => "beginning"}}filter {grok {match => { "message" => "%{COMBINEDAPACHELOG}" }}date {match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]}mutate {add_field => { "debug" => "true" }}}output {elasticsearch {hosts => ["$ELASTIC_ENDPOINT:9200"]index => "sample-access-log"user => elasticpassword => enter-password-heressl => truecacert => "/usr/share/ca-certificates/elastic-ca.crt"}stdout {codec => rubydebug}} - Set the index name to
-
Run Logstash with the updated configuration
/usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/logstash.conf -
If successful, Logstash should connect to Elasticsearch without errors.
[[main]-pipeline-manager] elasticsearch - Restored connection to ES instance {:url=>"https://elastic:xxxxxx@192.168.56.101:9200/"}[[main]-pipeline-manager] elasticsearch - Elasticsearch version determined (8.17.0) {:es_version=>8}[[main]-pipeline-manager] elasticsearch - Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>8}
Login to Elasticsearch node and switch to root:
-
Login to Elasticsearch and switch to root. Check if data has been indexed by Logstash
curl -u $ELASTIC_USER:$ELASTIC_PW --insecure \-X GET "$ELASTIC_ENDPOINT:9200/_cat/indices?v"Output:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size dataset.sizeyellow open sample-access-log 6KvdkUlvT3mdPP0JjyudLw 1 1 31250 0 14.9mb 14.9mb 14.9mb -
Check the
sample-access-logand confirm that it contains the sample Apache web server log data:curl -s -u $ELASTIC_USER:$ELASTIC_PW \-H 'Content-Type: application/json' \-XGET "$ELASTIC_ENDPOINT:9200/sample-access-log/_search?pretty=true" | jqIf the indexing was successful, the output should show something like this:
{"took": 217,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 10000,"relation": "gte"},"max_score": 1,"hits": [{"_index": "sample-access-log","_id": "K5xY9JMBNQdWCWQ3sSRr","_score": 1,"_ignored": ["message.keyword"],"_source": {"clientip": "174.0.59.42","referrer": "\"http://sundog-soft.com/features/real-time-3d-clouds/?gclid=CKiV8suV0NMCFUqewAodLWgE5A\"","auth": "-","timestamp": "02/May/2017:02:32:25 +0000","verb": "GET","debug": "true","agent": "\"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.81 Safari/537.36\"","@version": "1","host": "logstash","@timestamp": "2017-05-02T02:32:25.000Z","request": "/wp-content/plugins/js_composer/assets/js/dist/js_composer_front.min.js?ver=5.1.2","httpversion": "1.1","path": "/tmp/access_log.log","response": "200","message": "174.0.59.42 - - [02/May/2017:02:32:25 +0000] \"GET /wp-content/plugins/js_composer/assets/js/dist/js_composer_front.min.js?ver=5.1.2 HTTP/1.1\" 200 6450 \"http://sundog-soft.com/features/real-time-3d-clouds/?gclid=CKiV8suV0NMCFUqewAodLWgE5A\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.81 Safari/537.36\"","bytes": "6450","ident": "-"}},.......