Importing JSON Data
Overview
JSON (JavaScript Object Notation) is a simple data format used for storing and exchanging information. It is easy to read, write, and understand.
- Structured in key/value pairs and ordered lists.
- Keys and values are separated by a colon (:).
- Double quotes enclose keys.
- Key/value pairs are separated by a comma.
- The file extension is .json.
Example JSON:
{
  "name": "John",
  "age": 30,
  "city": "New York"
}
Logstash has a built-in JSON filter plugin that parses JSON data and converts it into a structured Logstash event. Without this plugin, JSON data will be ingested into Elasticsearch as a single line event.
Lab Environment
This lab focuses on importing JSON data using Logstash and Elasticsearch.
| Node | Hostname | IP Address | 
|---|---|---|
| Node 1 | elasticsearch | 192.168.56.101 | 
| Node 2 | logstash | 192.168.56.102 | 
Setup details:
- 
The nodes are created in VirtualBox using Vagrant. 
- 
An SSH key is generated on the Elasticsearch node 
- 
The Logstash node can reach Elasticsearch node via port 9200 
Pre-requisites
- Create the nodes in VirtualBox
- Install Elasticsearch on node 1
- Install Logstash on node 2
- Configure SSL on Elasticsearch
- Share Elasticsearch CA cert to Logstash
- Install jq on Elasticsearch node
Importing the Logs
On a computer with internet access:
- 
Download the sample datasets: sample-json.log 
- 
Transfer the files to your virtual machine. You can configure a fileshare in the VM's settings, map it to a local folder in your computer, and place the access log in that folder. Then, confirm the VM can access the fileshare and copy the log to /tmpwithin the VM.For more information, please see Setup Fileshare 
Configure Logstash
Login to the Logstash node, switch to root user, and perform the following:
- 
Create the json-read.conffile.sudo vi /etc/logstash/conf.d/json-read.confUse the configuration file below: input {
 file {
 path => "/mnt/fileshare/datasets/sample-json.log" ## sample json file
 start_position => "beginning"
 sincedb_path => "/dev/null"
 }
 }
 filter {
 json {
 source => "message"
 }
 }
 output {
 stdout { codec => json_lines }
 elasticsearch {
 hosts => ["$ELASTIC_ENDPOINT:9200"] ## address of elasticsearch node
 index => "demo-json"
 user => "elastic"
 password => "enter-password-here"
 ssl => true
 cacert => "/usr/share/ca-certificates/elastic-ca.crt" ## Shared Elasticsearch CA certificate path
 }
 }
- 
Start Logstash with the updated configuration: /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/json-read.conf
Verify Data in Elasticsearch
Login to the Elasticsearch node and switch to root user:
- 
First, store the Elasticsearch endpoint and credentials in variables: ELASTIC_ENDPOINT="https://your-elasticsearch-endpoint"
 ELASTIC_USER="your-username"
 ELASTIC_PW="your-password"
- 
Verify that the demo-jsonindex has been created.curl -s -u $ELASTIC_USER:$ELASTIC_PW \
 -H 'Content-Type: application/json' \
 -XGET $ELASTIC_ENDPOINT:9200/_cat/indices?vOutput: health status index uuid pri rep docs.count docs.deleted store.size pri.store.size dataset.size
 yellow open demo-csv bOUUiz2lSpWmeknhKl-H2Q 1 1 4 0 18.6kb 18.6kb 18.6kb
 yellow open movielens-sql GhfPWKYBQgumzbDiBPONTQ 1 1 1682 0 282.8kb 282.8kb 282.8kb
 yellow open demo-json 2abPTr7ZSPSKCFOgD7ED7Q 1 1 5 0 21.2kb 21.2kb 21.2kb
 yellow open demo-csv-mutate rOh8AoJVTKqDpq0wrYxB6A 1 1 4 0 24.7kb 24.7kb 24.
Drop Parameters based on Conditions
We can also choose to print the data based on some conditions.
- 
Create the json-read-drop.conffile.sudo vi /etc/logstash/conf.d/json-read-drop.confUse the configuration file below: input {
 file {
 path => "/mnt/fileshare/datasets/sample-json.log" ## sample csv file
 start_position => "beginning"
 sincedb_path => "/dev/null"
 }
 }
 filter {
 json {
 source => "message"
 }
 if [paymentType] == "Mastercard" {
 drop {}
 }
 mutate {
 remove_field => ["message","@timestamp","path","host","@version"]
 }
 }
 output {
 stdout { codec => json_lines }
 elasticsearch {
 hosts => ["$ELASTIC_ENDPOINT:9200"] ## address of elasticsearch node
 index => "demo-json-drop"
 user => "elastic"
 password => "enter-password-here"
 ssl => true
 cacert => "/usr/share/ca-certificates/elastic-ca.crt" ## Shared Elasticsearch CA certificate path
 }
 }This configuration filters and drops data based on specific conditions. - Parses data from the messagefield.
- Drops events where paymentTypeis "Mastercard".
- Removes unnecessary fields like message,@timestamp, etc.
 
- Parses data from the 
- 
Start Logstash with the updated configuration: /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/json-read-drop.conf
- 
Check if index is created. curl -s -u $ELASTIC_USER:$ELASTIC_PW \
 -H 'Content-Type: application/json' \
 -XGET $ELASTIC_ENDPOINT:9200/_cat/indices?vOutput: health status index uuid pri rep docs.count docs.deleted store.size pri.store.size dataset.size
 yellow open demo-csv bOUUiz2lSpWmeknhKl-H2Q 1 1 4 0 18.6kb 18.6kb 18.6kb
 yellow open movielens-sql GhfPWKYBQgumzbDiBPONTQ 1 1 1682 0 282.8kb 282.8kb 282.8kb
 yellow open demo-json-drop IwgvhAEEThGUYQcJX-cbuA 1 1 3 0 23.9kb 23.9kb 23.9kb
 yellow open demo-json 2abPTr7ZSPSKCFOgD7ED7Q 1 1 5 0 21.3kb 21.3kb 21.3kb
 yellow open demo-csv-mutate rOh8AoJVTKqDpq0wrYxB6A 1 1 4 0 24.7kb 24.7kb 24.7kb
- 
Check the data imported to the index. None of the details will have a paymentTypeof Mastercard.curl -s -u $ELASTIC_USER:$ELASTIC_PW \
 -H 'Content-Type: application/json' \
 -XGET "$ELASTIC_ENDPOINT:9200/demo-json-drop/_search?pretty=true" | jq
Using the Split Filter
The split filter in Logstash is useful when your data includes arrays (multiple items inside a single field). This filter takes each element from the array and creates a new event for each one. This way, Logstash can process each item individually.
- 
Create the json-split.conffile.sudo vi /etc/logstash/conf.d/json-split.confUse the configuration file below: input {
 file {
 path => "/mnt/fileshare/datasets/sample-json.log" ## sample json file
 start_position => "beginning"
 sincedb_path => "/dev/null"
 }
 }
 filter {
 json {
 source => "message"
 }
 split {
 field => "[pastEvents]"
 }
 mutate {
 remove_field => ["message", "@timestamp", "path", "host", "@version"]
 }
 }
 output {
 stdout { }
 elasticsearch {
 hosts => ["$ELASTIC_ENDPOINT:9200"] ## address of elasticsearch node
 index => "json-split"
 user => "elastic"
 #password => "enter-password-here"
 password => "enter-password-here"
 ssl => true
 cacert => "/usr/share/ca-certificates/elastic-ca.crt" ## Shared Elasticsearch CA certificate path
 }
 }This configuration is useful when dealing with arrays; each array is treated as a separate event for further processing. - Splits each value of the pastEventsarray into separate events.
- Removes unnecessary fields such as message,@timestamp, and others.
 
- Splits each value of the 
- 
Start Logstash with the updated configuration: /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/json-split.conf
- 
Check if the json-splitindex is created.curl -s -u $ELASTIC_USER:$ELASTIC_PW \
 -H 'Content-Type: application/json' \
 -XGET $ELASTIC_ENDPOINT:9200/_cat/indices?vOutput: health status index uuid pri rep docs.count docs.deleted store.size pri.store.size dataset.size
 yellow open demo-csv bOUUiz2lSpWmeknhKl-H2Q 1 1 4 0 18.6kb 18.6kb 18.6kb
 yellow open movielens-sql GhfPWKYBQgumzbDiBPONTQ 1 1 1682 0 282.8kb 282.8kb 282.8kb
 yellow open demo-json-drop IwgvhAEEThGUYQcJX-cbuA 1 1 3 0 24.1kb 24.1kb 24.1kb
 yellow open json-split 4CcfiWDVRQWkflMZP1jFlg 1 1 5 0 16.6kb 16.6kb 16.6kb
 yellow open demo-json 2abPTr7ZSPSKCFOgD7ED7Q 1 1 10 0 49kb 49kb 49kb
 yellow open demo-csv-mutate rOh8AoJVTKqDpq0wrYxB6A 1 1 4 0 24.7kb 24.7kb 24.7kb
- 
Run the query below. curl -s -u $ELASTIC_USER:$ELASTIC_PW \
 -H 'Content-Type: application/json' \
 -XGET "$ELASTIC_ENDPOINT:9200/json-split/_search?pretty=true" | jqThe documents are now split into two; a new document is created for each pass event.