Using Scripts
Updated Dec 30, 2022 ·
Overview
Importing data into Elasticsearch is efficient with scripts, automating bulk operations and ensuring consistency.
- Use Python with
elasticsearch-py
for programmatic imports and transformations. - Use shell scripts with the
_bulk
API for fast, structured data uploads.
Other methods:
- Logstash and Beats stream data from logs, S3, and databases.
- AWS services (Lambda, Kinesis Firehose) enable seamless streaming.
- Integration add-ons for Kafka, Spark, and more.
Importing Data
info
You need a running Elasticsearch cluster to test the examples below. You will also need to install the following:
The script below reads a CSV file of movies, formats the data for Elasticsearch bulk indexing, and prints the output to the console.
movies-to-json.py
import csv
import re
csvfile = open('ml-latest-small/movies.csv', 'r')
reader = csv.DictReader(csvfile)
for movie in reader:
print(f'{{ "create" : {{ "_index": "movies", "_id" : "{movie["movieId"]}" }} }}')
# Extract title and year
title = re.sub(r" \(.*\)$", "", movie['title'].replace('"', ''))
year = movie['title'][-5:-1]
if not year.isdigit():
year = "2016" # Default year if no valid year is found
# Extract genres
genres = movie['genres'].split('|')
genres_json = ', '.join(f'"{genre}"' for genre in genres)
print(f'{{ "id": "{movie["movieId"]}", "title": "{title}", "year": {year}, "genre": [{genres_json}] }}')
Steps:
-
Download the dataset.
-
Unzip the package.
unzip ml-latest-small.zip
-
Run the Python script and forward it to a file.
python3 movies-to-json.py > movies-2.json
-
Import the new dataset into Elasticsearch.
infoStore the Elasticsearch endpoint and credentials in variables:
ELASTIC_ENDPOINT="https://your-elasticsearch-endpoint"
ELASTIC_USER="your-username"
ELASTIC_PW="your-password"curl -s -u $ELASTIC_USER:$ELASTIC_PW \
-H 'Content-Type: application/json' \
-XPUT $ELASTIC_ENDPOINT:9200/_bulk?pretty \
--data-binary @movies-2.json | jq -
Now try to query for a movie title.
curl -s -u $ELASTIC_USER:$ELASTIC_PW \
-H 'Content-Type: application/json' \
-XGET $ELASTIC_ENDPOINT:9200/movies/_search?q=shrek | jqOutput:
{
"took": 10,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 5,
"relation": "eq"
},
"max_score": 10.226742,
"hits": [
{
"_index": "movies",
"_id": "4306",
"_score": 10.226742,
"_source": {
"id": "4306",
"title": "Shrek",
"year": 2001,
"genre": [
"Adventure",
"Animation",
"Children",
"Comedy",
"Fantasy",
"Romance"
]
}
},
{
"_index": "movies",
"_id": "8360",
"_score": 8.57909,
"_source": {
"id": "8360",
"title": "Shrek 2",
"year": 2004,
"genre": [
"Adventure",
"Animation",
"Children",
"Comedy",
"Musical",
"Romance"
]
}
},
{
"_index": "movies",
"_id": "53121",
"_score": 7.3886833,
"_source": {
"id": "53121",
"title": "Shrek the Third",
"year": 2007,
"genre": [
"Adventure",
"Animation",
"Children",
"Comedy",
"Fantasy"
]
}
},
{
"_index": "movies",
"_id": "64249",
"_score": 7.3886833,
"_source": {
"id": "64249",
"title": "Shrek the Halls",
"year": 2007,
"genre": [
"Adventure",
"Animation",
"Comedy",
"Fantasy"
]
}
},
{
"_index": "movies",
"_id": "78637",
"_score": 7.3886833,
"_source": {
"id": "78637",
"title": "Shrek Forever After",
"year": 2010,
"genre": [
"Adventure",
"Animation",
"Children",
"Comedy",
"Fantasy",
"IMAX"
]
}
}
]
}
}