Using Scripts
Updated Dec 30, 2022 ·
Overview
Importing data into Elasticsearch is efficient with scripts, automating bulk operations and ensuring consistency.
- Use Python with
elasticsearch-pyfor programmatic imports and transformations. - Use shell scripts with the
_bulkAPI for fast, structured data uploads.
Other methods:
- Logstash and Beats stream data from logs, S3, and databases.
- AWS services (Lambda, Kinesis Firehose) enable seamless streaming.
- Integration add-ons for Kafka, Spark, and more.
Importing Data
info
You need a running Elasticsearch cluster to test the examples below. You will also need to install the following:
The script below reads a CSV file of movies, formats the data for Elasticsearch bulk indexing, and prints the output to the console.
movies-to-json.py
import csv
import re
csvfile = open('ml-latest-small/movies.csv', 'r')
reader = csv.DictReader(csvfile)
for movie in reader:
print(f'{{ "create" : {{ "_index": "movies", "_id" : "{movie["movieId"]}" }} }}')
# Extract title and year
title = re.sub(r" \(.*\)$", "", movie['title'].replace('"', ''))
year = movie['title'][-5:-1]
if not year.isdigit():
year = "2016" # Default year if no valid year is found
# Extract genres
genres = movie['genres'].split('|')
genres_json = ', '.join(f'"{genre}"' for genre in genres)
print(f'{{ "id": "{movie["movieId"]}", "title": "{title}", "year": {year}, "genre": [{genres_json}] }}')
Steps:
-
Download the dataset.
-
Unzip the package.
unzip ml-latest-small.zip -
Run the Python script and forward it to a file.
python3 movies-to-json.py > movies-2.json -
Import the new dataset into Elasticsearch.
infoStore the Elasticsearch endpoint and credentials in variables:
ELASTIC_ENDPOINT="https://your-elasticsearch-endpoint"ELASTIC_USER="your-username"ELASTIC_PW="your-password"curl -s -u $ELASTIC_USER:$ELASTIC_PW \-H 'Content-Type: application/json' \-XPUT $ELASTIC_ENDPOINT:9200/_bulk?pretty \--data-binary @movies-2.json | jq -
Now try to query for a movie title.
curl -s -u $ELASTIC_USER:$ELASTIC_PW \-H 'Content-Type: application/json' \-XGET $ELASTIC_ENDPOINT:9200/movies/_search?q=shrek | jqOutput:
{"took": 10,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 5,"relation": "eq"},"max_score": 10.226742,"hits": [{"_index": "movies","_id": "4306","_score": 10.226742,"_source": {"id": "4306","title": "Shrek","year": 2001,"genre": ["Adventure","Animation","Children","Comedy","Fantasy","Romance"]}},{"_index": "movies","_id": "8360","_score": 8.57909,"_source": {"id": "8360","title": "Shrek 2","year": 2004,"genre": ["Adventure","Animation","Children","Comedy","Musical","Romance"]}},{"_index": "movies","_id": "53121","_score": 7.3886833,"_source": {"id": "53121","title": "Shrek the Third","year": 2007,"genre": ["Adventure","Animation","Children","Comedy","Fantasy"]}},{"_index": "movies","_id": "64249","_score": 7.3886833,"_source": {"id": "64249","title": "Shrek the Halls","year": 2007,"genre": ["Adventure","Animation","Comedy","Fantasy"]}},{"_index": "movies","_id": "78637","_score": 7.3886833,"_source": {"id": "78637","title": "Shrek Forever After","year": 2010,"genre": ["Adventure","Animation","Children","Comedy","Fantasy","IMAX"]}}]}}