Using Client Libraries
Using Client APIs
Elasticsearch provides client libraries for most programming languages. This makes it easier to interact with Elasticsearch without dealing directly with raw JSON.
- Java has an official client maintained by Elastic.
- Python offers the
elasticsearch
package. - Ruby supports
elasticsearch-ruby
. - Scala has multiple client options.
- Perl uses the
elasticsearch.pm
module.
Python Client Library
-
Run an update and install required packages.
sudo apt update
sudo apt install -y python3-pip
sudo pip3 uninstall urllib3 chardet
sudo pip3 install --upgrade requests
sudo pip3 install "urllib3==1.26.6"
sudo pip3 install "chardet==3.0.4"
sudo pip3 install "charset-normalizer==2.0.4" -
Install the elasticsearch library using pip.
sudo pip3 install elasticsearch
-
Download the dataset and unzip from the previous section (if you haven't done so).
Unzip the package.
unzip ml-latest-small.zip
-
Below is a sample Python code that uses the Elasticsearch client library to import movie ratings into Elasticsearch.
infoNotes:
- SSL is enabled on the Elasticsearch cluster in this example.
- Authentication is done using a username and password, without certificates.
- The default cluster URL is
$ELASTIC_ENDPOINT:9200
.
import csv
import elasticsearch
from elasticsearch import helpers
import urllib3
import sys
import warnings
import getpass
from collections import deque
# Suppress specific warnings
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
warnings.filterwarnings("ignore", category=UserWarning, module='requests')
warnings.filterwarnings("ignore", category=Warning, module='elasticsearch')
# Read movie titles
def readMovies(movies_path):
titleLookup = {}
with open(movies_path, 'r') as csvfile:
reader = csv.DictReader(csvfile)
for movie in reader:
titleLookup[movie['movieId']] = movie['title']
return titleLookup
# Read ratings
def readRatings(movies_path, ratings_path):
titleLookup = readMovies(movies_path)
with open(ratings_path, 'r') as csvfile:
reader = csv.DictReader(csvfile)
for line in reader:
yield {
'user_id': int(line['userId']),
'movie_id': int(line['movieId']),
'title': titleLookup[line['movieId']],
'rating': float(line['rating']),
'timestamp': int(line['timestamp']),
}
def main():
if len(sys.argv) < 3:
print("Usage: python3 indexratings.py <path_to_movies.csv> <path_to_ratings.csv>")
sys.exit(1)
movies_path = sys.argv[1]
ratings_path = sys.argv[2]
es_host = "$ELASTIC_ENDPOINT:9200"
es_username = input("Enter Elasticsearch username: ")
es_password = getpass.getpass("Enter Elasticsearch password: ")
es = elasticsearch.Elasticsearch(
[es_host],
basic_auth=(es_username, es_password),
verify_certs=False # Disable SSL certificate verification
)
es.indices.delete(index="ratings", ignore=404)
deque(helpers.parallel_bulk(es, readRatings(movies_path, ratings_path), index="ratings"), maxlen=0)
es.indices.refresh(index="ratings")
if __name__ == "__main__":
main() -
To run the code, pass the
movies.csv
and theratings.csv
.python3 indexratings.py ml-latest-small/movies.csv ml-latest-small/ratings.csv
infoYou can put the script and the files in the same directory and run:
python3 indexratings.py movies.csv ratings.csv
-
Provide username and password when prompted.
Enter Elasticsearch username: elastic
Enter Elasticsearch password:infoThe Elasticsearch host is set to
$ELASTIC_ENDPOINT:9200
.
If your cluster uses a different address, you may need to update the script accordingly. -
Run a query to verify if the index has been created and populated.
infoStore the Elasticsearch endpoint and credentials in variables:
ELASTIC_ENDPOINT="https://your-elasticsearch-endpoint"
ELASTIC_USER="your-username"
ELASTIC_PW="your-password"curl -s -u $ELASTIC_USER:$ELASTIC_PW \
-H 'Content-Type: application/json' \
-XGET $ELASTIC_ENDPOINT:9200/ratings/_search?pretty | jqOutput:
"hits": [
{
"_index": "ratings",
"_id": "NyP57pMBek9Hxsebqms0",
"_score": 1,
"_source": {
"user_id": 263,
"movie_id": 6708,
"title": "Matchstick Men (2003)",
"rating": 3,
"timestamp": 1090948299
}
},
{
"_index": "ratings",
"_id": "OCP57pMBek9Hxsebqms0",
"_score": 1,
"_source": {
"user_id": 263,
"movie_id": 6711,
"title": "Lost in Translation (2003)",
"rating": 4.5,
"timestamp": 1090948232
}
},
{
"_index": "ratings",
"_id": "OSP57pMBek9Hxsebqms0",
"_score": 1,
"_source": {
"user_id": 263,
"movie_id": 6773,
"title": "Triplets of Belleville, The (Les triplettes de Belleville) (2003)",
"rating": 3.5,
"timestamp": 1090948250
}
},