Using Client Libraries
Using Client APIs
Elasticsearch provides client libraries for most programming languages. This makes it easier to interact with Elasticsearch without dealing directly with raw JSON.
- Java has an official client maintained by Elastic.
- Python offers the
elasticsearchpackage. - Ruby supports
elasticsearch-ruby. - Scala has multiple client options.
- Perl uses the
elasticsearch.pmmodule.
Python Client Library
-
Run an update and install required packages.
sudo apt updatesudo apt install -y python3-pipsudo pip3 uninstall urllib3 chardetsudo pip3 install --upgrade requestssudo pip3 install "urllib3==1.26.6"sudo pip3 install "chardet==3.0.4"sudo pip3 install "charset-normalizer==2.0.4" -
Install the elasticsearch library using pip.
sudo pip3 install elasticsearch -
Download the dataset and unzip from the previous section (if you haven't done so).
Unzip the package.
unzip ml-latest-small.zip -
Below is a sample Python code that uses the Elasticsearch client library to import movie ratings into Elasticsearch.
infoNotes:
- SSL is enabled on the Elasticsearch cluster in this example.
- Authentication is done using a username and password, without certificates.
- The default cluster URL is
$ELASTIC_ENDPOINT:9200.
import csvimport elasticsearchfrom elasticsearch import helpersimport urllib3import sysimport warningsimport getpassfrom collections import deque# Suppress specific warningsurllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)warnings.filterwarnings("ignore", category=UserWarning, module='requests')warnings.filterwarnings("ignore", category=Warning, module='elasticsearch')# Read movie titlesdef readMovies(movies_path):titleLookup = {}with open(movies_path, 'r') as csvfile:reader = csv.DictReader(csvfile)for movie in reader:titleLookup[movie['movieId']] = movie['title']return titleLookup# Read ratingsdef readRatings(movies_path, ratings_path):titleLookup = readMovies(movies_path)with open(ratings_path, 'r') as csvfile:reader = csv.DictReader(csvfile)for line in reader:yield {'user_id': int(line['userId']),'movie_id': int(line['movieId']),'title': titleLookup[line['movieId']],'rating': float(line['rating']),'timestamp': int(line['timestamp']),}def main():if len(sys.argv) < 3:print("Usage: python3 indexratings.py <path_to_movies.csv> <path_to_ratings.csv>")sys.exit(1)movies_path = sys.argv[1]ratings_path = sys.argv[2]es_host = "$ELASTIC_ENDPOINT:9200"es_username = input("Enter Elasticsearch username: ")es_password = getpass.getpass("Enter Elasticsearch password: ")es = elasticsearch.Elasticsearch([es_host],basic_auth=(es_username, es_password),verify_certs=False # Disable SSL certificate verification)es.indices.delete(index="ratings", ignore=404)deque(helpers.parallel_bulk(es, readRatings(movies_path, ratings_path), index="ratings"), maxlen=0)es.indices.refresh(index="ratings")if __name__ == "__main__":main() -
To run the code, pass the
movies.csvand theratings.csv.python3 indexratings.py ml-latest-small/movies.csv ml-latest-small/ratings.csvinfoYou can put the script and the files in the same directory and run:
python3 indexratings.py movies.csv ratings.csv -
Provide username and password when prompted.
Enter Elasticsearch username: elasticEnter Elasticsearch password:infoThe Elasticsearch host is set to
$ELASTIC_ENDPOINT:9200.
If your cluster uses a different address, you may need to update the script accordingly. -
Run a query to verify if the index has been created and populated.
infoStore the Elasticsearch endpoint and credentials in variables:
ELASTIC_ENDPOINT="https://your-elasticsearch-endpoint"ELASTIC_USER="your-username"ELASTIC_PW="your-password"curl -s -u $ELASTIC_USER:$ELASTIC_PW \-H 'Content-Type: application/json' \-XGET $ELASTIC_ENDPOINT:9200/ratings/_search?pretty | jqOutput:
"hits": [{"_index": "ratings","_id": "NyP57pMBek9Hxsebqms0","_score": 1,"_source": {"user_id": 263,"movie_id": 6708,"title": "Matchstick Men (2003)","rating": 3,"timestamp": 1090948299}},{"_index": "ratings","_id": "OCP57pMBek9Hxsebqms0","_score": 1,"_source": {"user_id": 263,"movie_id": 6711,"title": "Lost in Translation (2003)","rating": 4.5,"timestamp": 1090948232}},{"_index": "ratings","_id": "OSP57pMBek9Hxsebqms0","_score": 1,"_source": {"user_id": 263,"movie_id": 6773,"title": "Triplets of Belleville, The (Les triplettes de Belleville) (2003)","rating": 3.5,"timestamp": 1090948250}},