Skip to main content

Full-Text Search

Updated Dec 30, 2022 ·

Overview

Analyzers in Elasticsearch help to control how text is indexed and searched. They can be customized for exact or partial matches to improve search accuracy and relevance.

  • Exact Match

    • Matches the exact text.
    • Use keyword mapping instead of text.
    • Ideal for precise searches like product IDs.
  • Partial Match

    • Matches parts of the text.
    • Useful for flexible searches like titles or descriptions.

Searching Keywords

info

The example below is tested on a running Elasticsearch 8, and uses a dataset containing movie ratings. For more information, please see [Importing by Bulk.](/docs/018-Observability/020-Elastic-Stack/ 003-Mapping-and-indexing/013-Updating-Data.md#importing-by-bulk)

First, store the Elasticsearch endpoint and credentials in variables:

ELASTIC_ENDPOINT="https://your-elasticsearch-endpoint"
ELASTIC_USER="your-username"
ELASTIC_PW="your-password"

Search the index for "Star Trek" movies.

curl -s -u $ELASTIC_USER:$ELASTIC_PW \
-H 'Content-Type: application/json' \
-XGET $ELASTIC_ENDPOINT:9200/movies/_search?pretty -d '
{
"query": {
"match": {
"title": "Star Trek"
}
}
}' | jq

Notice that this query returns documents for both "Star Trek" and "Star Wars". The main difference is that the "Star Trek" document returned a higher score (2.6716127), while the "Star Wars" document had a lower score (0.73723686), indicating less relevance for the search term.

{
"took": 5,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 2.6716127,
"hits": [
{
"_index": "movies",
"_id": "135569",
"_score": 2.6716127,
"_source": {
"id": "135569",
"title": "Star Trek Beyond",
"year": 2016,
"genre": [
"Action",
"Adventure",
"Sci-Fi"
]
}
},
{
"_index": "movies",
"_id": "122886",
"_score": 0.73723686,
"_source": {
"id": "122886",
"title": "Star Wars: Episode VII - The Force Awakens",
"year": 2015,
"genre": [
"Action",
"Adventure",
"Fantasy",
"Sci-Fi",
"IMAX"
]
}
}
]
}
}

Try another qeury. Search for movies with "sci-fi" genre.

curl -s -u $ELASTIC_USER:$ELASTIC_PW \
-H 'Content-Type: application/json' \
-XGET $ELASTIC_ENDPOINT:9200/movies/_search?pretty -d '
{
"query": {
"match": {
"genre": "sci-fi"
}
}
}' | jq

This query will return all movies with the "Sci-Fi" genre. Since the index isn’t strict about genre types, it will return results even with partial matches, such as lowercase text.

{
"took": 8,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": 0.81092054,
"hits": [
{
"_index": "movies",
"_id": "1924",
"_score": 0.81092054,
"_source": {
"id": "1924",
"title": "Plan 9 from Outer Space",
"year": 1959,
"genre": [
"Horror",
"Sci-Fi"
]
}
},
{
"_index": "movies",
"_id": "135569",
"_score": 0.7309394,
"_source": {
"id": "135569",
"title": "Star Trek Beyond",
"year": 2016,
"genre": [
"Action",
"Adventure",
"Sci-Fi"
]
}
},
{
"_index": "movies",
"_id": "122886",
"_score": 0.61051,
"_source": {
"id": "122886",
"title": "Star Wars: Episode VII - The Force Awakens",
"year": 2015,
"genre": [
"Action",
"Adventure",
"Fantasy",
"Sci-Fi",
"IMAX"
]
}
}
]
}
}

Enforcing Exact Match

To enforce an exact match, we need to modify the index mappings. In this case, we will delete the existing index and reindex the data.

curl -s -u $ELASTIC_USER:$ELASTIC_PW \
-H 'Content-Type: application/json' \
-XDELETE $ELASTIC_ENDPOINT:9200/movies

Output:

{"acknowledged":true} 

Next, we re-define the mappings. The genre field will be of type keyword for exact matches, and the title field will use the text type to allow partial matches. We will also apply the English analyzer to handle stopwords and synonyms specific to the language.

curl -u $ELASTIC_USER:$ELASTIC_PW \
-H 'Content-Type: application/json' \
-XPUT $ELASTIC_ENDPOINT:9200/movies -d '
{
"mappings": {
"properties": {
"id": {"type": "integer"},
"year": {"type": "date"},
"genre": {"type": "keyword"},
"type": {
"type": "text",
"analyzer": "english"
}
}
}
}'

Output:

{"acknowledged":true} 

Now, we reindex the data using the movies.json file.

curl -u $ELASTIC_USER:$ELASTIC_PW \
-H 'Content-Type: application/json' \
-XPUT $ELASTIC_ENDPOINT:9200/_bulk?pretty \
--data-binary @movies.json

Next, search for movies with the "sci-fi" genre.

curl -s -u $ELASTIC_USER:$ELASTIC_PW \
-H 'Content-Type: application/json' \
-XGET $ELASTIC_ENDPOINT:9200/movies/_search?pretty -d '
{
"query": {
"match": {
"genre": "sci-fi"
}
}
}' | jq

Since we have set the genre field to keyword type, there will be no partial matches, and the search will return no results.

{
"took": 5,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
}
}

Finally, search for "star wars" using the title field.

curl -s -u $ELASTIC_USER:$ELASTIC_PW \
-H 'Content-Type: application/json' \
-XGET $ELASTIC_ENDPOINT:9200/movies/_search?pretty -d '
{
"query": {
"match": {
"title": "star wars"
}
}
}' | jq

Since the title field is of type text, the search will return both "Star Wars" and "Star Trek" due to partial matches.

{
"took": 6,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1.5458014,
"hits": [
{
"_index": "movies",
"_id": "122886",
"_score": 1.5458014,
"_source": {
"id": "122886",
"title": "Star Wars: Episode VII - The Force Awakens",
"year": 2015,
"genre": [
"Action",
"Adventure",
"Fantasy",
"Sci-Fi",
"IMAX"
]
}
},
{
"_index": "movies",
"_id": "135569",
"_score": 0.8025915,
"_source": {
"id": "135569",
"title": "Star Trek Beyond",
"year": 2016,
"genre": [
"Action",
"Adventure",
"Sci-Fi"
]
}
}
]
}
}