Flattened Data Type

Updated Dec 30, 2022 ·

Overview

When dealing with deeply nested or highly dynamic fields in Elasticsearch, you may encounter mapping explosion, where the number of fields in the mapping grows excessively. This can lead to performance issues and make the mapping difficult to manage.

Cluster Mappings

Download the dataset below:

demo-default.json

First, store the Elasticsearch endpoint and credentials in variables:

ELASTIC_ENDPOINT="https://your-elasticsearch-endpoint"
ELASTIC_USER="your-username"
ELASTIC_PW="your-password"

Index the dataset from the demo-default.json file into the Elasticsearch.

curl -s -u $ELASTIC_USER:$ELASTIC_PW \
-H 'Content-Type: application/json' \
-XPUT "$ELASTIC_ENDPOINT:9200/demo-default/_doc/1" \
-d @demo-default.json | jq

Output:

{
  "_index": "demo-default",
  "_id": "1",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 1
}

Next, you can check the mappings of the demo-default index:

curl -s -u $ELASTIC_USER:$ELASTIC_PW \
-H 'Content-Type: application/json' \
-XGET "$ELASTIC_ENDPOINT:9200/demo-default/_mapping?pretty=true" | jq 

The returned output shows the automatically assigned field types by Elasticsearch for each field. We didn’t explicitly define any field types, but Elasticsearch inferred the types based on the data provided.

{
  "demo-default": {
    "mappings": {
      "properties": {
        "@timestamp": {
          "type": "date"
        },
        "fileset": {
          "properties": {
            "name": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            }
          }
        },
        "host": {
          "properties": {
            "hostname": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "name": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            }
          }
        },
        "message": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "process": {
          "properties": {
            "name": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "pid": {
              "type": "long"
            }
          }
        }
      }
    }
  }
}

Cluster State

You can check the state of the Elasticsearch cluster using the following command:

curl -s -u $ELASTIC_USER:$ELASTIC_PW \
-H 'Content-Type: application/json' \
-XGET "$ELASTIC_ENDPOINT:9200/_cluster/state?pretty=true" | jq >> es-cluster-state.json

As a recap, an Elasticsearch cluster consists of a master node that sends the latest cluster state to all other nodes. When a node receives the updated state, it acknowledges the master node.

When new fields are added to documents, Elasticsearch creates new mappings, which also update the cluster state. Regularly adding fields can cause the cluster state to grow, leading to delays and performance issues.

Too many fields in the mapping can result in a mapping explosion, which can then impact memory and overall cluster performance. To prevent mapping explosions, Elasticsearch introduces the flattened datatype.

Using Flattened Data Type

The flattened data type allows us to index complex, nested JSON objects while keeping the data structure flat.

Flattens complex data for easier querying.
Reduces mapping complexity and improves query performance.

Example: Creating and Mapping the Index

First, create the index demo-flattened:

curl -s -u $ELASTIC_USER:$ELASTIC_PW \
-H 'Content-Type: application/json' \
-XPUT $ELASTIC_ENDPOINT:9200/demo-flattened | jq

Output:

{
  "acknowledged": true,
  "shards_acknowledged": true,
  "index": "demo-flattened"
} 

Next, define the mapping for the host field as a flattened type:

curl -s -u $ELASTIC_USER:$ELASTIC_PW \
-H 'Content-Type: application/json' \
-XPUT "$ELASTIC_ENDPOINT:9200/demo-flattened/_mapping" -d '{
  "properties": {
    "host": {
      "type": "flattened"
    }
  }
}' | jq

Indexing the Dataset

Now, index the same demo-default.json file from the previous example into the demo-flattened index:

curl -s -u $ELASTIC_USER:$ELASTIC_PW \
-H 'Content-Type: application/json' \
-XPUT "$ELASTIC_ENDPOINT:9200/demo-flattened/_doc/1" \
-d @demo-default.json | jq

Output;

{
  "_index": "demo-flattened",
  "_id": "1",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 1
} 

Checking the Mapping

To verify the mapping of the host field, run the following command:

curl -s -u $ELASTIC_USER:$ELASTIC_PW \
-H 'Content-Type: application/json' \
-XGET "$ELASTIC_ENDPOINT:9200/demo-flattened/_mapping?pretty=true" | jq 

You should see the following for the host field:

  "host": {
    "type": "flattened"
  },

Adding Inner Fields to the Flattened Field

Next, let's add new inner fields to the host field:

curl -s -u $ELASTIC_USER:$ELASTIC_PW \
-H 'Content-Type: application/json' \
-X POST "$ELASTIC_ENDPOINT:9200/demo-flattened/_update/1" -d '{
  "doc": {
    "host": {
      "osVersion": "Bionic Beaver",
      "osArchitecture": "x86_64"
    }
  }
}' | jq

You would expect the inner fields to be added to the mapping. However, because the host field is defined as a flattened type, Elasticsearch will not automatically add these new fields to the mapping.

To confirm this, check the mapping again:

curl -s -u $ELASTIC_USER:$ELASTIC_PW \
-H 'Content-Type: application/json' \
-XGET "$ELASTIC_ENDPOINT:9200/demo-flattened/_mapping?pretty=true" | jq 

As seen below, the flattened data type does not dynamically add new fields to the mapping when inner fields are added. This is a limitation of the flattened type, but this also significantly reduces the size of the mapping and reduces the risk of mapping explosions.

  "host": {
    "type": "flattened"
  },

Limitations

The main limitation of the flattened data type is that fields are treated as keywords, meaning no analyzers or tokenizers are applied. This limits the search capability because:

Only exact matches are supported, not partial or full-text search.
Fields aren't tokenized, so complex searches or custom parsing aren't possible.

While flattened fields are great for structured data, they are not ideal for full-text search or complex text processing.

To demonstrate the limitation, run a query on a flattened field from the previous example:

curl -s -u $ELASTIC_USER:$ELASTIC_PW \
-H 'Content-Type: application/json' \
-XGET "$ELASTIC_ENDPOINT:9200/demo-flattened/_search?pretty=true" -d'
{
  "query": {
    "term": {
      "host.osVersion": "Beaver"
    }
  }
}' | jq

The response shows that since the field is flattened, it doesn't allow partial search or text matching:

{
  "took": 7,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 0,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  }
} 

Supported Queries

Below are supported queries for the flattened data type:

Query Type	Description
`term`, `terms`, and `terms_set`	Exact matches against the field values.
`prefix`	Finds documents where the field value starts with a specified prefix.
`range`	Supports non-numerical range operations (e.g., matching string values).
`match` and `multi_match`	Exact keyword matches (not for full-text searches).
`query_string` and `simple_query_string`	Allows for complex search expressions with exact keyword matches.
`exists`	Checks if a specific field exists within the document.

Overview​

Cluster Mappings​

Cluster State​

Using Flattened Data Type​

Example: Creating and Mapping the Index​

Indexing the Dataset​

Checking the Mapping​

Adding Inner Fields to the Flattened Field​

Limitations​

Supported Queries​