Architecture
Single-Node
A single-node Elasticsearch setup runs all processes on one machine. It is suitable for small-scale applications or testing.
- Indexing, searching, and storage within the same node.
- Handles data storage, indexing, and search queries in one place.
- limited scalability, not highly available.
Recommended node configuration:
Resource | Recommended Value |
---|---|
Memory | Allocate 50% to Elasticsearch (16GB to 32GB) |
CPU | Multi-core processor (4 to 8 cores). |
Disk | Fast SSD storage with sufficient space. |
Network | At least 1 Gbps network connection |
Clustered
A clustered Elasticsearch setup consists of multiple nodes working together., ideal for large-scale data and high availability. It distributes tasks like
- Indexing, searching, and storage are spread across nodes.
- Nodes are assigned roles (e.g., master, data, or coordinating).
Each node can serve multiple roles. By distributing tasks, the cluster can handle lagre volumes of data and high query load efficiently.
Namespace
A cluster in Elasticsearch provides a single namespace, making it easy to interact with data across multiple nodes.
- Combines data from all nodes into one logical view.
- Ensures consistent data access and coordination.
Node Roles
In an Elasticsearch cluster, each node can have a specific role to handle different tasks, helping with performance, scalability, and data management.
The node roles are:
-
Master Node (Master-eligible)
-
Manages cluster state and metadata.
-
Coordinates changes like adding/removing nodes.
-
Responsible for maintaining cluster health.
infoA cluster can only have one master at a time, but it can have multiple master-eligible nodes.
-
-
Data Node
- Index, store, and analuyze data.
- Responsible for query execution and storage management.
- Stores data in shards and replicas.
-
Data Ingest
- Prepares and formats data before storing in data nodes.
- Transforms data as it enters the system.
-
Data Content
- Handles the storage and search of content-related data.
- Can be used for managing high-volume data.
- Typically stores unprocessed data for fast search operations.
-
ML Node
- Runs machine learning tasks like anomaly detection.
- Performs predictive analytics and training.
- Optimized for heavy computation and data processing.
-
Transform
- Executes data transformations for analytics.
- Reformat and restructure data for improved analysis.
- Useful for large-scale data aggregation tasks.
-
Remote Cluster Client
- Interacts with remote Elasticsearch clusters.
- Enables cross-cluster search and aggregation.
In addition to these roles, Elasticsearch provides specialized data storage nodes based on access frequency:
-
Data Hot
- Stores recently indexed, frequently accessed data.
- Optimized for high-speed searches.
- For data workloads that need quick retrieval.
-
Data Warm
- Stores less frequently accessed data, but still searchable.
- Balances cost and performance for aging data.
- Slower to query than hot data but cheaper to store.
-
Data Cold
- Stores infrequently accessed, archived data.
- Optimized for long-term storage with minimal query speed.
- For data that doesn't require frequent searches.
-
Data Frozen
- Stores data that is rarely accessed, with extremely low cost.
- Long-term retention with minimal query requirements.
- Data retrieval is slower compared to cold nodes.
Cluster States
Cluster states show the overall health of an Elasticsearch cluster. They depend on shard allocation for each index, and the cluster state always reflects the worst index state.
-
Green
- All indexes are green
- All primary and replica shards are allocated
- All data is fully available and replicated
-
Yellow
- At least one index is yellow and none are red
- All primary shards are allocated
- Some replica shards are missing
- Data is accessible but not fully replicated, which can be risky
-
Red
- At least one index is red
- Not all primary shards are allocated
- Some data is missing, possible data loss
- The cluster turns red because the worst index state defines the cluster state
Example:
- If all indexes are green, the cluster is green.
- If one index is yellow, the cluster is yellow.
- If any index turns red, the whole cluster is red.