How to Restore Elasticsearch Snapshot
How to Restore Elasticsearch Snapshot Elasticsearch snapshots are a critical component of any production-grade data management strategy. Whether you're recovering from accidental deletion, migrating data between clusters, or preparing for disaster recovery, the ability to restore an Elasticsearch snapshot reliably and efficiently can mean the difference between minutes of downtime and hours of ope
How to Restore Elasticsearch Snapshot
Elasticsearch snapshots are a critical component of any production-grade data management strategy. Whether you're recovering from accidental deletion, migrating data between clusters, or preparing for disaster recovery, the ability to restore an Elasticsearch snapshot reliably and efficiently can mean the difference between minutes of downtime and hours of operational chaos. A snapshot is a point-in-time backup of your indices, cluster state, and configuration stored in a shared repository—often in object storage like Amazon S3, Azure Blob Storage, HDFS, or a network file system. Restoring from a snapshot is not merely copying files; it involves coordinated operations across the cluster to rehydrate data, validate integrity, and ensure consistency. This guide provides a comprehensive, step-by-step walkthrough of how to restore Elasticsearch snapshots, covering everything from prerequisites and repository configuration to advanced recovery scenarios and optimization techniques. By the end of this tutorial, you’ll have the knowledge and confidence to restore snapshots safely, quickly, and with full awareness of potential pitfalls.
Step-by-Step Guide
Prerequisites Before Restoration
Before initiating any snapshot restoration, ensure the following prerequisites are met to avoid failures or data inconsistencies:
- Elasticsearch version compatibility: The target cluster must be running the same or a newer version of Elasticsearch than the one used to create the snapshot. Restoring a snapshot from a newer version to an older one is not supported.
- Repository accessibility: The snapshot repository must be accessible from the target cluster. This includes proper network connectivity, authentication credentials, and permissions on the underlying storage (e.g., S3 bucket policy, NFS mount permissions).
- Cluster health: The cluster should be in a green or yellow state. Avoid restoring during a red state, as shard allocation failures may occur.
- Index name conflicts: If indices with the same names already exist in the target cluster, restoration will fail unless you explicitly rename them or delete the conflicting indices.
- Enough disk space: Verify that the target nodes have sufficient free disk space to accommodate the restored data. Elasticsearch requires at least 10% free space on data nodes for normal operations.
Step 1: List Available Snapshots
Begin by listing all snapshots stored in your registered repository. This step confirms that the snapshot you intend to restore exists and provides metadata such as creation time, version, and included indices.
Use the following API request:
GET /_snapshot/my_backup_repository/_all
Replace my_backup_repository with the actual name of your registered snapshot repository. The response will include an array of snapshot objects, each containing:
snapshot: The name of the snapshotversion: The Elasticsearch version used to create the snapshotindices: List of included indicesstate: Status (e.g., SUCCESS, FAILED)start_time_in_millisandend_time_in_millis: Timestamps
Example response snippet:
{
"snapshots": [
{
"snapshot": "snapshot_2024_05_15",
"version": "8.12.0",
"indices": [
"logs-prod-2024-05",
"metrics-prod"
],
"state": "SUCCESS",
"start_time": "2024-05-15T02:00:00.000Z",
"end_time": "2024-05-15T02:45:30.000Z"
}
]
}
Step 2: Verify Repository Configuration
Ensure your snapshot repository is properly registered and accessible. Use this API call to list all registered repositories:
GET /_snapshot/_all
If your repository does not appear in the response, you must register it first. For example, to register an S3 repository:
PUT /_snapshot/my_backup_repository
{
"type": "s3",
"settings": {
"bucket": "my-elasticsearch-backups",
"region": "us-east-1",
"base_path": "snapshots/",
"access_key": "YOUR_ACCESS_KEY",
"secret_key": "YOUR_SECRET_KEY"
}
}
For production environments, use IAM roles instead of hard-coded credentials. When using a shared file system (e.g., NFS), the path must be identical on all master and data nodes:
PUT /_snapshot/my_nfs_repo
{
"type": "fs",
"settings": {
"location": "/mnt/elasticsearch/snapshots",
"compress": true
}
}
After registration, test connectivity by taking a small test snapshot:
PUT /_snapshot/my_backup_repository/test_snapshot
{
"indices": ".kibana_1",
"include_global_state": false
}
Monitor the snapshot status:
GET /_snapshot/my_backup_repository/test_snapshot
Step 3: Identify Indices to Restore
Once you’ve confirmed the snapshot’s existence and repository accessibility, determine which indices you need to restore. You can restore:
- The entire snapshot (all indices and cluster state)
- A subset of indices
- Indices with a new name (rename during restore)
To restore only specific indices, specify them in the restore request. For example, to restore only logs-prod-2024-05 from the snapshot:
POST /_snapshot/my_backup_repository/snapshot_2024_05_15/_restore
{
"indices": "logs-prod-2024-05",
"rename_pattern": "logs-prod-(.+)",
"rename_replacement": "logs-prod-restore-$1"
}
The rename_pattern and rename_replacement parameters use Java regular expressions to dynamically rename indices during restore. This is essential when the original index names conflict with existing ones.
Step 4: Initiate the Restore Operation
Now, execute the restore command. The simplest form restores all indices and the cluster state:
POST /_snapshot/my_backup_repository/snapshot_2024_05_15/_restore
For more control, use a comprehensive request body:
POST /_snapshot/my_backup_repository/snapshot_2024_05_15/_restore
{
"indices": "logs-prod-2024-05,metrics-prod",
"ignore_unavailable": true,
"include_global_state": false,
"rename_pattern": "logs-prod-(.+)",
"rename_replacement": "logs-prod-restore-$1",
"index_settings": {
"index.number_of_replicas": 1
},
"include_aliases": true
}
Key parameters explained:
- indices: Comma-separated list of indices to restore. Use
*to restore all. - ignore_unavailable: If true, ignores indices that don’t exist in the snapshot (e.g., if they were deleted after snapshot creation).
- include_global_state: If true, restores cluster-wide settings, templates, and keystore entries. Use with caution—this can overwrite existing cluster configurations.
- rename_pattern and rename_replacement: Regex-based renaming for indices.
- index_settings: Override index settings during restore (e.g., reduce replicas for faster restore).
- include_aliases: Restores index aliases along with the indices.
Step 5: Monitor Restore Progress
Restoration is an asynchronous process. Monitor its progress using:
GET /_cat/restore?v
This returns a table showing:
repository: Snapshot repository namesnapshot: Snapshot nameindex: Index being restoredshards: Total shardscompleted_shards: Shards restoredtotal_size: Total data sizerestore_size: Data restored so farstart_timeandend_time
For detailed status per index, use:
GET /_snapshot/my_backup_repository/snapshot_2024_05_15/_status
Wait until all shards report completed and the status is DONE. Do not interrupt the process—this can lead to partial or corrupted restores.
Step 6: Validate Restored Data
After restoration completes, validate the integrity of your data:
- Check index health:
GET /_cat/indices/logs-prod-restore-2024-05?v - Verify document count:
GET /logs-prod-restore-2024-05/_count - Search sample documents:
GET /logs-prod-restore-2024-05/_search?q=*&size=5 - Confirm aliases:
GET /_alias/logs-prod-2024-05(if aliases were restored) - Check mappings:
GET /logs-prod-restore-2024-05/_mappingto ensure field types match expectations
Compare the restored data with a known good reference (e.g., a sample from before the incident) to confirm fidelity.
Step 7: Update Applications and Aliases
Once validation is complete, update your applications to point to the restored indices. If you used rename patterns, your application may already be configured correctly. If not, you may need to:
- Update index patterns in Kibana
- Modify data source configurations in Logstash or Beats
- Recreate or update aliases to point to the new indices
To create an alias pointing to the restored index:
POST /_aliases
{
"actions": [
{
"add": {
"index": "logs-prod-restore-2024-05",
"alias": "logs-prod"
}
}
]
}
This allows seamless reintegration without requiring code changes in upstream services.
Best Practices
1. Regularly Test Your Snapshots
Many organizations assume their snapshots are valid because they complete successfully. However, a snapshot can be corrupted, incomplete, or incompatible due to configuration drift. Schedule quarterly restore tests in a non-production environment. Automate this using scripts that:
- Restore a recent snapshot to a test cluster
- Verify document counts and field integrity
- Run a sample search query
- Log success/failure and alert if anomalies are detected
2. Use Incremental Snapshots Wisely
Elasticsearch snapshots are incremental by default—only new or changed data since the last snapshot is stored. This is efficient but means a snapshot chain depends on its predecessors. Never delete intermediate snapshots unless you’re certain you no longer need them. Always retain at least the last three snapshots for redundancy.
3. Avoid Restoring Cluster State Unless Necessary
The include_global_state flag restores cluster settings, index templates, and security configurations. While convenient, it can overwrite critical production settings (e.g., TLS certificates, role mappings, or node settings). Unless you’re restoring an entire cluster from scratch, set this to false and manually reapply configurations.
4. Reduce Replicas During Restore for Speed
By default, restored indices inherit their original number of replicas. If you’re restoring to a smaller cluster or need speed, override this setting:
"index_settings": {
"index.number_of_replicas": 0
}
After the restore completes, increase replicas to your desired level using:
PUT /logs-prod-restore-2024-05/_settings
{
"index.number_of_replicas": 1
}
5. Schedule Restores During Low Traffic Windows
Restoration consumes significant I/O and network bandwidth. Schedule restores during maintenance windows or off-peak hours to avoid impacting query performance. Use the cluster.routing.allocation.enable setting to temporarily prevent shard reallocations during the restore:
PUT /_cluster/settings
{
"transient": {
"cluster.routing.allocation.enable": "none"
}
}
Re-enable after restore:
PUT /_cluster/settings
{
"transient": {
"cluster.routing.allocation.enable": "all"
}
}
6. Monitor Disk Usage and Node Health
Restoring large snapshots can quickly fill disk space. Monitor disk usage during the process:
GET /_cat/allocation?v
If a node reaches 90% disk usage, Elasticsearch may halt shard allocation. Consider adding temporary storage or removing non-essential data before initiating the restore.
7. Use Snapshot Lifecycle Management (SLM)
For automated, policy-driven snapshotting and retention, use Elasticsearch’s Snapshot Lifecycle Management (SLM). While SLM primarily automates creation, it ensures consistency and simplifies recovery planning. Define policies to retain daily, weekly, and monthly snapshots, and automate cleanup of expired ones.
Tools and Resources
Elasticsearch Snapshot APIs
The core tools for snapshot management are built into Elasticsearch’s REST API:
GET /_snapshot— List repositoriesPUT /_snapshot/{repository}— Register a repositoryGET /_snapshot/{repository}/{snapshot}— Get snapshot detailsPOST /_snapshot/{repository}/{snapshot}/_restore— Restore snapshotGET /_cat/restore— Monitor restore progressDELETE /_snapshot/{repository}/{snapshot}— Delete a snapshot (use with caution)
Third-Party Tools
Several third-party tools enhance snapshot management:
- Elasticsearch Curator: A Python-based tool for automating snapshot creation, deletion, and restoration based on age or size thresholds. Ideal for managing large volumes of time-series data.
- Logstash + Snapshot Plugins: While Logstash doesn’t manage snapshots directly, it can be used in conjunction with custom scripts to trigger restores based on ingestion pipelines.
- OpenSearch Dashboards (for OpenSearch users): If you’re using OpenSearch, the UI includes a built-in Snapshot & Restore module for visual management.
- Custom Python/Shell Scripts: Automate restore workflows using the
requestslibrary in Python orcurlin shell scripts. Combine with cron jobs for scheduled recovery drills.
Storage Backend Recommendations
The choice of snapshot repository storage impacts reliability and performance:
- Amazon S3: Highly durable, scalable, and cost-effective. Use with IAM roles for secure access. Recommended for cloud-native deployments.
- Azure Blob Storage: Similar to S3, with native integration for Azure-hosted Elasticsearch clusters.
- Google Cloud Storage: Ideal for GCP environments.
- NFS: Good for on-premises deployments but requires high availability and redundancy. Avoid single-point-of-failure mounts.
- HDFS: Suitable for large-scale Hadoop-integrated environments.
Always enable server-side encryption and audit logs for your storage backend. Avoid using local disk storage on a single node—it defeats the purpose of a backup.
Documentation and Community Resources
- Elasticsearch Official Snapshot & Restore Guide
- Elastic Discuss Forum — Search for “restore snapshot” for real-world troubleshooting
- Elasticsearch Curator GitHub Repository
- Elastic Blog: Backup and Recovery Best Practices
Real Examples
Example 1: Restoring a Corrupted Index After Accidental Deletion
A developer accidentally ran DELETE /logs-prod-2024-05 during a maintenance window. The index contained 2.1TB of operational logs critical for compliance.
Steps taken:
- Confirmed the latest snapshot
snapshot_2024_05_15existed and included the index. - Used
ignore_unavailable: trueto avoid failure if other indices were missing. - Restored with renaming:
rename_pattern: "logs-prod-(.+)"→rename_replacement: "logs-prod-restore-$1"to avoid conflicts. - Set
index.number_of_replicas: 0to speed up initial restore. - Monitored progress via
_cat/restore—completed in 42 minutes. - Verified document count matched the pre-deletion state (18.7M documents).
- Created alias
logs-prodpointing to the restored index. - Updated Kibana dashboard to use the new alias.
Result: Zero data loss. Service restored in under an hour.
Example 2: Migrating Data Between Clusters
A company upgraded from Elasticsearch 7.17 to 8.12 and needed to migrate indices from the old cluster to the new one.
Steps taken:
- Created a snapshot on the old cluster using an S3 repository.
- Registered the same S3 repository on the new cluster.
- Ensured version compatibility (8.12 can restore from 7.17).
- Restored all indices with
include_global_state: falseto preserve new cluster security settings. - Used
index_settingsto adjust refresh intervals and merge policies for better performance on new hardware. - Recreated index templates and ingest pipelines manually to align with new mappings.
Result: Smooth migration with no downtime. Data integrity verified using checksums on sample documents.
Example 3: Disaster Recovery After Node Failure
A data center outage caused three out of five data nodes to fail. The cluster went into red state.
Steps taken:
- Provisioned a new 5-node cluster with identical configuration.
- Registered the snapshot repository (NFS mounted on all nodes).
- Restored the latest snapshot with
include_global_state: trueto recover security roles and index templates. - Set
cluster.routing.allocation.enable: noneduring restore to prevent premature shard allocation. - After restore, enabled allocation and allowed Elasticsearch to rebalance shards.
- Monitored recovery using
_cat/recoveryand confirmed all shards were allocated.
Result: Full cluster recovery in 3 hours. No data loss. Business operations resumed with minimal impact.
FAQs
Can I restore a snapshot from a newer Elasticsearch version to an older one?
No. Elasticsearch does not support restoring snapshots created on a newer version to an older version. Always upgrade your target cluster before attempting a restore from a newer snapshot.
What happens if I delete the original index before restoring?
It’s safe and often recommended. Deleting the original index prevents naming conflicts and ensures a clean restore. Use ignore_unavailable: true if you’re unsure whether the index exists.
Does restoring a snapshot overwrite existing data?
Yes. If an index with the same name exists, the restore operation will fail unless you use rename_pattern to assign a new name. Never assume data will be merged—restores are destructive by design.
How long does a snapshot restore take?
Restore time depends on:
- Size of the snapshot
- Network bandwidth between cluster and storage
- Number of shards
- Node disk I/O performance
As a rough estimate: 100GB takes 10–30 minutes on a modern SSD-backed cluster with good network connectivity.
Can I restore only specific documents or fields?
No. Snapshots are index-level backups. You cannot restore individual documents or fields. To recover partial data, you must restore the entire index and then use reindexing or scripting to extract subsets.
What’s the difference between snapshot and reindex?
A snapshot is a backup of the entire index at a point in time, stored externally. Reindex copies data from one index to another within the same cluster. Snapshots are for disaster recovery and migration; reindex is for data transformation or cluster internal movement.
Why is my restore stuck at 0%?
Common causes:
- Repository misconfiguration or inaccessible storage
- Insufficient disk space
- Network connectivity issues
- Cluster in red state
Check cluster logs (GET /_cluster/logs) and verify repository access using the test snapshot method.
Do snapshots include security settings?
Only if include_global_state: true is set. This includes roles, users, API keys, and index templates. Use this flag cautiously in production environments.
Can I restore a snapshot to a different cluster with different hardware?
Yes. Elasticsearch snapshots are hardware-agnostic. As long as the version is compatible and the repository is accessible, you can restore to any cluster regardless of CPU, RAM, or disk type.
Conclusion
Restoring an Elasticsearch snapshot is not just a technical operation—it’s a mission-critical resilience strategy. When done correctly, it ensures business continuity, protects against data loss, and provides peace of mind in the face of hardware failure, human error, or cyber incidents. This guide has walked you through the complete lifecycle of snapshot restoration: from verifying repository integrity and selecting the right snapshot, to monitoring progress and validating outcomes. You’ve learned how to avoid common pitfalls, leverage advanced features like index renaming and replica tuning, and apply best practices that align with enterprise-grade data governance.
Remember: the value of a snapshot is not in its creation—it’s in its restoration. Regularly test your recovery procedures, automate where possible, and never assume your backups are working until you’ve proven they can be restored. By treating snapshot restoration as a routine, validated process rather than a last-resort emergency, you transform Elasticsearch from a high-performance search engine into a truly resilient data platform.
Now that you understand how to restore Elasticsearch snapshots, take action: schedule your first restore test this week. Document the process. Share the results. And ensure your team is prepared—not just for the best-case scenario, but for the worst.