Last week I played around with the HCL Connections documentation to backup Elasticsearch in the article Backup Elasticsearch Indices in Component Pack .
In the end I found that I couldn’t get the snapshot restored and that I have to run a command outside of my Kubernetes cluster to get a snapshot on a daily basis. That’s not what I want.
So the first idea was to move the job defined in the helm chart into a Kubernetes cronjob. So I changed the definition to a cronjob. So now I have a job running from Kubernetes.
I added a new default variable:
cronTimes: "0 6,18 * * *"
So without changing this, when we deploy the cronjob, at 6:00 and 18:00 (or 6am and 6pm) the cronjob runs and creates a snapshot.
So what happens if we want to restore a snapshot? When we add the restore script to the same helm chart as our backup script, we have to delete the installation and loose all logs of our backup jobs. The snapshots are still there, but the history is gone.
So I created seperate helm charts, first the cronjob to create snapshots and second the job to restore the snapshot. The restore script restores all indices in our snapshot, this fails because some of them are system indices and always open. So the restore fails all the time in my tests.
The biggest caveat with the restore script is that it closes all indices. Each index would automatically open again after a successful restore, but the restore fails and so all indices stay closed.
I tried adding command options to the delivered restore command to only restore the icmetrics*
, orient-me-collection
and quickresults
indices, but the restore script was too limited for me.
If you want to use this helm chart, feel free to download:
I’m not responsible for your data or give support on these scripts! Use it on your own risk!
I’m not happy with the restore script, so no download for the moment.
Use Kibana and create a snapshot policy
In former Componentpack versions, there was a helm chart to deploy Elasticstack (Kibana, Logstash and Filebeat). This chart is still contained in the Componentpack package, but the image are missing.
I asked for updated images in a case and got them from HCL support. As far as I know, these helm chart and images are not available on Flexnet until now, but I’m confident, that support will send them to you on request.
In Kibana we can define policies for automatic snapshots , these can be configured through the web UI and show the http request which is sent to Elasticsearch. So we can configure these snapshots without installing Kibana for now.
To create a snapshot in the evening each day:
- Open a shell in one of the es-client pods:
kubectl exec -it -c es-client $(kubectl get pods -l component=elasticsearch7,role=client | awk '/client/{print $1}' | head -n 1 ) -- bash
- The backup store is mounted into all Elasticsearch pods, so no need to change anything on the deployments or statefulsets.
cd /opt/elasticsearch-7.10.1/probe
./sendRequest.sh PUT /_slm/policy/daily-snapshot -H 'Content-Type: application/json' -d '
{
"name": "<daily-snap-{now/d}>",
"schedule": "0 31 16 * * ?",
"repository": "connectionsmetrics",
"config": {
"indices": [
"ic*",
"quickresults",
"orient-me-collection"
],
"ignore_unavailable": true
},
"retention": {
"expire_after": "3d",
"min_count": 3,
"max_count": 5
}
}
'
This will create a scheduled snapshot of the configured indices (ic*, quickresults and orient-me-collection), at 16:31 UTC (the leading zero are the seconds). And keep 3 snapshots as a minimum. So retention and automatically delete snapshots after the configured time.
Elasticsearch cron expressions
Elasticsearch snapshots are automatically deduplicated!
Snapshots are automatically deduplicated to save storage space and reduce network transfer costs. To back up an index, a snapshot makes a copy of the index’s segments and stores them in the snapshot repository. Since segments are immutable, the snapshot only needs to copy any new segments created since the repository’s last snapshot.
Each snapshot is also logically independent. When you delete a snapshot, Elasticsearch only deletes the segments used exclusively by that snapshot. Elasticsearch doesn’t delete segments used by other snapshots in the repository.
So no waste of diskspace if you add more snapshots. I play around with hourly snapshots at the moment:
cd /opt/elasticsearch-7.10.1/probe
./sendRequest.sh PUT /_slm/policy/hourly-snapshot -H 'Content-Type: application/json' -d '
{
"name": "<hourly-snap-{now/d}>",
"schedule": "0 0 * * * ?",
"repository": "connectionsmetrics",
"config": {
"indices": [
"ic*",
"quickresults",
"orient-me-collection"
],
"ignore_unavailable": true
},
"retention": {
"expire_after": "1d",
"min_count": 6,
"max_count": 12
}
}
'
This can be set with Kibana, but no need to deploy it, if you don’t need it. You can use the curl
calls above to configure the snapshots.
Testing the new comment system. Just ignore this please.