Skip to content

Replica not able to recover when disk lost #787

Open
@janhoy

Description

@janhoy
Contributor

I have now seen in several Solr clusters in k8s that a POD has experienced complete disk (PVC) loss due to underlying volume provisioning issues, and the POD eventually comes back online but with an empty disk / volume.

In such a case, all the replicas that were on that Solr node (as recorded in collection state) fails recovery and ends up in a permanent DOWN state. The soluition is to manually call DELETEREPLICA on them and then ADDREPLICA to create a new replica. This process has even been scripted https://gist.github.com/relwell/51aecaf7a435c68a1651872f0febbb5b.

There may of course be other reasons for a DOWN state replica than empty disk, which may also be solved by deleting the replica and adding a new one.

Question is whether we want either Solr itself or SolrOperator to be able to auto recover from this situation. It need not be the default action, but can be enabled by configuration. Thoughts?

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @janhoy

        Issue actions

          Replica not able to recover when disk lost · Issue #787 · apache/solr-operator