How to replace a failed disk on ceph node

Asked by Claude Durocher

We need to know how to replace a failed OSD disk in a Ceph cluster deployed by Fuel 6.1. All our ceph nodes uses SSD drives for Ceph journals.

The firsts steps are quite simple but not complete :

-identify the failed osd : ceph osd tree | grep down
-identify the journal partition used by the osd : ceph-disk list (don't work as the partition are listed but not the relationship to the osd)
-identify the mount point of the failed osd : mount (we figured out the ceph partition is the third one)
-unmount the partition : umount ${CEPH_OSD_PART}
-replace the disk and setup the controller so it recognize the new disk
-out the osd : ceph osd out ${CEPH_OSD} && stop ceph-osd id=${CEPH_OSD_NUM}
-remove the crush map : ceph osd crush remove ${CEPH_OSD} && ceph auth del ${CEPH_OSD} && ceph osd rm ${CEPH_OSD}

What would be the next steps to recreate the osd and reuse the existing journal partition?

Question information

Language:
English Edit question
Status:
Expired
For:
Fuel for OpenStack Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Launchpad Janitor (janitor) said :
#1

This question was expired because it remained in the 'Open' state without activity for the last 15 days.

Revision history for this message
Claude Durocher (claude-d) said :
#2

Anyone can provide some feedback?

Le sam. 22 août 2015 04:32, Launchpad Janitor <
<email address hidden>> a écrit :

> Your question #270081 on Fuel for OpenStack changed:
> https://answers.launchpad.net/fuel/+question/270081
>
> Status: Open => Expired
>
> Launchpad Janitor expired the question:
> This question was expired because it remained in the 'Open' state
> without activity for the last 15 days.
>
> --
> If you're still having this problem, you can reopen your question either
> by replying to this email or by going to the following page and
> entering more information about your problem:
> https://answers.launchpad.net/fuel/+question/270081
>
> You received this question notification because you asked the question.
>
--

Claude

Revision history for this message
Fabrizio Soppelsa (fsoppelsa) said :
#3

Greetings Claude,

the standard procedure from Ceph applies: http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-osds-manual

Best,
Fabrizio
Mirantis Fuel Team

Revision history for this message
Claude Durocher (claude-d) said :
#4

The Ceph standard procedure is a starting point however, my question was
more specific: how to replace a failed disk considering the use of a
journal on a separate (ssd) disk?

Also, Fuel has a special way of partitioning disks before the creation of
the osd. So a procedure or even better, a tool should be available to
replace or add an osd with a pattern similar to the one used at creation.

Le mar. 1 sept. 2015 21:43, Fabrizio Soppelsa <
<email address hidden>> a écrit :

> Your question #270081 on Fuel for OpenStack changed:
> https://answers.launchpad.net/fuel/+question/270081
>
> Status: Open => Answered
>
> Fabrizio Soppelsa proposed the following answer:
> Greetings Claude,
>
> the standard procedure from Ceph applies:
> http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds
> /#removing-osds-manual
>
> Best,
> Fabrizio
> Mirantis Fuel Team
>
> --
> If this answers your question, please go to the following page to let us
> know that it is solved:
> https://answers.launchpad.net/fuel/+question/270081/+confirm?answer_id=2
>
> If you still need help, you can reply to this email or go to the
> following page to enter your feedback:
> https://answers.launchpad.net/fuel/+question/270081
>
> You received this question notification because you asked the question.
>
--

Claude

Revision history for this message
Fabrizio Soppelsa (fsoppelsa) said :
#5

Claude, even with a journal on external SSD, the procedure applies. After removing the auth and osd, when the cluster health goes to HEALTH_OK, you can remove the disk.

After that, you zap the new disk and deploy the configuration, specifying the old journal path:

# ceph-deploy disk zap node:sdX
# ceph-deploy --overwrite-conf osd create node:sdX:/path/to/journal

Fabrizio
Mirantis Fuel Team

Revision history for this message
Claude Durocher (claude-d) said :
#6

Still, how do you find the old journal path? Do you specify dev-by-id path
or regular device path?

This may seems as details but they are important if you wish to keep a
uniform configuration with Fuel.

Thanks again for the answers.

Le mar. 1 sept. 2015 22:17, Fabrizio Soppelsa <
<email address hidden>> a écrit :

> Your question #270081 on Fuel for OpenStack changed:
> https://answers.launchpad.net/fuel/+question/270081
>
> Status: Open => Answered
>
> Fabrizio Soppelsa proposed the following answer:
> Claude, even with a journal on external SSD, the procedure applies.
> After removing the auth and osd, when the cluster health goes to
> HEALTH_OK, you can remove the disk.
>
> After that, you zap the new disk and deploy the configuration,
> specifying the old journal path:
>
> # ceph-deploy disk zap node:sdX
> # ceph-deploy --overwrite-conf osd create node:sdX:/path/to/journal
>
> Fabrizio
> Mirantis Fuel Team
>
> --
> If this answers your question, please go to the following page to let us
> know that it is solved:
> https://answers.launchpad.net/fuel/+question/270081/+confirm?answer_id=4
>
> If you still need help, you can reply to this email or go to the
> following page to enter your feedback:
> https://answers.launchpad.net/fuel/+question/270081
>
> You received this question notification because you asked the question.
>
--

Claude

Revision history for this message
Launchpad Janitor (janitor) said :
#7

This question was expired because it remained in the 'Open' state without activity for the last 15 days.