Ceph Outage on /cwork

As noted in the earlier section describing the process to add new hardware to the cluster, we described the process that we used to expand the number of metadata services supporting the /cwork cluster. As part of that process we enabled directory fragmentation in order to better support large numbers of files in a directory. Unfortunately, we did so on the root file system for /cwork as we had not created a subvolume for that mount point. We should not have been able to enable directory fragmentation on the root file system for /cwork, but due to a bug in Ceph we were able to add it. This has led to a catastrophic metadata pool corruption of the /cwork volume resulting from fragmenting the root directory fragmentation on /cwork. The /cwork directory will be unavailable until a full rebuild of all of the metadata is completed – which could last for more than a week. The Ceph development team will be putting out a patch to prevent this issue and we have informed our users of the issue. Luckily this only affects the /cwork shared volume – the volume designed to store long term researcher data was unaffected as it is a separate storage volume.

Work is underway with many steps completed – an update will be provided when services are fully restored


This entry was posted on Sunday, August 18th, 2024 at 12:53 pm and is filed under Uncategorized. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Leave a Reply

Your email address will not be published. Required fields are marked *