Ceph

The Intellectual Merit of the project is the development of a managed ecosystem based on standard data center class hardware and open source CEPH software that provides a flexible, high-performance, cost-effective, managed tiered data storage system in support of research.

The proposed system is based on open source CEPH[11] software running on generic data center class hardware and is able to provide both Network Attached Storage (NAS) and S3 compatible Object Storage. The system will provide data management and scripting tools for activities such as data tagging and other metadata generation, automation of workflows, including the packaging and migration of data between storage tiers.

Benchmark Testing

I0500 standard

April 2023

  • network does not appear to be a bottleneck with current 144 7.2K rpm OSD config
  • current testing is unlikely to cause network problems for other network users
  • We seem to be hitting a limit on storage performance as tested with IO500. I would like to test scaling down the number of OSDs and see if that reduces the overall performance. The *theory* is that cephfs performance will scale linearly with the number of OSDs.
  • Overall performance is not dramatically faster or slower than Isilon/NFS based on very crude testing

May 2023

  • Variance: When comparing “identical” benchmark runs (in terms of filesystem tested, number of client nodes and MPI threads used) the average variance in IO500 metrics is 0.59%, with a maximum variance occurring on the ‘mdtest-easy-stat’ test of between 10.04% and 16.24%. The three primary metrics (Total Bandwidth, Total IOPS, and IO500 Score) have a maximum variance of 2.72%.
  • Since making the recommended changes, benchmark scores have significantly increased.
  • The best overall IO500 results are generally seen with runs using 4 or 8 MPI threads per node.
  • The 8.3 erasure encoded filesystem has 11.75% higher scores on average for ‘ior-easy-write’ (sequential writing), 8.25% higher scores at ‘mdtest-easy-delete’, and 4.43% higher scores at ‘mdtest-hard-delete’.
  • The replicated filesystem has generally higher scores; 16.08% higher Total Bandwidth, 4.35% higher Total IOPS, and 10.63% higher IO500 scores. Other test that were notably higher scores are ior-hard-write (random write) at 37.98% higher, ior-hard-read at 23.16% higher. Other tests were within 10% of the erasure encoded filesystem scores.

Baseline results

IO500 benchmarks were run on 44 client VMs on 11 physical hosts (4 VMs per host), with 4 or 8 MPI threads. The two CephFS filesystems tested are an Erasure Encoded 8.3 filesystem (/cephwork_ec), and a 3x replicated filesystem (/cephwork).