Disk Allocation Optimization

This page contains the discussion of how to allocate disks for the repo.fossology.org cluster. This is the system that is running the live demo It might be interesting to people setting up similar clusters.

Hardware

msa1000 with 14 300gb disks, plus an add-on msa30 with 14 300gb disks.

The internal chassis has two U160 SCSI busses and the external is using two U160 SCSI busses.
Total = four U160 busses with 7 300gb disks each

  • The msa1000 allows grouping of full or partial disks into LUNs to be presented to hosts. This means that you can potentially have multiple LUNs on any given disk mechanism, potentially competing.
  • The more disks you can spread a LUN across, the better the performance. This is a conflicting goal with the above, depending on workload it might make sense to do one or the other.
  • The more SCSI busses you can spread a LUN across, the better the performance.

Goals/assumptions

  • try to have things be symmetric so as to not unevenly burden the repos with other uses on the same disks or bus
  • each of the repo areas on it's own disks so they don't interfere with each other
  • non-repo areas not on the repo disks, but can maybe share disks
  • sharing busses is probably ok, as long as it's symmetric
  • make sure we have redundancy

Needed uses

  • 3 repo areas with the same design
  • 1 db area, will be smaller than repo areas
  • 1 scratch area, needs to be able to handle multiple copies of the unpacked/"make prep"'d fedora stuff, and similar for debian
  • 1 fossbazaar area, probably can be <100gb

Questions

  • Can the repos share disks? * Is the repo disk activity constant or sporatic? * If sporatic, does it happen in sync or would it interleave well? * Would it be better to have more mechs per LUN to get the added performance or isolated mechs to avoid the seek penalty of having multiple LUNs on the same disks?
  • Can the db and repo share disks? * What does the db disk access look like? * Does postgres' memory cache help?
  • Can the scratch area and the db share disks?
  • Can the scratch area and the repo share disks?
  • Do we want to optimize for speed or usable size?

Potential configs

Countless permutations exist, here are some as examples:

1.) Isolation config, all uses on their own disks, raid0+1
repo: 1 whole disks each bus, 4 disks total, raid0+1 = 600gb
3x -> 1800gb total
scratch: 2 whole disks each bus, 8 disks total, raid0+1 = 1200gb
db: 1 whole disks each bus, 4 disks total, raid0+1 = 600gb
FB: 1 whole disks each bus, 4 disks total, raid0+1 = 600gb

2.) Isolation config, all uses on their own disks, raid5
repo: 1 whole disks each bus, 4 disks total, raid5 = 800gb
3x -> 2400gb total
scratch: 2 whole disks each bus, 8 disks total, raid5 = 1800gb
db: 1 whole disks each bus, 4 disks total, raid5 = 800gb
FB: 1 whole disks each bus, 4 disks total, raid5 = 800gb

3.) Repos share, scratch+db share, raid0+1
repo: 100gb of 5 disks each bus, 20 disks total, raid0+1 = 1000gb
3x -> 3000gb total
scratch: 150gb of 2 disks each bus, 8 disks total, raid0+1 = 600gb
db: 150gb of 2 disks each bus, 8 disks total, raid0+1 = 600gb
FB: 1 whole disks each bus, 4 disks total, raid0+1 = 600gb

4.) Repos share, scratch+db share, raid6/5
repo: 100gb of 5 disks each bus, 20 disks total, raid6 = 1800gb
3x -> 5400gb total
scratch: 150gb of 2 disks each bus, 8 disks total, raid5 = 1050gb
db: 150gb of 2 disks each bus, 8 disks total, raid5 = 1050gb
FB: 1 whole disks each bus, 4 disks total, raid0+1 = 600gb