Using a sample of roughly 100,000 drives, the study which was presented last month at the 5th USENIX Conference in San Jose also found that SATA drives were no less reliable than the more expensive and faster Fibre Channel varieties.
Computerworld, reports that a further study presented at the same conference, which used drive samples from Google run data centres, interestingly found that drive temperatures appeared to have little effect on drive reliability rates.
The Carnegie Mellon University study debates that manufacturer data sheets for the drive samples used as part of the study, indicated MTTF values (Mean Time To Failure) of between 1 to 1.5 million hours, by which a conclusion was drawn that the worst case failure rate would equal some 0.88%. The study however uncovered that the failure rate of the drive samples used, which were from large production systems, Internet service sites and so on, was on an annual figure of somewhere between 2 and 4%. Some systems delivered failure rates of an astounding 13%.
Commenting on the study itself, the associate professor of computer science, as well as the co-author of the study itself, Garth Gibson, stressed that the goal of the study was to aid manufacturers to make improvements not only with the design of drives, but with the testing processes used as well. He went on to make clear that he had no vendor-specific material and that the study did not necessarily track actual drive failures but instead, customer diagnosed drive failure where the customer felt the drive in question required replacement. Lastly, he stated that helping users to distinguish between the best and worst vendors was not a goal of the study.
Mr. Gibson, went on to voice similar opinions held by analysts and vendors that perhaps as many as 50% of storage drives returned by customers had no failures and that failures in general could occur for a multitude of reasons, ranging from extraordinary environments which the drive may be subjected to, to random read-write or intensive operations that could simply cause premature wear and tear to the mechanical components within the drive.
Amongst the drive vendors who were asked to comment upon the study, several declined the invitation. A spokesperson from Seagate, based in California, responded via e-mail to express that 'The conditions that surround true drive failures are complicated and require a detailed failure analysis to determine what the failure mechanisms were'. Mirroring the information provided in the paragraph above, the spokesperson went on to state that 'It is important to not only understand the kind of drive being used, but the system or environment in which it was placed and its workload'.
However, perhaps not everyone was surprised at the results observed within the study. In particular, Ashish Nadkarni, holding the position of a principal consultant at Massachusetts based storage services provider, GlassHouse Technologies Inc., expressed no surprise at the replacement rates quoted because of the distinct differences between the environment used by drive vendors to test drives and, the dust, environment, noise and vibrations which may be present in a data centre.
Mr. Nadkarni elaborated further by describing how, in his view, the overall quality of drives, due to price competition within the industry, has been falling over time. He suggested that customers to implement tracking of disk drive records and to press vendors to review their internal testing procedures.