On Failure Diagnosis of the Storage Stack
Abstrak
Diagnosing storage system failures is challenging even for professionals. One example is the "When Solid State Drives Are Not That Solid" incident occurred at Algolia data center, where Samsung SSDs were mistakenly blamed for failures caused by a Linux kernel bug. With the system complexity keeps increasing, such obscure failures will likely occur more often. As one step to address the challenge, we present our on-going efforts called X-Ray. Different from traditional methods that focus on either the software or the hardware, X-Ray leverages virtualization to collects events across layers, and correlates them to generate a correlation tree. Moreover, by applying simple rules, X-Ray can highlight critical nodes automatically. Preliminary results based on 5 failure cases shows that X-Ray can effectively narrow down the search space for failures.
Topik & Kata Kunci
Penulis (4)
Duo Zhang
Om Rameshwar Gatla
Runzhou Han
Mai Zheng
Akses Cepat
- Tahun Terbit
- 2020
- Bahasa
- en
- Sumber Database
- arXiv
- Akses
- Open Access ✓