arXiv Open Access 2020

On Failure Diagnosis of the Storage Stack

Duo Zhang Om Rameshwar Gatla Runzhou Han Mai Zheng
Lihat Sumber

Abstrak

Diagnosing storage system failures is challenging even for professionals. One example is the "When Solid State Drives Are Not That Solid" incident occurred at Algolia data center, where Samsung SSDs were mistakenly blamed for failures caused by a Linux kernel bug. With the system complexity keeps increasing, such obscure failures will likely occur more often. As one step to address the challenge, we present our on-going efforts called X-Ray. Different from traditional methods that focus on either the software or the hardware, X-Ray leverages virtualization to collects events across layers, and correlates them to generate a correlation tree. Moreover, by applying simple rules, X-Ray can highlight critical nodes automatically. Preliminary results based on 5 failure cases shows that X-Ray can effectively narrow down the search space for failures.

Topik & Kata Kunci

Penulis (4)

D

Duo Zhang

O

Om Rameshwar Gatla

R

Runzhou Han

M

Mai Zheng

Format Sitasi

Zhang, D., Gatla, O.R., Han, R., Zheng, M. (2020). On Failure Diagnosis of the Storage Stack. https://arxiv.org/abs/2005.02547

Akses Cepat

Lihat di Sumber
Informasi Jurnal
Tahun Terbit
2020
Bahasa
en
Sumber Database
arXiv
Akses
Open Access ✓