|Abstract:||For many physical systems (e.g., computer systems, computer networks, industrial plants, etc.) one of the most important property is definitely the system dependability. Dependability is a property involving several different aspects concerning the behavior of a system, such as reliability, availability, safety and security among the others. The main perception when thinking about a dependable system is to consider a system that does not fail during its regular activity.
From the point of view of a system user, it reflects the extent of his/her confidence that the system will operate as the user expects, and thus that it will not fail during normal use.
In fact, system failures are often unavoidable and they may have widespread effects, by affecting other systems, as well as people somewhat related to the system itself; this includes system operators, system users, but also people indirectly involved in the system environment (we can think for instance to the population living around a potentially dangerous
plant). Systems that are not dependable are in fact unreliable, unavailable when needed, unsafe or insecure, and because of that, they may be rejected by their users. Moreover, cost issues must also be taken into account; if a failure leads to economic losses or physical damage, both direct failure impacts, as well as recovery costs have to be seriously considered.
If one wants to reason about all the above mentioned aspects, it is clear that formal models must be introduced. Such models have to be properly defined with respect to a system specification, since a failure (and all the consequences of that) is a deviation from a specification. However specifications can rarely be complete, as well as deterministic;
thus a problem arising in building dependability models is the problem of uncertainty. It follows that some of the most relevant problems related to system dependability concern the representation and modeling of the system, the quantification of system model parameters and the representation, propagation and quantification of the
uncertainty in the system behavior.
Moreover, in order to address a concrete dependability problem, other important issues to be considered are: the temporal dimension, with particular attention to the modeling and analysis of temporal dependencies that can arise among system components, the multi-state nature of several components (that cannot be constrained in the
standard dichotomy working/failed), the risk/utility analysis, often related to the definition of suitable control or recovery policies on the system under examination.
Classical approaches to dependability modeling and analysis show several limitations with respect to the above mentioned issues: combinatorial approaches (such as Fault Trees or Reliability Block Diagrams) are simple to use and analyze, but they are limited in modeling power; on the other hand, state-space approaches (such as Markov models) pay their augmented representational power, with more complex or less efficient analysis techniques (and with the state space explosion problem, typical of such models).
The main problem is then to define an approach where important dependencies among system components can be captured, while keeping the analysis task manageable at the same time. In the Artificial Intelligence field, similar problems have been addressed and solved by the adoption of Probabilistic Graphical Models. Model languages belonging to such a class are Bayesian Network and Decision Network formalisms. The former is a graphical and compact representation of a joint probability distribution, allowing to localize dependencies among modeled entities (system components or sub-systems in the case of a dependability application), and exploiting such dependencies, in order
to reduce the number of probabilistic parameters to be specified. This results in a sound probabilistic model, relying on local specifications, where different kinds of probabilistic queries can be asked (in particular, posterior probability queries, after the gathering of specific information). Several important tasks for dependability analysis can be naturally framed in the setting of such probabilistic queries. Temporal aspects and dynamic dependencies can be addressed with dynamic versions of Bayesian Networks, having the advantage, with respect to standard Markov models, of considering a factored state space. Decision Networks are finally extensions to Bayesian Networks, where also external actions, as well as the utility of specific system conditions can be modeled. This allows the analyst to exploit a decision theoretic
framework to perform risk/utility analysis, which is very important in the dependability field (as noticed above).
The aim of the book is to present approaches to the dependability (reliability, availability, risk and safety, security) of systems, using the Artificial Intelligence (AI) framework of Probabilistic Graphical Models. This framework (and in particular the Bayesian Network formalism) has been extensively employed in several sub-fields of AI which are strictly related to dependability and reliability issues, like diagnostic problem solving, intelligent monitoring and recovery planning.
After a survey on the main concepts and methodologies adopted in dependability analysis, the book discusses the main features of Probabilistic Graphical Models, by considering Bayesian Networks, Dynamic Bayesian Networks and Decision Networks. The advantages, both in terms of modeling and analysis, with respect to classical dependability formalisms
are deeply discussed. Methodologies for deriving Probabilistic Graphical Models from standard dependability languages
(such as Fault Tree or Dynamic Fault Tree) are introduced, by pointing out tools able to support such a process.
Several case studies are presented and analyzed in the book, in order to support the claim concerning the suitability of the use of Probabilistic Graphical Models in the study of dependable systems. Such case studies concentrates on different facets of the dependability concept, like standard reliability, dynamic reliability, selection of optimal repair policies, cascading failures, fault detection, identification and recovery, safety and security assessment. Some of such examples refer to real-world case studies, where the approach based on Probabilistic Graphical Models has proven to be very successful.|