Network of Excellence in Internet Science

Understanding, Modeling, and Impact of deviations in system operation due to social and organizational processes

I want to say a few words about JRA7 in this blog. JRA7 is about critical infrastructure research, in particular with respect to resilience, security, and dependability. This is an ICT problem, but not only as the following problem description shows. This is one example for research we want to discuss in JRA7 context and we would be interested in contributions in upcoming workshops (e.g. April 03, 2014 at DRCN in Ghent, check JRA7 blog http://internet-science.eu/blogs/group/56 for future posts on these workshops). Now my problem description:

Algorithms and concepts to provide security and resilience in (critical infrastructure) systems may operate on false assumptions. First, there may be wrong initial assumptions already in the planned concept. Moreover, implementation of software and hardwaremay deviate from plan due to ambiguities and different levels of detail in their specification and realization as well as due to  social processes in the overall development process. Changes in configuration and operational processes lead to additional changes. Behavior of actors like users may also deviate from plan. In particular, users and operators may learn behavior and adapt to what seems best from their local perspective. As a consequence, the system will operate in a different scenario than the one it was designed for.

To some degree current Internet technology anticipates small system changes, e.g. routing protocols adapt communication paths in case of failures or topological changes.  However, dynamic routing makes it hard to predict actual behavior and guarantee performance. The routing protocol may also optimize according to the wrong objective when other technologies are used than expected, e.g. wireless instead of wired. Another example might be that software integrity checks may make software updates difficult. Thus, operators would likely avoid the hassle and rather use unpatched software with security-relevant flaws. Warnings from the control system may be ambiguous and operators might check the wrong system component or use the wrong method due to changes in operation. Human factors may discourage operators from raising warnings and initiate a reaction.

One way to integrate these kinds of deviations into the system analysis could be to not only model the planned system and analyze its properties, but also to analyze slightly changed versions of the system. However, this does not only massively increase the overall complexity of the analysis.  Processes that lead to these deviations are as of yet often ignored in the modeling. Many methods are not yet made compatible with such ideas, e.g. classifications like secure and insecure may not be appropriate when in some unlikely deviation of the system security properties might not hold and all systems would be labelled insecure.  System design that is resilient against such kind of challenges would be important for the safety and security of networked critical infrastructures. The challenge is to understand these deviations, help to model them, and assess their impact on algorithms and concepts used in such systems.