DSN 2010:Chicago, IL, USA

Proceedings of the 2010 IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2010, Chicago, IL, USA, June 28 - July 1 2010. IEEE Computer Society 【DBLP Link】

Paper Num: 69 || Session Num: 0

1. Scalable RFID systems: a privacy-preserving protocol with constant-time identification.

【Paper Link】【Pages】:1-10

【Authors】: Basel Alomair ; Andrew Clark ; Jorge Cuéllar ; Radha Poovendran

【Abstract】: In RFID literature, most “privacy-preserving” protocols require the reader to search all tags in the system in order to identify a single tag. In another class of protocols, the search complexity is reduced to be logarithmic in the number of tags, but it comes with two major drawbacks: it requires a large communication overhead over the fragile wireless channel, and the compromise of a tag in the system reveals secret information about other, uncompro-mised, tags in the same system. In this work, we take a different approach to address time-complexity of private identification in large-scale RFID systems. We utilize the special architecture of RFID systems to propose the first symmetric-key privacy-preserving authentication protocol for RFID systems with constant-time identification. Instead of increasing communication overhead, the existence of a large storage device in RFID systems, the database, is utilized for improving the time efficiency of tag identification.

【Keywords】: Radiofrequency identification; Radio frequency; Public key; RFID tags; Public key cryptography; Databases; Data privacy; Wireless application protocol; Large-scale systems; Authentication

2. Detecting selfish carrier-sense behavior in WiFi networks by passive monitoring.

【Paper Link】【Pages】:11-20

【Authors】: Utpal Paul ; Samir R. Das ; Ritesh Maheshwari

【Abstract】: With the advent of programmability in radios, it is becoming easier for wireless network nodes to cheat to obtain an unfair share of the bandwidth. In this work we study the widely used 802.11 protocol and present a solution to detect selfish carrier-sensing behavior where a node raises the CCA (clear channel assessment) threshold for carrier-sensing, or simply does not sense carrier (possibly randomly to avoid detection). Our approach is based on detecting any asymmetry in carrier-sense behavior between node pairs and finding multiple such witnesses to raise confidence. The approach is completely passive. It requires deploying multiple sniffers across the network to capture wireless traffic traces. These traces are then analyzed by using a machine learning approach to infer carrier-sense relationships between network nodes. Evaluations using a real testbed as well as ns2 simulation studies demonstrate excellent detection ability. The metric of selfishness used to estimate selfish behaviormatches closely with actual degree of selfishness observed.

【Keywords】: MAC layer misbehavior; 802.11 protocol; Hidden Markov Model

3. Detecting Sybil nodes in wireless networks with physical layer network coding.

【Paper Link】【Pages】:21-30

【Authors】: Weichao Wang ; Di Pu ; Alexander M. Wyglinski

【Abstract】: Previous research on the security of network coding focuses on the detection of pollution attacks. The capabilities of network coding to detect malicious attacks have not been fully explored. We propose a new mechanism based on physical layer network coding to detect the Sybil nodes. When two signal sequences collide at the receiver, the starting point of the collision is determined by the distances between the receiver and the senders. When the distance between two receivers is large enough, they can combine their interference sequences to recover the original data packets. On the contrary, the Sybil nodes attached to the same physical device cannot accomplish the data recovery procedure. We have proposed several schemes at both physical and network layers to transform the idea into a practical approach. The investigation shows that the wireless nodes can effectively detect Sybil nodes without the adoption of special hardware or time synchronization.

【Keywords】: Wireless networks; Physical layer; Network coding; Hardware; Pollution; Communication system security; Data security; Interference; Network topology; Routing protocols

4. Observable non-Sybil quorums construction in one-hop wireless ad hoc networks.

【Paper Link】【Pages】:31-40

【Authors】: Diogo Mónica ; João Leitão ; Luís E. T. Rodrigues ; Carlos Ribeiro

【Abstract】: The Sybil Attack is a serious threat to the secure and dependable operation of wireless ad hoc networks. This paper proposes an algorithm to provide each correct node in an one-hop wireless network with a quorum of non-Sybil identities from the neighbourhood. The quorums provided to different correct nodes may differ, but their intersection is composed by a majority of correct identities, with an arbitrarily close to 1 probability. Therefore, the quorums may be used for different purposes, such as voting. The algorithm is based on the combination of different resource tests, to efficiently detect (and exclude) Sybil identities.

【Keywords】: Mobile ad hoc networks; Voting; Ad hoc networks; Wireless networks; System testing; Aggregates; Robustness; Routing protocols; Intrusion detection; Wireless communication

5. Reliable MLC NAND flash memories based on nonlinear t-error-correcting codes.

【Paper Link】【Pages】:41-50

【Authors】: Zhen Wang ; Mark G. Karpovsky ; Ajay Joshi

【Abstract】: Multi-level cell (MLC) NAND flash memories are very popular storage media because of their power efficiency and big storage density. This paper proposes to use nonlinear t-error-correcting codes to replace linear BCH codes for error detection and correction in MLC NAND flash memories. Compared to linear BCH codes with the same bit-error correcting capability t, the proposed codes have less errors miscorrected by all codewords and nearly no undetectable errors. For example, the proposed (8281, 8201, 11) 5-error-correcting code has no errors of multiplicity six miscorrected by all codewords while the widely used (8262, 8192, 11) linear shortened BCH code has 11 over 6 × A11 errors in this class, where A11 ≈ 10¹⁴ is the number of codewords of Hamming weight eleven in the shortened BCH code. Moreover, in spite of the fact that the Hamming distance of the proposed code is 2t+1, it can also correct some errors of multiplicity t+1 and t+2 requiring no extra hardware overhead and latency penalty. In this paper, the constructions and the error correction algorithm for the nonlinear t-error-correcting codes are presented. The architecture of the encoder and the decoder for the codes are shown. The error correcting capabilities, the hardware overhead, the latency and the power consumption for the encoder and the decoder will be analyzed and compared to that of the linear BCH codes to demonstrate the advantages of the proposed codes for error detection and correction in MLC NAND flash memories.

【Keywords】: Error correction codes; Power system reliability; Hardware; Delay; Decoding; Portable media players; Threshold voltage; Bit error rate; Computer networks; Laboratories

6. Code-M: A non-MDS erasure code scheme to support fast recovery from up to two-disk failures in storage systems.

【Paper Link】【Pages】:51-60

【Authors】: Shenggang Wan ; Qiang Cao ; Changsheng Xie ; Benjamin Eckart ; Xubin He

【Abstract】: In this paper, we present a novel coding scheme that can tolerate up to two-disk failures, satisfying the RAID-6 property. Our coding scheme, Code-M, is a non-MDS (Maximum Distance Separable, tolerating maximum failures with a given amount of redundancy) code that is optimized by trading rate for fast recovery times. Code-M is lowest density and its parity chain length is fixed at 2C − 1 for a given number of columns in a strip-set C. The rate of Code-M, or percentage of disk space occupied by non-parity data, is (C − 1)/C. We perform theoretical analysis and evaluation of the coding scheme under different configurations. Our theoretical analysis shows that Code-M has favorable reconstruction times compared to RDP, another well-established RAID-6 code. The quantitative comparisons of Code-M against RDP demonstrate recovery performance improvement by a factor of up to 5.18 under single disk failure and 2.8 under double failures using the same number of disks. Overall, Code-M is a RAID-6 type code supporting fast recovery with reduced I/O complexity.

【Keywords】: Disk drives; Computer networks; Redundancy; Performance evaluation; Performance analysis; Helium; Costs

7. Decoding STAR code for tolerating simultaneous disk failure and silent errors.

【Paper Link】【Pages】:61-70

【Authors】: Jianqiang Luo ; Cheng Huang ; Lihao Xu

【Abstract】: As storage systems grow in size and complexity, various hardware and software component failures inevitably occur, resulting in disk malfunction in failures, as well as silent errors. Existing techniques and schemes overcome the failures and silent errors in a separate fashion. In this paper, we advocate using the STAR code as a unified and systematic mechanism to simultaneously tolerate failures on one disk and silent errors on another. By exploring the unique geometric structure of the STAR code, we propose a novel efficient decoding algorithm - EEL. Both theoretical and experimental performance evaluations show that EEL constantly outperforms a naive Try-and-Test approach by large factors in overall decoding throughput.

【Keywords】: Decoding; Error correction codes; Error correction; Disk drives; Hardware; Microprogramming; Performance loss; Throughput; Software performance; Computer bugs

8. Diverse Partial Memory Replication.

【Paper Link】【Pages】:71-80

【Authors】: Ryan M. Lefever ; Vikram S. Adve ; William H. Sanders

【Abstract】: An important approach for software dependability is the use of diversity to detect and/or tolerate errors. We develop and evaluate an approach for automated program diversity called Diverse Partial Memory Replication (DPMR), aimed at detecting memory safety errors. DPMR is an automatic compiler transformation that replicates some subset of an executable's data memory and applies one or more diversity transformations to the replica. DPMR can detect any kind of memory safety errors in any part of a program's data memory. Moreover, DPMR is novel because it uses partial replication within a single address space, replicating (and comparing) only a subset of a program's memory. We also perform a detailed study of the diversity mechanisms and state comparison policies in DPMR (a first of its kind for such diversity approaches), which is valuable for exploiting the high flexibility of DPMR.

【Keywords】: experimental evaluation; software memory errors; diversity; replication; fault injection

9. Data recovery for web applications.

【Paper Link】【Pages】:81-90

【Authors】: Istemi Ekin Akkus ; Ashvin Goel

【Abstract】: Web-based applications store their data at the server side. This design has several benefits, but it can also cause a serious problem because a misconfiguration, bug or vulnerability leading to data loss or corruption can affect many users. While data backup solutions can help resolve some of these issues, they do not help diagnose the events that led to the corruption or the precise set of changes caused by these events. In this paper, we describe the design of a recovery system that helps administrators recover from data corruption caused by bugs in web applications. Our system tracks application requests, helping identify requests that cause data corruption, and reuses undo logs already kept by databases to selectively recover from the effects of these requests. The main challenge is to correlate requests across the multiple tiers of the application to determine the correct recovery actions. We explore using dependencies both within and across requests at three layers (database, application, and client) to help identify data corruption accurately. We evaluate our system using known bugs in popular web applications, including Wordpress, Drupal and Gallery2. Our results show that our system enables recovery from data corruption without loss of critical data and incurs small runtime overhead.

【Keywords】: Computer bugs; Databases; Network servers; Runtime; Manuals; Content management; Pricing; Testing; Blogs

10. Programming support and adaptive checkpointing for high-throughput data services with log-based recovery.

【Paper Link】【Pages】:91-100

【Authors】: Jingyu Zhou ; Caijie Zhang ; Hong Tang ; Jiesheng Wu ; Tao Yang

【Abstract】: Many applications in large-scale data mining and offline processing are organized as network services, running continuously or for a long period of time. To sustain high-throughput, these services often keep their data in memory, thus susceptible to failures. On the other hand, the availability requirement for these services is not as stringent as online services exposed to millions of users. But those data-intensive offline or mining applications do require data persistence to survive failures. This paper presents programming and runtime support called SLACH for building multi-threaded high-throughput persistent services. To keep in-memory objects persistent, SLACH employs application-assisted logging and checkpointing for log-based recovery while maximizing throughput and concurrency. SLACH adaptively adjusts checkpointing frequency based on log growth and throughput demand to balance between runtime overhead and recovery speed. This paper describes the design and API of SLACH, adaptive checkpoint control, and our experiences and experiments in using SLACH at Ask.com.

【Keywords】: Checkpointing; Runtime; Throughput; Large-scale systems; Data mining; Availability; Buildings; Concurrent computing; Frequency; Programmable control

11. StageWeb: Interweaving pipeline stages into a wearout and variation tolerant CMP fabric.

【Paper Link】【Pages】:101-110

【Authors】: Shantanu Gupta ; Amin Ansari ; Shuguang Feng ; Scott A. Mahlke

【Abstract】: Manufacture-time process variation and life-time failure projections have become a major industry concern. Consequently, fault tolerance, historically of interest only for mission-critical systems, is now gaining attention in the mainstream computing space. Traditionally reliability issues have been addressed at a coarse granularity, e.g., by disabling faulty cores in chip multiprocessors. However, this is not scalable to higher failure rates. In this paper, we propose StageWeb, a fine-grained wearout and variation tolerance solution, that employs a reconfigurable web of replicated processor pipeline stages to construct dependable many-core chips. The interconnection flexibility of StageWeb simultaneously tackles wearout failures (by isolating broken stages) and process variation (by selectively disabling slower stages). Our experiments show that through its wearout tolerance, a StageWeb chip performs up to 70% more cumulative work than a comparable chip multiprocessor. Further, variation mitigation in StageWeb enables it to scale supply voltage more aggressively, resulting in up to 16% energy savings.

【Keywords】: reliability; permanent faults; process variation; multicore; architecture

12. Architecting reliable multi-core network-on-chip for small scale processing technology.

【Paper Link】【Pages】:111-120

【Authors】: Xin Fu ; Tao Li ; José A. B. Fortes

【Abstract】: The trend towards multi-/many- core design has made network-on-chip (NoC) a crucial component of future microprocessors. With CMOS processing technologies continuously scaling down to the nanometer regime, effects such as process variation (PV) and negative bias temperature instability (NBTI) significantly decrease hardware reliability and lifetime. Therefore, it is imperative for multi-core architects to consider and mitigate these effects in NoCs implemented using small-scale processing technology. This paper reports on a first step to optimize NoC architecture reliability in light of both PV and NBTI effects. We propose novel techniques that can hierarchically alleviate PV and NBTI effects on NoC while leveraging their benign interaction. Our low-level design improves PV and NBTI efficiency of key components (e.g. virtual channel allocation logics, virtual channels) of critical paths of the pipelined router microarchitecture. Our high-level mechanisms leverage NBTI degradation and PV information from multiple routers to intelligently route packets, delivering optimized performance-power-reliability across the NoC substrate. Experimental results show that our intra-router level techniques (i.e. VA_M1 and VC_M2) reduce guardband by 47% while improving network throughput by 24%. Our inter-router optimization scheme (i.e. IR_M3) results in 50% guardband reduction and 19% network latency improvement.

【Keywords】: Network-on-a-chip; Niobium compounds; Titanium compounds; CMOS technology; Microprocessors; CMOS process; Negative bias temperature instability; Hardware; Channel allocation; Logic design

13. Energy-efficient fault tolerance in chip multiprocessors using Critical Value Forwarding.

【Paper Link】【Pages】:121-130

【Authors】: Pramod Subramanyan ; Virendra Singh ; Kewal K. Saluja ; Erik Larsson

【Abstract】: Relentless CMOS scaling coupled with lower design tolerances is making ICs increasingly susceptible to wear-out related permanent faults and transient faults, necessitating on-chip fault tolerance in future chip microprocessors (CMPs). In this paper we introduce a new energy-efficient fault-tolerant CMP architecture known as Redundant Execution using Critical Value Forwarding (RECVF). RECVF is based on two observations: (i) forwarding critical instruction results from the leading to the trailing core enables the latter to execute faster, and (ii) this speedup can be exploited to reduce energy consumption by operating the trailing core at a lower voltage-frequency level. Our evaluation shows that RECVF consumes 37% less energy than conventional dual modular redundant (DMR) execution of a program. It consumes only 1.26 times the energy of a non-fault-tolerant baseline and has a performance overhead of just 1.2%.

【Keywords】: Energy efficiency; Fault tolerance; Microprocessors; Fault tolerant systems; Hardware; Temperature; Power dissipation; Energy consumption; Fault detection; Proposals

14. A fast and accurate multi-cycle soft error rate estimation approach to resilient embedded systems design.

【Paper Link】【Pages】:131-140

【Authors】: Mahdi Fazeli ; Seyed Ghassem Miremadi ; Hossein Asadi ; Seyed Nematollah Ahmadian

【Abstract】: In this paper, we propose a very fast and accurate analytical approach to estimate the overall SER and to identify the most vulnerable gates, flip-flops, and paths of a circuit. Using such information, designers can selectively protect the vulnerable parts resulting in lower power and area overheads that are the most important factors in embedded systems. Unlike previous approaches, the proposed approach firstly does not rely on fault injection or fault simulation; secondly it measures the SER for multi cycles of circuit operation; thirdly, the proposed approach accurately computes all three masking factors, namely, logical, electrical, and timing masking; fourthly, the effects of error propagation in re-convergent fanouts are considered in the proposed approach. SERs estimated by the proposed approach for some ISCAS89 circuit benchmarks are compared with that estimated by the Monte Carlo (MC) simulation based fault injection approach. The results show that the proposed approach is about four orders of magnitude faster than the MC fault injection approach while having an accuracy of about 97%. This level of fastness and accuracy makes the proposed approach a viable solution to measure the SER of very large size circuits used in industry.

【Keywords】: Error analysis; Estimation error; Embedded system; Circuit faults; Computational modeling; Circuit simulation; Flip-flops; Power system protection; Electric variables measurement; Timing

15. Bit-slice logic interleaving for spatial multi-bit soft-error tolerance.

【Paper Link】【Pages】:141-150

【Authors】: Nishant J. George ; Carl R. Elks ; Barry W. Johnson ; John Lach

【Abstract】: Semiconductor devices are becoming more susceptible to single event upsets (SEUs) as device dimensions, operating voltages and frequencies are scaled. The majority of architecture-, logic- and circuit-level techniques that have been developed to address SEUs in logic assume a single-point fault model. This will soon be insufficient as the occurrence of spatial multi-bit errors is becoming prevalent in highly scaled devices. In this paper, we explore this new fault model and evaluate the effectiveness of conventional fault tolerance techniques to mitigate such faults. We also extend the idea of bit interleaving in memory to logic bit slices and explore its utility as an approach to spatial multi-bit error mitigation in logic. We present a comparison of these techniques using a case study of a Brent-Kung adder at a 90-nm process.

【Keywords】: Interleaved codes; Logic devices; Circuit faults; Single event transient; Semiconductor devices; Single event upset; Voltage; Frequency; Logic circuits; Fault tolerance

16. WearMon: Reliability monitoring using adaptive critical path testing.

【Paper Link】【Pages】:151-160

【Authors】: Bardia Zandian ; Waleed Dweik ; Suk Hun Kang ; Thomas Punihaole ; Murali Annavaram

【Abstract】: As processor reliability becomes a first order design constraint, this research argues for a need to provide continuous reliability monitoring. We present WearMon, an adaptive critical path monitoring architecture which provides accurate and real-time measure of the processor's timing margin degradation. Special test patterns check a set of critical paths in the circuit-under-test. By activating the actual devices and signal paths used in normal operation of the chip, each test will capture up-to-date timing margin of these paths. The monitoring architecture dynamically adapts testing interval and complexity based on analysis of prior test results, which increases efficiency and accuracy of monitoring. Experimental results based on FPGA implementation show that the proposed monitoring unit can be easily integrated into existing designs. Monitoring overhead can be reduced to zero by scheduling tests only when a unit is idle.

【Keywords】: Timing Margins; Reliability; Critical Paths

17. A numerical optimization-based methodology for application robustification: Transforming applications for error tolerance.

【Paper Link】【Pages】:161-170

【Authors】: Joseph Sloan ; David Kesler ; Rakesh Kumar ; Ali Rahimi

【Abstract】: There have been several attempts at correcting process variation induced errors by identifying and masking these errors at the circuit and architecture level [10, 27]. These approaches take up valuable die area and power on the chip. As an alternative, we explore the feasibility of an approach that allows these errors to occur freely, and handle them in software, at the algorithmic level. In this paper, we present a general approach to converting applications into an error tolerant form by recasting these applications as numerical optimization problems, which can then be solved reliably via stochastic optimization. We evaluate the potential robustness and energy benefits of the proposed approach using an FPGA-based framework that emulates timing errors in the floating point unit (FPU) of a Leon3 processor [11]. We show that stochastic versions of applications have the potential to produce good quality outputs in the face of timing errors under certain assumptions. We also show that good quality results are possible for both intrinsically robust algorithms as well as fragile applications under these assumptions.

【Keywords】: Optimization methods; Robustness; Error correction; Application software; Stochastic processes; Timing; Circuits; Computer architecture; Software algorithms; Potential energy

18. Blue Banana: resilience to avatar mobility in distributed MMOGs.

【Paper Link】【Pages】:171-180

【Authors】: Sergey Legtchenko ; Sébastien Monnet ; Gaël Thomas

【Abstract】: Massively Multiplayer Online Games (MMOGs) recently emerged as a popular class of applications with millions of users. To offer acceptable gaming experience, such applications need to render the virtual world surrounding the player with a very low latency. However, current state-of-the-art MMOGs based on peer-to-peer overlays fail to satisfy these requirements. This happens because avatar mobility implies many data exchanges through the overlay. As state-of-the-art overlays do not anticipate this mobility, the needed data is not delivered on time, which leads to transient failures at the application level. To solve this problem, we propose Blue Banana, a mechanism that models and predicts avatar movement, allowing the overlay to adapt itself by anticipation to the MMOG needs. Our evaluation is based on large-scale traces derived from Second life. It shows that our anticipation mechanism decreases by 20% the number of transient failures with only a network overhead of 2%.

【Keywords】: Resilience; Avatars; Peer to peer computing; Information retrieval; Delay; Large-scale systems; Second Life; Degradation; Online Communities/Technical Collaboration; Scalability

19. Efficient eventual consistency in Pahoehoe, an erasure-coded key-blob archive.

【Paper Link】【Pages】:181-190

【Authors】: Eric Anderson ; Xiaozhou Li ; Arif Merchant ; Mehul A. Shah ; Kevin Smathers ; Joseph Tucek ; Mustafa Uysal ; Jay J. Wylie

【Abstract】: Cloud computing demands cheap, always-on, and reliable storage. We describe Pahoehoe, a key-value cloud storage system we designed to store large objects cost-effectively with high availability. Pahoehoe stores objects across multiple data centers and provides eventual consistency so to be available during network partitions. Pahoehoe uses erasure codes to store objects with high reliability at low cost. Its use of erasure codes distinguishes Pahoehoe from other cloud storage systems, and presents a challenge for efficiently providing eventual consistency. We describe Pahoehoe's put, get, and convergence protocols—convergence being the decentralized protocol that ensures eventual consistency. We use simulated executions of Pahoehoe to evaluate the efficiency of convergence, in terms of message count and message bytes sent, for failure-free and expected failure scenarios (e.g., partitions and server unavailability). We describe and evaluate optimizations to the naïve convergence protocol that reduce the cost of convergence in all scenarios.

【Keywords】: Convergence; Availability; Network servers; Computer crashes; Frequency selective surfaces; Cloud computing; Laboratories; Access protocols; Cost function; Social network services

20. Using correlated surprise to infer shared influence.

【Paper Link】【Pages】:191-200

【Authors】: Adam J. Oliner ; Ashutosh V. Kulkarni ; Alex Aiken

【Abstract】: We propose a method for identifying the sources of problems in complex production systems where, due to the prohibitive costs of instrumentation, the data available for analysis may be noisy or incomplete. In particular, we may not have complete knowledge of all components and their interactions. We define influences as a class of component interactions that includes direct communication and resource contention. Our method infers the influences among components in a system by looking for pairs of components with time-correlated anomalous behavior. We summarize the strength and directionality of shared influences using a Structure-of-Influence Graph (SIG). This paper explains how to construct a SIG and use it to isolate system misbehavior, and presents both simulations and in-depth case studies with two autonomous vehicles and a 9024-node production supercomputer.

【Keywords】: Production systems; Instruments; Computer science; Costs; Silicon carbide; Data analysis; Remotely operated vehicles; Mobile robots; Supercomputers; Delay effects

21. Keystroke biometrics with number-pad input.

【Paper Link】【Pages】:201-210

【Authors】: Roy A. Maxion ; Kevin S. Killourhy

【Abstract】: Keystroke dynamics is the process of identifying individual users on the basis of their typing rhythms, which are in turn derived from the timestamps of key-press and key-release events in the keyboard. Many researchers have explored this domain, with mixed results, but few have examined the relatively impoverished territory of digits only, particularly when restricted to using a single finger - which might come into play on an automated teller machine, a mobile phone, a digital telephone dial, or a digital electronic security keypad at a building entrance. In this work, 28 users typed the same 10-digit number, using only the right-hand index finger. Employing statistical machine-learning techniques (random forest), we achieved an unweighted correct-detection rate of 99.97% with a corresponding false-alarm rate of 1.51%, using practiced 2-of-3 encore typing with outlier handling. This level of accuracy approaches sufficiency for two-factor authentication for passwords or PIN numbers.

【Keywords】: Biometrics; Keyboards; Rhythm; Authentication; Telegraphy; Laboratories; Computer science; Telephony; Security; Europe

22. Using Bayesian networks for cyber security analysis.

【Paper Link】【Pages】:211-220

【Authors】: Peng Xie ; Jason H. Li ; Xinming Ou ; Peng Liu ; Renato Levy

【Abstract】: Capturing the uncertain aspects in cyber security is important for security analysis in enterprise networks. However, there has been insufficient effort in studying what modeling approaches correctly capture such uncertainty, and how to construct the models to make them useful in practice. In this paper, we present our work on justifying uncertainty modeling for cyber security, and initial evidence indicating that it is a useful approach. Our work is centered around near real-time security analysis such as intrusion response. We need to know what is really happening, the scope and severity level, possible consequences, and potential countermeasures. We report our current efforts on identifying the important types of uncertainty and on using Bayesian networks to capture them for enhanced security analysis. We build an example Bayesian network based on a current security graph model, justify our modeling approach through attack semantics and experimental study, and show that the resulting Bayesian network is not sensitive to parameter perturbation.

【Keywords】: Bayesian methods; Computer security; Uncertainty; File servers; Web server; Access protocols; USA Councils; Graphical models; Workstations; Intrusion detection

23. A study of the internal and external effects of concurrency bugs.

【Paper Link】【Pages】:221-230

【Authors】: Pedro Fonseca ; Cheng Li ; Vishal Singhal ; Rodrigo Rodrigues

【Abstract】: Concurrent programming is increasingly important for achieving performance gains in the multi-core era, but it is also a difficult and error-prone task. Concurrency bugs are particularly difficult to avoid and diagnose, and therefore in order to improve methods for handling such bugs, we need a better understanding of their characteristics. In this paper we present a study of concurrency bugs in MySQL, a widely used database server. While previous studies of real-world concurrency bugs exist, they have centered their attention on the causes of these bugs. In this paper we provide a complementary focus on their effects, which is important for understanding how to detect or tolerate such bugs at run-time. Our study uncovered several interesting facts, such as the existence of a significant number of latent concurrency bugs, which silently corrupt data structures and are exposed to the user potentially much later. We also highlight several implications of our findings for the design of reliable concurrent systems.

【Keywords】: Concurrent computing; Computer bugs; Interleaved codes; Performance gain; Programming profession; Fault detection; Manufacturing processes; Software performance; Software systems

24. AutomaDeD: Automata-based debugging for dissimilar parallel tasks.

【Paper Link】【Pages】:231-240

【Authors】: Greg Bronevetsky ; Ignacio Laguna ; Saurabh Bagchi ; Bronis R. de Supinski ; Dong H. Ahn ; Martin Schulz

【Abstract】: Today's largest systems have over 100,000 cores, with million-core systems expected over the next few years. This growing scale makes debugging the applications that run on them a daunting challenge. Few debugging tools perform well at this scale and most provide an overload of information about the entire job. Developers need tools that quickly direct them to the root cause of the problem. This paper presents AutomaDeD, a tool that identifies which tasks of a large-scale application first manifest a bug at a specific code region and specific program execution point. AutomaDeD statistically models the application's control-flow and timing behavior, grouping tasks and identifying deviations from normal execution, which significantly reduces debugging effort. In addition to a case study in which AutomaDeD locates a bug that occurred during development of MVAPICH, we evaluate AutomaDeD on a range of bugs injected into the NAS parallel benchmarks. Our results demonstrate that AutomaDeD detects the time period when a bug first manifested with 90% accuracy for stalls and hangs and 70% accuracy for interference faults. It identifies the subset of processes first affected by the fault with 80% accuracy and 70% accuracy, respectively and the code region where the fault first manifested with 90% and 50% accuracy, respectively.

【Keywords】: Debugging; Government; Protection; Large-scale systems; Timing; Computer bugs; Fault detection; Interference; Fault diagnosis; Histograms

25. Detecting vulnerabilities in C programs using trace-based testing.

【Paper Link】【Pages】:241-250

【Authors】: Dazhi Zhang ; Donggang Liu ; Yu Lei ; David Chenho Kung ; Christoph Csallner ; Wenhua Wang

【Abstract】: Security testing has gained significant attention recently due to frequent attacks against software systems. This paper presents a trace-based security testing approach. It reuses test cases generated from previous testing methods to produce execution traces. An execution trace is a sequence of program statements exercised by a test case. Each trace is symbolically executed to produce program constraints and security constraints. A program constraint is a constraint imposed by program logic on program variables. A security constraint is a condition on program variables that must be satisfied to ensure system security. A security flaw exists if there is an assignment of values to program variables that satisfies the program constraint but violates the security constraint. This approach detects security flaws even if existing test cases do not trigger them. The novelty of this method is a test model that unifies program constraints and security constraints such that formal reasoning can be applied to detect vulnerabilities. A tool named SecTAC is implemented and applied to 14 benchmark programs and 3 open-source programs. The experiment shows that SecTAC quickly detects all reported vulnerabilities and 13 new ones that have not been detected before.

【Keywords】: Data security; Vehicle crash testing; System testing; Software testing; Open source software; Computer crashes; Runtime; Automatic testing; Computer science; Computer security

26. Mash-IF: Practical information-flow control within client-side mashups.

【Paper Link】【Pages】:251-260

【Authors】: Zhou Li ; Kehuan Zhang ; XiaoFeng Wang

【Abstract】: Mashup is a representative of Web 2.0 technology that needs both convenience of cross-domain access and protection against the security risks it brings in. Solutions proposed by prior research focused on mediating access to the data in different domains, but little has been done to control the use of the data after the access. In this paper, we present Mash-IF, a new technique for information-flow control within mashups. Our approach allows cross-domain communications within a browser, but disallows disclosure of sensitive information to remote parties without the user's permission. It mediates the cross-domain channels in existing mashups and works on the client without collaborations from other parties. Also of particular interest is a novel technique that automatically generates declassification rules for a script by statically analyzing its code. Such rules can be efficiently enforced through monitoring the script's call sequences and DOM operations.

【Keywords】: Information-Flow Control; Web; Browser; Mashup; Protection; Security Model

27. DataGuard: Dynamic data attestation in wireless sensor networks.

【Paper Link】【Pages】:261-270

【Authors】: Dazhi Zhang ; Donggang Liu

【Abstract】: Attestation has become a promising approach for ensuring software integrity in wireless sensor networks. However, current attestation either focuses on static system properties, e.g., code integrity, or requires hardware support such as Trusted Platform Module (TPM). However, there are attacks exploiting vulnerabilities that do not violate static system properties, and sensor platforms may not have hardware-based security support. This paper presents a software attestation scheme for dynamic data integrity based on data boundary integrity. It automatically transforms the source code and inserts data guards to track run-time program data. A data guard is unrecoverable once it is corrupted by an attacker, even if the attacker fully controls the system later. The corruption of any data guard at runtime can be remotely detected. A corruption either indicates a software attack or a bug in the software that needs immediate attention. The benefits of the proposed attestation scheme are as follows. First, it does not rely on any additional hardware support, making it suitable for low-cost sensor nodes. Second, it introduces minimal communication cost and has adjustable runtime memory overhead. Third, it works even if sensor nodes use different hardware platforms, as long as they run the same software. The prototype implementation and the experiments on TelosB motes show that the proposed technique is both effective and efficient for sensor networks.

【Keywords】: Wireless sensor networks; Hardware; Runtime; Sensor systems; Data security; Automatic control; Control systems; Costs; Software prototyping; Prototypes

28. iProve: A scalable technique for consumer-verifiable software guarantees.

【Paper Link】【Pages】:271-280

【Authors】: Silviu Andrica ; Horatiu Jula ; George Candea

【Abstract】: Formally proving complex program properties is still considered impractical for systems with over a million lines of code. We present iProve, an approach that enables guaranteeing useful properties in large Java systems. Desired properties are proven in iProve as a combination of two proofs: one of a complex property applied to a small piece of code—a nucleus—using existing theorem provers, and a proof of a simple property applied to the rest of the code—the program body—using iProve. We show how iProve can be used to guarantee properties such as communication security, deadlock immunity, data privacy, and resource usage bounds in Java programs with millions of lines of code. iProve scales well, requires no access to source code, and allows nuclei to be reused with an unlimited number of systems and to be written in verification-friendly languages.

【Keywords】: System recovery; Java; Communication system software; Computer networks; Data privacy; Automatic control

29. Reuse-oriented camouflaging trojan: Vulnerability detection and attack construction.

【Paper Link】【Pages】:281-290

【Authors】: Zhiqiang Lin ; Xiangyu Zhang ; Dongyan Xu

【Abstract】: We introduce the reuse-oriented camouflaging trojan — a new threat to legitimate software binaries. To perform a malicious action, such a trojan identifies and reuses an existing function in a legal binary program instead of implementing the function itself. Furthermore, this trojan is stealthy in that the malicious invocation of a targeted function usually takes place in a location where it is legal to do so, closely mimicking a legal invocation. At the network level, the victim binary can still follow its communication protocol without exhibiting any anomalous behavior. Meanwhile, many close-source shareware binaries are rich in functions that can be maliciously “reused”, making them attractive targets of this type of attack. In this paper, we present a framework to determine if a given binary program is vulnerable to this attack and to construct a concrete trojan if so. Our experiments with a number of real-world software binaries demonstrate that the reuse-oriented camouflaging trojans are a real threat and vulnerabilities of this type in legal binaries can be effectively revealed and confirmed.

【Keywords】: Law; Legal factors; Payloads; Computer science; Protocols; Concrete; Logic; Security; Computer worms; Sockets

30. Detection of botnets using combined host- and network-level information.

【Paper Link】【Pages】:291-300

【Authors】: Yuanyuan Zeng ; Xin Hu ; Kang G. Shin

【Abstract】: Bots are coordinated by a command and control (C&C) infrastructure to launch attacks that seriously threaten the Internet services and users. Most botnet-detection approaches function at the network level and require the analysis of packets' payloads, raising privacy concerns and incurring large computational overheads. Moreover, network traffic analysis alone can seldom provide a complete picture of botnets' behavior. By contrast, in-host detection approaches are useful to identify each bot's host-wide behavior, but are susceptible to the host-resident malware if used alone. To address these limitations, we consider both the coordination within a botnet and the malicious behavior each bot exhibits at the host level, and propose a C&C protocol-independent detection framework that combines host- and network-level information for making detection decisions. The framework is shown to be effective in detecting various types of botnets with low false-alarm rates.

【Keywords】: Web server; Storms; Protocols; Telecommunication traffic; Computer worms; Inspection; Command and control systems; Web and internet services; Counting circuits; Relays

31. Dependable connection setup for network capabilities.

【Paper Link】【Pages】:301-310

【Authors】: Soo Bum Lee ; Virgil D. Gligor ; Adrian Perrig

【Abstract】: Network-layer capabilities offer strong protection against link flooding by authorizing individual flows with unforgeable credentials (i.e., capabilities). However, the capability-setup channel is vulnerable to flooding attacks that prevent legitimate clients from acquiring capabilities; i.e., in Denial of Capability (DoC) attacks. Based on the observation that the distribution of attack sources in the current Internet is highly non-uniform, we provide a router-level scheme that confines the effects of DoC attacks to specified locales or neighborhoods (e.g., one or more administrative domains of the Internet). Our scheme provides precise access guarantees for capability schemes, even in the face of flooding attacks. The effectiveness of our scheme is evaluated by ns2 simulations under different attack scenarios.

【Keywords】: Floods; Internet; Protection; Authorization; Aggregates; Counting circuits; Large-scale systems; Filtering; Filters; Telecommunication traffic

32. Why software hangs and what can be done with it.

【Paper Link】【Pages】:311-316

【Authors】: Xiang Song ; Haibo Chen ; Binyu Zang

【Abstract】: Software hang is an annoying behavior and forms a major threat to the dependability of many software systems. To avoid software hang at the design phase or fix it in production runs, it is desirable to understand its characteristics. Unfortunately, to our knowledge, there is currently no comprehensive study on why software hangs and how to deal with it. In this paper, we study the reported hang-related bugs of four typical open-source software applications, aiming to gain insight into characteristics of software hang and provide some guidelines to fix them at the first place or remedy them in production runs.

【Keywords】: Computer bugs; Open source software; Concurrent computing; Software systems; Software debugging; Application software; Databases; System recovery; Runtime environment; Network servers

33. Finding stable cliques of PlanetLab nodes.

【Paper Link】【Pages】:317-322

【Authors】: Elias Procópio Duarte Jr. ; Thiago Garrett ; Luis C. E. Bona ; Renato Carmo ; Alexandre P. Züge

【Abstract】: Users of large scale network testbeds often execute experiments that require a set of nodes that behave and communicate among themselves in a reasonably stable pattern. In this work we call such a set of nodes a stable clique, and introduce a monitoring strategy that allows their detection in PlanetLab, a non-trivial task for such a large scale dynamic network. Nodes monitor each other by sampling the RTT (Round-Trip-Time) and computing its variation. Based on this data and a threshold, pairs of nodes are classified as stable or unstable. A set of graphs is generated, on which maximum sized cliques are computed. Three experiments were conducted in which hundreds of nodes were monitored for several days. Results show the unexpected behavior of some nodes, and the size of the maximum stable clique for different time windows and different thresholds.

【Keywords】: Monitoring; Testing; Protocols; Large-scale systems; Communication channels; Internet; Informatics; Sampling methods; Proposals; Stability

34. Who is peeping at your passwords at Starbucks? - To catch an evil twin access point.

【Paper Link】【Pages】:323-332

【Authors】: Yimin Song ; Chao Yang ; Guofei Gu

【Abstract】: In this paper, we consider the problem of “evil twin” attacks in wireless local area networks (WLANs). An evil twin is essentially a phishing (rogue) Wi-Fi access point (AP) that looks like a legitimate one (with the same SSID name). It is set up by an adversary, who can eavesdrop on wireless communications of users' Internet access. Existing evil twin detection solutions are mostly for wireless network administrators to verify whether a given AP is in an authorized list or not, instead of for a wireless client to detect whether a given AP is authentic or evil. Such administrator-side solutions are limited, expensive, and not available for many scenarios. For example, for traveling users who use wireless networks at airports, hotels, or cafes, they need to protect themselves from evil twin attacks (instead of relying on those wireless network providers, which typically may not provide strong security monitoring/management service). Thus, a lightweight and effective solution for these users is highly desired. In this work, we propose a novel user-side evil twin detection technique that outperforms traditional administrator-side detection methods in several aspects. Unlike previous approaches, our technique does not need a known authorized AP/host list, thus it is suitable for users to identify and avoid evil twins. Our technique does not strictly rely on training data of target wireless networks, nor depend on the types of wireless networks. We propose to exploit fundamental communication structures and properties of such evil twin attacks in wireless networks and to design new active, statistical and anomaly detection algorithms. Our preliminary evaluation in real-world widely deployed 802.11b and 802.11g wireless networks shows very promising results. We can identify evil twins with a very high detection rate while keeping a very low false positive rate.

【Keywords】: Wireless networks; Wireless LAN; Wireless communication; Internet; Airports; Protection; Monitoring; Training data; Algorithm design and analysis; Detection algorithms

35. Improving privacy and lifetime of PCM-based main memory.

【Paper Link】【Pages】:333-342

【Authors】: Jingfei Kong ; Huiyang Zhou

【Abstract】: Phase change memory (PCM) is a promising technology for computer memory systems. However, the non-volatile nature of PCM poses serious threats to computer privacy. The low programming endurance of PCM devices also limits the lifetime of PCM-based main memory (PRAM). In this paper, we first adopt counter-mode encryption for privacy protection and show that encryption significantly reduces the effectiveness of some previously proposed wear-leveling techniques for PRAM. To mitigate such adverse impact, we propose simple, yet effective extensions to the encryption scheme. In addition, we propose to reuse the encryption counters as age counters and to dynamically adjust the strength of error correction code (ECC) to extend the lifetime of PRAM. Our experiments show that our mechanisms effectively achieve privacy protection and lifetime extension for PRAM with very low performance overhead.

【Keywords】: Phase change random access memory; Phase change materials; Counting circuits; Elliptic curve cryptography; Protection; Phase change memory; Error correction codes; Delay; Data privacy; Computer science

36. Generic construction of consensus algorithms for benign and Byzantine faults.

【Paper Link】【Pages】:343-352

【Authors】: Olivier Rütti ; Zarko Milosevic ; André Schiper

【Abstract】: The paper proposes a generic consensus algorithm that highlights the basic and common features of known consensus algorithms. The parameters of the generic algorithm encapsulate the core differences between various consensus algorithms, including leader-based and leader-free algorithms, addressing benign faults, authenticated Byzantine faults and Byzantine faults. This leads to the identification of three classes of consensus algorithms. With the proposed classification, Paxos and PBFT indeed belong to the same class, while FaB Paxos belongs to a different class. Interestingly, the classification allowed us to identify a new Byzantine consensus algorithm that requires n > 4b, where b is the maximum number of Byzantine processes.

【Keywords】: Fault diagnosis; Fault tolerance; Distributed computing; Reactive power; Lead; Fault detection; Detectors

37. Scrooge: Reducing the costs of fast Byzantine replication in presence of unresponsive replicas.

【Paper Link】【Pages】:353-362

【Authors】: Marco Serafini ; Péter Bokor ; Dan Dobre ; Matthias Majuntke ; Neeraj Suri

【Abstract】: Byzantine-Fault-Tolerant (BFT) state machine replication is an appealing technique to tolerate arbitrary failures. However, Byzantine agreement incurs a fundamental trade-off between being fast (i.e. optimal latency) and achieving optimal resilience (i.e. 2f + b + 1 replicas, where f is the bound on failures and b the bound on Byzantine failures [9]). Achieving fast Byzantine replication despite f failures requires at least f + b − 2 additional replicas [10, 6, 8]. In this paper we show, perhaps surprisingly, that fast Byzantine agreement despite f failures is practically attainable using only b − 1 additional replicas, which is independent of the number of crashes tolerated. This makes our approach particularly appealing for systems that must tolerate many crashes (large f) and few Byzantine faults (small b). The core principle of our approach is to have replicas agree on a quorum of responsive replicas before agreeing on requests. This is key to circumventing the resilience lower bound of fast Byzantine agreement [6].

【Keywords】: Costs; Protocols; Resilience; Delay; Content addressable storage; Computer crashes; Fault tolerance; Distributed computing; Throughput

38. Zzyzx: Scalable fault tolerance through Byzantine locking.

【Paper Link】【Pages】:363-372

【Authors】: James Hendricks ; Shafeeq Sinnamohideen ; Gregory R. Ganger ; Michael K. Reiter

【Abstract】: Zzyzx is a Byzantine fault-tolerant replicated state machine protocol that outperforms prior approaches and provides near-linear throughput scaling. Using a new technique called Byzantine Locking, Zzyzx allows a client to extract state from an underlying replicated state machine and access it via a second protocol specialized for use by a single client. This second protocol requires just one round-trip and 2 f + 1 responsive servers—compared to Zyzzyva, this results in 39–43% lower response times and a factor of 2.2−2.9× higher throughput. Furthermore, the extracted state can be transferred to other servers, allowing non-overlapping sets of servers to manage different state. Thus, Zzyzx allows throughput to be scaled by adding servers when concurrent data sharing is not common. When data sharing is common, performance can match that of the underlying replicated state machine protocol.

【Keywords】: Fault tolerance; Throughput; Fault tolerant systems; Access protocols; File servers; Delay; Data mining; Scalability; Physics computing; Computer bugs

39. Doubly-expedited one-step Byzantine consensus.

【Paper Link】【Pages】:373-382

【Authors】: Nazreen Banu ; Taisuke Izumi ; Koichi Wada

【Abstract】: It is known that Byzantine consensus algorithms guarantee one-step decision only in favorable situations (e.g. when all processes propose the same value) and no one-step algorithm can support two-step decision. This paper presents DEX, a novel one-step Byzantine algorithm that circumvents these impossibilities using the condition-based approach. Algorithm DEX has two distinguished features: Adaptiveness and Double-expedition property. Adaptiveness makes it sensitive to only actual number of failures so that it provides fast termination for more number of inputs when there are fewer failures (a common case in practice). The double-expedition property facilitates two-step decision in addition to one-step decision by running two condition-based mechanisms in parallel. To the best of our knowledge, double-expedition property is the new concept introduced by this paper, and DEX is the first algorithm having such a feature. Although DEX takes four steps at worst in well-behaved runs while existing one-step algorithms take only three, it is expected to work efficiently because the worst-case does not occur so often in practice.

【Keywords】: Broadcasting; Computer crashes; Educational institutions; Mechanical factors; Fault tolerant systems; Magnetic heads; Detectors

40. A passive approach to wireless device fingerprinting.

【Paper Link】【Pages】:383-392

【Authors】: Ke Gao ; Cherita L. Corbett ; Raheem A. Beyah

【Abstract】: We propose a passive blackbox-based technique for determining the type of access point (AP) connected to a network. Essentially, a stimulant (i.e., packet train) that emulates normal data transmission is sent through the access point. Since access points from different vendors are architecturally heterogeneous (e.g., chipset, firmware, driver), each AP will act upon the packet train differently. By applying wavelet analysis to the resultant packet train, a distinct but reproducible pattern is extracted allowing a clear classification of different AP types. This has two important applications: (1) as a system administrator, this technique can be used to determine if a rogue access point has connected to the network; and (2) as an attacker, fingerprinting the access point is necessary to launch driver/firmware specific attacks. Extensive experiments were conducted (over 60GB of data was collected) to differentiate 6 APs. We show that this technique can classify APs with a high accuracy (in some cases, we can classify successfully 100% of the time) with as little as 100000 packets. Further, we illustrate that this technique is independent of the stimulant traffic type (e.g., TCP or UDP). Finally, we show that the AP profile is stable across multiple models of the same AP.

【Keywords】: Fingerprint recognition; Computer science; Pattern analysis; Data communication; Microprogramming; Wavelet analysis; Wavelet packets; Data mining; Traffic control; Data analysis

41. Exploiting diverse observation perspectives to get insights on the malware landscape.

【Paper Link】【Pages】:393-402

【Authors】: Corrado Leita ; Ulrich Bayer ; Engin Kirda

【Abstract】: We are witnessing an increasing complexity in the malware analysis scenario. The usage of polymorphic techniques generates a new challenge: it is often difficult to discern the instance of a known polymorphic malware from that of a newly encountered malware family, and to evaluate the impact of patching and code sharing among malware writers in order to prioritize analysis efforts. This paper offers an empirical study on the value of exploiting the complementarity of different information sources in studying malware relationships. By leveraging real-world data generated by a distributed honeypot deployment, we combine clustering techniques based on static and behavioral characteristics of the samples, and we show how this combination helps in detecting clustering anomalies. We also show how the different characteristics of the approaches can help, once combined, to underline relationships among different code variants. Finally, we highlight the importance of contextual information on malware propagation for getting a deeper understanding of the evolution and the “economy” of the different threats.

【Keywords】: Explosions; Protocols; Character generation; Marine vehicles; Internet; Control systems; Computer crime; Computer worms; Security; Engines

42. Combined performance and risk analysis for border management applications.

【Paper Link】【Pages】:403-412

【Authors】: Mayra Sacanamboy ; Bojan Cukic

【Abstract】: When designing critical applications, trade offs between different security solutions and their performance implications are common. Unfortunately, understanding the precise implications of such tradeoffs early in the system development lifecycle is difficult. This paper proposes a methodology for combined analysis of performance and security risk. We transform system requirements into a Layered Queueing Network (LQN) model that subsequently provides analytical performance analysis feedback when considering a set of security mechanisms and incurred security risks. We quantify security risks using cost curves. The proposed approach is illustrated through a realistic case study of a border management application.

【Keywords】: Risk analysis; Risk management; Security; Performance analysis; Costs; Software performance; Application software; Conference management; Queueing analysis; Delay

43. Experimental validation of a fault tolerant microcomputer system against intermittent faults.

【Paper Link】【Pages】:413-418

【Authors】: Joaquin Gracia-Moran ; Daniel Gil-Tomas ; Luis J. Saiz-Adalid ; Juan Carlos Baraza ; Pedro J. Gil-Vicente

【Abstract】: As technologies shrink, new kinds of faults arise. Intermittent faults are part of these new faults. They are expected to be an increasing challenge in modern VLSI circuits. Up to now, transient and permanent faults used to be injected for the experimental validation of fault tolerance mechanisms. The main objective of this work is to improve the dependability assessment by injecting also intermittent faults. Furthermore, we have compared intermittent faults impact with the influence of transient and permanent faults. To carry out this study, we have injected bursts of intermittent faults in a fault-tolerant microcomputer system with some well known fault detection and recovery mechanisms. The methodology used lies in VHDL-Based Fault Injection technique, which allows a systematic and exhaustive analysis. Results show that intermittent faults have a notable impact on recovery mechanisms. They must be taken into account besides permanent and transient faults to implement an accurate dependability assessment.

【Keywords】: Fault tolerant systems; Microcomputers; Circuit faults; Computational modeling; Fault tolerance; Fault detection; Very large scale integration; Manufacturing processes; Temperature; Voltage

44. Evaluating repair strategies for a water-treatment facility using Arcade.

【Paper Link】【Pages】:419-424

【Authors】: Boudewijn R. Haverkort ; Matthias Kuntz ; Anne Remke ; S. Roolvink ; Mariëlle Stoelinga

【Abstract】: The performance and dependability of critical infrastructures, such as water-treatment facilities is essential. In this paper we use various performance and dependability measures to analyze a simplified model of a water treatment facility. Building on the existing architectural framework Arcade a model is derived in XML format and then automatically mapped to the model checker PRISM. Using the stochastic model checking capabilities that PRISM offers, we compare different repair strategies, with respect to their costs, system reliability, availability and survivability. For this case study we conclude that using non-preemtive priority scheduling with additional repair crews is the best choice with respect to performance, dependability and costs.

【Keywords】: Stochastic processes; Performance analysis; Costs; Availability; XML; Reliability; Government; Logic design; Buildings; Stochastic systems

45. Application of a fault injection based dependability assessment process to a commercial safety critical nuclear reactor protection system.

【Paper Link】【Pages】:425-430

【Authors】: Carl R. Elks ; Michael Reynolds ; Nishant J. George ; Marko Miklo ; Scott Bingham ; Ronald D. Williams ; Barry W. Johnson ; Michael Waterman ; Jeanne Dion

【Abstract】: Existing nuclear power generation facilities are currently seeking to replace obsolete analog Instrumentation and Control (I&C) systems with contemporary digital and processor based systems. However, as new technology is introduced into existing and new plants, it becomes vital to assess the impact of that technology on plant safety. From a regulatory point of view, the introduction or consideration of new digital I&C systems into nuclear power plants raises concerns regarding the possibility that the fielding of these I&C systems may introduce unknown or unanticipated failure modes. In this paper, we present a fault injection based safety assessment methodology that was applied to a commercial safety grade digital Reactor Protection System. Approximately 10,000 fault injections were applied to the system. This paper presents a overview of the research effort, lessons learned, and the results of the endeavor.

【Keywords】: Safety; Protection; Control systems; Power generation; Inductors; Delay; Application software; Analog computers; Power engineering computing; Power engineering and energy

46. Measurement-based analysis of fault and error sensitivities of dynamic memory.

【Paper Link】【Pages】:431-436

【Authors】: Keun Soo Yim ; Zbigniew Kalbarczyk ; Ravishankar K. Iyer

【Abstract】: This paper presents a measurement-based analysis of the fault and error sensitivities of dynamic memory. We extend a software-implemented fault injector to support data-type-aware fault injection into dynamic memory. The results indicate that dynamic memory exhibits about 18 times higher fault sensitivity than static memory, mainly because of the higher activation rate. Furthermore, we show that errors in a large portion of static and dynamic memory space are recoverable by simple software techniques (e.g., reloading data from a disk). The recoverable data include pages filled with identical values (e.g., ‘0’) and pages loaded from files unmodified during the computation. Consequently, the selection of targets for protection should be based on knowledge of recoverability rather than on error sensitivity alone.

【Keywords】: Protection; Error correction codes; Costs; Hardware; Coordinate measuring machines; Application software; Target tracking; Fault location; Computer networks; Random access memory

47. Representativeness analysis of injected software faults in complex software.

【Paper Link】【Pages】:437-446

【Authors】: Roberto Natella ; Domenico Cotroneo ; João Durães ; Henrique Madeira

【Abstract】: Despite of the existence of several techniques for emulating software faults, there are still open issues regarding representativeness of the faults being injected. An important aspect, not considered by existing techniques, is the non-trivial activation condition (trigger) of real faults, which causes them to elude testing and remain hidden until operation.

【Keywords】: Software testing; Control system synthesis; Computer crashes; Acceleration; Programming profession; Pattern analysis; Computer bugs; Production; Fault tolerant systems; Benchmark testing

48. An empirical investigation of fault types in space mission system software.

【Paper Link】【Pages】:447-456

【Authors】: Michael Grottke ; Allen P. Nikora ; Kishor S. Trivedi

【Abstract】: As space mission software becomes more complex, the ability to effectively deal with faults is increasingly important. The strategies that can be employed for fighting a software bug depend on its fault type. Bohrbugs are easily isolated and removed during software testing. Mandelbugs appear to behave chaotically. While it is more difficult to detect these faults during testing, it may not be necessary to correct them; a simple retry after a failure occurrence may work. Aging-related bugs, a sub-class of Mandelbugs, can cause an increasing failure rate. For these faults, proactive techniques may prevent future failures. In this paper, we analyze the faults discovered in the on-board software for 18 JPL/NASA space missions. We present the proportions of the various fault types and study how they have evolved over time. Moreover, we examine whether or not the fault type and attributes such as the failure effect are independent.

【Keywords】: Space missions; System software; Computer bugs; Testing; Fault detection; Application software; Propulsion; Laboratories; Isolation technology; Space technology

49. Assessing and improving the effectiveness of logs for the analysis of software faults.

【Paper Link】【Pages】:457-466

【Authors】: Marcello Cinque ; Domenico Cotroneo ; Roberto Natella ; Antonio Pecchia

【Abstract】: Event logs are the primary source of data to characterize the dependability behavior of a computing system during the operational phase. However, they are inadequate to provide evidence of software faults, which are nowadays among the main causes of system outages. This paper proposes an approach based on software fault injection to assess the effectiveness of logs to keep track of software faults triggered in the field. Injection results are used to provide guidelines to improve the ability of logging mechanisms to report the effects of software faults. The benefits of the approach are shown by means of experimental results on three widely used software systems.

【Keywords】: Open source software; Software systems; Computer networks; Guidelines; Supercomputers; Runtime; Computer crashes; Computer errors; Fault diagnosis; Web server

50. Statistical guarantees of performance for MIMO designs.

【Paper Link】【Pages】:467-476

【Authors】: Jayanand Asok Kumar ; Shobha Vasudevan

【Abstract】: Sources of noise such as quantization, introduce randomness into Register Transfer Level (RTL) designs of Multiple Input Multiple Output (MIMO) systems. Performance of these MIMO RTL designs is typically quantified by metrics averaged over simulations. In this paper, we introduce a formal approach to compute these metrics with high confidence. We define best, bounded and average case performance metrics as properties in a probabilistic temporal logic. We then use probabilistic model checking to verify these properties for MIMO RTL and thereby guarantee the statistical performance. If a property fails, we show a characterization of error. However, probabilistic model checking is known to encounter the problem of state space explosion. With respect to the properties of interest, we show sound and efficient reductions that significantly improve the scalability of our approach. We illustrate our approach on different non-trivial components of MIMO system designs.

【Keywords】: MIMO; Acoustic noise; Noise level; Quantization; Registers; Computational modeling; Measurement; Probabilistic logic; State-space methods; Explosions

51. Transient fault models and AVF estimation revisited.

【Paper Link】【Pages】:477-486

【Authors】: Nishant J. George ; Carl R. Elks ; Barry W. Johnson ; John Lach

【Abstract】: Transient faults (also known as soft-errors) resulting from high-energy particle strikes on silicon are typically modeled as single bit-flips in memory arrays. Most Architectural Vulnerability Factor (AVF) analyses assume this model. However, accelerated radiation tests on static random access memory (SRAM) arrays built using modern technologies show evidence of clustered upsets resulting from single particle strikes. In this paper, these observations are used to define a scalable fault model capable of representing fault multiplicities. Applying this model, a probabilistic framework for incorporating vulnerability of SRAM arrays to different fault multiplicities into AVF is proposed. An experimental fault injection setup using a detailed microarchitecture simulation running generic benchmarks was used to demonstrate vulnerability characterization in light of the new fault model. Further, rigorous fault injection is used to demonstrate that conventional methods of AVF estimation overestimate vulnerability up to 7× for some structures.

【Keywords】: ACE analysis; SEU; spatial multi-bit upset; fault injection; AVF

52. A unified model for timing speculation: Evaluating the impact of technology scaling, CMOS design style, and fault recovery mechanism.

【Paper Link】【Pages】:487-496

【Authors】: Marc de Kruijf ; Shuou Nomura ; Karthikeyan Sankaralingam

【Abstract】: Due to fundamental device properties, energy efficiency from CMOS scaling is showing diminishing improvements. To overcome the energy efficiency challenges, timing speculation has been proposed to optimize for common-case timing conditions, with errors occurring under worst-case conditions detected and corrected in hardware. Although various timing speculation techniques have been proposed, no general framework exists for reasoning about the trade-offs and high-level design considerations of timing speculation. This paper develops two models to study the end-to-end behavior of timing speculation: a hardware-level efficiency model that considers the effects of process variations on path delays, and a complementary system-level recovery model. When combined, the models are used to assess the impact of technology scaling, CMOS design style, and fault recovery mechanism on the efficiency of timing speculation. Our results show that (1) efficiency gains from timing speculation do not improve as technology scales, (2) ultra-low power (sub-threshold) CMOS designs benefit most from timing speculation — we report a 47% potential energy-delay reduction, and (3) fine-grained fault recovery is key to significant energy improvements. The combined model uses only high-level inputs to derive quantitative energy efficiency benefits without any need for detailed simulation, making it a potentially useful tool for hardware developers.

【Keywords】: CMOS technology; Semiconductor device modeling; Timing; Energy efficiency; Hardware; Power system modeling; Circuit faults; Error correction; Error analysis; Mechanical factors

53. Performance and availability aware regeneration for cloud based multitier applications.

【Paper Link】【Pages】:497-506

【Authors】: Gueyoung Jung ; Kaustubh R. Joshi ; Matti A. Hiltunen ; Richard D. Schlichting ; Calton Pu

【Abstract】: Virtual machine technology enables agile system deployments in which software components can be cheaply moved, replicated, and allocated hardware resources in a controlled fashion. This paper examines how these facilities can be used to provide enhanced solutions to the classic problem of ensuring high availability while maintaining performance. By regenerating software components to restore the redundancy of a system whenever failures occur, we achieve improved availability compared to a system with a fixed redundancy level. Moreover, by smartly controlling component placement and resource allocation using information about application control flow and performance predictions from queuing models, we ensure that the resulting performance degradation is minimized. We consider an environment in which a collection of multitier enterprise applications operates across multiple hosts, racks, clusters, and data centers to maximize failure independence. Simulation results show that our proposed approach provides better availability and significantly lower degradation of system response times compared to traditional approaches.

【Keywords】: Availability; Clouds; Application software; Resource management; Degradation; Virtual machining; Hardware; Control systems; Predictive models; Delay

54. Adaptive on-line software aging prediction based on machine learning.

【Paper Link】【Pages】:507-516

【Authors】: Javier Alonso ; Jordi Torres ; Josep Lluis Berral ; Ricard Gavaldà

【Abstract】: The growing complexity of software systems is resulting in an increasing number of software faults. According to the literature, software faults are becoming one of the main sources of unplanned system outages, and have an important impact on company benefits and image. For this reason, a lot of techniques (such as clustering, fail-over techniques, or server redundancy) have been proposed to avoid software failures, and yet they still happen. Many software failures are those due to the software aging phenomena. In this work, we present a detailed evaluation of our chosen machine learning prediction algorithm (M5P) in front of dynamic and non-deterministic software aging. We have tested our prediction model on a three-tier web J2EE application achieving acceptable prediction accuracy against complex scenarios with small training data sets. Furthermore, we have found an interesting approach to help to determine the root cause failure: The model generated by machine learning algorithms.

【Keywords】: Aging; Machine learning; Machine learning algorithms; Software systems; Redundancy; Clustering algorithms; Prediction algorithms; Software algorithms; Testing; Predictive models

55. Correlating failures with asynchronous changes for root cause analysis in enterprise environments.

【Paper Link】【Pages】:517-526

【Authors】: Manoj K. Agarwal ; Venkateswara Reddy Madduri

【Abstract】: In enterprise environments, it is critical to find the root cause of performance failures as quickly as possible. More often, root cause for these failures are the change events that are done deliberately by system administrator to fix existing problems in the system. However, these changes may themselves manifest as faults with the change in operating conditions over time. Hence, time lag between the changes that manifest themselves as faults and resulting failures may be unbounded. Most existing approaches fail to find such root cause faults as they consider only time correlated symptom events and real root cause event(s) are not even considered. In this paper we present a novel approach to identify such changes, from the set of changes done over the time, which may be the root cause of current failures. Our system automatically associates failures with these changes without any time window constraint. The approach presented in this paper does not require existence of any baseline model. As per our understanding, ours is a first system that associates changes with current failures without these assumptions. We have implemented this system in a real life test bed and it shows promising results.

【Keywords】: Performance failures; Change Management; Root Cause Analysis; Enterprise Systems

56. Ring Paxos: A high-throughput atomic broadcast protocol.

【Paper Link】【Pages】:527-536

【Authors】: Parisa Jalili Marandi ; Marco Primi ; Nicolas Schiper ; Fernando Pedone

【Abstract】: Atomic broadcast is an important communication primitive often used to implement state-machine replication. Despite the large number of atomic broadcast algorithms proposed in the literature, few papers have discussed how to turn these algorithms into efficient executable protocols. Our main contribution, Ring Paxos, is a protocol derived from Paxos. Ring Paxos inherits the reliability of Paxos and can be implemented very efficiently. We report a detailed performance analysis of Ring Paxos and compare it to other atomic broadcast protocols.

【Keywords】: Broadcasting; Protocols; Switches; Throughput; Unicast; Communication switching; Atomic measurements; Context; Performance analysis; Atomic layer deposition

57. Turquois: Byzantine consensus in wireless ad hoc networks.

【Paper Link】【Pages】:537-546

【Authors】: Henrique Moniz ; Nuno Ferreira Neves ; Miguel Correia

【Abstract】: The operation of wireless ad hoc networks is intrinsically tied to the ability of nodes to coordinate their actions in a dependable and efficient manner. The failure of some nodes and momentary breakdown of communications, either of accidental or malicious nature, should not result in the failure of the entire system. This paper presents Turquois - an intrusion-tolerant consensus protocol specifically designed for resource-constrained wireless ad hoc networks. Turquois allows an efficient utilization of the broadcasting medium, avoids synchrony assumptions, and refrains from public-key cryptography during its normal operation. The protocol is safe despite the arbitrary failure of f < n over 3 processes from a total of n processes, and unrestricted message omissions. The protocol was prototyped and subject to a comparative performance evaluation against two well-known intrusion-tolerant consensus protocols. The results show that, as the system scales, Turquois outperforms the other protocols by more than an order of magnitude.

【Keywords】: Mobile ad hoc networks; Cryptographic protocols; Wireless application protocol; Broadcasting; Synchronization; Ad hoc networks; Electric breakdown; Public key cryptography; Prototypes; Wire

58. Diversity-inspired clustering for self-healing MANETs: Motivation, protocol, and performability evaluation.

【Paper Link】【Pages】:547-556

【Authors】: Ann T. Tai ; Kam S. Tso ; William H. Sanders

【Abstract】: Swarm systems, which typically comprise a large number of lightweight mobile components, must be capable of self-healing. In this paper, we propose a self-organizing, self-healing framework called “superimposed” clustering for such systems. The framework makes a significant departure from traditional clustering algorithms that apply a single policy to form clusters through iterations. Specifically, our superimposed clustering protocol (SCP) selects a pair of diversified clustering polices to simultaneously build two sets of clusters, which we view as two cluster layers with one on top of the other. Via redundancy shadowing, SCP is able to extract and combine the complementary portions of the two layers to form a clustered network such that the vast majority of nodes can be organized through a single round. Moreover, SCP exploits shadow redundancy to enable gracefully degradable clustering coverage to mitigate cluster damage caused by node failure, death, or migration. We present the notion of superimposed clustering by devising a protocol and conducting a performability evaluation.

【Keywords】: Protocols; Performance evaluation; Degradation; Redundancy; Clustering algorithms; Robot kinematics; Shadow mapping; Aggregates; Performance loss; Nanobioscience

59. GOOFI-2: A tool for experimental dependability assessment.

【Paper Link】【Pages】:557-562

【Authors】: Daniel Skarin ; Raul Barbosa ; Johan Karlsson

【Abstract】: This paper presents GOOFI-2, a comprehensive fault injection tool for experimental dependability assessment of embedded systems. The tool includes a large number of extensions and improvements over its predecessor, GOOFI. These include support for three widely used fault injection techniques, two target processors, and a variety of new features for storing, disseminating and analyzing experimental data. We report on our experiences and lessons learned from the use and development of GOOFI-2. In particular, we compare and discuss properties of three fault injection techniques: Nexus-based, exception-based and instrumentation-based injection. The comparison relies on several sets of experiments with two target processors, Freescale's MPC565 and MPC5554.

【Keywords】: Debugging; Instruments; Testing; Fault diagnosis; Fault tolerance; ISO standards; Road safety; Computer science; Embedded system; Data analysis

60. Studying application-library interaction and behavior with LibTrac.

【Paper Link】【Pages】:563-568

【Authors】: Eric Bisolfati ; Paul Dan Marinescu ; George Candea

【Abstract】: LibTrac is a tool for studying the program/library boundary and answering questions like: Which library functions are called most often? Are there library usage patterns that distinguish one class of applications from the others? Do programs generally retry failed I/O calls or not?

【Keywords】: Software libraries; Statistics; Runtime; Linux; Computer networks; System testing; Application software; Software testing; Costs; Computer interfaces

61. Experiences with a CANoe-based fault injection framework for AUTOSAR.

【Paper Link】【Pages】:569-574

【Authors】: Patrick E. Lanigan ; Priya Narasimhan ; Thomas E. Fuhrman

【Abstract】: Standardized software architectures, such as AUTomotive Open System ARchitecture (AUTOSAR), are being pursued within the automotive industry in order to reduce the cost of developing new vehicle features. Many of these features will need to be highly dependable. Fault injection plays an important role during the dependability analysis of such software. This work evaluates the feasibility of leveraging the CANoe simulation environment to develop software-based methods for injecting faults into AUTOSAR applications. We describe a proof-of-concept fault-injection framework with example fault-injection scenarios, as well as implementation issues faced and addressed, lessons learned, and the suitability of using CANoe as a fault-injection environment.

【Keywords】: Costs; Vehicles; Automotive engineering; Application software; Computer industry; ISO standards; Hardware; Data structures; Software architecture; Computer architecture

62. Empirical characterization of uncongested optical lambda networks and 10GbE commodity endpoints.

【Paper Link】【Pages】:575-584

【Authors】: Tudor Marian ; Daniel A. Freedman ; Ken Birman ; Hakim Weatherspoon

【Abstract】: High-bandwidth, semi-private optical lambda networks carry growing volumes of data on behalf of large data centers, both in cloud computing environments and for scientific, financial, defense, and other enterprises. This paper undertakes a careful examination of the end-to-end characteristics of an uncongested lambda network running at high speeds over long distances, identifying scenarios associated with loss, latency variations, and degraded throughput at attached end-hosts. We use identical fast commodity source and destination platforms, hence expect the destination to receive more or less what we send. We observe otherwise: degraded performance is common and easily provoked. In particular, the receiver loses packets even when the sender employs relatively low data rates. Data rates of future optical network components are projected to outpace clock speeds of commodity end-host processors, hence more and more end-to-end applications will confront the same issue we encounter. Our work thus poses a new challenge for those hoping to achieve dependable performance in higher-end networked settings.

【Keywords】: Optical fiber networks; High speed optical techniques; Optical losses; Optical network units; Optical receivers; Degradation; Bandwidth; Optical fibers; Computer science; Cloud computing

63. On/off process modeling of IP network failures.

【Paper Link】【Pages】:585-594

【Authors】: Pirkko Kuusela ; Ilkka Norros

【Abstract】: A reliability model for IP networks is considered, where the routers and links are modeled by independent stationary on/off processes. Component downtimes may obey any probability distribution. The model combines component reliability to a topological analysis, taking into account the routing rules of the network. This allows the derivation of on-off processes describing with high accuracy the IP-availability delivered to customers of each access router. The approach also provides estimates for risk in terms of lost traffic and allows analytic comparison of basic strategies for reliability improvement. The methodology is illustrated by studying the Finnish research backbone network.

【Keywords】: IP networks; Network topology; Probability distribution; Availability; Telecommunication traffic; Spine; Protection; Routing; Risk analysis; Access protocols

64. An automated technique to support the verification and validation of simulation models.

【Paper Link】【Pages】:595-604

【Authors】: Samuel K. Klock ; Peter Kemper

【Abstract】: Simulation modeling requires model validation and verification to ensure that computed results are worth being considered. While we cannot expect a magic solution to the general problem, automated techniques for particular aspects of validation and verification are feasible. In this paper, we propose a technique to deduce model properties automatically from simulation runs performed for verification and validation and to use those properties for runtime monitoring during production runs. Properties are represented as formulas in linear temporal logic and are limited to functional properties. We demonstrate the applicability of the approach with using an extended version of a stochastic Botnet model originally developed by Van Ruitenbeek and Sanders.

【Keywords】: Runtime; Formal specifications; Automatic testing; Software testing; Logic; Monitoring; Computational modeling; Software engineering; Performance analysis; Computer simulation

65. Model checking CSLTA with Deterministic and Stochastic Petri Nets.

【Paper Link】【Pages】:605-614

【Authors】: Elvio Gilberto Amparore ; Susanna Donatelli

【Abstract】: CSLTA is a stochastic temporal logic for continuous-time Markov chains (CTMC), that can verify the probability of following paths specified by a Deterministic Timed Automaton (DTA). A DTA expresses both logic and time constraints over a CTMC path, yielding to a very flexible way of describing performance and dependability properties. This paper explores a model checking algorithm for CSL^TA based on the translation into a Deterministic and Stochastic Petri Net (DSPN). The algorithm has been implemented in a simple Model Checker prototype, that relies on existing DSPN solvers to do the actual numerical computations.

【Keywords】: DSPN; Stochastic Model Checking

66. 1st workshop on fault-tolerance for HPC at extreme scale FTXS 2010.

【Paper Link】【Pages】:615

【Authors】: John T. Daly ; Nathan DeBardeleben

【Abstract】: With the emergence of many-core processors, accelerators, and alternative/heterogeneous architectures, the HPC community faces a new challenge: a scaling in number of processing elements that supersedes the historical trend of scaling in processor frequencies. The attendant increase in system complexity has first-order implications for fault tolerance. Mounting evidence invalidates traditional assumptions of HPC fault tolerance: faults are increasingly multiple-point instead of single-point and interdependent instead of independent; silent failures and silent data corruption are no longer rare enough to discount; stabilization time consumes a larger fraction of useful system lifetime, with failure rates projected to exceed one per hour on the largest systems; and application interrupt rates are apparently diverging from system failure rates.

【Keywords】: Fault tolerant systems; Fault tolerance; Laboratories; Hardware; Software performance; Error correction; Predictive models; Government; Space technology; Conferences

67. Second workshop on proactive failure avoidance, recovery, and maintenance (PFARM).

【Paper Link】【Pages】:616-618

【Authors】: Miroslaw Malek ; Felix Salfner ; Kishor S. Trivedi

【Abstract】: Proactive approaches to failure avoidance, recovery and maintenance have recently attracted increased interest among researchers and practitioners from various areas of dependable system design and operation. This first workshop provided a stimulating, and fruitful forum to foster collaboration among researchers working on proactive fault management, to discuss ideas, exchange experiences and to find new answers to the overall challenge of significantly improving system dependability in contemporary computing and communication systems.

【Keywords】:

68. Fourth workshop on dependable and secure nanocomputing.

【Paper Link】【Pages】:619-620

【Authors】: Jean Arlat ; Cristian Constantinescu ; Ravishankar K. Iyer ; Johan Karlsson ; Michael Nicolaidis

【Abstract】: Nanocomputing technologies hold the promise for higher performance, lower power consumption as well as increased functionality. However, the dependability of these unprecedentedly small scale devices remains uncertain. The main sources of concern are: • Nanometer devices are expected to be highly sensitive to process variations. The guard-bands used today for avoiding the impact of such variations will not represent a feasible solution in the future. Thus, timing errors may occur more frequently. • New failure modes, specific to new materials, are expected to raise serious challenges to the design and test engineers. • Environment induced errors, like single event upsets (SEU), are likely to occur more frequently than in the case of conventional semiconductor devices. • New hardware redundancy techniques are needed to enable development of energy efficient systems. • The increased complexity of the systems based on nanotechnology will require improved computer aided design (CAD) tools, as well as better validation techniques. • Security of nanocomputing systems may be threatened by malicious attacks targeting new vulnerable areas in the hardware.

【Keywords】:

69. 4th workshop on recent advances in intrusion-tolerant systems WRAITS 2010.

【Paper Link】【Pages】:621-622

【Authors】: Miguel Correia ; Partha P. Pal

【Abstract】: Design and operational vulnerabilities are accepted as inevitable in today's complex computer systems. The distributed and networked nature of the systems that are currently in use and being developed facilitate discovery and exploitation of these flaws in increasingly new and easier ways. Intrusion Tolerance acknowledges that it is impossible to completely prevent attacks and intrusions, and that it is often impossible to accurately detect the act of intrusion and stop it early enough. Intrusion Tolerance research therefore aims to develop technologies that enable computer systems to continue to operate correctly despite attacks, and deny the attacker/intruder the success they seek. For instance, an intrusion-tolerant system may suffer partial loss of service or resources due to the attack, but it will continue to provide critical services in a degraded mode or trigger automatic mechanisms to regain and recover the compromised services and resources. Similar goals are being pursued in Survivability, Byzantine Fault Tolerance, Self-regenerative and Autonomic Systems.

【Keywords】: