DSN 2013:Budapest, Hungary

2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Budapest, Hungary, June 24-27, 2013. IEEE Computer Society 【DBLP Link】

Paper Num: 72 || Session Num: 0

1. Performability analysis of RAID10 versus RAID6.

【Paper Link】【Pages】:1-2

【Authors】: Fumio Machida ; Jianwen Xiang ; Kumiko Tadano ; Yoshiharu Maeno ; Takashi Horikawa

【Abstract】: Design of storage system configuration is one of the key issues for providing dependable IT systems. An appropriate RAID storage configuration should consider both performance and availability. To assist the design, this paper presents the performability models for RAID10 and RAID6 that can be used to compare the configuration quantitatively. A performability advantage of RAID6 over RAID10 in sequential read access is discovered by the numerical study in conjunction with performance benchmark results.

【Keywords】: Reliability; Benchmark testing; Arrays; Data models; Computational modeling; Numerical models; Analytical models

2. EagleEye: Towards mandatory security monitoring in virtualized datacenter environment.

【Paper Link】【Pages】:1-12

【Authors】: Yu-Sung Wu ; Pei-Keng Sun ; Chun-Chi Huang ; Sung-Jer Lu ; Syu-Fang Lai ; Yi-Yung Chen

【Abstract】: Virtualized datacenter (VDC) has become a popular approach to large-scale system consolidation and the enabling technology for infrastructure-as-a-service cloud computing. The consolidation inevitably aggregates the security threats once faced by individual systems towards a VDC, and a VDC operator should remain vigilant of the threats at all times. We envision the need for on-demand mandatory security monitoring of critical guest systems as a means to track and deter security threats that could jeopardize the operation of a VDC. Unfortunately, existing VDC security monitoring mechanisms all require pre-installed guest components to operate. The security monitoring would either be up to the discretion of individual tenants or require costly direct management of guest systems by the VDC operator. We propose the EagleEye approach for on-demand mandatory security monitoring in VDC environment, which does not depend on pre-installed guest components. We implement a prototype on-access anti-virus monitor to demonstrate the feasibility of the EagleEye approach. We also identify challenges particular to this approach, and provide a set of solutions meant to strengthen future research in this area.

【Keywords】: Security; Monitoring; Virtual machine monitors; Kernel; Engines; Prototypes; Virtualization

3. Application-driven TCP recovery and non-stop BGP.

【Paper Link】【Pages】:1-12

【Authors】: Robert Surton ; Ken Birman ; Robbert van Renesse

【Abstract】: Some network protocols tie application state to underlying TCP connections, leading to unacceptable service outages when an endpoint loses TCP state during fail-over or migration. For example, BGP ties forwarding tables to its control plane connections so that the failure of a BGP endpoint can lead to widespread routing disruption, even if it recovers all of its state but what was encapsulated by its TCP implementation. Although techniques exist for recovering TCP state transparently, they make assumptions that do not hold for applications such as BGP. We introduce application-driven TCP recovery, a technique that separates application recovery from TCP recovery. We evaluate our prototype, TCPR, and show that it outperforms existing BGP recovery techniques.

【Keywords】: Graceful Restart; TCP; fault tolerance; middleware; BGP; non-stop routing

4. PHYS: Profiled-HYbrid Sampling for soft error reliability benchmarking.

【Paper Link】【Pages】:1-12

【Authors】: Jinho Suh ; Murali Annavaram ; Michel Dubois

【Abstract】: In this paper, we introduce PHYS (Profiled-HYbrid Sampling), a sampling framework for soft-error benchmarking of caches. Reliability simulations of caches are much more complex than performance simulations and therefore exhibit large simulation slowdowns (two orders of magnitude) over performance simulations. The major problem is that the reliability lifetime of every accessed block must be tracked from beginning to end, on top of simulating the benchmark, in order to track the total number of vulnerability cycles (VCs) between two accesses to the block. Because of the need to track SDCs (silent error corruption) and to distinguish between true and false DUEs (detected but unrecoverable errors) vulnerability cycles cannot be truncated when data is written back from cache to main memory. Vulnerability cycles must be maintained even during a block's sojourn in main memory to track whether corrupted values in a block are used by the processor, until program termination. PHYS solves this problem by sampling intervals between accesses to each memory block, instead of sampling the execution of the processor in a time interval as is classically done in performance simulations. At first a statistical profiling phase captures the distribution of VCs for every block. This profiling step provides a statistical guarantee of the minimum sampling rate of access intervals needed to meet a desired FIT error target with a given confidence interval. Then, per cacheset sampling rates are dynamically adjusted to sample VCs with higher merit. We compare PHYS with many other possible sampling methods, some of which are widely used to accelerate performance-centric simulations but have also been applied in the past to track reliability lifetime. We demonstrate the superiority of PHYS in the context of reliability benchmarking through exhaustive evaluations of various sampling techniques.

【Keywords】: simulation sampling; Soft-error; reliability benchmarking

5. The Third International Workshop on Dependability of Clouds, Data Centers and Virtual Machine Technology DCDV 2013.

【Paper Link】【Pages】:1-2

【Authors】: Jogesh K. Muppala ; Matti A. Hiltunen ; Roy Campbell ; Paulo Veríssimo

【Abstract】: The Third International Workshop on Dependability of Clouds, Data Centers, and Virtual Machine Technology (DCDV 2013) features papers covering various aspects of dependability and security in Clouds and Data Centers. Four sessions covering Cloud and Data Center Networking, Dependability Evaluation, Mobile and Cloud Computing, and Virtualization and Cloud include eleven papers.

【Keywords】: Security; Cloud; Data Centers; Dependability

6. Towards secure monitoring and control systems: Diversify!

【Paper Link】【Pages】:1-2

【Authors】: Domenico Cotroneo ; Antonio Pecchia ; Stefano Russo

【Abstract】: Cyber attacks have become surprisingly sophisticated over the past fifteen years. While early infections mostly targeted individual machines, recent threats leverage the widespread network connectivity to develop complex and highly coordinated attacks involving several distributed nodes [1]. Attackers are currently targeting very diverse domains, e.g., e-commerce systems, corporate networks, datacenter facilities and industrial systems, to achieve a variety of objectives, which range from credentials compromise to sabotage of physical devices, by means of smarter and smarter worms and rootkits. Stuxnet is a recent worm that well emphasizes the strong technical advances achieved by the attackers' community. It was discovered in July 2010 and firstly affected Iranian nuclear plants [2]. Stuxnet compromises the regular behavior of the supervisory control and data acquisition (SCADA) system by reprogramming the code of programmable logic controllers (PLC). Once compromised, PLCs can progressively destroy a device (e.g., components of a centrifuge, such as the case of the Iranian plant) by sending malicious control signals. Stuxnet combines a relevant number of challenging features: it exploits zero-days vulnerabilities of the Windows OS to affect the nodes connected to the PLC; it propagates either locally (e.g., by means of USB sticks) or remotely (e.g., via shared folders or the print spooler vulnerability); it is able to modify its behavior during the progression of the attack, and communicates with a remote command and control server. More importantly, Stuxnet can remain undetected for many months [3] because it is able to fool the SCADA system by emulating regular monitoring signals.

【Keywords】: Security; Monitoring; SCADA systems; Smart grids; Sensors; Proposals

7. Evasive bots masquerading as human beings on the web.

【Paper Link】【Pages】:1-12

【Authors】: Jing Jin ; Jeff Offutt ; Nan Zheng ; Feng Mao ; Aaron Koehl ; Haining Wang

【Abstract】: Web bots such as crawlers are widely used to automate various online tasks over the Internet. In addition to the conventional approach of human interactive proofs such as CAPTCHAs, a more recent approach of human observational proofs (HOP) has been developed to automatically distinguish web bots from human users. Its design rationale is that web bots behave intrinsically differently from human beings, allowing them to be detected. This paper escalates the battle against web bots by exploring the limits of current HOP-based bot detection systems. We develop an evasive web bot system based on human behavioral patterns. Then we prototype a general web bot framework and a set of flexible de-classifier plugins, primarily based on application-level event evasion. We further abstract and define a set of benchmarks for measuring our system's evasion performance on contemporary web applications, including social network sites. Our results show that the proposed evasive system can effectively mimic human behaviors and evade detectors by achieving high similarities between human users and evasive bots.

【Keywords】: human observation proofs; Web security; bot; machine learning

8. Redefining web browser principals with a Configurable Origin Policy.

【Paper Link】【Pages】:1-12

【Authors】: Yinzhi Cao ; Vaibhav Rastogi ; Zhichun Li ; Yan Chen ; Alexander Moshchuk

【Abstract】: With the advent of Web 2.0, web developers have designed multiple additions to break SOP boundary, such as splitting and combining traditional web browser protection boundaries (security principals). However, these newly generated principals lack a new label to represent its security property. To address the inconsistent label problem, this paper proposes a new way to define a security principal and its labels in the browser. In particular, we propose a Configurable Origin Policy (COP), in which a browser's security principal is defined by a configurable ID rather than a fixed triple <;scheme, host, port>. The server-side and client-side code of a web application can create, join, and destroy its own principals. We perform a formal security analysis on COP to ensure session integrity. Then we also show that COP is compatible with legacy web sites, and those sites utilizing COP are also compatible with legacy browsers.

【Keywords】: Browsers; Servers; Security; Web sites; Ports (Computers); Google; Mashups

9. Security implications of memory deduplication in a virtualized environment.

【Paper Link】【Pages】:1-12

【Authors】: Jidong Xiao ; Zhang Xu ; Hai Huang ; Haining Wang

【Abstract】: Memory deduplication has been widely used in various commodity hypervisors. By merging identical memory contents, it allows more virtual machines to run concurrently on top of a hypervisor. However, while this technique improves memory efficiency, it has a large impact on system security. In particular, memory deduplication is usually implemented using a variant of copy-on-write techniques, for which, writing to a shared page would incur a longer access time than those non-shared. In this paper, we investigate the security implication of memory deduplication from the perspectives of both attackers and defenders. On one hand, using the artifact above, we demonstrate two new attacks to create a covert channel and detect virtualization, respectively. On the other hand, we also show that memory deduplication can be leveraged to safeguard Linux kernel integrity.

【Keywords】: Virtual machine monitors; Linux

10. Lilliput meets brobdingnagian: Data center systems management through mobile devices.

【Paper Link】【Pages】:1-6

【Authors】: Saurabh Bagchi ; Fahad A. Arshad ; Jan S. Rellermeyer ; Thomas H. Osiecki ; Michael Kistler ; Ahmed Gheith

【Abstract】: In this paper, we put forward the notion that systems management for large masses of virtual machines in data centers is going to be done differently in the short to medium term future-through smart phones and through controlled crowdsourcing to a variety of experts within an organization, rather than dedicated system administrators alone. We lay out the research and practitioner challenges this model raises and give some preliminary solution directions that are being developed, here at IBM and elsewhere.

【Keywords】: Mobile communication; Servers; Security; Monitoring; Smart phones; Throughput

11. simFI: From single to simultaneous software fault injections.

【Paper Link】【Pages】:1-12

【Authors】: Stefan Winter ; Michael Tretter ; Benjamin Sattler ; Neeraj Suri

【Abstract】: Software-implemented fault injection (SWIFI) is an established experimental technique to evaluate the robustness of software systems. While a large number of SWIFI frameworks exist, virtually all are based on a single-fault assumption, i.e., interactions of simultaneously occurring independent faults are not investigated. As software systems containing more than a single fault often are the norm than an exception [1] and current safety standards require the consideration of “multi-point faults” [2], the validity of this single-fault assumption is at question for contemporary software systems. To address the issue and support simultaneous SWIFI (simFI), we analyze how independent faults can manifest in a generic software composition model and extend an existing SWIFI tool to support some characteristic simultaneous fault types. We implement three simultaneous fault models and demonstrate their utility in evaluating the robustness of the Windows CE kernel. Our findings indicate that simultaneous fault injections prove highly efficient in triggering robustness vulnerabilities.

【Keywords】: robustness testing; Software fault injections; fault models

12. Increasing network resiliency by optimally assigning diverse variants to routing nodes.

【Paper Link】【Pages】:1-12

【Authors】: Andrew Newell ; Daniel Obenshain ; Thomas Tantillo ; Cristina Nita-Rotaru ; Yair Amir

【Abstract】: Networks with homogeneous routing nodes are constantly at risk as any vulnerability found against a node could be used to compromise all nodes. Introducing diversity among nodes can be used to address this problem. With few variants, the choice of assignment of variants to nodes is critical to the overall network resiliency. We present the Diversity Assignment Problem (DAP), the assignment of variants to nodes in a network, and we show how to compute the optimal solution in medium-size networks. We also present a greedy approximation to DAP that scales well to large networks. Our solution shows that a high level of overall network resiliency can be obtained even from variants that are weak on their own. For real-world systems that grow incrementally over time, we provide an online version of our solution. Lastly, we provide a variation of our solution that is tunable for specific applications (e.g., BFT).

【Keywords】: Routing; Network topology; Topology; Operating systems; Context; Cloud computing; Approximation methods

13. SIDE: Isolated and efficient execution of unmodified device drivers.

【Paper Link】【Pages】:1-12

【Authors】: Yifeng Sun ; Tzi-cker Chiueh

【Abstract】: Buggy device drivers are a major threat to the reliability of their host operating system. There have been myriad attempts to protect the kernel, but most of them either required driver modifications or incur substantial performance overhead. This paper describes an isolated device driver execution system called SIDE (Streamlined Isolated Driver Execution), which focuses specifically on unmodified device drivers and strives to avoid changing the existing kernel code as much as possible. SIDE exploits virtual memory hardware to set up a device driver execution environment that is compatible with existing device drivers and yet is fully isolated from the kernel. SIDE is able to run an unmodified device driver for a Gigabit Ethernet NIC and the latency and throughput penalty is kept under 1% when augmented with a set of performance optimizations designed to reduce the number of protection domain crossings between an isolated device driver and the kernel.

【Keywords】: device driver isolation; fault tolerance

14. Design of event-based Intrusion Detection System on OpenFlow Network.

【Paper Link】【Pages】:1-2

【Authors】: Yung-Li Hu ; Wei-Bing Su ; Li-ying Wu ; Yennun Huang ; Sy-Yen Kuo

【Abstract】: OpenFlow (OF) Network is a novel network architecture many famous cloud service providers have applied it to build their data center network. The difference between OF Network and traditional network architecture is the decoupling of controller planes and data planes for network management. Intrusion detection is very important in cloud computing to improve system security. Because OF network can improve the response time of an alert by efficiently configuring network flows, we design an event-based Intrusion Detection System (IDS) architecture on OF network.

【Keywords】: Intrusion Detection Systems; OpenFlow Network

15. Analysis of bugs in Apache Virtual Computing Lab.

【Paper Link】【Pages】:1-6

【Authors】: Flavio Frattini ; Rahul Ghosh ; Marcello Cinque ; Andy Rindos ; Kishor S. Trivedi

【Abstract】: Understanding the bugs in software platforms is extremely valuable for developers, especially during the testing phase. However, this is a rarely investigated issue for open source Cloud platforms till date. In this paper, we present the analysis of 146 bug reports from Apache Virtual Computing Lab, a representative open source Cloud platform. Analysis is performed by means of an empirical approach tailored to open source Clouds. For VCL development and test teams, these results provide useful guidelines, e.g., directing volunteers' effort to components where more residual bugs are expected to be found.

【Keywords】: Computer bugs; Iron; Testing; Authentication; Software systems; Engines

16. Guaranteeing Proper-Temporal-Embedding safety rules in wireless CPS: A hybrid formal modeling approach.

【Paper Link】【Pages】:1-12

【Authors】: Feng Tan ; Yufei Wang ; Qixin Wang ; Lei Bu ; Rong Zheng ; Neeraj Suri

【Abstract】: Cyber-Physical Systems (CPS) integrate discrete-time computing and continuous-time physical-world entities, which are often wirelessly interlinked. The use of wireless safety critical CPS (control, healthcare etc.) requires safety guarantees despite communication faults. This paper focuses on one important set of such safety rules: Proper-Temporal-Embedding (PTE). Our solution introduces hybrid automata to formally describe and analyze CPS design patterns. We propose a novel lease based design pattern, along with closed-form configuration constraints, to guarantee PTE safety rules under arbitrary wireless communication faults. We propose a formal methodology to transform the design pattern hybrid automata into specific wireless CPS designs. This methodology can effectively isolate physical world parameters from affecting the PTE safety of the resultant specific designs. We conduct a case study on laser tracheotomy wireless CPS to show that the resulting system is safe and can withstand communication disruptions.

【Keywords】: Safety; Wireless communication; Automata; Synchronization; Lasers; Computers; Base stations

17. Lightweight message tracing for debugging wireless sensor networks.

【Paper Link】【Pages】:1-12

【Authors】: Vinaitheerthan Sundaram ; Patrick Eugster

【Abstract】: Wireless sensor networks (WSNs) deployments are subjected not infrequently to complex runtime failures that are difficult to diagnose. Alas, debugging techniques for traditional distributed systems are inapplicable because of extreme resource constraints in WSNs, and existing WSN-specific debugging solutions address either only specific types of failures, focus on individual nodes, or exhibit high overheads hampering their scalability. Message tracing is a core issue underlying the efficient and effective debugging of WSNs. We propose a message tracing solution which addresses key challenges in WSNs - besides stringent resource constraints, these include out-of-order message arrivals and message losses - while being streamlined for the common case of successful in-order message transmission. Our approach reduces energy overhead significantly (up to 95% and on average 59% smaller) compared to state-of-the-art message tracing approaches making use of Lamport clocks. We demonstrate the effectiveness of our approach through case studies of several complex faults in three well-known distributed protocols.

【Keywords】: Wireless sensor networks; Clocks; Debugging; Out of order; Protocols; Runtime; Unicast

18. Hector: Detecting Resource-Release Omission Faults in error-handling code for systems software.

【Paper Link】【Pages】:1-12

【Authors】: Suman Saha ; Jean-Pierre Lozi ; Gaël Thomas ; Julia L. Lawall ; Gilles Muller

【Abstract】: Omitting resource-release operations in systems error handling code can lead to memory leaks, crashes, and deadlocks. Finding omission faults is challenging due to the difficulty of reproducing system errors, the diversity of system resources, and the lack of appropriate abstractions in the C language. To address these issues, numerous approaches have been proposed that globally scan a code base for common resource-release operations. Such macroscopic approaches are notorious for their many false positives, while also leaving many faults undetected. We propose a novel microscopic approach to finding resource-release omission faults in systems software. Rather than generalizing from the entire source code, our approach focuses on the error-handling code of each function. Using our tool, Hector, we have found over 370 faults in six systems software projects, including Linux, with a 23% false positive rate. Some of these faults allow an unprivileged malicious user to crash the entire system.

【Keywords】: Linux; Computer crashes; Runtime; Kernel; System recovery; Protocols

19. Stress balancing to mitigate NBTI effects in register files.

【Paper Link】【Pages】:1-10

【Authors】: Hussam Amrouch ; Thomas Ebi ; Jörg Henkel

【Abstract】: Negative Bias Temperature Instability (NBTI) is considered one of the major reliability concerns of transistors in current and upcoming technology nodes and a main cause of their diminished lifetime. We propose a new means to mitigate the effects of NBTI on SRAM-based register files, which are particularly vulnerable due to their small structure size and are under continuous voltage stress for prolonged intervals. The conducted results from our technology simulator demonstrate the severity of NBTI effects on the SRAM cells - especially when process variation is taken into account. Based on the presented analysis, we show that NBTI stress in different registers needs to be tackled using different strategies corresponding to their access patterns. To this end, we propose to selectively increase the resilience of individual registers against NBTI. Our technique balances the gate voltage stress of the two PMOS transistors of an SRAM cell such that both are under stress for approximately the same amount of time during operation - thereby minimizing the deleterious effects of NBTI. We present mitigation implementations in both hardware and in software along with the incurred overhead. Through a wide range of applications we can show that our technique reduces the NBTI-induced reliability degradation by 35% on average. This is 22% better than current State-of-the-Art.

【Keywords】: Microarchitecture; Embedded system; Reliability; Aging; Register File; NBTI

20. Towards SDN enabled network control delegation in clouds.

【Paper Link】【Pages】:1-6

【Authors】: Muhammad Salman Malik ; Mirko Montanari ; Jun Ho Huh ; Rakesh B. Bobba ; Roy H. Campbell

【Abstract】: In today's IaaS clouds users only get a logical view of the underlying network and have limited control. Delegating more control to end users would be beneficial but would also raise security concerns for the provider. Emerging Software Defined Networking (SDN) technologies have the capabilities to facilitate delegation of network controls and provide some level of network abstractions to end users. However, any delegation solution should try to balance the level of controls delegated to end users with the security constraints of the provider. In this paper, we propose a SDN-based framework to facilitate delegation of some network controls to end users, providing the means to monitor and configure their own slices of the underlying networks. Using two instantiations of this framework, we illustrate the tradeoffs between security and the level of network abstractions provided to end users.

【Keywords】: Control Delegation; SDN; Cloud

21. Distal: A framework for implementing fault-tolerant distributed algorithms.

【Paper Link】【Pages】:1-8

【Authors】: Martin Biely ; Pamela Delgado ; Zarko Milosevic ; André Schiper

【Abstract】: We introduce Distal, a new framework that simplifies turning pseudocode of fault tolerant distributed algorithms into efficient executable code. Without proper tool support, even small amounts of pseudocode normally ends up in several thousands of non-trivial lines of Java or C++. Distal is implemented as a library in Scala and consists of two main parts: a domain specific language (DSL) in which algorithms are expressed and an efficient messaging layer that deals with low level issues such as connection management, threading and (de)serialization. The DSL is designed such that implementations of distributed algorithms highly resemble the pseudocode found in research papers. By writing code that is close to the protocol description, one can be more convinced that the implemented system really reflects the protocol specification on paper. Distal does not only make it simple and intuitive to implement distributed algorithms but it also leads to efficient implementations.

【Keywords】: SMR; DSL; fault-tolerant distributed algorithms; Paxos

22. An algorithmic approach to error localization and partial recomputation for low-overhead fault tolerance.

【Paper Link】【Pages】:1-12

【Authors】: Joseph Sloan ; Rakesh Kumar ; Greg Bronevetsky

【Abstract】: The increasing size and complexity of massively parallel systems (e.g. HPC systems) is making it increasingly likely that individual circuits will produce erroneous results. For this reason, novel fault tolerance approaches are increasingly needed. Prior fault tolerance approaches often rely on checkpoint-rollback based schemes. Unfortunately, such schemes are primarily limited to rare error event scenarios as the overheads of such schemes become prohibitive if faults are common. In this paper, we propose a novel approach for algorithmic correction of faulty application outputs. The key insight for this approach is that even under high error scenarios, even if the result of an algorithm is erroneous, most of it is correct. Instead of simply rolling back to the most recent checkpoint and repeating the entire segment of computation, our novel resilience approach uses algorithmic error localization and partial recomputation to efficiently correct the corrupted results. We evaluate our approach in the specific algorithmic scenario of linear algebra operations, focusing on matrix-vector multiplication (MVM) and iterative linear solvers. We develop a novel technique for localizing errors in MVM and show how to achieve partial recomputation within this algorithm, and demonstrate that this approach both improves the performance of the Conjugate Gradient solver in high error scenarios by 3x-4x and increases the probability that it completes successfully by up to 60% with parallel experiments up to 100 nodes.

【Keywords】: sparse linear algebra; algorithmic error correction; partial recomputation; error localization; numerical methods

23. Message from the general chair.

【Paper Link】【Pages】:1

【Authors】: András Pataricza

【Abstract】: It is my distinguished pleasure to welcome all the participants to Budapest and the 43nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks. The Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) is the most prestigious international forum for presenting research results in the field of dependability and security. It is especially important for us that this conference is organized in our region for the first time in its history.

【Keywords】:

24. Fitting second-order acyclic Marked Markovian Arrival Processes.

【Paper Link】【Pages】:1-12

【Authors】: Andrea Sansottera ; Giuliano Casale ; Paolo Cremonesi

【Abstract】: Markovian Arrival Processes (MAPs) are a tractable class of point-processes useful to model correlated time series, such as those commonly found in network traces and system logs used in performance analysis and reliability evaluation. Marked MAPs (MMAPs) generalize MAPs by further allowing the modeling of multi-class traces, possibly with cross-correlation between multi-class arrivals. In this paper, we present analytical formulas to fit second-order acyclic MMAPs with an arbitrary number of classes. We initially define closed-form formulas to fit second-order MMAPs with two classes, where the underlying MAP is in canonical form. Our approach leverages forward and backward moments, which have recently been defined, but never exploited jointly for fitting. Then, we show how to sequentially apply these formulas to fit an arbitrary number of classes. Representative examples and trace-driven simulation using storage traces show the effectiveness of our approach for fitting empirical datasets.

【Keywords】: dependence; Multi-class workload; point process

25. Fault detection and localization in distributed systems using invariant relationships.

【Paper Link】【Pages】:1-8

【Authors】: Abhishek B. Sharma ; Haifeng Chen ; Min Ding ; Kenji Yoshihira ; Guofei Jiang

【Abstract】: Recent advances in sensing and communication technologies enable us to collect round-the-clock monitoring data from a wide-array of distributed systems including data centers, manufacturing plants, transportation networks, automobiles, etc. Often this data is in the form of time series collected from multiple sensors (hardware as well as software based). Previously, we developed a time-invariant relationships based approach that uses Auto-Regressive models with eXogenous input (ARX) to model this data. A tool based on our approach has been effective for fault detection and capacity planning in distributed systems. In this paper, we first describe our experience in applying this tool in real-world settings. We also discuss the challenges in fault localization that we face when using our tool, and present two approaches - a spatial approach based on invariant graphs and a temporal approach based on expected broken invariant patterns - that we developed to address this problem.

【Keywords】: Time series analysis; Monitoring; Time measurement; Noise; Data models; Servers

26. Consistency or latency? A quantitative analysis of replication systems based on replicated state machines.

【Paper Link】【Pages】:1-12

【Authors】: Xu Wang ; Hailong Sun ; Ting Deng ; Jinpeng Huai

【Abstract】: Existing theories like CAP and PACELC have claimed that there are tradeoffs between some pairs of performance measures in distributed replication systems, such as consistency and latency. However, current systems take a very vague view on how to balance those tradeoffs, e.g. eventual consistency. In this work, we are concerned with providing a quantitative analysis on consistency and latency for widely-used replicated state machines(RSMs). Based on our presented generic RSM model called RSM-d, probabilistic models are built to quantify consistency and latency. We show that both are affected by d, which is the number of ACKs received by the coordinator before committing a write request. And we further define a payoff model through combining the consistency and latency models. Finally, with Monte Carlo based simulation, we validate our presented models and show the effectiveness of our solutions in terms of how to obtain an optimal tradeoff between consistency and latency.

【Keywords】: replicated state machine; consistency; write conflict; latency

27. Improving SSD reliability with RAID via Elastic Striping and Anywhere Parity.

【Paper Link】【Pages】:1-12

【Authors】: Jaeho Kim ; Jongmin Lee ; Jongmoo Choi ; Donghee Lee ; Sam H. Noh

【Abstract】: While the move from SLC to MLC/TLC flash memory technology is increasing SSD capacity at lower cost, it is being done at the cost of sacrificing reliability. An approach to remedy this loss is to employ the RAID architecture with the chips that comprise SSDs. However, using the traditional RAID approach may result in negative effects as the total number of writes may increase due to the parity updates, consequently leading to increased P/E cycles and higher bit error rates. Using a technique that we call Elastic Striping and Anywhere Parity (eSAP), we develop eSAP-RAID, a RAID scheme that significantly reduces parity writes while providing reliability better than RAID-5. We derive performance and lifetime models of SSDs employing RAID-5 and eSAP-RAID that show the benefits of eSAP-RAID. We also implement these schemes in SSDs using DiskSim with SSD Extension and validate the models using realistic workloads. Our results show that eSAP-RAID improves reliability considerably, while limiting its wear. Specifically, the expected lifetime of eSAP-RAID employing SSDs may be as long as current ECC based SSDs, while its reliability level can be maintained at the level of the early stages of current ECC based SSDs throughout its entire lifetime.

【Keywords】: SSD; Flash memory; Reliability; RAID

28. DRIP: A framework for purifying trojaned kernel drivers.

【Paper Link】【Pages】:1-12

【Authors】: Zhongshu Gu ; William N. Sumner ; Zhui Deng ; Xiangyu Zhang ; Dongyan Xu

【Abstract】: Kernel drivers are usually provided in the form of loadable kernel extensions, which can be loaded/unloaded dynamically at runtime and execute with the same privilege as the core operating system kernel. The unrestricted security access from the drivers to the kernel is nevertheless a double-edged sword that makes them susceptible targets of trojan attacks. Given a benign driver, it is now easy to implant malicious logic with existing hacking tools. Once implanted, such malicious logic is difficult to detect. In this paper we propose DRIP, a framework for detecting and eliminating malicious logic embedded in a kernel driver through iteratively eliminating unnecessary kernel API invocations from the driver. When provided with the binary of a trojaned driver, DRIP generates a purified driver with benign functionalities preserved and malicious ones eliminated. Our evaluation shows that DRIP successfully eliminates malicious effects of trojaned drivers in the system, with the purified drivers maintaining or even improving their performance over the trojaned drivers.

【Keywords】: Trojan Detection; System Security; Kernel Drivers

29. SPECTRE: A dependable introspection framework via System Management Mode.

【Paper Link】【Pages】:1-12

【Authors】: Fengwei Zhang ; Kevin Leach ; Kun Sun ; Angelos Stavrou

【Abstract】: Virtual Machine Introspection (VMI) systems have been widely adopted for malware detection and analysis. VMI systems use hypervisor technology for system introspection and to expose malicious activity. However, recent malware can detect the presence of virtualization or corrupt the hypervisor state thus avoiding detection. We introduce SPECTRE, a hardware-assisted dependability framework that leverages System Management Mode (SMM) to inspect the state of a system. Contrary to VMI, our trusted code base is limited to BIOS and the SMM implementations. SPECTRE is capable of transparently and quickly examining all layers of running system code including a hypervisor, the OS, and user level applications. We demonstrate several use cases of SPECTRE including heap spray, heap overflow, and rootkit detection using real-world attacks on Windows and Linux platforms. In our experiments, full inspection with SPECTRE is 100 times faster than similar VMI systems because there is no performance overhead due to virtualization.

【Keywords】: memory attacks; SMM; introspection

30. Fault-tolerance characteristics of data center network topologies using fault regions.

【Paper Link】【Pages】:1-6

【Authors】: Yang Liu ; Jogesh K. Muppala

【Abstract】: Data center networks (DCNs) are inherently failure-prone owing to the existence of many links, switches and servers. Many times the failures of the components may be correlated resulting a set of connected components failing together. This correlated failure behaviour could be captured through the use of fault regions [1]. This paper explores the effect of such failures in DCNs, using four topologies, viz., Fat Tree, DCell, FlatNet and BCube. We used two categories of metrics for evaluation: connection-oriented metrics, including aggregated bottleneck throughput (ABT), average path length (APL) and routing failure rate (RFR); and network size-oriented metrics, including Component Decomposition Number (CDN) and Smallest/Largest Component Size (SCS/LCS).

【Keywords】: Evaluation; Data Center Network; Fault Region

31. Implementing the ADVISE security modeling formalism in Möbius.

【Paper Link】【Pages】:1-8

【Authors】: Michael D. Ford ; Ken Keefe ; Elizabeth LeMay ; William H. Sanders ; Carol Muehrcke

【Abstract】: The ADversary VIew Security Evaluation (ADVISE) model formalism provides a system security model from the perspective of an adversary. An ADVISE atomic model consists of an attack execution graph (AEG) composed of attack steps, system state variables, and attack goals, as well as an adversary profile that defines the abilities and interests of a particular adversary. The ADVISE formalism has been implemented as a Möbius atomic model formalism in order to leverage the existing set of mature modeling formalisms and solution techniques offered by Möbius. This tool paper explains the ADVISE implementation in Möbius and provides technical details for Möbius users who want to use ADVISE either alone or in combination with other modeling formalisms provided by Möbius.

【Keywords】: Möbius Atomic Model Formalism; Quantitative Security Metrics; State-based Security Model

32. FTSPM: A Fault-Tolerant ScratchPad Memory.

【Paper Link】【Pages】:1-10

【Authors】: Amir Mahdi Hosseini Monazzah ; Hamed Farbeh ; Seyed Ghassem Miremadi ; Mahdi Fazeli ; Hossein Asadi

【Abstract】: ScratchPad Memory (SPM) is an important part of most modern embedded processors. The use of embedded processors in safety-critical applications implies including fault tolerance in the design of SPM. This paper proposes a method, called FTSPM, which integrates a multi-priority mapping algorithm with a hybrid SPM structure. The proposed structure divides SPM into three parts: 1) a part is equipped with Non-Volatile Memory (NVM) which is immune against soft errors, 2) a part is equipped with Error-Correcting Code, and 3) a part is equipped with parity. The proposed mapping algorithm is responsible to distribute the program blocks among the above three parts with regards to their vulnerability level. The simulation results demonstrate that the FTSPM reduces the SPM vulnerability by about 7x in comparison to a pure SRAM-based SPM. In addition, the dynamic energy consumption of the proposed method is 77% and 47% less than that of a pure NVM-based SPM and a pure SRAM-based SPM, respectively.

【Keywords】: Non-Volatile Memory; Reliability; Mapping of SPM; SPM

33. Lumpability of fluid models with heterogeneous agent types.

【Paper Link】【Pages】:1-11

【Authors】: Giulio Iacobelli ; Mirco Tribastone

【Abstract】: Fluid models have gained popularity in the performance modeling of computing systems and communication networks. When the model under study consists of many different types of agents, the size of the associated system of ordinary differential equations (ODEs) increases with the number of types, making the analysis more difficult. We study this problem for a class of models where heterogeneity is expressed as a perturbation of certain parameters of the ODE vector field. We provide an a-priori bound that relates the solutions of the original, heterogenous model with that of an ODE system of smaller size which arises from aggregating system variables concerning different types of agents. By showing that this bound grows linearly with the intensity of the perturbation, we provide a formal justification to the intuitive possibility of neglecting small differences in agents' behavior as a means to reducing the dimensionality of the original system.

【Keywords】: Computational modeling; Vectors; Mathematical model; Numerical models; Analytical models; Bandwidth; Jacobian matrices

34. Message from the DCCS program chair.

【Paper Link】【Pages】:1

【Authors】: George Candea

【Abstract】: Welcome to DCCS 2013 - this is a fantastic time to work on dependability. Everyday life has become inherently dependent on computerized networked systems whose complexity is rapidly growing. Providing resilience to malicious attacks, accidental faults, design errors, and unexpected operating conditions is therefore becoming simultaneously more important and more challenging. If these systems are unreliable, insecure or untrustworthy, significant harm can result, and users will abandon them, thus squandering opportunities for progress.

【Keywords】:

35. Locality matters: Reducing Internet traffic graphs using location analysis.

【Paper Link】【Pages】:1-12

【Authors】: Andreas Berger ; Stefan Ruehrup ; Wilfried N. Gansterer ; Oliver Jung

【Abstract】: The representation of Internet traffic as connection graphs augments anomaly detection systems by providing insight on the structural connection properties, i.e., who-talks-to-whom. However, these graphs are extremely large and one has to decide in advance on which aspect to focus. In the context of malware detection, this is difficult as malware often mimics legitimate traffic. In this paper, we present a statistical approach for extracting the typical traffic destinations for a set of monitored hosts, and derive a reduced graph that contains only connections that are anomalous for that host. This graph can then be analyzed efficiently. Our system is designed to scale to thousands of monitored hosts. We evaluate our approach using a data set from a real network, and show that we can reliably detect injected malware activity.

【Keywords】: malware detection; Network monitoring; statistical anomaly detection; graph analysis; traffic modeling

36. Reading between the lines of failure logs: Understanding how HPC systems fail.

【Paper Link】【Pages】:1-12

【Authors】: Nosayba El-Sayed ; Bianca Schroeder

【Abstract】: As the component count in supercomputing installations continues to increase, system reliability is becoming one of the major issues in designing HPC systems. These issues will become more challenging in future Exascale systems, which are predicted to include millions of CPU cores. Even with relatively reliable individual components, the sheer number of components will increase failure rates to unprecedented levels. Efficiently running those systems will require a good understanding of how different factors impact system reliability. In this paper we use a decade worth of field data made available by Los Alamos National Lab to study the impact of a diverse set of factors on the reliability of HPC systems. We provide insights into the nature of correlations between failures, and investigate the impact of factors, such as the power quality, temperature, fan and chiller reliability, system usage and utilization, and external factors, such as cosmic radiation, on system reliability.

【Keywords】: Correlation; Probability; Hardware; Reliability; Program processors; Analytical models

37. A practical characterization of a NASA SpaceCube application through fault emulation and laser testing.

【Paper Link】【Pages】:1-8

【Authors】: John Paul Walters ; Kenneth M. Zick ; Matthew French

【Abstract】: Historically, space-based processing systems have lagged behind their terrestrial counterparts by several processor generations due, in part, to the cost and complexity of implementing radiation-hardened processor designs. Efforts such as NASA's SpaceCube seek to change this paradigm, using higher performance commercial hardware wherever possible. This has the potential to revolutionize onboard data processing, but it cannot happen unless the soft error reliability can be characterized and deemed sufficient. A variety of fault injection techniques are used to evaluate system reliability, most commonly fault emulation, fault simulation, laser testing, and particle beam testing. Combining multiple techniques is more complex and less common. In this study we characterize a real-world application that leverages a radiation-hardening by software (RHBSW) solution for the SpaceCube platform, using two fault injection strategies: laser testing and fault emulation. We describe several valuable lessons learned, and show how both validation techniques can be combined to greater effect.

【Keywords】: Circuit faults; Testing; Emulation; Computer crashes; Laser applications; Field programmable gate arrays

38. Intrusion detection and honeypots in nested virtualization environments.

【Paper Link】【Pages】:1-6

【Authors】: Michael Beham ; Marius Vlad ; Hans P. Reiser

【Abstract】: Several research projects in the past have built intrusion detection systems and honeypot architectures based on virtual machine introspection (VMI). These systems directly benefit from the use of virtualization technology. The VMI approach, however, requires direct interaction with the virtual machine monitor, and typically is not available to clients of current public clouds. Recently, nested virtualization has gained popularity in research as an approach that could enable cloud customers to use virtualization-based solutions within a cloud by nesting two virtual machine monitors, with the inner one under control of the client. In this paper, we compare the performance of existing nested-virtualization solutions and analyze the impact of the performance overhead on VMI-based intrusion detection and honeypot systems.

【Keywords】: Cloud computing; Intrusion detection; Honeypots; Nested virtualization

39. CloudPD: Problem determination and diagnosis in shared dynamic clouds.

【Paper Link】【Pages】:1-12

【Authors】: Bikash Sharma ; Praveen Jayachandran ; Akshat Verma ; Chita R. Das

【Abstract】: In this work, we address problem determination in virtualized clouds. We show that high dynamism, resource sharing, frequent reconfiguration, high propensity to faults and automated management introduce significant new challenges towards fault diagnosis in clouds. Towards this, we propose CloudPD, a fault management framework for clouds. CloudPD leverages (i) a canonical representation of the operating environment to quantify the impact of sharing; (ii) an online learning process to tackle dynamism; (iii) a correlation-based performance models for higher detection accuracy; and (iv) an integrated end-to-end feedback loop to synergize with a cloud management ecosystem. Using a prototype implementation with cloud representative batch and transactional workloads like Hadoop, Olio and RUBiS, it is shown that CloudPD detects and diagnoses faults with low false positives (<; 16%) and high accuracy of 88%, 83% and 83%, respectively. In an enterprise trace-based case study, CloudPD diagnosed anomalies within 30 seconds and with an accuracy of 77%, demonstrating its effectiveness in real-life operations.

【Keywords】: Hadoop MapReduce; Cloud; Problem Determination; Fault Diagnosis; Virtualization; Performance

40. Geo-replicated storage with scalable deferred update replication.

【Paper Link】【Pages】:1-12

【Authors】: Daniele Sciascia ; Fernando Pedone

【Abstract】: Many current online services are deployed over geographically distributed sites (i.e., datacenters). Such distributed services call for geo-replicated storage, that is, storage distributed and replicated among many sites. Geographical distribution and replication can improve locality and availability of a service. Locality is achieved by moving data closer to the users. High availability is attained by replicating data in multiple servers and sites. This paper considers a class of scalable replicated storage systems based on deferred update replication with transactional properties. The paper discusses different ways to deploy scalable deferred update replication in geographically distributed systems, considers the implications of these deployments on user-perceived latency, and proposes solutions. Our results are substantiated by a series of microbenchmarks and a social network application.

【Keywords】: transactional systems; Database replication; scalable data store; fault tolerance; high performance

41. State-of-the-practice in data center virtualization: Toward a better understanding of VM usage.

【Paper Link】【Pages】:1-12

【Authors】: Robert Birke ; Andrej Podzimek ; Lydia Y. Chen ; Evgenia Smirni

【Abstract】: Hardware virtualization is the prevalent way to share data centers among different tenants. In this paper we present a large scale workload characterization study that aims to a better understanding of the state-of-the-practice, i.e., how data centers in the private cloud are used by their customers, how physical resources are shared among different tenants using virtualization, and how virtualization technologies are actually employed. Our study focuses on all corporate data centers of a major infrastructure provider that are geographically dispersed across the entire globe and reports on their observed usage across a 19-day period. We especially focus on how virtual machines are deployed across different physical resources with an emphasis on processors and memory, focusing on resource sharing and usage of physical resources, virtual machine life cycles, and migration patterns and frequencies. Our study illustrates that there is a huge tendency in over provisioning resources while being conservative to the several possibilities opened up by virtualization (e.g., migration and co-location), showing tremendous potential for the development of policies aiming to reduce data center operational costs.

【Keywords】: workload characterization; Data Centers; virtualization; private cloud

42. WirelessHART modeling and performance evaluation.

【Paper Link】【Pages】:1-12

【Authors】: Anne Remke ; Xian Wu

【Abstract】: In process industries wired supervisory and control networks are more and more replaced by wireless systems. Wireless communication inevitably introduces time delays and message losses, which may degrade the system reliability and performance. WirelessHART, as the first international standard for wireless process supervision and control has received notable academic attention. This paper models WirelessHART networks with link failures using Discrete-time Markov chains and evaluates the network performance in a typical WirelessHART environment with respect to delay and reachability. The evaluation shows that although the performance of WirelessHART is influenced by several factors, it is capable to deliver reliable service in typical industrial environments. The proposed model can also be used to predict path performance and to provide routing suggestions.

【Keywords】: Logic gates; Schedules; Wireless communication; Delays; Computational modeling; Uplink; Routing

43. Generative software-based memory error detection and correction for operating system data structures.

【Paper Link】【Pages】:1-12

【Authors】: Christoph Borchert ; Horst Schirmeier ; Olaf Spinczyk

【Abstract】: Recent studies indicate that the number of system failures caused by main memory errors is much higher than expected. In contrast to the commonly used hardware-based countermeasures, for example using ECC memory, software-based fault-tolerance measures are much more flexible and can exploit application knowledge, such as the criticality of specific data structures. This paper presents a software-based memory error protection approach, which we used to harden the eCos operating system in a case study. The main benefits of our approach are the flexibility to choose from an extensible toolbox of easily pluggable error detection and correction schemes as well as its very low runtime overhead, which totals in a range of 0.09-1.7 %. The implementation is based on aspect-oriented programming and exploits the object-oriented program structure of eCos to identify well-suited code locations for the insertion of generative fault-tolerance measures.

【Keywords】: Data structures; Computer crashes

44. Increasing the trustworthiness of commodity hardware through software.

【Paper Link】【Pages】:1-6

【Authors】: Kevin Elphinstone ; Yanyan Shen

【Abstract】: Advances in formal software verification has produced an operating system that is guaranteed mathematically to be correct and enforce access isolation. Such an operating system could potentially consolidate safety and security critical software on a single device where previously multiple devices were used. One of the barriers to consolidation on commodity hardware is the lack of hardware dependability features. A hardware fault triggered by cosmic rays, alpha particle strikes, etc. potentially invalidates the strong mathematical guarantees. This paper discusses improving the trustworthiness of commodity hardware to enable a verified microkernel to be used in some situations previously needing separate computers. We explore leveraging multicore processors to provide redundancy, and report the results of our initial performance investigation.

【Keywords】: reliability; multicore; kernel

45. Manipulating semantic values in kernel data structures: Attack assessments and implications.

【Paper Link】【Pages】:1-12

【Authors】: Aravind Prakash ; Eknath Venkataramani ; Heng Yin ; Zhiqiang Lin

【Abstract】: Semantic values in kernel data structures are critical to many security applications, such as virtual machine introspection, malware analysis, and memory forensics. However, malware, or more specifically a kernel rootkit, can often directly tamper with the raw kernel data structures, known as DKOM (Direct Kernel Object Manipulation) attacks, thereby significantly thwarting security analysis. In addition to manipulating pointer fields to hide certain kernel objects, DKOM attacks may also mutate semantic values, which are data values with important semantic meanings. Prior research efforts have been made to defeat pointer manipulation attacks and thus identify hidden kernel objects. However, the space and severity of Semantic Value Manipulation (SVM) attacks have not received sufficient understanding. In this paper, we take a first step to systematically assess this attack space. To this end, we devise a new fuzz testing technique, namely - duplicate-value directed semantic field fuzzing, and implement a prototype called MOSS. Using MOSS, we evaluate two widely used operating systems: Windows XP and Ubuntu 10.04. Our experimental results show that the space of SVM attacks is vast for both OSes. Our proof-of-concept kernel rootkit further demonstrates that it can successfully evade all the security tools tested in our experiments, including recently proposed robust signature schemes. Moreover, our duplicate value analysis implies the challenges in defeating SVM attacks, such as an intuitive cross checking approach on duplicate values can only provide marginal detection improvement. Our study motivates revisiting of existing security solutions and calls for more effective defense against kernel threats.

【Keywords】: Semantics; Kernel; Testing; Heuristic algorithms; Data structures; Security; Support vector machines

46. Seamless kernel updates.

【Paper Link】【Pages】:1-12

【Authors】: Maxim Siniavine ; Ashvin Goel

【Abstract】: Kernel patches are released frequently to fix bugs and security vulnerabilities. However, users and system administrators often delay installing these updates because they require a system reboot, which results in disruption of service and the loss of application state. Unfortunately, the longer a system remains out-of-date, the higher is the likelihood of system failure or a successful attack. Approaches, such as dynamic patching and hot swapping, have been proposed for updating the kernel. All of them either limit the types of updates that are supported, or require significant programming effort to manage. We have designed a system that checkpoints application-visible state, updates the kernel, and restores the application state thus minimizing disruption of service. By checkpointing high-level state, our system no longer depends on the precise implementation of a patch and can apply all backward compatible patches. Our results show that updates to major releases of the Linux kernel can be applied with minimal effort and no observable overhead.

【Keywords】: Kernel; Instruction sets; Data structures; Linux; Protocols; Transfer functions; Reliability

47. Addressing memory exhaustion failures in Virtual Machines in a cloud environment.

【Paper Link】【Pages】:1-6

【Authors】: Jose Antonio Navas Molina ; Shivakant Mishra

【Abstract】: With the expansion of the cloud computing usage over a wide range of areas and different kinds of users, the cloud providers are taking full advantage of all their resources as much as they can. Memory is the most expensive resource in terms of oversubscribing and this has resulted in high price to the end user. Furthermore, performing swapping in Virtual Machines (VM) is expensive, so the cloud provider usually do not offer any swapping space for its systems. As a consequence, when a VM runs out of memory, user processes are killed. This scenario in the cloud environment is especially critical, since the user loses all of his/her execution time and, by extension, the money invested in this computation. This paper addresses this critical problem by providing a kernel extension that monitors the memory requirements of a VM and prevents the out of memory state by creating swapping space dynamically. The paper describes the design and implementation of a preliminary prototype of this kernel extensions and evaluates its performance.

【Keywords】: cloud; Virtual machine; memory exhaustion; swapping

48. Availability study on cloud computing environments: Live migration as a rejuvenation mechanism.

【Paper Link】【Pages】:1-6

【Authors】: Matheus Melo ; Paulo Romero Martins Maciel ; Jean Araujo ; Rubens de S. Matos ; Carlos Araújo

【Abstract】: With the increasing adoption of cloud computing environments, studies about high availability in those systems became more and more significant. Software rejuvenation is an important mechanism to improve system availability. This paper presents a comprehensive availability model to evaluate the utilization of the live migration mechanism to enable VMM rejuvenation with minimum service interruption. Live migrations are performed observing a time-based trigger. We evaluate five different scenarios, with distinct time intervals for triggering the rejuvenation. The results show that the live migration can significantly reduce the system downtime.

【Keywords】: availability; Cloud computing; software aging and rejuvenation; live migration

49. Mitigating access-driven timing channels in clouds using StopWatch.

【Paper Link】【Pages】:1-12

【Authors】: Peng Li ; Debin Gao ; Michael K. Reiter

【Abstract】: This paper presents StopWatch , a system that defends against timing-based side-channel attacks that arise from coresidency of victims and attackers in infrastructure-as-a-service clouds. StopWatch triplicates each cloud-resident guest virtual machine (VM) and places replicas so that the three replicas of a guest VM are coresident with nonoverlapping sets of (replicas of) other VMs. StopWatch uses the timing of I/O events at a VM's replicas collectively to determine the timings observed by each one or by an external observer, so that observable timing behaviors are similarly likely in the absence of any other individual, coresident VM. We detail the design and implementation of StopWatch in Xen, evaluate the factors that influence its performance, and address the problem of placing VM replicas in a cloud under the constraints of StopWatch so as to still enable adequate cloud utilization.

【Keywords】: Clocks; Real-time systems; Virtual machine monitors; Hardware; Synchronization; Radiation detectors

50. Evaluating Xilinx SEU Controller Macro for fault injection.

【Paper Link】【Pages】:1-2

【Authors】: Jose Luis Nunes ; João Carlos Cunha ; Raul Barbosa ; Mário Zenha Rela

【Abstract】: This paper presents a preliminary evaluation of the SEU Controller Macro, a VHDL component developed by Xilinx for the detection and recovery of single event upsets, as a building block of an FPGA fault-injector. We found that this SEU Controller Macro is extremely effective for injecting faults into the FPGA configuration memory, as single and double bit-flips, with precise location, virtually no intrusiveness, and coarse timing accuracy. We present some clues on how to extend its functionalities to build a fully-fledge FPGA fault injector.

【Keywords】: embedded systems; fault injection; FPGA; SEU

51. Network traffic anomaly detection based on growing hierarchical SOM.

【Paper Link】【Pages】:1-2

【Authors】: Shin-Ying Huang ; Yennun Huang

【Abstract】: Network anomaly detection aims to detect patterns in a given network traffic data that do not conform to an established normal behavior. Distinguishing different anomaly patterns from large amount of data can be a challenge, let alone visualizing them in a comparative perspective. Recently, the unsupervised learning method such as the K-means [3], self-organizing map (SOM) [2], and growing hierarchical self-organizing map (GHSOM) [1] have been shown to be able to facilitate network anomaly detection [4][5]. However, there is no study addressing both mining and detecting task. This study leverages the advantage of GHSOM to analyze the network traffic data and visualize the distribution of attack patterns with hierarchical relationship. In the mining stage, the geometric distances between each pattern and its descriptive information are revealed in the topological space. The density and the sample size of each node can help to detect anomalous network traffic. In the detecting stage, this study extends the traditional GHSOM and uses the support vector machine (SVM) [6] to classify network traffic data into the predefined categories. The proposed approach achieves (1) help understand the behaviors of anomalous network traffic data (2) provide effective classification rule to facilitate network anomaly detection and (3) accumulate network anomaly detection knowledge for both mining and detecting purpose. The public dataset and the private dataset are used to evaluate the proposed approach. The expected result is to confirm that the proposed approach can help understand network traffic data, and the detecting mechanism is effective for identifying anomalous behavior.

【Keywords】: Visualization; Network anomaly detection; Neural networks; Data Clustering; Data Classification

52. Why is my smartphone slow? On the fly diagnosis of underperformance on the mobile Internet.

【Paper Link】【Pages】:1-8

【Authors】: Chaitrali Amrutkar ; Matti A. Hiltunen ; Trevor Jim ; Kaustubh R. Joshi ; Oliver Spatscheck ; Patrick Traynor ; Shobha Venkataraman

【Abstract】: The perceived end-to-end performance of the mobile Internet can be impacted by multiple factors including websites, devices, and network components. Constant changes in these factors and network complexity make identifying root causes of high latency difficult. In this paper, we propose a multidimensional diagnosis technique using passive IP flow data collected at ISPs for investigating factors that impact the performance of the mobile Internet. We implement and evaluate our technique over four days of data from a major US cellular provider's network. Our approach identifies several combinations of factors affecting performance. We investigate four combinations indepth to confirm the latency causes chosen by our technique. Our findings include a popular gaming website showing poor performance on a specific device type for over 50% of the flows and web browser traffic on older devices accounting for 99% of poorly performing traffic. Our technique can direct operators in choosing factors having high impact on latency in the mobile Internet.

【Keywords】: Performance evaluation; Internet; Mobile communication; IP networks; Monitoring; Servers; Mobile computing

53. An adaptive approach to dependable circuits for a digital power control.

【Paper Link】【Pages】:1-2

【Authors】: Aromhack Saysanasongkham ; Kenta Imai ; Masayuki Arai ; Satoshi Fukumoto ; Keiji Wada

【Abstract】: Recently, a microcomputer and a FPGA are apt to be used for control of the power conversion circuits because of their capability to simplify the parameter resetting and also their flexibility on the basis of programming by software. On the other hand, the control circuits are getting extremely close to the high current main circuit. Thus the electro-magnetic radiation generated nearby the high current pulse may affect the control circuit as transient faults. In this study, we focus on transient noise caused by switching activities of a DC-DC converter and propose a dependable digital power control circuit by FPGA. The basic idea is to keep the sampling times as far away from the switching times as possible to avoid the effects of transient noise. A control circuit, with the proposed method applied, is designed and its effectiveness is shown by simulations.

【Keywords】: Noise; Switches; Power control; Field programmable gate arrays; Circuit faults; Transient analysis; Integrated circuit modeling

54. Modeling and analysing operation processes for dependability.

【Paper Link】【Pages】:1-2

【Authors】: Xiwei Xu ; Liming Zhu ; Jim Zhanwen Li ; Len Bass ; Qinghua Lu ; Min Fu

【Abstract】: Application dependability issues depend on increasingly sophisticated activities during operation time for deployment, upgrade, scaling out/in and reactions to various failures. Traditional approaches to improving application dependability focus on artifact-oriented troubleshooting and improvements. In this paper, we present an approach using process models to represent and analyze operations with considerations of exception handlings and fault-proneness. Our goal is to reduce diagnosis and repair time for application failures that occur during operation activities such as deployment and upgrade.

【Keywords】: Computational modeling; Analytical models; Maintenance engineering; Biological system modeling; Availability; Data models; Educational institutions

55. The architecture of a resilience infrastructure for computing and communication systems.

【Paper Link】【Pages】:1-2

【Authors】: Algirdas Avizienis

【Abstract】: The resilience infrastructure is a physically and functionally separate add-on to a “Client” computing and/or communication system that provides resilience to the Client system. This short paper summarizes the main features of the architecture of a resilience infrastructure.

【Keywords】: fault tolerance; resilience; infrastructure; dependability

56. Operating SECDED-based caches at ultra-low voltage with FLAIR.

【Paper Link】【Pages】:1-11

【Authors】: Moinuddin K. Qureshi ; Zeshan Chishti

【Abstract】: Voltage scaling is often limited by bit failures in large on-chip caches. Prior approaches for enabling cache operation at low voltages rely on correcting cache lines with multi-bit failures. Unfortunately, multi-bit Error Correcting Codes (ECC) incur significant storage overhead and complex logic. Our goal is to develop solutions that enable ultra-low voltage operation while incurring minimal changes to existing SECDED-based cache designs. We exploit the observation that only a small percentage of cache lines have multi-bit failures. We propose FLexible And Introspective Replication (FLAIR) that performs two-way replication for part of the cache during testing to maintain robustness, and disables lines with multi-bit failures after testing. FLAIR leverages the correction features of existing SECDED code to greatly improve on simple two-way replication. FLAIR provides a Vmin of 485mv (similar to ECC-8) and maintains robustness to soft-error, while incurring a storage overhead of only one bit per cache line.

【Keywords】: Testing; Error correction codes; Low voltage; Circuit faults; Robustness; Microprocessors

57. DATAFLASKS: An epidemic dependable key-value substrate.

【Paper Link】【Pages】:1-6

【Authors】: Francisco Maia ; Miguel Matos ; Ricardo Manuel Pereira Vilaça ; José Orlando Pereira ; Rui Oliveira ; Etienne Riviere

【Abstract】: Recently, tuple-stores have become pivotal structures in many information systems. Their ability to handle large datasets makes them important in an era with unprecedented amounts of data being produced and exchanged. However, these tuple-stores typically rely on structured peer-to-peer protocols which assume moderately stable environments. Such assumption does not always hold for very large scale systems sized in the scale of thousands of machines. In this paper we present a novel approach to the design of a tuple-store. Our approach follows a stratified design based on an unstructured substrate. We focus on this substrate and how the use of epidemic protocols allow reaching high dependability and scalability.

【Keywords】: Large Scale Data Stores; Dependability; Epidemic Protocols; Distributed Systems

58. Model-based performance analysis of local re-execution scheme in offloading system.

【Paper Link】【Pages】:1-6

【Authors】: Qiushi Wang ; Huaming Wu ; Katinka Wolter

【Abstract】: Offloading is a useful approach to save energy and time for mobile devices by migrating heavy computation to remote powerful servers. However, the unreliable wireless network constrains the implementation of offloading applications. The execution continuity is always interrupted by network failures. To deal with this problem, locally re-executing the pre-determined offloading task in the mobile device is a valid method. Challenges arise due to the best trade-off between costs and benefits of Local Re-execution. In this paper, using a Stochastic Activity Network model, we defined three metrics to investigate the performance of Local Re-execution, which is launched by different timeout values. Through comprehensively comparing the simulation results, we further explored the optimal timeout value for activating Local Re-execution, and reached the conclusion that the optimum is mainly controlled by the delay of network recovery.

【Keywords】: timeout; Modeling; performance analysis; Offloading; Stochastic Activity Network

59. Automating the debugging of datacenter applications with ADDA.

【Paper Link】【Pages】:1-12

【Authors】: Cristian Zamfir ; Gautam Altekar ; Ion Stoica

【Abstract】: Debugging data-intensive distributed applications running in datacenters is complex and time-consuming because developers do not have practical ways of deterministically replaying failed executions. The reason why building such tools is hard is that non-determinism that may be tolerable on a single node is exacerbated in large clusters of interacting nodes, and datacenter applications produce terabytes of intermediate data exchanged by nodes, thus making full input recording infeasible. We present ADDA, a replay-debugging system for datacenters that has lower recording and storage overhead than existing systems. ADDA is based on two techniques: First, ADDA provides control plane determinism, leveraging our observation that many typical datacenter applications consist of a separate “control plane” and “data plane”, and most bugs reside in the former. Second, ADDA does not record “data plane” inputs, instead it synthesizes them during replay, starting from the application's external inputs, which are typically persisted in append-only storage for reasons unrelated to debugging. We evaluate ADDA and show that it deterministically replays real-world failures in Hypertable and Memcached.

【Keywords】: storage; debugging; record-replay; reliability; data-center

60. A review of cloud deployment models for e-learning systems.

【Paper Link】【Pages】:1-2

【Authors】: Engin Leloglu ; Tolga Ayav ; Burak Galip Aslan

【Abstract】: With the significant growth in the cloud-based systems, many industries give their attention to cloud computing solutions. E-learning is a promising application area since its typical requirements such as dynamically allocation of computation and storage resources, coincide well with cloud characteristics. This paper presents some possible cloud solutions in e-learning environments by emphasizing its pros and cons. It is of paramount importance to choose the most suitable cloud model for an e-learning application or an educational organization in terms of scalability, portability and security. We distinguish various deployment alternatives of cloud computing and discuss their benefits against typical e-learning requirements.

【Keywords】: deployment models; cloud computing; e-learning

61. A view on the past and future of fault injection.

【Paper Link】【Pages】:1-2

【Authors】: Nuno Silva ; Ricardo Barbosa ; João Carlos Cunha ; Marco Vieira

【Abstract】: Fault injection is a well-known technology that enables assessing dependability attributes of computer systems. Many works on fault injection have been developed in the past, and fault injection has been used in different application domains. This fast abstract briefly revises previous applications of fault injection, especially for embedded systems, and puts forward ideas on its future use, both in terms of application areas and business markets.

【Keywords】: Fault Models; Fault Injection; Dependability

62. An empirical investigation of fault repairs and mitigations in space mission system software.

【Paper Link】【Pages】:1-8

【Authors】: Javier Alonso ; Michael Grottke ; Allen P. Nikora ; Kishor S. Trivedi

【Abstract】: Faults in software systems can have different characteristics. In an earlier paper, the anomaly reports for a number of JPL/NASA missions were analyzed and the underlying faults were classified as Bohrbugs, non-aging-related Mandelbugs, and aging-related bugs. In another paper the times to failure for each of these fault types were examined to identify trends within missions as well as across the missions. The results of those papers are now starting to provide guidance to improve the dependability of space mission software. Just as there are different types of faults, there are different kinds of mitigations of faults and failures. This paper analyzes the mitigations associated with each fault studied in our previous papers. We identify trends of mitigation type proportions within missions as well as from mission to mission. We also look for relationships between fault types and mitigation types. The results will be used to increase the reliability of space mission software.

【Keywords】: Software; Space vehicles; Instruments; NASA; Software reliability; Testing

63. Complementing static and dynamic analysis approaches for better network defense.

【Paper Link】【Pages】:1-2

【Authors】: Himanshu Pareek ; N. Sarat Chandra Babu

【Abstract】: This paper presents a novel approach to prevent execution of malicious files on host system present in the network. For preventing a malicious file at gateway level, it is required to do reassembly of packets which in turn reduces the network performance by huge margins. This work combines static and dynamic analysis approaches and implements a minimal agent based system to prevent malicious files and in turn increase the dependability using existing systems.

【Keywords】: malware; network defense; heuristics

64. A logic for model-checking mean-field models.

【Paper Link】【Pages】:1-12

【Authors】: Anna Kolesnichenko ; Pieter-Tjerk de Boer ; Anne Remke ; Boudewijn R. Haverkort

【Abstract】: Recently the mean-field method has been adopted for analysing systems consisting of a large number of interacting objects in computer science, biology, chemistry, etc. It allows for a quick and accurate analysis of such systems, while avoiding the state-space explosion problem. So far, the method has primarily been used for performance evaluation. In this paper, we use the mean-field method for model-checking. We define and motivate a logic MF-CSL for describing properties of systems composed of many identical interacting objects. The proposed logic allows describing both properties of the overall system and of a random individual object. Algorithms to check the satisfaction relation for all MF-CSL operators are proposed. Furthermore, we explain how the set of all time instances that fulfill a given MF-CSL formula for a certain distribution of objects can be computed.

【Keywords】: Computers

65. Detecting malicious landing pages in Malware Distribution Networks.

【Paper Link】【Pages】:1-11

【Authors】: Gang Wang ; Jack W. Stokes ; Cormac Herley ; David Felstead

【Abstract】: Drive-by download attacks attempt to compromise a victim's computer through browser vulnerabilities. Often they are launched from Malware Distribution Networks (MDNs) consisting of landing pages to attract traffic, intermediate redirection servers, and exploit servers which attempt the compromise. In this paper, we present a novel approach to discovering the landing pages that lead to drive-by downloads. Starting from partial knowledge of a given collection of MDNs we identify the malicious content on their landing pages using multiclass feature selection. We then query the webpage cache of a commercial search engine to identify landing pages containing the same or similar content. In this way we are able to identify previously unknown landing pages belonging to already identified MDNs, which allows us to expand our understanding of the MDN. We explore using both a rule-based and classifier approach to identifying potentially malicious landing pages. We build both systems and independently verify using a high-interaction honeypot that the newly identified landing pages indeed attempt drive-by downloads. For the rule-based system 57% of the landing pages predicted as malicious are confirmed, and this success rate remains constant in two large trials spaced five months apart. This extends the known footprint of the MDNs studied by 17%. The classifier-based system is less successful, and we explore possible reasons.

【Keywords】: signature; Drive-by download; malware distribution network

66. Chasing the optimum in replicated in-memory transactional platforms via protocol adaptation.

【Paper Link】【Pages】:1-12

【Authors】: Maria Couceiro ; Pedro Ruivo ; Paolo Romano ; Luís E. T. Rodrigues

【Abstract】: Replication plays an essential role for in-memory distributed transactional platforms, such as NoSQL data grids, given that it represents the primary mean to ensure data durability. Unfortunately, no single replication technique can ensure optimal performance across a wide range of workloads and system configurations. This paper tackles this problem by presenting MORPHR, a framework that allows to automatically adapt the replication protocol of in-memory transactional platforms according to the current operational conditions. MORPHR presents two key innovative aspects. On one hand, it allows to plug in, in a modular fashion, specialized algorithms to regulate the switching between arbitrary replication protocols. On the other hand, MORPHR relies on state of the art machine learning techniques to autonomously determine the optimal replication in face of varying workloads. We integrated MORPHR in a popular open-source in-memory NoSQL data grid, and evaluated it by means of an extensive experimental study. The results highlight that MORPHR is accurate in identifying the optimal replication strategy in presence of complex, realistic workloads, and does so with minimal overhead.

【Keywords】: Protocols

67. Dependability models for designing disaster tolerant cloud computing systems.

【Paper Link】【Pages】:1-6

【Authors】: Bruno Silva ; Paulo Romero Martins Maciel ; Eduardo Tavares ; Armin Zimmermann

【Abstract】: Hundreds of natural disasters occur in many parts of the world every year, causing billions of dollars in damages. This fact contrasts with the high availability requirement of cloud computing systems, and, to protect such systems from unforeseen catastrophe, a recovery plan requires the utilization of different data centers located far enough apart. However, the time to migrate a VM from a data center to another increases due to distance. This work presents dependability models for evaluating distributed cloud computing systems deployed into multiple data centers considering disaster occurrence. Additionally, we present a case study which evaluates several scenarios with different VM migration times and distances between data centers.

【Keywords】: stochastic Petri nets; cloud computing; dependability evaluation

68. Uniform node sampling service robust against collusions of malicious nodes.

【Paper Link】【Pages】:1-12

【Authors】: Emmanuelle Anceaume ; Yann Busnel ; Bruno Sericola

【Abstract】: We consider the problem of achieving uniform node sampling in large scale systems in presence of a strong adversary. We first propose an omniscient strategy that processes on the fly an unbounded and arbitrarily biased input stream made of node identifiers exchanged within the system, and outputs a stream that preserves Uniformity and Freshness properties. We show through Markov chains analysis that both properties hold despite any arbitrary bias introduced by the adversary. We then propose a knowledge-free strategy and show through extensive simulations that this strategy accurately approximates the omniscient one. We also evaluate its resilience against a strong adversary by studying two representative attacks (flooding and targeted attacks). We quantify the minimum number of identifiers that the adversary must insert in the input stream to prevent uniformity. To our knowledge, such an analysis has never been proposed before.

【Keywords】: randomized approximation algorithm; Data stream; strong adversary; uniform sampling; Markov chains

69. Message from the PDS program chair.

【Paper Link】【Pages】:1

【Authors】: Peter Kemper

【Abstract】: PDS is a well-recognized, major international forum for researchers and practitioners to present innovative, high-quality results in the analysis and assessment of computer systems and networks in terms of their performance, dependability, and security. As much as the considered systems vary from extremely small embedded devices and hardware components to large scale networked systems and infrastructures, so do the properties of interest with various aspects of performance, dependability, and security and the assessment approaches ranging from empirical and experimental studies to refined mathematical modeling methods. This year, we received 113 submissions (95 regular papers, 11 practical experience reports, and 7 tool descriptions) from 20 countries, covering various facets of research and practice on these topics.

【Keywords】:

70. Error detector placement for soft computation.

【Paper Link】【Pages】:1-12

【Authors】: Anna Thomas ; Karthik Pattabiraman

【Abstract】: The scaling of Silicon devices has exacerbated the unreliability of modern computer systems, and power constraints have necessitated the involvement of software in hardware error detection. At the same time, emerging workloads in the form of soft computing applications, (e.g., multimedia applications) can tolerate most hardware errors as long as the erroneous outputs do not deviate significantly from error-free outcomes. We term outcomes that deviate significantly from the error-free outcomes as Egregious Data Corruptions (EDCs). In this study, we propose a technique to place detectors for selectively detecting EDC causing errors in an application. We performed an initial study to formulate heuristics that identify EDC causing data. Based on these heuristics, we developed an algorithm that identifies program locations for placing high coverage detectors for EDCs using static analysis.We evaluate our technique on six benchmarks to measure the EDC coverage under given performance overhead bounds. Our technique achieves an average EDC coverage of 82%, under performance overheads of 10%, while detecting 10% of the Non-EDC and benign faults.

【Keywords】: Detectors; Benchmark testing; PSNR; Decoding; Arrays; Measurement; Hardware

71. Practical automated vulnerability monitoring using program state invariants.

【Paper Link】【Pages】:1-12

【Authors】: Cristiano Giuffrida ; Lorenzo Cavallaro ; Andrew S. Tanenbaum

【Abstract】: Despite the growing attention to security concerns and advances in code verification tools, many memory errors still escape testing and plague production applications with security vulnerabilities. We present RCORE, an efficient dynamic program monitoring infrastructure to perform automated security vulnerability monitoring. Our approach is to perform extensive static analysis at compile time to automatically index program state invariants (PSIs). At runtime, our novel dynamic analysis continuously inspects the program state and produces a report when PSI violations are found. Our technique retrofits existing applications and is designed for both offline and production runs. To avoid slowing down production applications, we can perform our dynamic analysis on idle cores to detect suspicious behavior in the background. The alerts raised by our analysis are symptoms of memory corruption or other-potentially exploitable-dangerous behavior. Our experimental evaluation confirms that RCORE can report on several classes of vulnerabilities with very low overhead.

【Keywords】: Systems Security; Program State Invariants; Vulnerability Analysis; Memory Errors

72. Crossing the threshold: Detecting network malfeasance via sequential hypothesis testing.

【Paper Link】【Pages】:1-12

【Authors】: Srinivas Krishnan ; Teryl Taylor ; Fabian Monrose ; John McHugh

【Abstract】: The domain name system plays a vital role in the dependability and security of modern network. Unfortunately, it has also been widely misused for nefarious activities. Recently, attackers have turned their attention to the use of algorithmically generated domain names (AGDs) in an effort to circumvent network defenses. However, because such domain names are increasingly being used in benign applications, this transition has significant implications for techniques that classify AGDs based solely on the format of a domain name. To highlight the challenges they face, we examine contemporary approaches and demonstrate their limitations. We address these shortcomings by proposing an online form of sequential hypothesis testing that classifies clients based solely on the non-existent (NX) responses they elicit. Our evaluations on real-world data show that we outperform existing approaches, and for the vast majority of cases, we detect malware before they are able to successfully rendezvous with their command and control centers.

【Keywords】: Program processors; Engines