49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2019, Portland, OR, USA, June 24-27, 2019. IEEE 【DBLP Link】
【Paper Link】 【Pages】:1-12
【Authors】: Georgios Mappouras ; Alireza Vahid ; A. Robert Calderbank ; Daniel J. Sorin
【Abstract】: Racetrack memory is an exciting emerging memory technology with the potential to offer far greater capacity and performance than other non-volatile memories. Racetrack memory has an unusual error model, though, which precludes the use of the typical error coding techniques used by architects. In this paper, we introduce GreenFlag, a coding scheme that combines a new construction for Varshamov-Tenegolts codes with specially crafted delimiter bits that are placed between each codeword. GreenFlag is the first coding scheme that is compatible with 3D racetrack, which has the benefit of very high density but the limitation of a single read/write port per track. Based on our implementation of encoding/decoding hardware, we analyze the trade-offs between latency, code length, and code rate; we then use this analysis to evaluate the viability of racetrack at each level of the memory hierarchy.
【Keywords】: Racetrack-Memory; Coding; Fault-Tolerance; Shift-Errors
【Paper Link】 【Pages】:13-25
【Authors】: Minesh Patel ; Jeremie S. Kim ; Hasan Hassan ; Onur Mutlu
【Abstract】: Experimental characterization of DRAM errors is a powerful technique for understanding DRAM behavior and provides valuable insights for improving overall system performance, energy efficiency, and reliability. Unfortunately, recent DRAM technology scaling issues are forcing manufacturers to adopt on-die error-correction codes (ECC), which pose a significant challenge for DRAM error characterization studies by obfuscating raw error distributions using undocumented, proprietary, and opaque error-correction hardware. As we show in this work, errors observed in devices with on-die ECC no longer follow expected, well-studied distributions (e.g., lognormal retention times) but rather depend on the particular ECC scheme used. In this work, we develop Error-correction INference (EIN), a new statistical inference methodology that overcomes the inability to understand the error characteristics of DRAM devices with ondie ECC. EIN uses maximum a posteriori (MAP) estimation over statistical models that we develop to represent ECC operation to: i) reverse-engineer the ECC scheme and ii) infer the pre-correction error rates given only the post-correction errors. We design and publicly release EINSim, a flexible open-source simulator that can apply EIN to a wide variety ofDRAM devices and standards. We evaluate EIN through the first experimental error-characterization study of DRAM devices with on-die ECC in open literature. Using the data-retention error rates of 232 (82) LPDDR4 devices with (without) on-die ECC across a wide range of temperatures, refresh rates, and test patterns, we show that EIN enables: i) reverse-engineering the on-die ECC scheme, which we find to be a single-error correction Hamming code with (n = 136, k = 128, d = 3), ii) inferring pre-correction error rates given only post-correction errors, and iii) recovering the well-studied pre-correction error distributions that on-die ECC obfuscates.
【Keywords】: Error correction codes; Random access memory; Reliability; Estimation; Standards; Bit error rate
【Paper Link】 【Pages】:26-38
【Authors】: Athanasios Chatzidimitriou ; Pablo Bodmann ; George Papadimitriou ; Dimitris Gizopoulos ; Paolo Rech
【Abstract】: Fault injection in early microarchitecture-level simulation CPU models and beam experiments on the final physical CPU chip are two established methodologies to access the soft error reliability of a microprocessor at different stages of its design flow. Beam experiments, on one hand, estimate the devices expected soft error rate in realistic physical conditions by exposing it to accelerated particles fluxes. Fault injection in microarchitectural models of the processor, on the other hand, provides deep insights on faults propagation through the entire system stack, including the operating system. Combining beam experiments and fault injection data can deliver deep insights about the devices expected reliability when deployed in the field. However, it is yet largely unclear if the fault injection error rates can be compared to those reported by beam experiments and how this comparison can lead to informed soft error protection decisions in early stages of the system design. In this paper, we present and analyze data gathered with extensive beam experiments (on physical CPU hardware) and microarchitectural fault injections (on an equivalent CPU model on Gem5) performed with 13 different benchmarks executed on top of Linux on an ARM Cortex-A9 microprocessor. We combine experimental data that cover more than 2.9 million years of natural exposure with the result of more than 80,000 injections. We then compare the soft error rate estimations that are based on neutron beam and fault injection experiments. We show that, for most benchmarks, fault injection can be very accurately used to predict the Silent Data Corruptions (SDCs) rate and the Application Crash rate. The System Crash rate measured with beam experiments, however is much larger than the one estimated by fault injection due to unknown proprietary parts of the physical hardware platform that can't be modeled in the simulator. Overall, our analysis shows that the relative difference between the total error rates of the beam experiments and the fault injection experiments is limited within a narrow range of values and is always smaller than one order of magnitude. This narrow range of the expected failure rate of the CPU provides invaluable assistance to the designers in making effective soft error protection decisions in early design stages.
【Keywords】: CPU-reliability; soft-errors; failures-in-time; neutron-beam; fault-injection; microarchitecture-simulation
【Paper Link】 【Pages】:39-51
【Authors】: Qiang Zeng ; Jianhai Su ; Chenglong Fu ; Golam Kayas ; Lannan Luo ; Xiaojiang Du ; Chiu Chiang Tan ; Jie Wu
【Abstract】: Adversarial examples (AEs) are crafted by adding human-imperceptible perturbations to inputs such that a machine-learning based classifier incorrectly labels them. They have become a severe threat to the trustworthiness of machine learning. While AEs in the image domain have been well studied, audio AEs are less investigated. Recently, multiple techniques are proposed to generate audio AEs, which makes countermeasures against them urgent. Our experiments show that, given an audio AE, the transcription results by Automatic Speech Recognition (ASR) systems differ significantly (that is, poor transferability), as different ASR systems use different architectures, parameters, and training datasets. Based on this fact and inspired by Multiversion Programming, we propose a novel audio AE detection approach MVP-Ears, which utilizes the diverse off-the-shelf ASRs to determine whether an audio is an AE. We build the largest audio AE dataset to our knowledge, and the evaluation shows that the detection accuracy reaches 99.88%. While transferable audio AEs are difficult to generate at this moment, they may become a reality in future. We further adapt the idea above to proactively train the detection system for coping with transferable audio AEs. Thus, the proactive detection system is one giant step ahead of attackers working on transferable AEs.
【Keywords】: Adversarial Example; transferability; Automatic Speech Recognition; DNN
【Paper Link】 【Pages】:52-63
【Authors】: Jiaqi Yan ; Guanhua Yan ; Dong Jin
【Abstract】: Malware have been one of the biggest cyber threats in the digital world for a long time. Existing machine learning based malware classification methods rely on handcrafted features extracted from raw binary files or disassembled code. The diversity of such features created has made it hard to build generic malware classification systems that work effectively across different operational environments. To strike a balance between generality and performance, we explore new machine learning techniques to classify malware programs represented as their control flow graphs (CFGs). To overcome the drawbacks of existing malware analysis methods using inefficient and nonadaptive graph matching techniques, in this work, we build a new system that uses deep graph convolutional neural network to embed structural information inherent in CFGs for effective yet efficient malware classification. We use two large independent datasets that contain more than 20K malware samples to evaluate our proposed system and the experimental results show that it can classify CFG-represented malware programs with performance comparable to those of the state-of-the-art methods applied on handcrafted malware features.
【Keywords】: malware classification; control flow graph; deep learning; graph convolution
【Paper Link】 【Pages】:64-75
【Authors】: Guanxiong Liu ; Issa Khalil ; Abdallah Khreishah
【Abstract】: Neural Network classifiers have been used successfully in a wide range of applications. However, their underlying assumption of attack free environment has been defied by adversarial examples. Researchers tried to develop defenses; however, existing approaches are still far from providing effective solutions to this evolving problem. In this paper, we design a generative adversarial net (GAN) based zero knowledge adversarial training defense, dubbed ZK-GanDef, which does not consume adversarial examples during training. Therefore, ZK-GanDef is not only efficient in training but also adaptive to new adversarial examples. This advantage comes at the cost of small degradation in test accuracy compared to full knowledge approaches. Our experiments show that ZK-GanDef enhances test accuracy on adversarial examples by up-to 49.17% compared to zero knowledge approaches. More importantly, its test accuracy is close to that of the state-of-the-art full knowledge approaches (maximum degradation of 8.46%), while taking much less training time.
【Keywords】: Adversarial Training Defense; Generative Adversarial Nets; full knowledge training; zero knowledge training
【Paper Link】 【Pages】:76-87
【Authors】: Hananeh Aliee ; Faramarz Khosravi ; Jürgen Teich
【Abstract】: The reliability of today's electronic products suffers from a growing variability of failure and ageing effects. In this paper, we investigate a technique for the efficient derivation of uncertainty distributions of system reliability. We assume that a system is composed of unreliable components whose reliabilities are modeled as probability distributions. Existing Monte Carlo (MC) simulation-based techniques, which iteratively select a sample from the probability distributions of the components, often suffer from high execution time and/or poor coverage of the sample space. To avoid the costly re-evaluation of a system reliability during MC simulation, we propose to employ the Taylor expansion of the system reliability function. Moreover, we propose a stratified sampling technique which is based on the fact that the contribution (or importance) of the components on the uncertainty of their system may not be equivalent. This technique finely/coarsely stratifies the probability distribution of the components with high/low contribution. The experimental results show that the proposed technique is more efficient and provides more accurate results compared to previously proposed techniques.
【Keywords】: Reliability, Uncertainty Analysis, Sampling, Importance Measure, System Design, Stratified Sampling
【Paper Link】 【Pages】:88-99
【Authors】: Hoang Hai Nguyen ; Kartik Palani ; David M. Nicol
【Abstract】: Network reliability studies properties of networks subjected to random failures of their components. It has been widely adopted to modeling and analyzing real-world problems across different domains, such as circuit design, genomics, databases, information propagation, network security, and many others. Two practical situations that usually arise from such problems are (i) the correlation between component failures and (ii) the uncertainty in failure probabilities. Previous work captured correlations by modeling component reliability using general Boolean expression of Bernoulli random variables. This paper extends such a model to address the second problem, where we investigate the use of Beta distributions to capture the variance of uncertainty. We call this new formalism the Beta uncertain graph. We study the reliability polynomials of Beta uncertain graphs as multivariate polynomials of Beta random variables and demonstrate the use of the model on two realistic examples. We also observe that the reliability distribution of a monotone Beta uncertain graph can be approximated by a Beta distribution, usually with high accuracy. Numerical results from Monte Carlo simulation of an approximation scheme and from two case studies strongly support this observation.
【Keywords】: network reliability; correlated failures; uncertainty; Beta distribution
【Paper Link】 【Pages】:100-111
【Authors】: Peter Buchholz ; Iryna Dohndorf ; Jan Kriege
【Abstract】: The traditional expectation-maximization (EM) algorithm is a general purpose algorithm for maximum likelihood estimation in problems with incomplete data. Several variants of the algorithm exist to estimate the parameters of phase-type distributions (PHDs), a widely used class of distributions in performance and dependability modeling. EM algorithms are typical offline algorithms because they improve the likelihood function by iteratively running through a fixed sample. Nowadays data can be generated online in most systems such that offline algorithms seem to be outdated in this environment. This paper proposes an online EM algorithm for parameter estimation of PHDs. In contrast to the offline version, the online variant adds data immediately when it becomes available and includes no iteration. Different variants of the algorithms are proposed that exploit the specific structure of subclasses of PHDs like hyperexponential, hyper-Erlang or acyclic PHDs. The algorithm furthermore incorporates current methods to detect drifts or change points in a data stream and estimates a new PHD whenever such a behavior has been identified. Thus, the resulting distributions can be applied for online model prediction and for the generation of inhomogeneous PHDs as an extension of inhomogeneous Poisson processes. Numerical experiments with artificial and measured data streams show the applicability of the approach.
【Keywords】: stochastic modeling; phase-type distributions; expectation-maximization algorithm; online algorithm; parameter estimation
【Paper Link】 【Pages】:112-124
【Authors】: Saurabh Jha ; Subho S. Banerjee ; Timothy Tsai ; Siva Kumar Sastry Hari ; Michael B. Sullivan ; Zbigniew T. Kalbarczyk ; Stephen W. Keckler ; Ravishankar K. Iyer
【Abstract】: The safety and resilience of fully autonomous vehicles (AVs) are of significant concern, as exemplified by several headline-making accidents. While AV development today involves verification, validation, and testing, end-to-end assessment of AV systems under accidental faults in realistic driving scenarios has been largely unexplored. This paper presents DriveFI, a machine learning-based fault injection engine, which can mine situations and faults that maximally impact AV safety, as demonstrated on two industry-grade AV technology stacks (from NVIDIA and Baidu). For example, DriveFI found 561 safety-critical faults in less than 4 hours. In comparison, random injection experiments executed over several weeks could not find any safety-critical faults.
【Keywords】: Autonomous Vehicles; Fault Injection; Machine Learning
【Paper Link】 【Pages】:125-137
【Authors】: Weibin Wu ; Hui Xu ; Sanqiang Zhong ; Michael R. Lyu ; Irwin King
【Abstract】: The exceptional performance of Deep neural networks (DNNs) encourages their deployment in safety-and dependability-critical systems. However, DNNs often demonstrate erroneous behaviors in real-world corner cases. Existing countermeasures center on improving the testing and bug-fixing practice. Unfortunately, building a bug-free DNN-based system is almost impossible currently due to its black-box nature, so anomaly detection is imperative in practice. Motivated by the idea of data validation in a traditional program, we propose and implement Deep Validation, a novel framework for detecting real-world error-inducing corner cases in a DNN-based system during runtime. We model the specifications of DNNs by resorting to their training data and cast checking input validity of DNNs as the problem of discrepancy estimation. Deep Validation achieves excellent detection results against various corner case scenarios across three popular datasets. Consequently, Deep Validation greatly complements existing efforts and is a crucial step toward building safe and dependable DNN-based systems.
【Keywords】: neural networks; classification; safety; anomaly detection
【Paper Link】 【Pages】:138-150
【Authors】: Ankush Desai ; Shromona Ghosh ; Sanjit A. Seshia ; Natarajan Shankar ; Ashish Tiwari
【Abstract】: The recent drive towards achieving greater autonomy and intelligence in robotics has led to high levels of complexity. Autonomous robots increasingly depend on third-party off-the-shelf components and complex machine-learning techniques. This trend makes it challenging to provide strong design-time certification of correct operation. To address these challenges, we present SOTER, a robotics programming framework with two key components: (1) a programming language for implementing and testing high-level reactive robotics software, and (2) an integrated runtime assurance (RTA) system that helps enable the use of uncertified components, while still providing safety guarantees. SOTER provides language primitives to declaratively construct a RTA module consisting of an advanced, high-performance controller (uncertified), a safe, lower-performance controller (certified), and the desired safety specification. The framework provides a formal guarantee that a well-formed RTA module always satisfies the safety specification, without completely sacrificing performance by using higher performance uncertified components whenever safe. SOTER allows the complex robotics software stack to be constructed as a composition of RTA modules, where each uncertified component is protected using a RTA module. To demonstrate the efficacy of our framework, we consider a real-world case-study of building a safe drone surveillance system. Our experiments both in simulation and on actual drones show that the SOTER-enabled RTA ensures the safety of the system, including when untrusted third-party components have bugs or deviate from the desired behavior.
【Keywords】: Runtime Assurance; Runtime Verification; Safe Robotics; Programming Languages; Autonomous Robots; Assured Autonomy; Formal Methods; ROS
【Paper Link】 【Pages】:151-163
【Authors】: Pedro Ramalhete ; Andreia Correia ; Pascal Felber ; Nachshon Cohen
【Abstract】: A persistent transactional memory (PTM) library provides an easy-to-use interface to programmers for using byte-addressable non-volatile memory (NVM). Previously proposed PTMs have, so far, been blocking. We present OneFile, the first wait-free PTM with integrated wait-free memory reclamation. We have designed and implemented two variants of the OneFile, one with lock-free progress and the other with bounded wait-free progress. We additionally present software transactional memory (STM) implementations of the lock-free and wait-free algorithms targeting volatile memory. Each of our PTMs and STMs is implemented as a single C++ file with ~1,000 lines of code, making them versatile to use. Equipped with these PTMs and STMs, non-expert developers can design and implement their own lock-free and wait-free data structures on NVM, thus making lock-free programming accessible to common software developers.
【Keywords】: persistent memory; transactions; crash resilience; concurrency; wait freedom
【Paper Link】 【Pages】:164-175
【Authors】: Zhongmiao Li ; Paolo Romano ; Peter Van Roy
【Abstract】: Modern transactional platforms strive to jointly ensure ACID consistency and high scalability. In order to pursue these antagonistic goals, several recent systems have revisited the classical State Machine Replication (SMR) approach in order to support sharding of application state across multiple data partitions and partial replication. By promoting and exploiting locality principles, these systems, which we call Partially Replicated State Machines (PRSMs), can achieve scalability levels unparalleled by classic SMR. Yet, existing PRSM systems suffer from two major limitations: 1) they rely on a single thread to execute or serialize transactions within a partition, thus failing to fully untap the parallelism of multi-core architectures, and/or 2) they rely on the ability to accurately predict the data items to be accessed by transactions, which is non-trivial for complex applications. This paper proposes Sparkle, an innovative deterministic concurrency control that enhances the throughput of state of the art PRSM systems by more than one order of magnitude under high contention, through the joint use of speculative transaction processing and scheduling techniques. On the one hand, speculation allows Sparkle to take full advantage of modern multi-core micro-processors, while avoiding any assumption on the a-priori knowledge of the transactions' access patterns, which increases its generality and widens the scope of its scalability. Transaction scheduling techniques, on the other hand, are aimed to maximize the efficiency of speculative processing.
【Keywords】: distributed transaction; data replication; state machine replication
【Paper Link】 【Pages】:176-187
【Authors】: Alexey Gotsman ; Anatole Lefort ; Gregory V. Chockler
【Abstract】: Atomic multicast is a communication primitive that delivers messages to multiple groups of processes according to some total order, with each group receiving the projection of the total order onto messages addressed to it. To be scalable, atomic multicast needs to be genuine, meaning that only the destination processes of a message should participate in ordering it. In this paper we propose a novel genuine atomic multicast protocol that in the absence of failures takes as low as 3 message delays to deliver a message when no other messages are multicast concurrently to its destination groups, and 5 message delays in the presence of concurrency. This improves the latencies of both the fault-tolerant version of classical Skeen's multicast protocol (6 or 12 message delays, depending on concurrency) and its recent improvement by Coelho et al. (4 or 8 message delays). To achieve such low latencies, we depart from the typical way of guaranteeing fault-tolerance by replicating each group with Paxos. Instead, we weave Paxos and Skeen's protocol together into a single coherent protocol, exploiting opportunities for white-box optimisations. We experimentally demonstrate that the superior theoretical characteristics of our protocol are reflected in practical performance pay-offs.
【Keywords】: Atomic multicast; fault tolerance; replication
【Paper Link】 【Pages】:188-200
【Authors】: Yanyan Shen ; Gernot Heiser ; Kevin Elphinstone
【Abstract】: High availability and integrity are paramount in systems deployed in life-and mission-critical scenarios. Such fault-tolerance can be achieved through redundant co-execution (RCoE) on replicated hardware, now cheaply available with multicore processors. RCoE replicates almost all software, including OS kernel, drivers, and applications, achieving a sphere of replication that covers everything except the minimal interfaces to non-replicated peripherals. We complement our original, loosely-coupled RCoE with a closely-coupled version that improves transparency of replication to application code, and investigate the functionality, performance and vulnerability trade-offs.
【Keywords】: seL4; microkernel; SEU; replication; fault tolerance
【Paper Link】 【Pages】:201-213
【Authors】: Sam Ainsworth ; Timothy M. Jones
【Abstract】: Processor error detection can be reduced in cost significantly by exploiting the parallelism that exists in a repeated copy of an execution, which may not exist in the original code, to split up the redundant work on a large number of small, highly efficient cores. However, such schemes don't provide a method for automatic error recovery. We develop ParaMedic, an architecture to allow efficient automatic correction of errors detected in a system by using parallel heterogeneous cores, to provide a full fail-safe system that does not propagate errors to other systems, and can recover without manual intervention. This uses logging to roll back any computation that occurred after a detected error, along with a set of techniques to provide error-checking parallelism while still preventing the escape of incorrect processor values in multicore environments, where ordering of individual processors' logs is not enough to be able to roll back execution. Across a set of single and multi-threaded benchmarks, we achieve 3.1% and 1.5% overhead respectively, compared with 1.9% and 1% for error detection alone.
【Keywords】: fault tolerance, microarchitecture, error detection
【Paper Link】 【Pages】:214-221
【Authors】: Radha Venkatagiri ; Khalique Ahmed ; Abdulrahman Mahmoud ; Sasa Misailovic ; Darko Marinov ; Christopher W. Fletcher ; Sarita V. Adve
【Abstract】: Modern systems are increasingly susceptible to soft errors in the field and traditional redundancy-based mitigation techniques are too expensive to protect against all errors. Recent techniques, such as approximate computing and various low-cost resilience mechanisms, intelligently trade off inaccuracy in program output for better energy, performance, and resiliency overhead. A fundamental requirement for realizing the full potential of these techniques is a thorough understanding of how applications react to errors. Approxilyzer is a state-of-the-art tool that enables an accurate, efficient, and comprehensive analysis of how errors in almost all dynamic instructions in a program's execution affect the quality of the final program output. While useful, its adoption is limited by its implementation using the proprietary Simics infrastructure and the SPARC ISA. We present gem5-Approxilyzer, a re-implementation of Approxilyzer using the open-source gem5 simulator. gem5-Approxilyzer can be extended to different ISAs, starting with x86 in this work. We show that gem5-Approxilyzer is both efficient (up to two orders of magnitude reduction in error injections over a naive campaign) and accurate (average 92% for our experiments) in predicting the program's output quality in the presence of errors. We also compare the error profiles of five workloads under x86 and SPARC to further motivate the need for a tool like gem5-Approxilyzer..
【Keywords】: Reliability; Fault Tolerance; Soft Errors; Approximate Computing
【Paper Link】 【Pages】:222-233
【Authors】: Jiongyi Chen ; Chaoshun Zuo ; Wenrui Diao ; Shuaike Dong ; Qingchuan Zhao ; Menghan Sun ; Zhiqiang Lin ; Yinqian Zhang ; Kehuan Zhang
【Abstract】: Nowadays, IoT clouds are increasingly deployed to facilitate users to manage and control their IoT devices. Unlike the traditional cloud services with communication between a client and a server, IoT cloud architectures involve three parties: the IoT device, the user, and the cloud. Before a user can remotely access her IoT device, remote communication between them is bootstrapped through the cloud. However, the security implications of such a unique process in IoT are less understood today. In this paper, we report the first step towards systematic analyses of IoT remote binding. To better understand the problem, we describe the life cycle of remote binding with a state-machine model which helps us demystify the complexity in various designs and systematically explore the attack surfaces. With the evaluation of 10 real-world remote binding solutions, our study brings to light questionable practices in the designs of authentication and authorization, including inappropriate use of device IDs, weak device authentication, and weak cloud-side access control, as well as the impact of the discovered problems, which could cause sensitive user data leak, persistent denial-of-service, connection disruption, and even stealthy device control.
【Keywords】: Internet of Things; Security Analysis; System Security
【Paper Link】 【Pages】:234-246
【Authors】: Naif Saleh Almakhdhub ; Abraham A. Clements ; Mathias Payer ; Saurabh Bagchi
【Abstract】: Attacks against IoT systems are increasing at an alarming pace. Many IoT systems are and will be built using low-cost micro-controllers (IoT-uCs). Different security mechanisms have been proposed for IoT-uCs with different trade-offs. To guarantee a realistic and practical evaluation, the constrained resources of IoT-uCs require that defenses must be evaluated with respect to not only security, but performance, memory, and energy as well. Evaluating security mechanisms for IoT-uCs is limited by the lack of realistic benchmarks and evaluation frameworks. This burdens researchers with the task of developing not only the proposed defenses but applications on which to evaluate them. As a result, security evaluation for IoT-uCs is limited and ad-hoc. A sound benchmarking suite is essential to enable robust and comparable evaluations of security techniques on IoT-uCs. This paper introduces BenchIoT, a benchmark suite and evaluation framework to address pressing challenges and limitations for evaluating IoT-uCs security. The evaluation framework enables automatic evaluation of 14 metrics covering security, performance, memory usage, and energy consumption. The BenchIoT benchmarks provide a curated set of five real-world IoT applications that cover both IoT-uCs with and without an OS. We demonstrate BenchIoT's ability by evaluating three defense mechanisms. All benchmarks and the evaluation framework is open sourced and available to the research community.
【Keywords】: Security; IoT; Benchmarks; Internet of Things; Cybersecurity
【Paper Link】 【Pages】:247-255
【Authors】: K. Virgil English ; Islam Obaidat ; Meera Sridhar
【Abstract】: In the recent past, there has been a rapid increase in attacks on consumer Internet-of-Things (IoT) devices. Several attacks currently focus on easy targets for exploitation, such as weak configurations (weak default passwords). However, with governments, industries, and organizations proposing new laws and regulations to reduce and prevent such easy targets in the IoT space, attackers will move to more subtle exploits in these devices. Memory corruption vulnerabilities are a significant class of vulnerabilities in software security through which attackers can gain control of the entire system. Numerous memory corruption vulnerabilities have been found in IoT firmware already deployed in the consumer market. This paper presents an approach for exploiting stack-based buffer-overflow attacks in IoT firmware, to hijack the device remotely. To show the feasibility of this approach, we demonstrate exploiting a common network software application, Connman, used widely in IoT firmware such as Samsung smart TVs. A series of experiments are reported on, including: crashing and executing arbitrary code in the targeted software application in a controlled environment, adopting the attacks in uncontrolled environments (with standard software defenses such as W⊕X and ASLR enabled), and installing publicly available IoT firmware that uses this software application on a Raspberry Pi. The presented exploits demonstrate the ease in which an adversary can control IoT devices.
【Keywords】: Software; Servers; Standards; Wireless fidelity; Tools; Password
【Paper Link】 【Pages】:256-263
【Authors】: Ajay Mahimkar ; Zihui Ge ; Sanjeev Ahuja ; Shomik Pathak ; Nauman Shafi
【Abstract】: Cellular service providers continuously deploy changes in their network in the form of new software releases, service feature introductions, configuration changes, equipment re-homes, firmware upgrades, and topology modifications. It is important to carefully assess the impact of these changes on service performance to validate expected behaviors and take mitigation actions in a timely fashion in case of any unexpected degradation. The diverse nature of the network changes, complex interactions across different layers of the cellular network, and the rapid evolution of the network make it challenging to accurately conduct the assessment. In this paper, we present the design and implementation of our system that enables rigorous, effortless and timely assessment of performance around network changes. We share our lessons learned from the deployment in an operational cellular network over the last eight years.
【Keywords】: Network changes; performance impact; rigorous analytics; effortless specification; timely assessment
【Paper Link】 【Pages】:264-275
【Authors】: Karla Vargas ; Sergio Rajsbaum
【Abstract】: We present an implementation of an eventually perfect failure detector in an arbitrarily connected, partitionable network. We assume ADD channels: for each one there exist constants K, D, not known to the processes, such that for every K consecutive messages sent in one direction, at least one is delivered within time D. The best previous implementation used messages of bounded size, but exponential in n, the number of nodes. The main contribution of this paper is a novel use of time-to-live values in the design of failure detectors, obtaining a flexible implementation that uses messages of size O(n log n).
【Keywords】: Failure detector; ADD channel; time-to-live value; arbitrarily connected network
【Paper Link】 【Pages】:276-288
【Authors】: Klaus-Tycho Foerster ; Andrzej Kamisinski ; Yvonne Anne Pignolet ; Stefan Schmid ; Gilles Trédan
【Abstract】: To provide high availability despite link failures, many modern communication networks feature fast failover mechanisms in the data plane, which operates orders of magnitude faster than the control plane. While the configuration of highly resilient data planes is known to be a difficult combinatorial problem, over the last years, much progress has been made in the design of algorithms which provably guarantee connectivity even under many concurrent link failures. However, while these algorithms provide connectivity, the resulting routes after failures can be very long, which in turn can harm performance. In this paper, we propose, analyze, and evaluate methods for fast failover algorithms which account for the quality of the routes after failures, in addition to connectivity. In particular, we revisit the existing approach to cover the to-be-protected network with arc-disjoint spanning arborescences to define alternative routes to the destination, aiming to keep the stretch imposed by these trees low (hence the name of our method: Bonsai). We show that the underlying problem is NP-hard on general topologies and present lower bound results that are tight for various topologies, for any class of fast failover algorithms. We also present heuristics for general networks and demonstrate their performance benefits in extensive simulations. Finally, we show that failover algorithms using low-stretch arborescences, as a side effect, can provide connectivity under more general failure models than usually considered in the literature.
【Keywords】: Networks; Failover; Local Fast Failover; Resilience
【Paper Link】 【Pages】:289-301
【Authors】: Shengye Wan ; Jianhua Sun ; Kun Sun ; Ning Zhang ; Qi Li
【Abstract】: On ARM processors with TrustZone security extension, asynchronous introspection mechanisms have been developed in the secure world to detect security policy violations in the normal world. These mechanisms provide security protection via passively checking the normal world snapshot. However, since previous secure world checking solutions require to suspend the entire rich OS, asynchronous introspection has not been widely adopted in the real world. Given a multi-core ARM system that can execute the two worlds simultaneously on different cores, secure world introspection can check the rich OS without suspension. However, we identify a new normal-world evasion attack that can defeat the asynchronous introspection by removing the attacking traces in parallel from one core when the security checking is performing on another core. We perform a systematic study on this attack and present its efficiency against existing asynchronous introspection mechanisms. As the countermeasure, we propose a secure and trustworthy asynchronous introspection mechanism called SATIN, which can efficiently detect the evasion attacks by increasing the attackers' evasion time cost and decreasing the defender's execution time under a safe limit. We implement a prototype on an ARM development board and the experimental results show that SATIN can effectively prevent evasion attacks on multi-core systems with a minor system overhead.
【Keywords】: Asynchronous Introspection; Evasion Attack; Trusted Execution Environment
【Paper Link】 【Pages】:302-314
【Authors】: Kuniyasu Suzaki ; Yohei Hori ; Kazukuni Kobara ; Mohammad Mannan
【Abstract】: The Universal Serial Bus (USB) supports a diverse and wide-ranging set of device types. To enable ease of use, USB devices are automatically detected and classified by common operating systems, without any authentication. This trust-by-default design principle can be easily exploited, and led to numerous attacks in the past (e.g., Stuxnet, BadUSB, BadAndroid), specifically targeting high-value organizations. Administrators' efforts to prevent these attacks may also be threatened by unscrupulous users who may insert any USB device, or malicious users (inside attackers) who may try to circumvent OS/kernel-enforced protection mechanisms (e.g., via OS replacement). The root causes of USB attacks appear to be the lack of robust authentication of individual USB devices and inadequate tamper-proofing of the solution mechanism itself. We propose DeviceVeil to address these limitations. To authenticate individual USB devices, we utilize the tamper-proof feature of Physical Unclonable Functions (PUFs); PUFs extract unique features from physical characteristics of an integrated circuit (IC) at a reasonable cost (less than 1 USD). To make our authentication mechanism robust, we implement it as a small hypervisor, and protect it by a novel combination of security technologies available in commodity PCs, e.g., Trusted Platform Module (TPM), customized secure boot, and virtualization support. The OS disk image with all user data is encrypted by a key sealed in TPM and can be decrypted by the hypervisor only. Customized secure boot allows the loading of the legitimate hypervisor and OS kernel only. The hypervisor enables pre-OS authentication to protect the trust-by-default OS from USB attacks. The chain of trust continues from power-on to the insertion of a USB device and disallows all illegitimate USB devices. DeviceVeil's PUF authentication takes about 1.7 seconds during device insertion.
【Keywords】: Individual Device Authentication, Hypervisor, Physical Unclonable Function, Secure Boot, Trusted Computing
【Paper Link】 【Pages】:315-327
【Authors】: Mateus Tymburibá ; Hugo Sousa ; Fernando Magno Quintão Pereira
【Abstract】: This paper presents a multilayer protection approach to guard programs against Return-Oriented Programming (ROP) attacks. Upper layers validate most of a program's control flow at a low computational cost; thus, not compromising runtime. Lower layers provide strong enforcement guarantees to handle more suspicious flows; thus, enhancing security. Our multilayer system combines techniques already described in the literature with verifications that we introduce in this paper. We argue that modern versions of x86 processors already provide the microarchitectural units necessary to implement our technique. We demonstrate the effectiveness of our multilayer protection on a extensive suite of benchmarks, which includes: SPEC CPU2006; the three most popular web browsers; 209 benchmarks distributed with LLVM and four well-known systems shown to be vulnerable to ROP exploits. Our experiments indicate that we can protect programs with almost no overhead in practice, allying the good performance of lightweight security techniques with the high dependability of heavyweight approaches.
【Keywords】: ROP; Architecture; Layers; RAS; LBR; CFI
【Paper Link】 【Pages】:328-335
【Authors】: Amy Babay ; John L. Schultz ; Thomas Tantillo ; Samuel Beckley ; Eamon Jordan ; Kevin Ruddell ; Kevin Jordan ; Yair Amir
【Abstract】: While there has been considerable research on making power grid Supervisory Control and Data Acquisition (SCADA) systems resilient to attacks, the problem of transitioning these technologies into deployed SCADA systems remains largely unaddressed. We describe our experience and lessons learned in deploying an intrusion-tolerant SCADA system in two realistic environments: a red team experiment in 2017 and a power plant test deployment in 2018. These experiences resulted in technical lessons related to developing an intrusion-tolerant system with a real deployable application, preparing a system for deployment in a hostile environment, and supporting protocol assumptions in that hostile environment. We also discuss some meta-lessons regarding the cultural aspects of transitioning academic research into practice in the power industry.
【Keywords】: SCADA; power grid; intrusion tolerance; deployment; red team
【Paper Link】 【Pages】:336-348
【Authors】: Zhongshu Gu ; Hani Jamjoom ; Dong Su ; Heqing Huang ; Jialong Zhang ; Tengfei Ma ; Dimitrios Pendarakis ; Ian Molloy
【Abstract】: Distributed collaborative learning (DCL) paradigms enable building joint machine learning models from distrusted multi-party participants. Data confidentiality is guaranteed by retaining private training data on each participant's local infrastructure. However, this approach makes today's DCL design fundamentally vulnerable to data poisoning and backdoor attacks. It limits DCL's model accountability, which is key to backtracking problematic training data instances and their responsible contributors. In this paper, we introduce CALTRAIN, a centralized collaborative learning system that simultaneously achieves data confidentiality and model accountability. CALTRAIN enforces isolated computation via secure enclaves on centrally aggregated training data to guarantee data confidentiality. To support building accountable learning models, we securely maintain the links between training instances and their contributors. Our evaluation shows that the models generated by CALTRAIN can achieve the same prediction accuracy when compared to the models trained in non-protected environments. We also demonstrate that when malicious training participants tend to implant backdoors during model training, CALTRAIN can accurately and precisely discover the poisoned or mislabeled training data that lead to the runtime mispredictions.
【Keywords】: Data Privacy; Learning Systems; Systems Security
【Paper Link】 【Pages】:349-361
【Authors】: Pengfei Sun ; Luis Garcia ; Saman A. Zonouz
【Abstract】: The safety of critical cyber-physical IoT devices hinges on the security of their embedded software that implements control algorithms for monitoring and control of the associated physical processes, e.g., robotics and drones. Reverse engineering of the corresponding embedded controller software binaries enables their security analysis by extracting high-level, domain-specific, and cyber-physical execution semantic information from executables. We present MISMO, a domain-specific reverse engineering framework for embedded binary code in emerging cyber-physical IoT control application domains. The reverse engineering outcomes can be used for firmware vulnerability assessment, memory forensics analysis, targeted memory data attacks, or binary patching for dynamic selective memory protection (e.g., important control algorithm parameters). MISMO performs semantic-matching at an algorithmic level that can help with the understanding of any possible cyber-physical security flaws. MISMO compares low-level binary symbolic values and high-level algorithmic expressions to extract domain-specific semantic information for the binary's code and data. MISMO enables a finer-grained understanding of the controller by identifying the specific control and state estimation algorithms used. We evaluated MISMO on 2,263 popular firmware binaries by 30 commercial vendors from 6 application domains including drones, self-driving cars, smart homes, robotics, 3D printers, and the Linux kernel controllers. The results show that MISMO can accurately extract the algorithm-level semantics of the embedded binary code and data regions. We discovered a zero-day vulnerability in the Linux kernel controllers versions 3.13 and above.
【Keywords】: Reverse Engineering; Control Algorithm; Execution Semantic; Cyber physical system; IoT; Symbolic Expression; Symbolic Comparison
【Paper Link】 【Pages】:362-374
【Authors】: Haonan Wang ; Adwait Jog
【Abstract】: Memory (DRAM) energy consumption is one of the major scalability bottlenecks for almost all computing systems, including throughput machines such as Graphics Processing Units (GPUs). A large fraction of DRAM dynamic energy is spent on fetching the data bits from a DRAM page (row) to a small-sized hardware structure called as the row buffer. The data access from this row buffer is much less expensive in terms of energy and latency. Hence, it is preferred to reuse the buffered data as much as possible before activating another row and bringing its data to these row buffers. Our thorough characterization of several GPGPU applications shows that these row buffers are poorly utilized leading to sub-optimal energy consumption. To address this, we propose a novel memory scheduling for GPUs that exploits latency and error tolerance properties of GPGPU applications to reduce row energy by 44% on average.
【Keywords】: GPUs; Scheduling; Approximate Computing
【Paper Link】 【Pages】:375-387
【Authors】: Sébastien Ollivier ; Donald Kline Jr. ; Kawsher A. Roxy ; Rami G. Melhem ; Sanjukta Bhanja ; Alex K. Jones
【Abstract】: Spintronic domain wall memories (DWMs) are prone to alignment faults, which cannot be protected by traditional error correction techniques. To solve this problem, we propose a new technique called derived error correction coding (DECC). We construct metadata from the data and shift state of the DWM, on demand, using a novel transverse read (TR). TR reads in an orthogonal direction to the DWM access point and can determine the number of ones in a DWM. Errors in the metadata correspond to shift-faults in the DWM. Rather than storing the metadata, it is created on-demand and protected by storing parity bits. Repairing the metadata with ECC allows restoration of DWM alignment and ensures correct operation. Through these techniques, our shift-aware error correction approaches provide a lifetime of over 15 years with a similar performance, while reducing area and energy by 370% and 52%, versus the state-of-the-art, for a 32-bit nanowire.
【Keywords】: Domain Wall Memory; Memory Reliability; Derived error correction codes; novel encoding; Racetrack memory
【Paper Link】 【Pages】:388-400
【Authors】: Prashant J. Nair ; Bahar Asgari ; Moinuddin K. Qureshi
【Abstract】: Conventionally, systems have relied on technology scaling to provide smaller cells, which helps in increasing the capacity of on-chip and off-chip structures. Unfortunately, scaling technology to smaller nodes causes increased susceptibility to faults. We study the problem of efficiently tolerating transient failures using scalable Spin-Transfer Torque RAM (STTRAM) as an example. At smaller feature sizes, the energy required to flip a STTRAM cell reduces, which makes these cells more susceptible to random failures caused by thermal noise. Such failures can be tolerated by periodic scrubbing and provisioning each line with Error Correction Code (ECC). However, to tolerate the desired bit-error-rate, the cache needs ECC-6 (six bit error correction) per line, incurring impractical storage overheads. Ideally, we want to tolerate these faults without relying on multi-bit ECC. We propose SuDoku, a design that provisions each line with ECC-1 and a strong error detection code, and relies on a region-based RAID-4 to perform correction of multi-bit errors. Unfortunately, simply having such a RAID-4 based architecture is ineffective at tolerating a high-rate of transient faults and provides an MTTF in the order of only a few seconds. We describe a novel data resurrection scheme that can repair multiple faulty lines in a RAID-4 region to increase the MTTF to several hours. We propose an extension of SuDoku, which hashes a given line into two regions of RAID-4 to significantly enhance reliability and increase the MTTF to trillions of hours. Our evaluations show that SuDoku provides 874x higher reliability than ECC-6, incurs 30% less storage than ECC-6, and performs within 0.1% of an ideal fault-free baseline.
【Keywords】: STTRAM; Reliability; Transient-Failure
【Paper Link】 【Pages】:401-413
【Authors】: Judicael Briand Djoko ; Jack Lange ; Adam J. Lee
【Abstract】: With the rising popularity of file-sharing services such as Google Drive and Dropbox in the workflows of individuals and corporations alike, the protection of client-outsourced data from unauthorized access or tampering remains a major security concern. Existing cryptographic solutions to this problem typically require server-side support, involve non-trivial key management on the part of users, and suffer from severe re-encryption penalties upon access revocations. This combination of performance overheads and management burdens makes this class of solutions undesirable in situations where performant, platform-agnostic, dynamic sharing of user content is required. We present NEXUS, a stackable filesystem that leverages trusted hardware to provide confidentiality and integrity for user files stored on untrusted platforms. NEXUS is explicitly designed to balance security, portability, and performance: it supports dynamic sharing of protected volumes on any platform exposing a file access API without requiring server-side support, enables the use of fine-grained access control policies to allow for selective sharing, and avoids the key revocation and file re-encryption overheads associated with other cryptographic approaches to access control. This combination of features is made possible by the use of a client-side Intel SGX enclave that is used to protect and share NEXUS volumes, ensuring that cryptographic keys never leave enclave memory and obviating the need to reencrypt files upon revocation of access rights. We implemented a NEXUS prototype that runs on top of the AFS filesystem and show that it incurs ×2 overhead for a variety of common file and database operations.
【Keywords】: sgx; storage; rootkey; client side; openafs; cryptography; TEE
【Paper Link】 【Pages】:414-421
【Authors】: Maurice Bailleu ; Donald Dragoti ; Pramod Bhatotia ; Christof Fetzer
【Abstract】: We introduce TEE-PERF, an architecture-and platform-independent performance measurement tool for trusted execution environments (TEEs). More specifically, TEE-PERF supports method-level profiling for unmodified multithreaded applications, without relying on any architecture-specific hardware features (e.g. Intel VTune Amplifier), or without requiring platform-dependent kernel features (e.g. Linux perf). Moreover, TEE-PERF provides accurate profiling measurements since it traces the entire process execution without employing instruction pointer sampling. Thus, TEE-PERF does not suffer from sampling frequency bias, which can occur with threads scheduled to align to the sampling frequency. We have implemented TEE-P ERF with an easy to use interface, and integrated it with Flame Graphs to visualize the performance bottlenecks. We have evaluated TEE-PERF based on the Phoenix multithreaded benchmark suite and real-world applications (RocksDB, SPDK, etc.), and compared it with Linux perf. Our experimental evaluation shows that TEE-PERF incurs low profiling overheads, while providing accurate profile measurements to identify and optimize the application bottlenecks in the context of TEEs. TEE-PERF is publicly available.
【Keywords】: TEE; profiler; SGX; tool
【Paper Link】 【Pages】:422-434
【Authors】: Brian Delgado ; Tejaswini Vibhute ; John Fastabend ; Karen L. Karavanic
【Abstract】: Detecting unexpected changes in a system's runtime environment is critical to resilience. A repurposing of System Management Mode (SMM) for runtime security inspections has been proposed, due to SMM's high privilege and protected memory. However, key challenges prevent SMM's adoption for this purpose in production-level environments: the possibility of severe performance impacts, semantic gaps between SMM and host software, high overheads, overly broad access permissions, and lack of flexibility. We introduce a Runtime Integrity Measurement framework, EPA-RIMM, for both native Linux and Xen platforms, that includes several novel features to solve these challenges. EPA-RIMM decomposes large measurements to control perturbation and leverages the SMI Transfer Monitor (STM) to bridge the semantic gap between hypervisors and SMM, as well as restrict the measurement agent's accesses. We present a design and implementation for a concurrent approach that allows EPA-RIMM to utilize all cores in SMM, dramatically increasing measurement throughput and reducing application perturbation. Our Linux and Xen prototype results show that EPA-RIMM meets performance goals while continuously monitoring code and data for signs of attack, and that it is effective at detecting a number of recent exploits.
【Keywords】: UEFI, firmware, Coreboot, SMM, SMI Transfer Monitor, STM, Runtime integrity measurement, rootkits, kernel, hypervisor
【Paper Link】 【Pages】:435-446
【Authors】: Yihe Zhang ; Hao Zhang ; Xu Yuan ; Nian-Feng Tzeng
【Abstract】: Honeypot-based spammer gathering solutions usually lack attribute variability, deployment flexibility, and network scalability, deemed as their common drawbacks. This paper explores pseudo-honeypot, a novel honeypot-like system to overcome such drawbacks, for efficient and scalable spammer sniffing. The pseudo-honeypot takes advantage of user diversity and selects normal accounts, with attributes that have the higher potential of attracting spammers, as the parasitic bodies. By harnessing such category of users, pseudo-honeypot can monitor their streaming posts and behavioral patterns transparently. When compared with its traditional honeypot counterpart, the proposed solution offers the substantial advantages of attribute variability, deployment flexibility, network scalability, and system portability. Meanwhile, it offers a novel method to collect the social network dataset that has a higher probability of including spams and spammers, without being noticed by advanced spammers. We take the Twitter social network as an example to exhibit its system design, including pseudo-honeypot nodes selection, monitoring, feature extraction, ground truth labeling, and learning-based classification. Through experiments, we demonstrate the efficiency of pseudo-honeypot in terms of spams and spammers gathering. In particular, we confirm our solution can garner spammers at least 19 times faster than the state-of-the-art honeypot-based counterpart.
【Keywords】: honeypot; pseudo honeypot; spam detection; social networks; machine learning
【Paper Link】 【Pages】:447-459
【Authors】: Steven R. Gomez ; Samuel Jero ; Richard Skowyra ; Jason Martin ; Patrick Sullivan ; David Bigelow ; Zachary Ellenbogen ; Bryan C. Ward ; Hamed Okhravi ; James W. Landry
【Abstract】: Conventional network access control approaches are static (e.g., user roles in Active Directory), coarse-grained (e.g., 802.1x), or both (e.g., VLANs). Such systems are unable to meaningfully stop or hinder motivated attackers seeking to spread throughout an enterprise network. To address this threat, we present Dynamic Flow Isolation (DFI), a novel architecture for supporting dynamic, fine-grained access control policies enforced in a Software-Defined Network (SDN). These policies can emit and revoke specific access control rules automatically in response to network events like users logging off, letting the network adaptively reduce unnecessary reachability that could be potentially leveraged by attackers. DFI is oblivious to the SDN controller implementation and processes new packets prior to the controller, making DFI's access control resilient to a malicious or faulty controller or its applications. We implemented DFI for OpenFlow networks and demonstrated it on an enterprise SDN testbed with around 100 end hosts and servers. Finally, we evaluated the performance of DFI and how it enables a novel policy, which is otherwise difficult to enforce, that protects against a surrogate of the recent NotPetya malware in an infection scenario. We found that the threat was most limited in its ability to spread using our policy, which automatically restricted network flows over the course of the attack, compared to no access control or a static role-based policy.
【Keywords】: network security; software defined networking; SDN; access control
【Paper Link】 【Pages】:460-472
【Authors】: Onur Zungur ; Guillermo Suarez-Tangil ; Gianluca Stringhini ; Manuel Egele
【Abstract】: Companies adopt Bring Your Own Device (BYOD) policies extensively, for both convenience and cost management. The compelling way of putting private and business related applications (apps) on the same device leads to the widespread usage of employee owned devices to access sensitive company data and services. Such practices create a security risk as a legitimate app may send business-sensitive data to third party servers through detrimental app functions or packaged libraries. In this paper, we propose BorderPatrol, a system for extracting contextual data that businesses can leverage to enforce access control in BYOD-enabled corporate networks through fine-grained policies. BorderPatrol extracts contextual information, which is the stack trace of the app function that generated the network traffic, on provisioned user devices and transfers this data in IP headers to enforce desired policies at network routers. BorderPatrol provides a way to selectively prevent undesired functionalities, such as analytics activities or advertisements, and help enforce information dissemination policies of the company while leaving other functions of the app intact. Using 2,000 apps, we demonstrate that BorderPatrol is effective in preventing packets which originate from previously identified analytics and advertisement libraries from leaving the network premises. In addition, we show BorderPatrol's capability in selectively preventing undesirable app functions using case studies.
【Keywords】: mobile platforms; bring your own device; network security; android security; enterprise security; Android
【Paper Link】 【Pages】:473-484
【Authors】: Sheng Di ; Hanqi Guo ; Eric Pershey ; Marc Snir ; Franck Cappello
【Abstract】: An in-depth understanding of the failure features of HPC jobs in a supercomputer is critical to the large-scale system maintenance and improvement of the service quality for users. In this paper, we investigate the features of hundreds of thousands of jobs in one of the most powerful supercomputers, the IBM Blue Gene/Q Mira, based on 2001 days of observations with a total of over 32.44 billion core-hours. We study the impact of the system's events on the jobs' execution in order to understand the system's reliability from the perspective of jobs and users. The characterization involves a joint analysis based on multiple data sources, including the reliability, availability, and serviceability (RAS) log; job scheduling log; the log regarding each job's physical execution tasks; and the I/O behavior log. We present 22 valuable takeaways based on our in-depth analysis. For instance, 99,245 job failures are reported in the job-scheduling log, a large majority (99.4%) of which are due to user behavior (such as bugs in code, wrong configuration, or misoperations). The job failures are correlated with multiple metrics and attributes, such as users/projects and job execution structure (number of tasks, scale, and core-hours). The best-fitting distributions of a failed job's execution length (or interruption interval) include Weibull, Pareto, inverse Gaussian, and Erlang/exponential, depending on the types of errors (i.e., exit codes). The RAS events affecting job executions exhibit a high correlation with users and core-hours and have a strong locality feature. In terms of the failed jobs, our similarity-based event-filtering analysis indicates that the mean time to interruption is about 3.5 days.
【Keywords】: IBM BlueGene/Q System; Fault Tolerance; Failure Analysis; Supercomputer
【Paper Link】 【Pages】:485-492
【Authors】: Xinda Wang ; Kun Sun ; Archer L. Batcheller ; Sushil Jajodia
【Abstract】: Security patches in open source software (OSS) not only provide security fixes to identified vulnerabilities, but also make the vulnerable code public to the attackers. Therefore, armored attackers may misuse this information to launch N-day attacks on unpatched OSS versions. The best practice for preventing this type of N-day attacks is to keep upgrading the software to the latest version in no time. However, due to the concerns on reputation and easy software development management, software vendors may choose to secretly patch their vulnerabilities in a new version without reporting them to CVE or even providing any explicit description in their change logs. When those secretly patched vulnerabilities are being identified by armored attackers, they can be turned into powerful "0-day" attacks, which can be exploited to compromise not only unpatched version of the same software, but also similar types of OSS (e.g., SSL libraries) that may contain the same vulnerability due to code clone or similar design/implementation logic. Therefore, it is critical to identify secret security patches and downgrade the risk of those "0-day" attacks to at least "n-day" attacks. In this paper, we develop a defense system and implement a toolset to automatically identify secret security patches in open source software. To distinguish security patches from other patches, we first build a security patch database that contains more than 4700 security patches mapping to the records in CVE list. Next, we identify a set of features to help distinguish security patches from non-security ones using machine learning approaches. Finally, we use code clone identification mechanisms to discover similar patches or vulnerabilities in similar types of OSS. The experimental results show our approach can achieve good detection performance. A case study on OpenSSL, LibreSSL, and BoringSSL discovers 12 secret security patches.
【Keywords】: security patch; vulnerability detection; open source software
【Paper Link】 【Pages】:493-504
【Authors】: Jeman Park ; Aminollah Khormali ; Manar Mohaisen ; Aziz Mohaisen
【Abstract】: Open DNS resolvers are resolvers that perform recursive resolution on behalf of any user. They can be exploited by adversaries because they are open to the public and require no authorization to use. Therefore, it is important to understand the state of open resolvers to gauge their potentially negative impact on the security and stability of the Internet. In this study, we conducted a comprehensive probing over the entire IPv4 address space and found that more than 3 million open resolvers still exist in the wild. Moreover, we found that many of them work in a way that deviates from the standard. More importantly, we found that many open resolvers answer queries with the incorrect, even malicious, responses. Contrasting to results obtained in 2013, we found that while the number of open resolvers has decreased significantly, the number of resolvers providing incorrect responses is almost the same, while the number of open resolvers providing malicious responses has increased, highlighting the prevalence of their threat.
【Keywords】: open resolver; DNS; measurement; behavioral analysis
【Paper Link】 【Pages】:505-516
【Authors】: Jonghwan Kim ; DaeHee Jang ; Yunjong Jeong ; Brent ByungHoon Kang
【Abstract】: Object Layout Randomization (OLR) is a memory randomization approach that makes unpredictable in-object memory layout by shuffling and relocating each member fields of the object. This defense approach has significant security effect for mitigating various types of memory error attacks. However, the current state-of-the-art enforces OLR while compile time. It makes diversified object layout for each binary, but the layout remains equal across the execution. This approach can be effective in case the program binary is hidden from attackers. However, there are several limitations: (i) the security efficacy is built with the premise that the binary is safely undisclosed from adversaries, (ii) the randomized object layout is identical across multiple executions, and (iii) the programmer should manually specify which objects should be affected by OLR. In this paper, we introduce Per-allocation Object Layout Randomization(POLaR): the first dynamic approach of OLR suited for public binaries. The randomization mechanism of POLaR is applied at runtime, and the randomization makes unique object layout even for the same type of instances. As a result, POLaR achieves two previously unmet security primitives. (i) The randomization does not break upon the exposure of the binary. (ii) Repeating the same attack does not result in deterministic behavior. In addition, we also implemented the TaintClass framework based on DFSan project to optimize/automate the target object selection process. To show the efficacy of POLaR, we use several public open-source software and SPEC2006 benchmark suites.
【Keywords】: Randomization; Memory error; LLVM
【Paper Link】 【Pages】:517-529
【Authors】: David Pouliot ; Scott Griffy ; Charles V. Wright
【Abstract】: Efficiently searchable and easily deployable encryption schemes enable an untrusted, legacy service such as a relational database engine to perform searches over encrypted data. The ease with which such schemes can be deployed on top of existing services makes them especially appealing in operational environments where encryption is needed but it is not feasible to replace large infrastructure components like databases or document management systems. Unfortunately all previously known approaches for efficiently searchable and easily deployable encryption are vulnerable to inference attacks where an adversary can use knowledge of the distribution of the data to recover the plaintext with high probability. We present a new efficiently searchable, easily deployable database encryption scheme that is provably secure against inference attacks even when used with real, low-entropy data. We implemented our constructions in Haskell and tested databases up to 10 million records showing our construction properly balances security, deployability and performance.
【Keywords】: Cloud computing, Computer security, Data privacy, Outsourcing
【Paper Link】 【Pages】:530-542
【Authors】: Qiang Zeng ; Golam Kayas ; Emil Mohammed ; Lannan Luo ; Xiaojiang Du ; Junghwan Rhee
【Abstract】: Exploitation of heap vulnerabilities has been on the rise, leading to many devastating attacks. Conventional heap patch generation is a lengthy procedure requiring intensive manual efforts. Worse, fresh patches tend to harm system dependability, hence deterring users from deploying them. We propose a heap patching system HEAPTHERAPY+ that simultaneously has the following prominent advantages: (1) generating patches without manual efforts; (2) installing patches without altering the code (so called code-less patching); (3) handling various heap vulnerability types; (4) imposing a very low overhead; and (5) no dependency on specific heap allocators. As a separate contribution, we propose targeted calling context encoding, which is a suite of algorithms for optimizing calling context encoding, an important technique with applications in many areas. The system properly combines heavyweight offline attack analysis with lightweight online defense generation, and provides a new countermeasure against heap attacks. The evaluation shows that the system is effective and efficient.
【Keywords】: Heap memory safety; automatic patch generation; dynamic analysis; calling context encoding
【Paper Link】 【Pages】:543-555
【Authors】: Hui Kang ; Ting Dai ; Nerla Jean-Louis ; Shu Tao ; Xiaohui Gu
【Abstract】: On a Blockchain network, transaction data are exposed to all participants. To preserve privacy and confidentiality in transactions, while still maintaining data immutability, we design and implement FabZK. FabZK conceals transaction details on a shared ledger by storing only encrypted data from each transaction (e.g., payment amount), and by anonymizing the transactional relationship (e.g., payer and payee) between members in a Blockchain network. It achieves both privacy and auditability by supporting verifiable Pedersen commitments and constructing zero-knowledge proofs. FabZK is implemented as an extension to the open source Hyperledger Fabric. It provides APIs to easily enable data privacy in both client code and chaincode. It also supports on-demand, automated auditing based on encrypted data. Our evaluation shows that FabZK offers strong privacy-preserving capabilities, while delivering reasonable performance for the applications developed based on its framework.
【Keywords】: Blockchain; privacy; auditability; zero-knowledge proofs
【Paper Link】 【Pages】:556-567
【Authors】: Zhirong Shen ; Xiaolu Li ; Patrick P. C. Lee
【Abstract】: Erasure coding offers a storage-efficient redundancy mechanism for maintaining data availability guarantees in large-scale storage clusters, yet it also incurs high performance overhead in failure repair. Recent developments in accurate disk failure prediction allow soon-to-fail (STF) nodes to be repaired in advance, thereby opening new opportunities for accelerating failure repair in erasure-coded storage. To this end, we present a fast predictive repair solution called FastPR, which carefully couples two repair methods, namely migration (i.e., relocating the chunks of an STF node) and reconstruction (i.e., decoding the chunks of an STF node through erasure coding), so as to fully parallelize the repair operation across the storage cluster. FastPR solves a bipartite maximum matching problem and schedules both migration and reconstruction in a parallel fashion. We show that FastPR significantly reduces the repair time over the baseline repair approaches via mathematical analysis, large-scale simulation, and Amazon EC2 experiments.
【Keywords】: Erasure coding
【Paper Link】 【Pages】:568-580
【Authors】: Guy Golan-Gueta ; Ittai Abraham ; Shelly Grossman ; Dahlia Malkhi ; Benny Pinkas ; Michael K. Reiter ; Dragos-Adrian Seredinschi ; Orr Tamir ; Alin Tomescu
【Abstract】: SBFT is a state of the art Byzantine fault tolerant state machine replication system that addresses the challenges of scalability, decentralization and global geo-replication. SBFT is optimized for decentralization and is experimentally evaluated on a deployment of more than 200 active replicas withstanding a malicious adversary controlling f=64 replicas. Our experiments show how the different algorithmic ingredients of SBFT contribute to its performance and scalability. The results show that SBFT simultaneously provides almost 2x better throughput and about 1.5x better latency relative to a highly optimized system that implements the PBFT protocol. To achieve this performance improvement, SBFT uses a combination of four ingredients: using collectors and threshold signatures to reduce communication to linear, using an optimistic fast path, reducing client communication and utilizing redundant servers for the fast path. SBFT is the first system to implement a correct dual-mode view change protocol that allows to efficiently run either an optimistic fast path or a fallback slow path without incurring a view change to switch between modes.
【Keywords】: Consensus; Blockchain
【Paper Link】 【Pages】:581-592
【Authors】: Jin Huang ; Yu Li ; Junjie Zhang ; Rui Dai
【Abstract】: Unrestricted file upload vulnerabilities enable attackers to upload and execute malicious scripts in web servers. We have built a system, namely UChecker, to effectively and automatically detect such vulnerabilities in PHP server-side web applications. Towards this end, UChecker first interprets abstract syntax trees (AST) of program source code to perform symbolic execution. It then models vulnerabilities using SMT constraints and further leverages an SMT solver to verify the satisfiability of these constraints. UChecker features a novel vulnerability-oriented locality analysis algorithm to reduce the workload of symbolic execution, an AST-driven symbolic execution engine with compact data structures, and rules to translate PHP-based constraints into SMT-based constraints by mitigating their semantic gaps. Experiments based on real-world examples have demonstrated that UChecker has accomplished a high detection accuracy. In addition, it detected three vulnerable PHP scripts that are previously unknown.
【Keywords】: web security, vulnerability, detection, symbolic execution, program analysis
【Paper Link】 【Pages】:593-604
【Authors】: John Criswell ; Jie Zhou ; Spyridoula Gravani ; Xiaoyu Hu
【Abstract】: Operating systems such as Linux break the power of the root user into separate privileges (which Linux calls capabilities) and give processes the ability to enable privileges only when needed and to discard them permanently when the program no longer needs them. However, there is no method of measuring how well the use of such facilities reduces the risk of privilege escalation attacks if the program has a vulnerability. This paper presents PrivAnalyzer, an automated tool that measures how effectively programs use Linux privileges. PrivAnalyzer consists of three components: 1) AutoPriv, an existing LLVM-based C/C++ compiler which uses static analysis to transform a program that uses Linux privileges into a program that safely removes them when no longer needed, 2) ChronoPriv, a new LLVM C/C++ compiler pass that performs dynamic analysis to determine for how long a program retains various privileges, and 3) ROSA, a new bounded model checker that can model the damage a program can do at each program point if an attacker can exploit the program and abuse its privileges. We use PrivAnalyzer to determine how long five privileged open source programs retain the ability to cause serious damage to a system and find that merely transforming a program to drop privileges does not significantly improve security. However, we find that simple refactoring can considerably increase the efficacy of Linux privileges. In two programs that we refactored, we reduced the percentage of execution in which a device file can be read and written from 97% and 88% to 4% and 1%, respectively.
【Keywords】: security; Linux privileges; term rewriting; verification
【Paper Link】 【Pages】:605-616
【Authors】: Jiaqi Peng ; Feng Li ; Bingchang Liu ; Lili Xu ; Binghong Liu ; Kai Chen ; Wei Huo
【Abstract】: Discovering 1-day vulnerabilities in binary patches is worthwhile but challenging. One of the key difficulties lies in generating inputs that could reach the patched code snippet while making the unpatched program crash. In this paper, we named it as a target-oriented input generation problem or a ToIG problem for clarity. Existing solutions for the ToIG problem either suffer from path explosion or may get stuck by complex checks. In the paper, we present a new solution to improve the efficiency of ToIG which leverage a combination of a distance-based directed fuzzing mechanism and a dominator-based directed symbolic execution mechanism. To demonstrate its efficiency, we design and implement 1dVul, a tool for 1-day vulnerability discovering at binary-level, based on the solution. Demonstrations show that 1dVul has successfully generated inputs for 130 targets from a total of 209 patch targets identified from applications in DARPA Cyber Grant Challenge, while the state-of-the-art solutions AFLGo and Driller can only reach 99 and 107 targets, respectively, within the same limited time budget. Further-more, 1dVul runs 2.2X and 3.6X faster than AFLGo and Driller, respectively, and has confirmed 96 vulnerabilities from the unpatched programs.
【Keywords】: binary patch analysis, vulnerability discovery, target-oriented input generation
【Paper Link】 【Pages】:617-629
【Authors】: Mohammad A. Noureddine ; Ahmed M. Fawaz ; Amanda Hsu ; Cody Guldner ; Sameer Vijay ; Tamer Basar ; William H. Sanders
【Abstract】: In this paper, we address the challenges facing the adoption of client puzzles as a means to protect the TCP connection establishment channel from state exhaustion DDoS attacks. We model the problem of selecting the puzzle difficulties as a Stackelberg game with the server as the leader and the clients as the followers and obtain the equilibrium solution for the puzzle difficulty. We then present an implementation of client puzzles inside the TCP stack of the Linux 4.13.0 kernel. We evaluate the performance of our implementation and the obtained solution against a range of attacks through reproducible experiments on the DETER testbed. Our results show that client puzzles are effective at boosting the tolerance of the TCP handshake channel to state exhaustion DDoS attacks by rate limiting malicious attackers while allocating resources for legitimate clients.
【Keywords】: Denial of Service Attacks; Proof-of-Work; Stackelberg Games; TCP
【Paper Link】 【Pages】:630-637
【Authors】: Zilong Zhao ; Sophie Cerf ; Robert Birke ; Bogdan Robu ; Sara Bouchenak ; Sonia Ben Mokhtar ; Lydia Y. Chen
【Abstract】: Classification algorithms have been widely adopted to detect anomalies for various systems, e.g., IoT and cloud, under the common assumption that the data source is clean, i.e., features and labels are correctly set. However, data collected from the field can be unreliable due to careless annotations or malicious data transformation for incorrect anomaly detection. In this paper, we present a two-layer learning framework for robust anomaly detection (RAD) in the presence of unreliable anomaly labels. The first layer of quality model filters the suspicious data, where the second layer of classification model detects the anomaly types. We specifically focus on two use cases, (i) detecting 10 classes of IoT attacks and (ii) predicting 4 classes of task failures of big data jobs. Our evaluation results show that RAD can robustly improve the accuracy of anomaly detection, to reach up to 98% for IoT device attacks (i.e., +11%) and up to 83% for cloud task failures (i.e., +20%), under a significant percentage of altered anomaly labels.
【Keywords】: Unreliable Data; Anomaly Detection; Failures; Attacks; Machine Learning
【Paper Link】 【Pages】:638-644
【Authors】: Ricardo Couceiro ; Gonçalo Duarte ; João Durães ; João Castelhano ; Catarina Duarte ; César A. D. Teixeira ; Miguel Castelo-Branco ; Paulo Carvalho ; Henrique Madeira
【Abstract】: Our research explores a recent paradigm called Biofeedback Augmented Software Engineering (BASE) that introduces a strong new element in the software development process: the programmers' biofeedback. In this Practical Experience Report we present the results of an experiment to evaluate the possibility of using pupillography to gather biofeedback from the programmers. The idea is to use pupillography to get meta information about the programmers' cognitive and emotional states (stress, attention, mental effort level, cognitive overload,...) during code development to identify conditions that may precipitate programmers making bugs or bugs escaping human attention, and tag the corresponding code locations in the software under development to provide online warnings to the programmer or identify code snippets that will need more intensive testing. The experiments evaluate the use of pupillography as cognitive load predictor, compare the results with the mental effort perceived by programmers using NASATLX, and discuss different possibilities for the use of pupillography as biofeedback sensor in real software development scenarios.
【Keywords】: software faults; pupillography, programmers' biofeedback; mental effort; cognitive overload; human error