Abstracts and Presentations 2013

Fifth Annual SRL/ISSDM Research Symposium

UCSC Systems Oktoberfest

October 22, 2013

Baskin School of Engineering, University of California, Santa Cruz

ABSTRACTS

SIDR: Structure-Aware Intelligent Data Routing in Hadoop - Joe Buck (PhD Student, UCSC)

The MapReduce framework is being extended for domains quite different from the web applications for which it was designed, including the processing of big structured data, e.g., scientific and financial data. Previous work using MapReduce to process scientific data ignores existing structure when assigning intermediate data and scheduling tasks. In this paper, we present a method for incorporating knowledge of the structure of scientific data and executing query into the MapReduce communication model. Built in SciHadoop, a version of the Hadoop MapReduce framework for scientific data, SIDR intelligently partitions and routes intermediate data, allowing it to: remove Hadoop’s global barrier and execute Reduce tasks prior to all Map tasks completing; minimize intermediate key skew; and produce early, correct results. SIDR executes queries up to 2.5 times faster than Hadoop and 37% faster than SciHadoop; produces initial results with only 6% of the query completed; and produces dense, contiguous output.

Odyssey: A System for Evolutionary Analytics Over Big Data - Jeff LeFevre (PhD Student, UCSC)

Evolutionary analytics refers to the fluid process of interactive data exploration, which itself involves ingesting and exploring new datasets, applying complex functions on unstructured data, and frequently revising queries based on observing prior query answers. This talk will review Odyssey, a system for evolutionary analytics over big data that is being developed as a collaboration between UCSC and NEC data management labs. Using Odyssey, an analyst can point to data sources of interest, pose complex queries and obtain answers fast, without having to do any a-priori tuning or data preprocessing. The key idea is that Odyssey self-adapts its configuration to optimize query evaluation, leveraging the characteristics of interactive data exploration and large-scale query processing. The talk will review the design of Odyssey’s main components and present an empirical study of its performance against other state-of-the-art systems for big data analytics.

A Framework for Exascale Analysis Shipping - Ivo Jimenez (PhD Student, UCSC)

The road to exascale computing is well underway. Exascale systems that are slated for the end of this decade will include up to a million of compute nodes running about a billion threads of execution. In this scenario, traditional methods that ameliorate I/O bottlenecks don't work anymore. This has prompted the exploration of distinct storage system designs and techniques in order to deal with the high amount of I/O load.

Among the distinct architectures being explored, I/O Staging proposes the designation of a portion of the high-end nodes to manage I/O. These "I/O nodes" handle requests forwarded by the scientific applications, integrate a tier of solid-state devices to absorb the burst of random operations, and organize/re-format the data so that transfers from the staging area to the traditional parallel file system can be done more effectively. Staging areas have also the capability of executing analysis on the fresh data that has just been produced by simulation applications running at the compute nodes. However, the analysis code has to be already loaded on the staging nodes in order for them to execute. This mechanism is hard to operate since every time that new analysis functionality is added, the entire cluster has to be shutdown, binaries recompiled and distributed, and finally rebooted. This is prohibitive in HPC scenarios where time-sensitive applications can't be stopped.

In this work, we present a framework for the execution of analysis code that allows the user to explore the data, ship and execute jobs, and obtain results, all done dynamically without obstructing the operation of the high-end machine. In concrete, we implement a master/worker cluster on the staging nodes that is in charge of receiving, planing and executing analytical tasks on behalf of the user.

In-Vivo Storage System Development Noah Watkins (PhD Student, UCSC)

The emergence of high-performance open-source storage systems is allowing application and middleware developers to consider nonstandard storage system interfaces. In contrast to the practice of virtually always designing for le-like byte-stream interfaces, co-designed domainspeci c storage system interfaces are becoming increasingly common. 

However, in order for developers to evolve interfaces in high-availability storage systems, services are needed for in-vivo interface evolution that allows the development of interfaces in the context of a live system. Current clustered storage systems that provide interface customizability expose primitive services for managing ad-hoc interfaces. For maximum utility, the ability to create, evolve, and deploy dynamic storage interfaces is needed. However, in large-scale clusters, dynamic interface instantiation will require system-level support that ensures interface version consistency among storage nodes and client applications. We propose that storage systems should provide services that fully manage the life-cycle of dynamic interfaces that are aligned with the common branchand-merge form of software maintenance, including isolated development workspaces that can be combined into existing production views of the system.

Understanding Image-based Data using Human Computation - James Davis (Associate Professor, Computer Science Dept., UCSC)

Most of the data collected and stored in the world is in the form of images or video, and understanding images is critical to many computational systems. Computational systems are very good at managing large amounts of data and finding statistical patterns in huge labeled datasets. Unfortunately, machine computation is not yet very good at robust small scale contextual understanding, such as determining if a circular object in an image is a face or a wheel. In contrast, humans are exceptionally good at this kind of task. The lack of robustness at the small scale, limits the robustness of the intended large scale understanding. Our work seeks to enhance machine understanding of image based data by including human computational units as an element inside larger computational systems. By using human input as sub-routines inside a larger computational system, small scale annotation and labeling can be achieved robustly. These labels will in turn allow robust computer understanding of large datasets.

HEC: Improving Endurance of High Performance Flash-based Cache Devices - Jingpei Yang (PhD Student UCSC)

Flash memory is widely used for its fast random I/O access performance in a gamut of enterprise storage applications. However, due to the limited endurance and asymmetric write performance of flash memory, minimizing writes to a flash device is critical for both performance and endurance. Previous studies have focused on flash memory as a candidate for primary storage devices; little is known about its behavior as a Solid State Cache (SSC) device. In this paper, we propose HEC, a High Endurance Cache that aims to improve overall device endurance via reduced media writes and erases while maximizing cache hit rate performance. We analyze the added write pressures that cache workloads place on flash devices and propose optimizations at both the cache and flash management layers to improve endurance while maintaining or increasing cache hit rate. We demonstrate the individual and cumulative contributions of cache admission policy, cache eviction policy, flash garbage collection policy, and flash device configuration on a) hit rate, b) overall writes, and c) erases as seen by the SSC device. Through our improved cache and flash optimizations, 83% of the analyzed workload ensembles achieved increased or maintained hit rate with write reductions up to 20x, and erase count reductions up to 6x.

A Framework for an In-depth Comparison of Scale-up and Scale-out - Michael Sevilla (PhD Student, UCSC)

When data grows too large, we scale to larger systems, either by scaling out or scaling up. It is understood that scale-up and scale-out have different complexities and bottlenecks but a thorough comparison of the two architectures is challenging because of the diversity of their programming interfaces, their significantly different system environments, and their sensitivity to workload specifics. In this talk, we propose a novel framework that accounts for the application, its requirements, and its input size to help choose a scaling architecture. We then use this framework to demonstrate the complex trade-offs, interactions, and dependencies of different properties provided by the system so that a designer can select the best architecture given the job. 

This work also outlines some of the interesting challenges inherit in selecting a scale-up architecture as the computation framework. Moving data from a distributed file system onto a single node is bottlenecked by the network bandwidth and the number of ports. Loading data into memory is bottlenecked by the disk bandwidth. We present optimizations in our API/runtime that minimize some of these bottlenecks for MapReduce style applications on scale-up.

Transport Layer QoS For Storage Networks - Andrew Shewmaker (PhD Student UCSC)

Storage networks lack easy to configure, fine-grained, scalable performance guarantees. Solutions like DiffServ and Priority Flow Control, with their limited number of classes, do not meet the diverse and dynamic needs of today's cloud environments. IntServ preceded DiffServ and provides fine-grained flow reservations, but the specifications must be spread and supported throughout the routers of the network, which limits its scalability.

Furthermore, most existing transport layer protocols depend on packet loss as their primary signal of congestion, which results in large variations in packet delay and creates counter-productive standing bottleneck queues. FQ_Codel improves on this behavior by reducing standing queues while requiring little tuning and giving timely service to new flows. However, it pursues fairness rather than QoS, and it doesn't seek to reduce the maximum bottleneck queue depth. In particular, it doesn't address the Incast problem--a simultaneous many-to-one burst that can collapse throughput on storage networks.

Radon is a new variant of TCP optimized for storage networks that provides per-flow performance guarantees without requiring reservations to be propagated and actively supported by routers throughout the network. After admission control, client reservations are shared with the destination server and consist of (utilization, period tuples that can specify any range of flow requirements. Rather than requiring an additional layer of Active Queueing Management, a Radon flow detects congestion by modeling the bottleneck queue. Then it probabilistically responds based on the amount of work remaining in its period and the modeled depth of the bottleneck queue. In this way, Radon flows can adapt to both standing queues and Incast.

Minimal Latency in Solid-State Drives Through Redundancy - Dimitris Skourtis (PhD Student, UCSC)

Solid-state drives are becoming increasingly popular in enterprise storage systems, playing the role of large caches and permanent storage. Although SSDs provide faster random access than hard-drives, their performance under read/write workloads is highly variable often exceeding that of hard-drives (e.g., taking 100ms for a single read). Many systems with mixed workloads have low latency requirements, or require predictable performance and guarantees. In such cases, the performance variance of SSDs becomes a problem for both predictability and raw performance. 

In this work, we propose a design based on redundancy, which provides high performance and low latency for reads under read/write workloads by physically separating reads from writes. More specifically, reads achieve read-only performance while writes perform at least as good as before. We evaluate our design using micro-benchmarks and real traces, illustrating the performance benefits of read/write separation in solid-state drives. Finally, we examine the challenges of applying our technique into large-scale distributed storage systems.

Predicting Hard Disk Drive Access Time with Fourier Analysis and Neural Nets - Adam Crume (PhD Student, UCSC)

Predicting hard disk drive positioning times is a crucial part of predicting hard disk drive performance in general. Existing approaches use white-box modeling and require intimate knowledge of the internal layout of the drive. Automatically learning this behavior is a much more desirable approach, requiring less expert knowledge and fewer assumptions. A barrier to machine learning of hard drive performance has been the existence of periodic behavior with high, unknown frequencies. We show how hard disk drive positioning times can be accurately predicted using a neural net after important frequencies are found using Fourier analysis.