File System Metadata Management




IBM Almaden


Project 1: QMDS

Project 2: Dynamic Metadata Managment

Project 3: Jabberwocky


Project 1

QMDS: Queriable File System Metadata Services

Members: Alexander "Sasha" Ames, Maya Gokhale, Carlos Maltzahn

File system metadata management has become a bottleneck for many data‐intensive applications that rely on high‐performance file systems. Part of the bottleneck is due to the limitations of an almost 50‐year old interface standard, with metadata abstractions that were designed at a time when high‐end file systems managed less than 100MB. However, today's high‐performance file systems store 7‐9 orders of magnitude more data resulting in numbers of data items, for which these metadata abstractions are inadequate. Users of file systems have attempted to work around these inadequacies by moving application‐specific metadata management to relational databases to make metadata searchable. Splitting file system metadata management into two separate systems introduces inefficiencies and systems management‐problems. To address this problem, we explore searchable metadata management services that integrate all file system metadata and use a graph data model with attributes on nodes and links. Our research focuses on the following areas: (1) Query language design, (2) Data structures for metadata, (3) Query planning, processing and optimization, (4) Workload selection and query experiment preparation, (5) Trace analysis.


Project 2

Dynamic Metadata Management

Members: Scott Brandt, Carlos Maltzahn

Alumni: Sage Weil

In petabyte-scale distributed file systems that decouple read and write from metadata operations, behavior of the metadata service is critical to overall system performance and scalability. In this project we design and implement a dynamic sub-tree partitioning and adaptive metadata management system designed to efficiently manage hierarchical metadata workloads that evolve over time. We examine the relative merits of our approach in the context of traditional workload partitioning strategies, and demonstrate the performance, scalability and adaptability advantages. 


Project 3

JabberWocky: Crowd-Sourcing Metadata for Files

Members: Varun Bhagwan, Carlos Maltzahn

Finding relevant files in a personal file system continues to be a challenge. It is still easier to find stuff on the Web with its exponential growth than in one’s personal file sys- tem. Yet, the exponential growth of personal data renders the current services of personal file systems increasingly in- adequate. A reason for this failure is the “cold-start” prob- lem: algorithms that dramatically improve a user’s ability to find documents on the Web become ineffective in per- sonal file systems because there is not enough information about these documents. We propose JabberWocky, a service that allows users to manage the content of their personal file system by leveraging semantic relationships available on the Web. More specifically, JabberWocky is using key- word/resource associations of social bookmarking web sites as a basis for recommending keywords for files. We chose social bookmarking web sites because of their popularity and because the assignment of keywords (a process also re- ferred to as “tagging”) is an established and popular way to manage photos, music, movies, and audio resources on the Web – very much the kind of resources that need to be man- aged in personal file systems. The goal of JabberWocky is to overcome the “cold-start” problem of personal file systems and to provide recommendations in a scalable way while maintaining the user’s privacy. In this work-in-progress re- port we describe the motivation and challenges of design- ing a system like JabberWocky, present the initial design of an on-going user study, and briefly discuss what we have learned so far.