Secure Computations in Distributed Programming Frameworks

Distributed programming frameworks utilize parallelism in computation and storage to process massive amounts of data. A popular example is the MapReduce framework, which splits an input file into multiple chunks. In the first phase of MapReduce, a Mapper for each chunk reads the data, performs some computation, and outputs a list of key/value pairs. In the next phase, a Reducer combines the values belonging to each distinct key and outputs the result. Tasks which involve highly parallel computations over large data sets are particularly suited for MapReduce frameworks such as Hadoop. However, the data mappers may contain intentional or unintentional leakages. Untrusted mappers could return wrong results, which will in turn generate incorrect aggregate results. For example, a mapper may emit a very unique value by analyzing a private record, undermining users’ privacy. Therefore, there are two major attack prevention measures: securing the mappers and securing the data in the presence of an untrusted mapper.