The 45th GRACE Seminar on Advanced Software Science and Engineering

Time: 15:00-16:00, December 13th, 2010
Place:Meeting Room (2009/2010), 20F, National Institute of Informatics

Fee: Free
Makoto Onizuka (NTT Cyber Space Laboratories)

Efficient analytical query processing based on MapReduce

MapReduce is a distributed computation framework and is widely used in analytic applications by using statistical or machine-learning techniques.
However, it is well-known that MapReduce suffers from performance bottlenecks due to the materialization of intermediate data and communication overhead imposed by shuffling between map tasks and reduce tasks.
In this presentation, I briefly overview the research direction relating to MapReduce, and then presents two techniques I have been working on for efficient analytical processing. Both of them effectively decrease the amount of the intermediate data and the communication overhead imposed by shuffle.
1) Map Multi-Reduce, which extends the original MapReduce framework by pushing down the reduce functions towards the input source data and iteratively applying the reduce functions, and
2) PJoin, which materializes semi-join views and uses those views for efficient join processing.

Makoto Onizuka
is a distinguished technical member at NTT Cyber Space Laboratories.
His research focuses on issues at cloud-scale data management and analytical processing. He visited the University of Washington from 2000 to 2001, where he co-worked with Professor Dan Suciu at XML toolkit project.
He received his Ph.D. in 2007 from Tokyo institute of technology.
He is the recipient of Kambayashi award in 2008.

