Mon 26 Oct 2015 08:45 - 09:15 at Ellwood 1 - Session 1 - Real-world Data Chair(s): Eli Tilevich

Both industry and academia are confronting the challenge of big data, i.e., data processing that involves data so voluminous or arriving at such high velocity that no single commodity machine is capable of storing or processing them all. A common approach to handling big data is to divide and distribute the processing job to a cluster of machines. Ideally, a course that teaches students how to work with big data would provide students access to a cluster for hands-on practice. However, a cluster of physical machines may be prohibitively expensive, particularly at smaller institutions with smaller budgets.

In this report, we summarize our experiences developing and using a virtual cluster in a big data mining and analytics course at a small private liberal arts college. A single moderately-sized server hosts a cluster of virtual machines, which run the popular Apache Hadoop system. The virtual cluster gives students hands-on experience and costs less than an equal number of physical machines. It is also easily constructed and reconfigured. We describe our implementation, analyze its performance characteristics, and compare costs with physical clusters. We summarize our use of the virtual cluster in the classroom and show student feedback. For departments wishing to take a similar approach, we offer our software and curriculum under an open source license.

Mon 26 Oct
Times are displayed in time zone: Eastern Time (US & Canada) change

08:30 - 10:00: Session 1 - Real-world DataSPLASH-E at Ellwood 1
Chair(s): Eli TilevichVirginia Tech
08:30 - 08:45
Day opening
SPLASH-E Introduction
Eli TilevichVirginia Tech
08:45 - 09:15
Teaching Big Data with a Virtual Cluster
Joshua EckrothStetson University
File Attached
09:15 - 09:45
A Generic Framework for Engaging Online Data Sources in Introductory Programming Courses
Nadeem HamidBerry College
File Attached
09:45 - 10:00
Session 1 Discussion