Teaching Big Data with a Virtual Cluster (SPLASH 2015 - SPLASH-E)

Fri 23 - Fri 30 October 2015 Pittsburgh, Pennsylvania, United States

Track

SPLASH 2015 SPLASH-E

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 26 Oct 2015 08:45 - 09:15 at Ellwood 1 - Session 1 - Real-world Data Chair(s): Eli Tilevich

Abstract

Both industry and academia are confronting the challenge of big data, i.e., data processing that involves data so voluminous or arriving at such high velocity that no single commodity machine is capable of storing or processing them all. A common approach to handling big data is to divide and distribute the processing job to a cluster of machines. Ideally, a course that teaches students how to work with big data would provide students access to a cluster for hands-on practice. However, a cluster of physical machines may be prohibitively expensive, particularly at smaller institutions with smaller budgets.

In this report, we summarize our experiences developing and using a virtual cluster in a big data mining and analytics course at a small private liberal arts college. A single moderately-sized server hosts a cluster of virtual machines, which run the popular Apache Hadoop system. The virtual cluster gives students hands-on experience and costs less than an equal number of physical machines. It is also easily constructed and reconfigured. We describe our implementation, analyze its performance characteristics, and compare costs with physical clusters. We summarize our use of the virtual cluster in the classroom and show student feedback. For departments wishing to take a similar approach, we offer our software and curriculum under an open source license.

File attachments

(jeckroth-teaching big data.pdf)	158KiB

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 26 Oct
Displayed time zone: Eastern Time (US & Canada) change

08:30 - 10:00	Session 1 - Real-world DataSPLASH-E at Ellwood 1 Chair(s): Eli Tilevich Virginia Tech

08:30 15m Day opening		SPLASH-E Introduction SPLASH-E Eli Tilevich Virginia Tech
08:45 30m Talk		Teaching Big Data with a Virtual Cluster SPLASH-E Joshua Eckroth Stetson University File Attached
09:15 30m Talk		A Generic Framework for Engaging Online Data Sources in Introductory Programming Courses SPLASH-E Nadeem Hamid Berry College File Attached
09:45 15m Break		Session 1 Discussion SPLASH-E

Teaching Big Data with a Virtual Cluster

Mon 26 Oct
Displayed time zone: Eastern Time (US & Canada) change

Joshua Eckroth

Stetson University

Tracks

Co-hosted Conferences

Workshops

Co-hosted Symposia

Teaching Big Data with a Virtual Cluster

Program Display Configuration

Program Display Configuration

Mon 26 OctDisplayed time zone: Eastern Time (US & Canada) change

Joshua Eckroth

Stetson University

Mon 26 Oct
Displayed time zone: Eastern Time (US & Canada) change