During DataFest, students engaged in intense collaboration. Image courtesy of Rita Lo.

Students engaged in intense collaboration during DataFest 2016, a stats and data analysis competition held from April 1-3 at Duke. Image courtesy of Rita Lo.

On Saturday night, while most students were fast asleep or out partying, Duke junior Callie Mao stayed up until the early hours of the morning pushing and pulling a real-world data set to see what she could make of it — for fun. Callie and her team had planned for months in advance to take part in DataFest 2016, a statistical analysis competition that occurred from April 1 to April 3.

A total of 277 students, hailing from schools as disparate as Duke, UNC Chapel Hill, NCSU, Meredith College, and even one high school, the North Carolina School of Science and Mathematics, gathered in the Edge to extract insight from a mystery data set. The camaraderie was palpable, as students animatedly sketched out their ideas on whiteboard walls and chatted while devouring mountains of free food.

Callie Mao ponders which aspects of data to include in her analysis.Duke junior Callie Mao ponders which aspects of the data to include in her analysis.

Callie observed that the challenges the students faced at DataFest were extremely unique: “The most difficult part of DataFest is coming up with an idea. In class, we get specific problems, but at DataFest, we are thrown a massive data set and must figure out what to do with it. We originally came up with a lot of ideas, but the data set just didn’t have enough information to fully visualize though.”

At the core, Callie and her team, instead of answering questions posed in class, had to come up with innovative and insightful questions to pose themselves. With virtually no guidance, the team chose which aspects of the data to include and which to exclude.

Another principal consideration across all categories was which tools to use to quickly and clearly represent the data. Callie and her team used R to parse the relevant data, converted their desired data into JSON files, and used D3, a Javascript library, to code graphics to visualize the data. Other groups, however, used Tableau, a drag and drop interface that provided an expedited method for creating beautiful graphics.

Mentors assisted participants with formulating insights and presenting their resultsMentors assisted participants with formulating insights and presenting their results.
Image courtesy of Rita Lo.

On Sunday afternoon, students presented their findings to their attentive peers and to a panel of judges, comprised of industry professionals, statistics professors from various universities, and representatives from Data and Visualization Services at Duke Libraries. Judges commended projects based on aspects such as incorporation of other data sources, like Google Adwords, comprehensibility of the data presentation, and the applicability of findings in a real industry setting.

Students competed in four categories:  best use of outside data, best data insight, best visualization, and best recommendation. The Baeesians, pictured below, took first place in best outside data, the SuperANOVA team won best data insight, the Standard Normal team won best visualization, and the Sample Solution team won best recommendation. The winning presentations will be available to view by May 2 at http://www2.stat.duke.edu/datafest/.

Bayesian, the winner of the Best Outside Data categoryThe Baeasians, winner of the Best Outside Data category at DataFest 2016: Rahul Harikrishnan, Peter Shi, Qian Wang, Abhishek Upadhyaya. (Not pictured Justin Wang) Image courtesy of Rita Lo.

Originally published on the Duke Research Blog