Duke Research Blog

Following the people and events that make up the research community at Duke.

Category: Data (Page 1 of 6)

Pinpointing Where Durham’s Nicotine Addicts Get Their Fix

DURHAM, N.C. — It’s been five years since Durham expanded its smoking ban beyond bars and restaurants to include public parks, bus stops, even sidewalks.

While smoking in the state overall may be down, 19 percent of North Carolinians still light up, particularly the poor and those without a high school or college diploma.

Among North Carolina teens, consumption of electronic cigarettes in particular more than doubled between 2013 and 2015.

Now, new maps created by students in the Data+ summer research program show where nicotine addicts can get their fix.

Studies suggest that tobacco retailers are disproportionately located in low-income neighborhoods.

Living in a neighborhood with easy access to stores that sell tobacco makes it easier to start young and harder to quit.

The end result is that smoking, secondhand smoke exposure, and smoking-related diseases such as lung cancer, are concentrated among the most socially disadvantaged communities.

If you’re poor and lack a high school or college diploma, you’re more likely to live near a store that sells tobacco.

If you’re poor and lack a high school or college diploma, you’re more likely to live near a store that sells tobacco. Photo from Pixabay.

Where stores that sell tobacco are located matters for health, but for many states such data are hard to come by, said Duke statistics major James Wang.

Tobacco products bring in more than a third of in-store sales revenue at U.S. convenience stores — more than food, beverages, candy, snacks or beer. Despite big profits, more than a dozen states don’t require businesses to get a special license or permit to sell tobacco. North Carolina is one of them.

For these states, there is no convenient spreadsheet from the local licensing agency identifying all the businesses that sell tobacco, said Duke undergraduate Nikhil Pulimood. Previous attempts to collect such data in Virginia involved searching for tobacco retail stores by car.

“They had people physically drive across every single road in the state to collect the data. It took three years,” said team member and Duke undergraduate Felicia Chen.

Led by UNC PhD student in epidemiology Mike Dolan Fliss, the Duke team tried to come up with an easier way.

Instead of collecting data on the ground, they wrote an automated web-crawler program to extract the data from the Yellow Pages websites, using a technique called Web scraping.

By telling the software the type of business and location, they were able to create a database that included the names, addresses, phone numbers and other information for 266 potential tobacco retailers in Durham County and more than 15,500 statewide, including chains such as Family Fare, Circle K and others.

Map showing the locations of tobacco retail stores in Durham County, North Carolina.

Map showing the locations of tobacco retail stores in Durham County, North Carolina.

When they compared their web-scraped data with a pre-existing dataset for Durham County, compiled by a nonprofit called Counter Tools, hundreds of previously hidden retailers emerged on the map.

To determine which stores actually sold tobacco, they fed a computer algorithm data from more than 19,000 businesses outside North Carolina so it could learn how to distinguish say, convenience stores from grocery stores. When the algorithm received store names from North Carolina, it predicted tobacco retailers correctly 85 percent of the time.

“For example we could predict that if a store has the word “7-Eleven” in it, it probably sells tobacco,” Chen said.

As a final step, they also crosschecked their results by paying people a small fee to search for the stores online to verify that they exist, and call them to ask if they actually sell tobacco, using a crowdsourcing service called Amazon Mechanical Turk.

Ultimately, the team hopes their methods will help map the more than 336,000 tobacco retailers nationwide.

“With a complete dataset for tobacco retailers around the nation, public health experts will be able to see where tobacco retailers are located relative to parks and schools, and how store density changes from one neighborhood to another,” Wang said.

The team presented their work at the Data+ Final Symposium on July 28 in Gross Hall.

Data+ is sponsored by Bass Connections, the Information Initiative at Duke, the Social Science Research Institute, the departments of mathematics and statistical science and MEDx. This project team was also supported by Counter Tools, a non-profit based in Carrboro, NC.

Writing by Robin Smith; video by Lauren Mueller and Summer Dunsmore

Sizing Up Hollywood’s Gender Gap

DURHAM, N.C. — A mere seven-plus decades after she first appeared in comic books in the early 1940s, Wonder Woman finally has her own movie.

In the two months since it premiered, the film has brought in more than $785 million worldwide, making it the highest grossing movie of the summer.

But if Hollywood has seen a number of recent hits with strong female leads, from “Wonder Woman” and “Atomic Blonde” to “Hidden Figures,” it doesn’t signal a change in how women are depicted on screen — at least not yet.

Those are the conclusions of three students who spent ten weeks this summer compiling and analyzing data on women’s roles in American film, through the Data+ summer research program.

The team relied on a measure called the Bechdel test, first depicted by the cartoonist Alison Bechdel in 1985.

Bechdel test

The “Bechdel test” asks whether a movie features at least two women who talk to each other about anything besides a man. Surprisingly, a lot of films fail. Art by Srravya [CC0], via Wikimedia Commons.

To pass the Bechdel test, a movie must satisfy three basic requirements: it must have at least two named women in it, they must talk to each other, and their conversation must be about something other than a man.

It’s a low bar. The female characters don’t have to have power, or purpose, or buck gender stereotypes.

Even a movie in which two women only speak to each other briefly in one scene, about nail polish — as was the case with “American Hustle” —  gets a passing grade.

And yet more than 40 percent of all U.S. films fail.

The team used data from the bechdeltest.com website, a user-compiled database of over 7,000 movies where volunteers rate films based on the Bechdel criteria. The number of criteria a film passes adds up to its Bechdel score.

“Spider Man,” “The Jungle Book,” “Star Trek Beyond” and “The Hobbit” all fail by at least one of the criteria.

Films are more likely to pass today than they were in the 1970s, according to a 2014 study by FiveThirtyEight, the data journalism site created by Nate Silver.

The authors of that study analyzed 1,794 movies released between 1970 and 2013. They found that the number of passing films rose steadily from 1970 to 1995 but then began to stall.

In the past two decades, the proportion of passing films hasn’t budged.

Since the mid-1990s, the proportion of films that pass the Bechdel test has flatlined at about 50 percent.

Since the mid-1990s, the proportion of films that pass the Bechdel test has flatlined at about 50 percent.

The Duke team was also able to obtain data from a 2016 study of the gender breakdown of movie dialogue in roughly 2,000 screenplays.

Men played two out of three top speaking roles in more than 80 percent of films, according to that study.

Using data from the screenplay study, the students plotted the relationship between a movie’s Bechdel score and the number of words spoken by female characters. Perhaps not surprisingly, films with higher Bechdel scores were also more likely to achieve gender parity in terms of speaking roles.

“The Bechdel test doesn’t really tell you if a film is feminist,” but it’s a good indicator of how much women speak, said team member Sammy Garland, a Duke sophomore majoring in statistics and Chinese.

Previous studies suggest that men do twice as much talking in most films — a proportion that has remained largely unchanged since 1995. The reason, researchers say, is not because male characters are more talkative individually, but because there are simply more male roles.

“To close the gap of speaking time, we just need more female characters,” said team member Selen Berkman, a sophomore majoring in math and computer science.

Achieving that, they say, ultimately comes down to who writes the script and chooses the cast.

The team did a network analysis of patterns of collaboration among 10,000 directors, writers and producers. Two people are joined whenever they worked together on the same movie. The 13 most influential and well-connected people in the American film industry were all men, whose films had average Bechdel scores ranging from 1.5 to 2.6 — meaning no top producer is regularly making films that pass the Bechdel test.

“What this tells us is there is no one big influential producer who is moving the needle. We have no champion,” Garland said.

Men and women were equally represented in fewer than 10 percent of production crews.

But assembling a more gender-balanced production team in the early stages of a film can make a difference, research shows. Films with more women in top production roles have female characters who speak more too.

“To better represent women on screen you need more women behind the scenes,” Garland said.

Dollar for dollar, making an effort to close the Hollywood gender gap can mean better returns at the box office too. Films that pass the Bechdel test earn $2.68 for every dollar spent, compared with $2.45 for films that fail — a 23-cent better return on investment, according to FiveThirtyEight.

Other versions of the Bechdel test have been proposed to measure race and gender in film more broadly. The advantage of analyzing the Bechdel data is that thousands of films have already been scored, said English major and Data+ team member Aaron VanSteinberg.

“We tried to watch a movie a week, but we just didn’t have time to watch thousands of movies,” VanSteinberg said.

A new report on diversity in Hollywood from the University of Southern California suggests the same lack of progress is true for other groups as well. In nearly 900 top-grossing films from 2007 to 2016, disabled, Latino and LGBTQ characters were consistently underrepresented relative to their makeup in the U.S. population.

Berkman, Garland and VanSteinberg were among more than 70 students selected for the 2017 Data+ program, which included data-driven projects on photojournalism, art restoration, public policy and more.

They presented their work at the Data+ Final Symposium on July 28 in Gross Hall.

Data+ is sponsored by Bass Connections, the Information Initiative at Duke, the Social Science Research Institute, the departments of mathematics and statistical science and MEDx. 

Writing by Robin Smith; video by Lauren Mueller and Summer Dunsmore

Mapping Electricity Access for a Sixth of the World’s People

DURHAM, N.C. — Most Americans can charge their cell phones, raid the fridge or boot up their laptops at any time without a second thought.

Not so for the 1.2 billion people — roughly 16 percent of the world’s population — with no access to electricity.

Despite improvements over the past two decades, an estimated 780 million people will still be without power by 2030, especially in rural parts of sub-Saharan Africa, Asia and the Pacific.

To get power to these people, first officials need to locate them. But for much of the developing world, reliable, up-to-date data on electricity access is hard to come by.

Researchers say remote sensing can help.

For ten weeks from May through July, a team of Duke students in the Data+ summer research program worked on developing ways to assess electricity access automatically, using satellite imagery.

“Ground surveys take a lot of time, money and manpower,” said Data+ team member Ben Brigman. “As it is now, the only way to figure out if a village has electricity is to send someone out there to check. You can’t call them up or put out an online poll, because they won’t be able to answer.”

India at night

Satellite image of India at night. Large parts of the Indian countryside still aren’t connected to the grid, but remote sensing, machine learning could help pinpoint people living without power. Credits: NASA Earth Observatory images by Joshua Stevens, using Suomi NPP VIIRS data from Miguel Román, NASA’s Goddard Space Flight Center

Led by researchers in the Energy Data Analytics Lab and the Sustainable Energy Transitions Initiative, “the initial goal was to create a map of India, showing every village or town that does or does not have access to electricity,” said team member Trishul Nagenalli.

Electricity makes it possible to pump groundwater for crops, refrigerate food and medicines, and study or work after dark. But in parts of rural India, where Nagenalli’s parents grew up, many households use kerosene lamps to light homes at night, and wood or animal dung as cooking fuel.

Fires from overturned kerosene lamps are not uncommon, and indoor air pollution from cooking with solid fuels contributes to low birth weight, pneumonia and other health problems.

In 2005, the Indian government set out to provide electricity to all households within five years. Yet a quarter of India’s population still lives without power.

Ultimately, the goal is to create a machine learning algorithm — basically a set of instructions for a computer to follow — that can recognize power plants, irrigated fields and other indicators of electricity in satellite images, much like the algorithms that recognize your face on Facebook.

Rather than being programmed with specific instructions, machine learning algorithms “learn” from large amounts of data.

This summer the researchers focused on the unsung first step in the process: preparing the training data.

Phoenix power plant

Satellite image of a power plant in Phoenix, Arizona

Fellow Duke students Gouttham Chandrasekar, Shamikh Hossain and Boning Li were also part of the effort. First they compiled publicly available satellite images of U.S. power plants. Rather than painstakingly framing and labeling the plants in each photo themselves, they tapped the powers of the Internet to outsource the task and hired other people to annotate the images for them, using a crowdsourcing service called Amazon Mechanical Turk.

So far, they have collected more than 8,500 image annotations of different kinds of power plants, including oil, natural gas, hydroelectric and solar.

The team also compiled firsthand observations of the electrification rate for more than 36,000 villages in the Indian state of Bihar, which has one of the lowest electrification rates in the country. For each village, they also gathered satellite images showing light intensity at night, along with density of green land and other indicators of irrigated farms, as proxies for electricity consumption.

Using these data sets, the goal is to develop a computer algorithm which, through machine learning, teaches itself to detect similar features in unlabeled images, and distinguishes towns and villages that are connected to the grid from those that aren’t.

“We would like to develop our final algorithm to essentially go into a developing country and analyze whether or not a community there has access to electricity, and if so what kind,” Chandrasekar said.

Electrification map of Bihar, India

The proportion of households connected to the grid in more than 36,000 villages in Bihar, India

The project is far from finished. During the 2017-2018 school year, a Bass Connections team will continue to build on their work.

The summer team presented their research at the Data+ Final Symposium on July 28 in Gross Hall.

Data+ is sponsored by Bass Connections, the Information Initiative at Duke, the Social Science Research Institute, the departments of mathematics and statistical science and MEDx. This project team was also supported by the Duke University Energy Initiative.

Writing by Robin Smith; video by Lauren Mueller and Summer Dunsmore

Energy Program on Chopping Block, But New Data Suggest It Works

Duke research yields new data about energy efficiency program slated for elimination

Do energy efficiency “audits” really benefit companies over time? An interdisciplinary team of Duke researchers (economist Gale Boyd, statistician Jerome “Jerry” Reiter, and doctoral student Nicole Dalzell) have been tackling this question as it applies to a long-running Department of Energy (DOE) effort that is slated for elimination under President Trump’s proposed budget.

Evaluating a long-running energy efficiency effort

Since 1976, the DOE’s Industrial Assessments Centers (IAC) program has aimed to help small- and medium-sized manufacturers to become more energy-efficient by providing free energy “audits” from universities across the country. (Currently, 28 universities take part, including North Carolina State University.)

Gale Boyd

Gale Boyd is a Duke Economist

The Duke researchers’ project, supported by an Energy Research Seed Fund grant, has yielded a statistically sound new technique for matching publicly available IAC data with confidential plant information collected in the U.S. Census of Manufacturing (CMF).

The team has created a groundbreaking linked database that will be available in the Federal Statistical Research Data Center network for use by other researchers. Currently the database links IAC data from 2007 and confidential plant data from the 2012 CMF, but it can be expanded to include additional years.

The team’s analysis of this linked data indicate that companies participating in the DOE’s IAC program do become more efficient and improve in efficiency ranking over time when compared to peer companies in the same industry. Additional analysis could reveal the characteristics of companies that benefit most and the interventions that are most effective.

Applications for government, industry, utilities, researchers

This data could be used to inform the DOE’s IAC program, if the program is not eliminated.

But the data have other potential applications, too, says Boyd.

Individual companies who took part in the DOE program could discover the relative yields of their own energy efficiency measures: savings over time as well as how their efficiency ranking among peers has shifted.

Researchers, states, and utilities could use the data to identify manufacturing sectors and types of businesses that benefit most from information about energy efficiency measures, the specific measures connected with savings, and non-energy benefits of energy efficiency, e.g. on productivity.

Meanwhile, the probabilistic matching techniques developed as part of the project could help researchers in a range of fields—from public health to education—to build a better understanding of populations by linking data sets in statistically sound ways.

An interdisciplinary team leveraging Duke talent and resources

Boyd—a Duke economist who previously spent two decades doing applied policy evaluation at Argonne National Laboratory—has been using Census data to study energy efficiency and productivity for more than fifteen years. Boyd has co-appointments in Duke’s Social Science Research Institute and Department of Economics. He now directs the Triangle Research Data Center (TRDC), a partnership between the U.S. Census Bureau and Duke University in cooperation with the University of North Carolina and Research Triangle Institute.

The TRDC (located in Gross Hall for Interdisciplinary Innovation) is one of more than 30 locations in the country where researchers can access the confidential micro-data collected by the Federal Statistical System.

Jerry Reiter is a Duke statistician.

Jerry Reiter is a professor in Duke’s Department of Statistical Science, associate director of the Information Initiative at Duke (iiD), and a Duke alumnus (B.S’92). Reiter was dissertation supervisor for Nicole Dalzell, who completed her Ph.D. at Duke this spring and will be an assistant teaching professor in the Department of Mathematics and Statistics at Wake Forest University in the fall.

Boyd reports, “The opportunity to work in an interdisciplinary team with Jerry (one of the nation’s leading researchers on imputation and synthetic data) and Nicole (one of Duke’s bright new minds in this field) has opened my eyes a bit about how cavalier some researchers are with respect to uncertainty when we link datasets. Statisticians’ expertise in these areas can help the rest of us do better research, making it as sound and defensible as possible.”

What’s next for the project

The collaboration was made by possible by the Duke University Energy Initiative’s Energy Research Seed Fund, which supports new interdisciplinary research teams to secure preliminary results that can help secure external funding. The grant was co-funded by the Pratt School of Engineering and Information Initiative at Duke (iiD).

Given the potential uses of the team’s results by the private sector (particularly by electric utilities), other funding possibilities are likely to emerge.

Boyd, Reiter, and Dalzell have submitted an article to the journal Energy Policy and are discussing future research application of this data with colleagues in the field of energy efficiency and policy. Their working paper is available as part of the Environmental and Energy Economics Working Paper Series organized by the Energy Initiative and the Nicholas Institute for Environmental Policy Solutions.

Energy Efficiency Graphic

For more information, contact Gale Boyd: gale.boyd@duke.edu.

Guest Post from Braden Welborn, Duke University Energy Initiative

Students Share Research Journeys at Bass Connections Showcase

From the highlands of north central Peru to high schools in North Carolina, student researchers in Duke’s Bass Connections program are gathering data in all sorts of unique places.

As the school year winds down, they packed into Duke’s Scharf Hall last week to hear one another’s stories.

Students and faculty gathered in Scharf Hall to learn about each other’s research at this year’s Bass Connections showcase. Photo by Jared Lazarus/Duke Photography.

The Bass Connections program brings together interdisciplinary teams of undergraduates, graduate students and professors to tackle big questions in research. This year’s showcase, which featured poster presentations and five “lightning talks,” was the first to include teams spanning all five of the program’s diverse themes: Brain and Society; Information, Society and Culture; Global Health; Education and Human Development; and Energy.

“The students wanted an opportunity to learn from one another about what they had been working on across all the different themes over the course of the year,” said Lori Bennear, associate professor of environmental economics and policy at the Nicholas School, during the opening remarks.

Students seized the chance, eagerly perusing peers’ posters and gathering for standing-room-only viewings of other team’s talks.

The different investigations took students from rural areas of Peru, where teams interviewed local residents to better understand the transmission of deadly diseases like malaria and leishmaniasis, to the North Carolina Museum of Art, where mathematicians and engineers worked side-by-side with artists to restore paintings.

Machine learning algorithms created by the Energy Data Analytics Lab can pick out buildings from a satellite image and estimate their energy consumption. Image courtesy Hoël Wiesner.

Students in the Energy Data Analytics Lab didn’t have to look much farther than their smart phones for the data they needed to better understand energy use.

“Here you can see a satellite image, very similar to one you can find on Google maps,” said Eric Peshkin, a junior mathematics major, as he showed an aerial photo of an urban area featuring buildings and a highway. “The question is how can this be useful to us as researchers?”

With the help of new machine-learning algorithms, images like these could soon give researchers oodles of valuable information about energy consumption, Peshkin said.

“For example, what if we could pick out buildings and estimate their energy usage on a per-building level?” said Hoël Wiesner, a second year master’s student at the Nicholas School. “There is not really a good data set for this out there because utilities that do have this information tend to keep it private for commercial reasons.”

The lab has had success developing algorithms that can estimate the size and location of solar panels from aerial photos. Peshkin and Wiesner described how they are now creating new algorithms that can first identify the size and locations of buildings in satellite imagery, and then estimate their energy usage. These tools could provide a quick and easy way to evaluate the total energy needs in any neighborhood, town or city in the U.S. or around the world.

“It’s not just that we can take one city, say Norfolk, Virginia, and estimate the buildings there. If you give us Reno, Tuscaloosa, Las Vegas, Pheonix — my hometown — you can absolutely get the per-building energy estimations,” Peshkin said. “And what that means is that policy makers will be more informed, NGOs will have the ability to best service their community, and more efficient, more accurate energy policy can be implemented.”

Some students’ research took them to the sidelines of local sports fields. Joost Op’t Eynde, a master’s student in biomedical engineering, described how he and his colleagues on a Brain and Society team are working with high school and youth football leagues to sort out what exactly happens to the brain during a high-impact sports game.

While a particularly nasty hit to the head might cause clear symptoms that can be diagnosed as a concussion, the accumulation of lesser impacts over the course of a game or season may also affect the brain. Eynde and his team are developing a set of tools to monitor both these impacts and their effects.

A standing-room only crowd listened to a team present on their work “Tackling Concussions.” Photo by Jared Lazarus/Duke Photography.

“We talk about inputs and outputs — what happens, and what are the results,” Eynde said. “For the inputs, we want to actually see when somebody gets hit, how they get hit, what kinds of things they experience, and what is going on in the head. And the output is we want to look at a way to assess objectively.”

The tools include surveys to estimate how often a player is impacted, an in-ear accelerometer called the DASHR that measures the intensity of jostles to the head, and tests of players’ performance on eye-tracking tasks.

“Right now we are looking on the scale of a season, maybe two seasons,” Eynde said. “What we would like to do in the future is actually follow some of these students throughout their career and get the full data for four years or however long they are involved in the program, and find out more of the long-term effects of what they experience.”

Kara J. Manke, PhD

Post by Kara Manke

Data Geeks Go Head to Head

For North Carolina college students, “big data” is becoming a big deal. The proof: signups for DataFest, a 48-hour number-crunching competition held at Duke last weekend, set a record for the third time in a row this year.

DataFest 2017

More than 350 data geeks swarmed Bostock Library this weekend for a 48-hour number-crunching competition called DataFest. Photo by Loreanne Oh, Duke University.

Expected turnout was so high that event organizer and Duke statistics professor Mine Cetinkaya-Rundel was even required by state fire code to sign up for “crowd manager” safety training — her certificate of completion is still proudly displayed on her Twitter feed.

Nearly 350 students from 10 schools across North Carolina, California and elsewhere flocked to Duke’s West Campus from Friday, March 31 to Sunday, April 2 to compete in the annual event.

Teams of two to five students worked around the clock over the weekend to make sense of a single real-world data set. “It’s an incredible opportunity to apply the modeling and computing skills we learn in class to actual business problems,” said Duke junior Angie Shen, who participated in DataFest for the second time this year.

The surprise dataset was revealed Friday night. Just taming it into a form that could be analyzed was a challenge. Containing millions of data points from an online booking site, it was too large to open in Excel. “It was bigger than anything I’ve worked with before,” said NC State statistics major Michael Burton.

DataFest 2017

The mystery data set was revealed Friday night in Gross Hall. Photo by Loreanne Oh.

Because of its size, even simple procedures took a long time to run. “The dataset was so large that we actually spent the first half of the competition fixing our crushed software and did not arrive at any concrete finding until late afternoon on Saturday,” said Duke junior Tianlin Duan.

The organizers of DataFest don’t specify research questions in advance. Participants are given free rein to analyze the data however they choose.

“We were overwhelmed with the possibilities. There was so much data and so little time,” said NCSU psychology major Chandani Kumar.

“While for the most part data analysis was decided by our teachers before now, this time we had to make all of the decisions ourselves,” said Kumar’s teammate Aleksey Fayuk, a statistics major at NCSU.

As a result, these budding data scientists don’t just write code. They form theories, find patterns, test hunches. Before the weekend is over they also visualize their findings, make recommendations and communicate them to stakeholders.

This year’s participants came from more than 10 schools, including Duke, UNC, NC State and North Carolina A&T. Students from UC Davis and UC Berkeley also made the trek. Photo by Loreanne Oh.

“The most memorable moment was when we finally got our model to start generating predictions,” said Duke neuroscience and computer science double major Luke Farrell. “It was really exciting to see all of our work come together a few hours before the presentations were due.”

Consultants are available throughout the weekend to help with any questions participants might have. Recruiters from both start-ups and well-established companies were also on site for participants looking to network or share their resumes.

“Even as late as 11 p.m. on Saturday we were still able to find a professor from the Duke statistics department at the Edge to help us,” said Duke junior Yuqi Yun, whose team presented their results in a winning interactive visualization. “The organizers treat the event not merely as a contest but more of a learning experience for everyone.”

Caffeine was critical. “By 3 a.m. on Sunday morning, we ended initial analysis with what we had, hoped for the best, and went for a five-hour sleep in the library,” said NCSU’s Fayuk, whose team DataWolves went on to win best use of outside data.

By Sunday afternoon, every surface of The Edge in Bostock Library was littered with coffee cups, laptops, nacho crumbs, pizza boxes and candy wrappers. White boards were covered in scribbles from late-night brainstorming sessions.

“My team encouraged everyone to contribute ideas. I loved how everyone was treated as a valuable team member,” said Duke computer science and political science major Pim Chuaylua. She decided to sign up when a friend asked if she wanted to join their team. “I was hesitant at first because I’m the only non-stats major in the team, but I encouraged myself to get out of my comfort zone,” Chuaylua said.

“I learned so much from everyone since we all have different expertise and skills that we contributed to the discussion,” said Shen, whose teammates were majors in statistics, computer science and engineering. Students majoring in math, economics and biology were also well represented.

At the end, each team was allowed four minutes and at most three slides to present their findings to a panel of judges. Prizes were awarded in several categories, including “best insight,” “best visualization” and “best use of outside data.”

Duke is among more than 30 schools hosting similar events this year, coordinated by the American Statistical Association (ASA). The winning presentations and mystery data source will be posted on the DataFest website in May after all events are over.

The registration deadline for the next Duke DataFest will be March 2018.

DataFest 2017

Bleary-eyed contestants pose for a group photo at Duke DataFest 2017. Photo by Loreanne Oh.

s200_robin.smith

Post by Robin Smith

Page 1 of 6

Powered by WordPress & Theme by Anders Norén