Duke Research Blog

Following the people and events that make up the research community at Duke.

Category: Data (Page 1 of 5)

Students Share Research Journeys at Bass Connections Showcase

From the highlands of north central Peru to high schools in North Carolina, student researchers in Duke’s Bass Connections program are gathering data in all sorts of unique places.

As the school year winds down, they packed into Duke’s Scharf Hall last week to hear one another’s stories.

Students and faculty gathered in Scharf Hall to learn about each other’s research at this year’s Bass Connections showcase. Photo by Jared Lazarus/Duke Photography.

The Bass Connections program brings together interdisciplinary teams of undergraduates, graduate students and professors to tackle big questions in research. This year’s showcase, which featured poster presentations and five “lightning talks,” was the first to include teams spanning all five of the program’s diverse themes: Brain and Society; Information, Society and Culture; Global Health; Education and Human Development; and Energy.

“The students wanted an opportunity to learn from one another about what they had been working on across all the different themes over the course of the year,” said Lori Bennear, associate professor of environmental economics and policy at the Nicholas School, during the opening remarks.

Students seized the chance, eagerly perusing peers’ posters and gathering for standing-room-only viewings of other team’s talks.

The different investigations took students from rural areas of Peru, where teams interviewed local residents to better understand the transmission of deadly diseases like malaria and leishmaniasis, to the North Carolina Museum of Art, where mathematicians and engineers worked side-by-side with artists to restore paintings.

Machine learning algorithms created by the Energy Data Analytics Lab can pick out buildings from a satellite image and estimate their energy consumption. Image courtesy Hoël Wiesner.

Students in the Energy Data Analytics Lab didn’t have to look much farther than their smart phones for the data they needed to better understand energy use.

“Here you can see a satellite image, very similar to one you can find on Google maps,” said Eric Peshkin, a junior mathematics major, as he showed an aerial photo of an urban area featuring buildings and a highway. “The question is how can this be useful to us as researchers?”

With the help of new machine-learning algorithms, images like these could soon give researchers oodles of valuable information about energy consumption, Peshkin said.

“For example, what if we could pick out buildings and estimate their energy usage on a per-building level?” said Hoël Wiesner, a second year master’s student at the Nicholas School. “There is not really a good data set for this out there because utilities that do have this information tend to keep it private for commercial reasons.”

The lab has had success developing algorithms that can estimate the size and location of solar panels from aerial photos. Peshkin and Wiesner described how they are now creating new algorithms that can first identify the size and locations of buildings in satellite imagery, and then estimate their energy usage. These tools could provide a quick and easy way to evaluate the total energy needs in any neighborhood, town or city in the U.S. or around the world.

“It’s not just that we can take one city, say Norfolk, Virginia, and estimate the buildings there. If you give us Reno, Tuscaloosa, Las Vegas, Pheonix — my hometown — you can absolutely get the per-building energy estimations,” Peshkin said. “And what that means is that policy makers will be more informed, NGOs will have the ability to best service their community, and more efficient, more accurate energy policy can be implemented.”

Some students’ research took them to the sidelines of local sports fields. Joost Op’t Eynde, a master’s student in biomedical engineering, described how he and his colleagues on a Brain and Society team are working with high school and youth football leagues to sort out what exactly happens to the brain during a high-impact sports game.

While a particularly nasty hit to the head might cause clear symptoms that can be diagnosed as a concussion, the accumulation of lesser impacts over the course of a game or season may also affect the brain. Eynde and his team are developing a set of tools to monitor both these impacts and their effects.

A standing-room only crowd listened to a team present on their work “Tackling Concussions.” Photo by Jared Lazarus/Duke Photography.

“We talk about inputs and outputs — what happens, and what are the results,” Eynde said. “For the inputs, we want to actually see when somebody gets hit, how they get hit, what kinds of things they experience, and what is going on in the head. And the output is we want to look at a way to assess objectively.”

The tools include surveys to estimate how often a player is impacted, an in-ear accelerometer called the DASHR that measures the intensity of jostles to the head, and tests of players’ performance on eye-tracking tasks.

“Right now we are looking on the scale of a season, maybe two seasons,” Eynde said. “What we would like to do in the future is actually follow some of these students throughout their career and get the full data for four years or however long they are involved in the program, and find out more of the long-term effects of what they experience.”

Kara J. Manke, PhD

Post by Kara Manke

Data Geeks Go Head to Head

For North Carolina college students, “big data” is becoming a big deal. The proof: signups for DataFest, a 48-hour number-crunching competition held at Duke last weekend, set a record for the third time in a row this year.

DataFest 2017

More than 350 data geeks swarmed Bostock Library this weekend for a 48-hour number-crunching competition called DataFest. Photo by Loreanne Oh, Duke University.

Expected turnout was so high that event organizer and Duke statistics professor Mine Cetinkaya-Rundel was even required by state fire code to sign up for “crowd manager” safety training — her certificate of completion is still proudly displayed on her Twitter feed.

Nearly 350 students from 10 schools across North Carolina, California and elsewhere flocked to Duke’s West Campus from Friday, March 31 to Sunday, April 2 to compete in the annual event.

Teams of two to five students worked around the clock over the weekend to make sense of a single real-world data set. “It’s an incredible opportunity to apply the modeling and computing skills we learn in class to actual business problems,” said Duke junior Angie Shen, who participated in DataFest for the second time this year.

The surprise dataset was revealed Friday night. Just taming it into a form that could be analyzed was a challenge. Containing millions of data points from an online booking site, it was too large to open in Excel. “It was bigger than anything I’ve worked with before,” said NC State statistics major Michael Burton.

DataFest 2017

The mystery data set was revealed Friday night in Gross Hall. Photo by Loreanne Oh.

Because of its size, even simple procedures took a long time to run. “The dataset was so large that we actually spent the first half of the competition fixing our crushed software and did not arrive at any concrete finding until late afternoon on Saturday,” said Duke junior Tianlin Duan.

The organizers of DataFest don’t specify research questions in advance. Participants are given free rein to analyze the data however they choose.

“We were overwhelmed with the possibilities. There was so much data and so little time,” said NCSU psychology major Chandani Kumar.

“While for the most part data analysis was decided by our teachers before now, this time we had to make all of the decisions ourselves,” said Kumar’s teammate Aleksey Fayuk, a statistics major at NCSU.

As a result, these budding data scientists don’t just write code. They form theories, find patterns, test hunches. Before the weekend is over they also visualize their findings, make recommendations and communicate them to stakeholders.

This year’s participants came from more than 10 schools, including Duke, UNC, NC State and North Carolina A&T. Students from UC Davis and UC Berkeley also made the trek. Photo by Loreanne Oh.

“The most memorable moment was when we finally got our model to start generating predictions,” said Duke neuroscience and computer science double major Luke Farrell. “It was really exciting to see all of our work come together a few hours before the presentations were due.”

Consultants are available throughout the weekend to help with any questions participants might have. Recruiters from both start-ups and well-established companies were also on site for participants looking to network or share their resumes.

“Even as late as 11 p.m. on Saturday we were still able to find a professor from the Duke statistics department at the Edge to help us,” said Duke junior Yuqi Yun, whose team presented their results in a winning interactive visualization. “The organizers treat the event not merely as a contest but more of a learning experience for everyone.”

Caffeine was critical. “By 3 a.m. on Sunday morning, we ended initial analysis with what we had, hoped for the best, and went for a five-hour sleep in the library,” said NCSU’s Fayuk, whose team DataWolves went on to win best use of outside data.

By Sunday afternoon, every surface of The Edge in Bostock Library was littered with coffee cups, laptops, nacho crumbs, pizza boxes and candy wrappers. White boards were covered in scribbles from late-night brainstorming sessions.

“My team encouraged everyone to contribute ideas. I loved how everyone was treated as a valuable team member,” said Duke computer science and political science major Pim Chuaylua. She decided to sign up when a friend asked if she wanted to join their team. “I was hesitant at first because I’m the only non-stats major in the team, but I encouraged myself to get out of my comfort zone,” Chuaylua said.

“I learned so much from everyone since we all have different expertise and skills that we contributed to the discussion,” said Shen, whose teammates were majors in statistics, computer science and engineering. Students majoring in math, economics and biology were also well represented.

At the end, each team was allowed four minutes and at most three slides to present their findings to a panel of judges. Prizes were awarded in several categories, including “best insight,” “best visualization” and “best use of outside data.”

Duke is among more than 30 schools hosting similar events this year, coordinated by the American Statistical Association (ASA). The winning presentations and mystery data source will be posted on the DataFest website in May after all events are over.

The registration deadline for the next Duke DataFest will be March 2018.

DataFest 2017

Bleary-eyed contestants pose for a group photo at Duke DataFest 2017. Photo by Loreanne Oh.

s200_robin.smith

Post by Robin Smith

Creating Technology That Understands Human Emotions

“If you – as a human – want to know how somebody feels, for what might you look?” Professor Shaundra Daily asked the audience during an ECE seminar last week.

“Facial expressions.”
“Body Language.”
“Tone of voice.”
“They could tell you!”

Over 50 students and faculty gathered over cookies and fruits for Dr. Daily’s talk on designing applications to support personal growth. Dr. Daily is an Associate Professor in the Department of Computer and Information Science and Engineering at the University of Florida interested in affective computing and STEM education.

Dr. Daily explaining the various types of devices used to analyze people’s feelings and emotions. For example, pressure sensors on a computer mouse helped measure the frustration of participants as they filled out an online form.

Affective Computing

The visual and auditory cues proposed above give a human clues about the emotions of another human. Can we use technology to better understand our mental state? Is it possible to develop software applications that can play a role in supporting emotional self-awareness and empathy development?

Until recently, technologists have largely ignored emotion in understanding human learning and communication processes, partly because it has been misunderstood and hard to measure. Asking the questions above, affective computing researchers use pattern analysis, signal processing, and machine learning to extract affective information from signals that human beings express. This is integral to restore a proper balance between emotion and cognition in designing technologies to address human needs.

Dr. Daily and her group of researchers used skin conductance as a measure of engagement and memory stimulation. Changes in skin conductance, or the measure of sweat secretion from sweat gland, are triggered by arousal. For example, a nervous person produces more sweat than a sleeping or calm individual, resulting in an increase in skin conductance.

Galvactivators, devices that sense and communicate skin conductivity, are often placed on the palms, which have a high density of the eccrine sweat glands.

Applying this knowledge to the field of education, can we give a teacher physiologically-based information on student engagement during class lectures? Dr. Daily initiated Project EngageMe by placing galvactivators like the one in the picture above on the palms of students in a college classroom. Professors were able to use the results chart to reflect on different parts and types of lectures based on the responses from the class as a whole, as well as analyze specific students to better understand the effects of their teaching methods.

Project EngageMe: Screenshot of digital prototype of the reading from the galvactivator of an individual student.

The project ended up causing quite a bit of controversy, however, due to privacy issues as well our understanding of skin conductance. Skin conductance can increase due to a variety of reasons – a student watching a funny video on Facebook might display similar levels of conductance as an attentive student. Thus, the results on the graph are not necessarily correlated with events in the classroom.

Educational Research

Daily’s research blends computational learning with social and emotional learning. Her projects encourage students to develop computational thinking through reflecting on the community with digital storytelling in MIT’s Scratch, learning to use 3D printers and laser cutters, and expressing ideas using robotics and sensors attached to their body.

VENVI, Dr. Daily’s latest research, uses dance to teach basic computational concepts. By allowing users to program a 3D virtual character that follows dance movements, VENVI reinforces important programming concepts such as step sequences, ‘for’ and ‘while’ loops of repeated moves, and functions with conditions for which the character can do the steps created!

 

 

Dr. Daily and her research group observed increased interest from students in pursuing STEM fields as well as a shift in their opinion of computer science. Drawings from Dr. Daily’s Women in STEM camp completed on the first day consisted of computer scientist representations as primarily frazzled males coding in a small office, while those drawn after learning with VENVI included more females and engagement in collaborative activities.

VENVI is a programming software that allows users to program a virtual character to perform a sequence of steps in a 3D virtual environment!

In human-to-human interactions, we are able draw on our experiences to connect and empathize with each other. As robots and virtual machines grow to take increasing roles in our daily lives, it’s time to start designing emotionally intelligent devices that can learn to empathize with us as well.

Post by Anika Radiya-Dixit

Using the Statistics of Disorder to Unravel Real-World Chaos

What do election polls, hospital records, and the Syrian conflict have in common? How can a hospital use a patient’s vital signs to calculate their risk of cardiac arrest in real time?

Duke statistical science professor Rebecca Steorts

Duke statistical science professor Rebecca Steorts

Statistician Rebecca Steorts is developing advanced data analysis methods to answer these questions and other pressing real-world problems. Her research has taken her from computer science to biostatistics and hospital care to human rights.

One major focus of Steorts’ research has been estimating death counts in the Syrian civil war. She is working with her research group at Duke and the Human Rights Data Analysis Group (https://hrdag.org/) on combining databases of death records into a single master list of deaths in the conflict, a task known as record linkage.

“The key problem of record linkage is this: you have this duplicated information, how do you remove it?” explained Steorts. For example, journalists from different organizations might independently record the same death in their databases. Those duplicates have to be removed before an accurate death toll can be determined.

At first glance, this might seem like an easy task. But typographic errors, missing information, and inconsistent record-keeping make hunting for duplicates a complex and time consuming problem; a simple algorithm would require days to sort through all the records. So Steorts and her collaborators designed software to sift through the different databases using powerful machine learning techniques. In 2015, she was named one of MIT Technology Review’s 35 Innovators Under 35 for her work on the Syrian conflict. She credits a number of colleagues and students for their contributions to the project, including Anshumali Shrivastava (Rice University), Megan Price (HRDAG), Brenda Betancourt and Abbas Zaid (Duke University), Jeff Miller (Harvard Biostatistics, formerly Duke University), Hanna Wallach (Microsoft Research), and Giacomo Zanella (University of Bocconi and Visitor of Duke University in 2016).

Steorts’ work towards estimating death counts in the Syrian conflict is still ongoing, but human rights isn’t the only field that she plans to study. “I think of my work as very interdisciplinary,” she said. “For me, it’s all about the applications.”

Recently, Steorts, colleague Ben Goldstein, and students Reuben McCreanor and Angie Shen have been applying statistical methods to medical data from the Duke healthcare system. Her ultimate goal is to find techniques that can be used for many different applications and data sets.

cof

Guest post by Angela Deng, North Carolina School of Science and Math, Class of 2017

Mapping the Brain With Stories

alex-huth_

Dr. Alex Huth. Image courtesy of The Gallant Lab.

On October 15, I attended a presentation on “Using Stories to Understand How The Brain Represents Words,” sponsored by the Franklin Humanities Institute and Neurohumanities Research Group and presented by Dr. Alex Huth. Dr. Huth is a neuroscience postdoc who works in the Gallant Lab at UC Berkeley and was here on behalf of Dr. Jack Gallant.

Dr. Huth started off the lecture by discussing how semantic tasks activate huge swaths of the cortex. The semantic system places importance on stories. The issue was in understanding “how the brain represents words.”

To investigate this, the Gallant Lab designed a natural language experiment. Subjects lay in an fMRI scanner and listened to 72 hours’ worth of ten naturally spoken narratives, or stories. They heard many different words and concepts. Using an imaging technique called GE-EPI fMRI, the researchers were able to record BOLD responses from the whole brain.

Dr. Huth explaining the process of obtaining the new colored models that revealed semantic "maps are consistent across subjects."

Dr. Huth explaining the process of obtaining the new colored models that revealed semantic “maps are consistent across subjects.”

Dr. Huth showed a scan and said, “So looking…at this volume of 3D space, which is what you get from an fMRI scan…is actually not that useful to understanding how things are related across the surface of the cortex.” This limitation led the researchers to improve upon their methods by reconstructing the cortical surface and manipulating it to create a 2D image that reveals what is going on throughout the brain.  This approach would allow them to see where in the brain the relationship between what the subject was hearing and what was happening was occurring.

A model was then created that would require voxel interpretation, which “is hard and lots of work,” said Dr. Huth, “There’s a lot of subjectivity that goes into this.” In order to simplify voxel interpretation, the researchers simplified the dimensional subspace to find the classes of voxels using principal components analysis. This meant that they took data, found the important factors that were similar across the subjects, and interpreted the meaning of the components. To visualize these components, researchers sorted words into twelve different categories.

img_2431

The Four Categories of Words Sorted in an X,Y-like Axis

These categories were then further simplified into four “areas” on what might resemble an x , y axis. On the top right was where violent words were located. The top left held social perceptual words. The lower left held words relating to “social.” The lower right held emotional words. Instead of x , y axis labels, there were PC labels. The words from the study were then colored based on where they appeared in the PC space.

By using this model, the Gallant could identify which patches of the brain were doing different things. Small patches of color showed which “things” the brain was “doing” or “relating.” The researchers found that the complex cortical maps showing semantic information among the subjects was consistent.

These responses were then used to create models that could predict BOLD responses from the semantic content in stories. The result of the study was that the parietal cortex, temporal cortex, and prefrontal cortex represent the semantics of narratives.

meg_shieh_100hedPost by Meg Shieh

Does Digital Healthcare Work?

Wearable technologies like Fitbit have been shown to provide a short-term increase in physical activity, but long-term benefits are still unclear, even if recent studies on corporate wellness programs highlight the potential healthcare savings.

Headshot of Luca Foschini

Luca Foschini, PhD is a co-founder and head of data science at Evidation Health, and a visiting research scientist at UCSB. Source: Network Science IGERT at UCSB.

To figure out the effects of these technologies on our health, we need ways to efficiently mine through the vast amounts of data and feedback that wearable devices constantly record.

As someone who has recently jumped on the Fitbit “band” wagon, I have often wondered about what happens with all the data collected from my wrist day after day, week after week.

Luca Foschini, a co-founder and head of data science at Evidation Health, recently spoke at Duke’s Genomic and Precision Medicine Forum where he explained how his company uses these massive datasets to analyze and predict how digital health interventions — Fitbits and beyond — can result in better health outcomes.

California-based Evidation health uses real-life data collected upon authorization from 500,000-plus users of mobile health applications and devices. This mobile health or “mHealth” data is quickly becoming a focus of intense research interest because of its ability to provide such a wealth of information about an individual’s behavior.

Foschini and Evidation Health have taken the initiative to design and run clinical studies to show the healthcare field that digital technologies can be used for assessing patient health, behavioral habits, and medication adherence, just to name a few.

Foschini said that the benefits of mobile technologies could go far beyond answering questions about daily behavior and lifestyle to formulate predictions about health outcomes. This opens the door for “wearables and apps” to be used in the realm of behavior change intervention and preventative care.

Foschini speaks at Duke’s Genomic and Precision Medicine Forum

Foschini explains how data collected from thousands of individuals wearing digital health trackers was used to find a associations between activity tracking patterns and weight loss.

Evidation Health is not only exploring data based on wearable technologies, but data within all aspects of digital health. For example, an interesting concept to consider is whether devices create an opportunity for faster clinical trials. So-called “virtual recruiting” of participants for clinical studies might use social media, email campaigns and online advertising, rather than traditional ads and fliers. Foschini said a study by his firm found this type of recruitment is up to twelve times faster than normal recruitment methods for clinical trials (Kumar et al 2016). 

While Foschini and others in his field are excited about the possibilities that mHealth provides for the betterment of healthcare, he acknowledges the hurdles standing in the way of this new approach. There is no standardization in how this type of data is gathered, and greater scrutiny is needed to ensure the reliability and accuracy of some of the apps and devices that supply the data.

amanda_cox_100 Post by Amanda Cox

Page 1 of 5

Powered by WordPress & Theme by Anders Norén