Faculty Can Submit Proposals for Data+ 2019 Student Research Projects

Data+ RFP

Deadline: November 5, 2018

Data+ is a ten-week summer research experience for undergraduates interested in exploring data-driven approaches to interdisciplinary challenges.

Students join small teams and work alongside other teams in a communal environment. They learn how to marshal, analyze, and visualize data, while gaining broad exposure to the field of data science. In Summer 2018 there were 24 Data+ teams working together in Gross Hall.

Data+ is offered through the Information Initiative at Duke (iiD) and is part of the Bass Connections Information, Society & Culture theme. The program runs from mid-May until the end of July. During this time, students are required to contribute to the team full-time and may not take classes or have other employment.

Request for Proposals

We invite proposals for faculty-sponsored Data+ projects in Summer 2019. We are especially interested in proposals that involve a partner from outside the academy, or a faculty member from a different discipline. We also encourage proposals that involve previously untested ideas or unanalyzed datasets, and we hope that the Data+ team can make a contribution with important proof-of-principle work that may lead to more substantial faculty work and/or connections in the future. We also welcome proposals that will lead to the undergraduates creating tools that might be used in the classroom or that might facilitate community engagement with data and data-driven questions.

Opportunity to submit a joint proposal for a year-long Bass Connections project and a Summer 2019 Data+ project: Interested faculty may propose a Data+ project connected to a year-long Bass Connections project by completing the Bass Connections RFP (to be released on Sept. 4 and due Nov. 5). Please be prepared to articulate how you will connect the Data+ project with the year-long project. Please note that funding decisions will continue to be made by each program individually, so it is possible that your proposal may be accepted for only Data+ or only Bass Connections. Please contact Laura Howes if you have questions or want to discuss how other faculty have connected these experiences in the past.

Data+ Application Format

To apply, please prepare a document (three pages maximum) that responds to the following prompts, ideally in this order.

Summary: Please write a project summary, including the basic ideas behind the proposal.

Faculty leads: Data+ is especially interested in projects that connect faculty from different disciplines, as well as projects that enable faculty to branch out in new directions. Please describe the intended faculty leads and the expected benefits from their participation.

Mentoring: Day-to-day faculty involvement in Data+ is not expected. Instead, each Data+ project has a mentor, usually a graduate student or postdoc, who is on hand to give the student team more focused guidance. The time commitment tends to be five to seven hours per week, and funding is generally available to cover the mentor’s time.

If you have a mentor in mind, please indicate who this is and why s/he is well suited. If you do not, please describe the skills you would like this person to have (we are generally able to find faculty-mentor matches).

Goals: Describe the intended goals and products of the project, in the following manner:

  • Describe entirely reachable goals that you fully expect the students to achieve: these could be answers to a question, explorations of a hypothesis, or other things of that nature.
  • Describe a tangible product the students will create in the course of their research, which ideally will be of use both to further researchers at the university and to the students as something they can show off to future employers or graduate schools. This could be, for example, a good piece of well-commented software, or a visualization device, or a detailed curation of previously raw data.
  • Describe a more outrageous goal that you would be quite (pleasantly!) surprised to see the students achieve, along with a plan for them to build a potential roadmap toward that goal. For example, this goal might only be reachable if you had data that you currently do not have, and the students might build a speculative roadmap toward acquiring that data

Data: Most Data+ projects involve analysis of datasets. Some of these are publically available, and some are not. As it is essential that students be able to analyze the needed data for the project, we are very interested in plans to ensure that this will happen. Please address this in the following manner:

  • For each dataset that will be analyzed by the student team, please give a high-level description of the dataset (what’s in it, how was it collected and for which purpose, how large is it, etc.).
  • For each dataset, indicate whether you anticipate IRB approval will be needed for student access, and if not, why not. If IRB approval will be needed, indicate whether a protocol already exists and describe your plan for incorporating the student involvement. If it does not already exist, please describe your plan (including a timeline) for obtaining one.
  • For each dataset, indicated whether it is owned and/or is being provided by an outside party. If so, please describe the intended path toward ensuring that students will be granted the ability to access the dataset (we are often able to assist in crafting Data Use Agreements with outside parties, for example).

Outside partners: Some of the best Data+ projects have had a partner from outside the university. This might be someone who is invested in the data or the questions, and to whom the students will in essence deliver analysis and insight. Ideally, this partner will be able to come to Gross Hall two or three times during the summer to hear updates from the students and provide feedback.

For each such partner, please describe their expected interest in the project, how much they would interact with the team, whether or not they’d be able to contribute funds towards student stipends, and also identify a point of contact for this partner.

Deadline and Contact

The deadline for submitting this application is November 5, 2018, 5:00 p.m. Please email your completed application to Ariel Dawn. If you would like help in developing your proposal, please contact Paul Bendich.

Project on Color Vision of Shrimp Helps Biology Students See Data Science in New Light

Patrick Green and Eleanor Caves

We are all data scientists these days, to one degree or another. The ability to explore and analyze data helps us make sense of our world.

Duke’s Data Expeditions program aims to introduce more undergraduates to data science early in their college education. The Information Initiative at Duke (iiD), in partnership with the Social Science Research Institute (SSRI), supports pairs of graduate students to prepare a dataset for use in an existing undergraduate course.

Patrick Green teaching Data ExpeditionsIn one Data Expedition project, Exploring Cleaner Shrimp Color Vision Capabilities Using R, Biology doctoral students Eleanor Caves and Patrick Green teamed up with Professor Sönke Johnsen to pilot their approach in an introductory summer course called Sensory Systems. Green and his advisor Sheila Patek then adapted it for use in an upper-level lab course, Principles of Animal Physiology.

“Especially if classes have a lab component, getting students some experience with importing, analyzing, and plotting data can be invaluable,” said Caves. “I remember struggling with Excel to write my own lab reports in college, and if someone had just given me the tools to code, and then inspired me to use those tools for a couple of reports, I would have been so much more comfortable with different aspects of data analysis.”

“This is a critical tool for students to learn,” Green added, “whether they use data in their future careers or whether they’re just trying to understand the world around them as they, for example, vote and raise families.”

Cleaner shrimp working on a fishCleaner shrimp are crustaceans that provide handy cleaning services to reef fish by removing ectoparasites. The project’s aim was to investigate how cleaner shrimp perceive the color patterns of other cleaner shrimp and fish. Caves collected the data as part of her doctoral dissertation.

In the class, she and Green introduced the ecology of cleaner shrimp, asked the students to make predictions about color vision capability and taught coding sessions in R.

Along the way, both the undergraduates and the instructors faced challenges.

“What makes coding frustrating on an individual level translates into the classroom,” said Caves. “Typos and minor errors that can send coding errors back at you occur on the students’ computers too, and you have to be ready to troubleshoot on your feet.”

Patrick Green working with Data Expeditions students

“Similar to Eleanor, I learned that these activities move more slowly than we might expect,” noted Green. “It was incredibly useful to have ‘teachable moments’ when students hit error messages. Even if these errors were caused by simple misspellings, it allowed us to show students that this is normal and fixable – not an impassible roadblock. Because we coded in real-time along with the students, we were also able to showcase our own mistakes and humanize the process, something I think is useful for students to see.”

The students soon learned how to subset, index, plot, change the color and shape of data points, add best fit lines, change line width and type, and create smooth spectral sensitivity curves (which show how sensitive photoreceptors are across the visible spectrum of light).

Figure from Data ExpeditionsAt the end, they created a figure of spectral sensitivity for several individuals of the same species. They compared their results to their predictions and discussed how they might use their new skills to analyze data they’ll collect in future lab-based courses.

And they seemed to enjoy the process. Caves noted, “I’ve been pleasantly surprised at how attentive the students remain and how engaged they seem the whole time.”

“It never occurred to me that I would need to learn how to code,” wrote one student in an end-of-class reflection, “but I am glad that I get to learn this.” Another student wrote, “It was actually easier than I expected, since coding seems so out of reach when you don’t know what is happening or what the terms mean. I could definitely use R in the future for projects where I am required to use data.”

At the end of the day, coding gives students a deeper understanding of data to solve real-world problems. “It gives students, even those who won’t go on to do research of their own, a respect for the scientific process, how we analyze our data, and where results come from, so that hopefully they can be more informed citizens and interpreters of the overwhelming number of facts they’re exposed to every day,” said Caves.

Eleanor Caves and Patrick Green with their advisors

Both Caves and Green received the Dean’s Award for Excellence in Mentoring from The Graduate School. They graduated this spring and are now postdoctoral researchers in Duke’s Biology Department with the Nowicki Lab.

“I have been surprised to learn during my Ph.D. that I can code, and that I am somewhat good at it,” Green reflected. “This has taken lots of trial and error, but I am motivated to continue learning and developing these skills in my research. Being able to use the same skills in my teaching is something that expands my teaching abilities and, I hope, will improve my ability to reach new generations of students.”

See other Data Expeditions projects and learn about a new program at Duke called Archival Expeditions. Photos at top and bottom courtesy of The Graduate School; other photos courtesy of Eleanor Caves and Patrick Green.

Meet the Energy Data Analytics Ph.D. Student Fellows for 2018-19

Energy Data Analytics Ph.D. Student Fellows

The growth of energy-related data in the last decade has created new opportunities for data-driven exploration of solutions to energy problems. Capitalizing on the opportunities presented by this wealth of data will require scholars with training in both data science and energy application domains. Yet traditional graduate education is limited in its ability to provide such dual expertise.

That’s why the Duke University Energy Initiative established the Energy Data Analytics Ph.D. Student Fellows program, preparing cohorts of next-generation scholars to deftly wield data in pursuit of accessible, affordable, reliable, and clean energy systems. 

Each fellow will conduct a related research project for nine months, working with faculty from multiple disciplines. In addition to funding equivalent to one half of a full fellowship for an academic year, fellows will receive conference travel support and data acquisition support up to $2,000, as well as priority access to virtual machines, storage, and other computational resources. The scholarship of the first two cohorts of fellows will be highlighted at a symposium at Duke University in Spring 2020.

The program, which will support a cohort of four fellows in 2018-2019 and a second cohort of four in 2019-2020, is affiliated with the Energy Data Analytics Lab, a collaborative effort of the Duke University Energy Initiative (which houses it), the Information Initiative at Duke (iiD), and the Social Science Research Institute (SSRI). The fellows program is funded by a grant from the Alfred P. Sloan Foundation. (Note: Conclusions reached or positions taken by researchers or other grantees represent the views of the grantees themselves and not those of the Alfred P. Sloan Foundation or its trustees, officers, or staff.)

Bohao Huang

Bohao Huang is a Ph.D. student in Electrical & Computer Engineering at Duke’s Pratt School of Engineering. He is part of the Applied Machine Learning Lab at Duke. He focuses on the translation of advanced machine learning techniques into practical solutions for challenging real-world problems.

Energy security is vital to the prosperity and sustainability of modern societies. Ensuring energy security relies upon effective decision-making and energy systems modeling, a crucial component of which is access to high quality energy systems information. Unfortunately, however such information is often of limited availability, incomplete, or difficult to access because it is proprietary. Aerial imagery (e.g., color satellite imagery) is increasingly cheap and abundant, and may provide a rich source of energy systems information, but extracting useful information from such imagery is costly. I propose to leverage recent advances in deep learning to develop algorithms that can automatically extract useful energy systems information from large volumes of aerial imagery, potentially yielding a powerful and scalable new source of such information.

Qingran Li

Qingran Li is a Ph.D. student in the University Program in Environmental Policy (economics track) offered jointly by Duke’s Nicholas School of the Environment and Sanford School of Public Policy. Her research includes using analytical tools to understand behavioral responses to policies and developing interdisciplinary solutions to energy and environmental issues.

Residential electricity consumption is an important indicator of household characteristics, but it is often held confidential by utilities and seldom reported by publicly available energy surveys. Missing such information significantly constrains our ability to answer important policy questions. My project targets a big question: How can we estimate residential electricity demand more precisely? Using the smart meter data set from an Irish CER trial project and the national time use survey, this project aims at correcting the estimation bias from behavioral and policy-related factors which are often overlooked in the conventional engineering and statistical models. A new algorithm will be developed to identify residential usage patterns with additional information provided by behavioral surveys so that information lost from inadequate load samplings can be compensated.

Edgar Virguez

Edgar Virguez is a student in the doctoral program in Environment at Duke’s Nicholas School of the Environment. He is interested in contributing to the understanding of market mechanisms that facilitate the integration of variable energy resources. He holds a BS in environmental engineering, a BS in chemical engineering and a MS in environmental engineering. During the last decade, he has worked with several institutions (e.g., Universidad de los Andes, World Bank, Inter-American Development Bank) promoting the adoption of cleaner fuels in transport and industry throughout Latin America.

My project aims to design quantitative tools supporting the process of assessing policy and market approaches, promoting an increased penetration of variable energy resources in the energy matrix. This assessment will be performed based on the economic, reliability, and environmental dimensions of the electric power system, accounting for the benefits of reduced fuel use and emissions, and for the increased capital costs of renewables and the necessary re-dispatching of conventional generators.

Tianyu Wang

Tianyu is a Ph.D. student in Computer Science at Duke. Before coming to Duke, he obtained a B.S. in mathematics and computer science from the Hong Kong University of Science and Technology. His general research interests are in machine learning and applications of machine learning algorithms.

With the accumulation of energy data, energy domain problems are in need of more sophisticated machine learning models such as neural networks. However, building neural networks of high quality requires heavy human effort, since different hyperparameter configurations lead to significantly different performances. My project will use a multi-armed bandit approach to efficiently design the architecture of neural networks for energy domain problems such as energy demand prediction.

Excerpted from Energy Initiative Announces Its Inaugural Cohort of Energy Data Analytics PhD Student Fellows on the Duke University Energy Initiative website

Energy Research Seed Fund Awards Eight Grants to Duke Faculty for 2018-2019

Energy Research Seed Fund

Research projects that explore advances in energy materials, novel perspectives on resilience and sustainability, and energy storage solutions will receive funding in 2018 from the Duke University Energy Initiative’s Energy Research Seed Fund.

The program will award eight grants to projects involving 21 faculty members from four Duke schools, investing a total of $336,956 in promising new energy research.

The Energy Initiative—Duke’s interdisciplinary hub for energy education, research, and engagement—expanded its program this year in response to faculty feedback, offering three distinct grant categories of research funding:

  • Seed grants of up to $45,000, intended to provide a financial head start for new multi-disciplinary, collaborative research teams, enabling them to produce preliminary results that may help them obtain future external funding
  • Stage-two grants of up to $35,000 to carry projects currently supported by Energy Initiative seed funding into their next research phase
  • Proposal development grants of up to $25,000 for past seed fund recipients to develop proposals for external funding

In this—the fifth annual round of funding—the Energy Initiative awarded six seed fund grants and one grant in each of the two new categories. The Initiative also increased the maximum requested amount for seed fund grants by $5,000.

The first three rounds of funding from the Energy Research Seed Fund totaled $667,000. As of fall 2017, those rounds had generated more than five times their value in follow-on awards for Duke research.

“Five years in, this program continues to deliver a remarkable return on investment for Duke University,” notes Energy Initiative director Brian Murray. “And faculty tell us that it’s sparking them to tap into colleagues’ expertise across disciplines. This year we wanted to continue to catalyze these collaborations organically, but we also wanted to invest in next-stage efforts more specifically focused on enhancing external funding potential. This new approach seemed to strike a chord, as the number of proposals were roughly double last year’s number. Narrowing that pool was truly a challenge for our reviewers, but this is a good challenge to have.”

The 2018 round of awards is co-funded by the Energy Initiative, the Office of the Provost, Trinity College of Arts & Sciences, Pratt School of Engineering, and the Information Initiative at Duke (iiD).

Funded Projects in 2018-2019

Seed Grants

So the Dam Doesn’t Break: Understanding Mini-Grid Infrastructure Sustainability in Nepal. Building on previous research on mini-grids, energy transitions, and public infrastructure, this project by Robyn Meeks (Sanford School of Public Policy), Dalia Patiño-Echeverri (Nicholas School of the Environment), Subhrendu Pattanayak (Sanford School of Public Policy), and Erik Wibbels (Political Science, Trinity College of Arts & Sciences) will examine how propositions from engineering, new institutional economics, and public finance could help explain variations in mini-grid success in Nepal. In the process, the team will produce new insights into solving the “infrastructure quality trap” in mini-grids, which experts see as essential to providing universal energy access by 2030.

Investigating the Stability of Promising Earth Abundant-Based Photoelectrochemical Energy Materials. Labs led by Jeffrey Glass (Electrical & Computer Engineering, Pratt School of Engineering), David Mitzi (Mechanical Engineering & Materials Science, Pratt School of Engineering), and Edgard Ngaboyamahina (Electrical & Computer Engineering, Pratt School of Engineering) will explore how a new earth-abundant material developed at Duke can improve the efficiency, cost-effectiveness, and stability of cathodes used to turn water into hydrogen gas that can be stored and used as fuel.

Enabling Better Energy Decisions through Better Interpretable Causal Inference Methods for Personalized Treatment Effects. Causal inference methodology has become an essential tool for determining energy policy and understanding energy usage dependencies. This project by Cynthia Rudin (Computer Science, Trinity College of Arts & Sciences), Sudeepa Roy (Computer Science, Trinity College of Arts & Sciences), and Alexander Volfovsky (Statistical Science, Trinity College of Arts & Sciences) will improve our ability to make policy decisions and understand energy use by making casual inference methods inspired by machine learning more flexible, scalable, and accurate.

Determining the Second-Life Potential of Used Data-Center Batteries. Backup batteries used in U.S. data centers may be a massive untapped resource for energy storage. They are lightly used, well-maintained, and are generally recycled long before their service life has ended. This wide-ranging collaboration from Lincoln Pratson (Nicholas School of the Environment), Josiah Knight (Mechanical Engineering & Materials Science, Pratt School of Engineering), Jim Gaston (Pratt School of Engineering), David Schaad (Civil & Environmental Engineering, Pratt School of Engineering), John Robinson (Nicholas School of the Environment), Dalia Patiño-Echeverri (Nicholas School of the Environment), Martin Brooke (Electrical & Computer Engineering, Pratt School of Engineering), and Casey Collins (Facilities), will test the storage capabilities of these used batteries to determine whether they may be suitable for applications such as storing intermittent renewable energy.

Increasing the Efficiency and Power Density of Redox Flow Batteries with Metal Nanowire Flow-Through Electrodes. Labs led by Benjamin Wiley (Chemistry, Trinity College of Arts & Sciences) and Jeffrey Glass (Electrical & Computer Engineering, Pratt School of Engineering) will improve the power density and reduce the cost of redox flow batteries through a new flow-through electrode from copper nanowires. The electrode will have the same permeability as graphite felt, but will be 2,000 times more conductive and less expensive.

High-Temperature Photocatalytic Reactions on Plasmonic Materials. Teams led by Jie Liu (Chemistry, Trinity College of Arts & Sciences) and Nico Hotz (Mechanical Engineering & Materials Science, Pratt School of Engineering) will develop and study a simultaneously plasmonic and catalytic material that can be used for high-temperature photocatalysis, a critical process in chemical conversion, fuel and electrical energy production, and pollution mitigation.

Stage-Two Grants

Compositional Engineering for High-Performance Perovskite Photovoltaics with Simplified Device Structure. David Mitzi (Mechanical Engineering & Materials Science, Pratt School of Engineering) and Jie Liu (Chemistry, Trinity College of Arts & Sciences) will build on prior research on perovskite solar cells.

Proposal Development Grants

Proposal to develop a 2018 Energy Frontier Research Center at Duke University. Michael Therien (Chemistry, Trinity College of Arts & Sciences) and David Beratan (Chemistry, Trinity College of Arts & Sciences) received funding to support the writing of three Energy Frontier Research Center (Department of Energy) grant proposals.


Have questions about the Energy Research Seed Fund? Contact Jonathon Free at the Energy Initiative.

Want to give in support of innovative energy research at Duke? Give online or contact Sarah Weissberg at Duke Development (sarah.weissberg@duke.edu, (919) 684-3838).

Originally posted on the Duke University Energy Initiative website

On the Way to a Doctorate in Ecology, He Found a Passion for Telling Stories with Data

Matt Ross and Data+ team

As an undergraduate majoring in environmental sciences, Tess Harper didn’t have a strong background in computer science when she began her ten-week Data+ summer project. Neither did her teammate, Molly Rosenstein. Their mission was to develop interactive data applications for use in Environmental Science 101, a Duke course taught by Rebecca Vidra.

Through the mentorship of doctoral students Matt Ross and Aaron Berdanier, Harper and Rosenstein were able to create six web-based apps exploring climate change, sea level rise, biodiversity, solar power, watershed hydrology, and mountaintop mining.

“Getting our first app up and running was really exciting,” Harper said. Even without a coding background, “we’ve been able to hold our own.”

Ross and Rosenstein transferred their knowledge to a new cohort of Data+ students. “After hearing Matt Ross present about his project last year creating interactive Shiny apps,” said undergraduate Kelsey Sumner, “we started an application under the mentorship of Matt and Molly Rosenstein. With their help, my team was able to create an interactive Shiny app within a week.”

I think any environmental scientist–undergraduate, graduate, faculty, or otherwise–can use these powerful tools to tell compelling stories about our work. —Matt Ross

“Matt Ross has been enormously influential within Data+, teaching a whole generation of students about R Shiny apps,” said Robert Calderbank, Director of the Information Initiative at Duke (iiD), which runs Data+ as part of the Bass Connections Information, Society & Culture theme. “When I think of graduate students as undergraduate mentors, I think of Matt.”

Ross has accepted a position as Assistant Professor of Ecosystem Science and Sustainability at Colorado State University. Below, he shares the role Data+ played in his journey as a scholar and mentor.


Matt Ross, Ph.D. in Ecology ’17

I started as a graduate student at Duke in the Fall of 2011, graduating in the summer of 2017 with a degree in Ecology advised by Drs. Martin Doyle and Emily Bernhardt. My dissertation was titled “Linking Topographic, Hydrologic, and Biogeochemical Change in Human Dominated Landscapes.”

I came to Duke to work with Emily and Martin because they are both experts in developing science to better understand how people reshape, restore, and alter ecosystems, specifically rivers and streams. As an undergraduate at CU Boulder, I was always interested in better understanding how these novel ecosystems–that mankind intentionally and unintentionally creates–work, and Emily and Martin provided an excellent environment to work on these ideas.

I eventually ended up studying the impacts of mountaintop mining for coal, working in some of the most altered ecosystems in the world, where hundreds of meters of bedrock are removed to access shallow coal seams, before rebuilding landscapes from the rubble piles that remain once mining operations cease. This work was done with Emily, Dr. Brian McGlynn (another one of my advisers) and Fabian Nippgen (a post-doc mentor).

Sea level trends

The approach we used to better understand these man-made ecosystems was extremely data rich with geospatial data, sensor data, hydrologic, and biogeochemical data. However, when I started graduate school I had zero experience working with data, programming, statistics or really any practical knowledge of analyzing data, especially such complex streams of data.

As I worked through graduate school, I developed a passion for programming fostered by friendships with incredibly talented friends and coding mentors (specifically Kris Voss, Matt Kwit, and Aaron Berdanier). Aaron Berdanier, a previous Ph.D. student in Jim Clark’s lab, told me about a Data+ project he had received funding for and asked if I wanted to join. At the time, I was grappling with how to visualize and share the complex datasets from my Ph.D. and I knew Aaron wanted to teach undergraduates how to use Shiny R (a platform for visualizing and sharing interactive data), so I enthusiastically joined.

Winter loses its cool

Aaron and I ended up spending a significant portion of our summer working with Molly Rosenstein and Tess Harper on a variety of visualizations for an introductory environmental science course taught by Dr. Rebecca Vidra (see the final website).

Molly and Tess are very talented and picked up programming super fast, and it was really inspiring to be around such smart and dedicated people. Aaron was really the lead on the project, but we both learned a lot that summer, and afterwards we both became strong advocates for data visualization using Shiny R. We ended up teaching seminars to the Data Viz lab, Nicholas School, Data+ and others based on our experience teaching Molly and Tess. The central lesson from this experience was that programming and data visualization can be approachable and rapidly incorporated into any research approach.

Ecosystem for data visualization

These positive experiences led me to be a part of another Data+ project in the summer of 2017 with Richard Marinos as the lead. As with the first project, we worked with an incredibly talented group of undergraduates (Annie Lott, Camilla Vargas, and Devri Adams), and produced a website visualizing long-term ecosystem datasets.

All in all, my experience with Data+ made me realize how rapidly students can learn intimidating skills (programming, statistics, data analysis) with significant motivation (summer job) and consistent mentoring. All of our undergraduate students, for both years, had little experience with programming before our projects and within 10 weeks had made complex, interesting, and beautiful data visualizations that tell important environmental stories.

Snake River watershed

I think any environmental scientist–undergraduate, graduate, faculty, or otherwise–can use these powerful tools to tell compelling stories about our work. I took this passion for data and data visualization to my first faculty interview at Colorado State University for a position as a professor of water quality. During my interview I used a Shiny R app to teach students about water quality in Colorado. It was a gratifying experience to see students’ enthusiasm not just for the subject material, but also for how one could produce these kinds of visualizations.

Without Data+ I likely would not have had the confidence to use Shiny R in my teaching interview. Furthermore, I know it was important to the interviewing committee that I had 20 weeks of dedicated mentorship experience with undergraduates. Together these influences of Data+ helped me land the position at CSU where I will start next year. There, I will continue to work on issues around water quality, but also share my passion for data science and visualization.

Learn More

  • Watch a video about Ross’s second Data+ team.

Photo at top: Matt Ross (bottom right) with team members from the Data+ project Interactive Environmental Data Applications

Graduate Students Can Submit Proposals for Data Expeditions in Undergraduate Courses

Data Expeditions

Deadline: June 1, 2018

The purpose of this call is to introduce more undergraduate students to exploratory data analysis early in their Duke experience, and to involve graduate students in thinking about the way classes can interact with data. Our hope is that expeditions will encourage students to be more adventurous in exploring the Duke curriculum, and that students with deeper skills will be capable of deeper insights.

What Are Expeditions?

The Information Initiative at Duke (iiD), in partnership with the Social Science Research Institute (SSRI), will support pairs of graduate students to prepare a data set for use in an undergraduate class and then assist the faculty instructor by supervising the data expedition within the class. Another useful approach is to prepare several data sets for use in illustrating the ideas behind a particular data analysis technique.

Graduate students who participate receive a (tax free) grant of $1,500 for academic-related travel (such as conferences or workshops) or computers/technical equipment for research. The funds are not available until the course is complete and all materials have been submitted to Ariel Dawn (ariel.dawn@duke.edu) in the Information Initiative. These materials add to our undergraduate curriculum through expeditions, and we reciprocate by investing in intellectual development.

How Are Expeditions Organized and Funded?

iiD provides resources, SSRI stores the data sets for later use, and representatives from affiliated departments provide direction. Departments interested in participating are encouraged to contact the Data Expeditions Director Paul Bendich (bendich@math.duke.edu).

Application Process

Applications will be reviewed by a faculty committee; those received by June 1, 2018, will receive full consideration, and funding decisions will be made by July 1, 2018.

Graduate students are encouraged to contact Paul Bendich (bendich@math.duke.edu) for help in developing ideas.

We particularly encourage exploration of data sets that bring different intellectual communities together. We place a special emphasis on expeditions that can be used in the introductory undergraduate classroom, as well as those that can be easily adapted for use in multiple pedagogical scenarios.

Application Details

Email Ariel Dawn (ariel.dawn@duke.edu) a PDF, at most 2 pages, with the following information:

  • Sponsoring faculty member and target undergraduate class
  • Title of data set(s)
  • Description: A brief data description that includes (at least) the following information:
    • One two-sentence description of data file
    • Source(s) where the data come from (we greatly prefer data that can be made public without restrictions!)
    • Why the data were collected in the first place
    • How the data set was put together
    • Dimensions of the data set
  • Potential classroom exercises: List of potential questions that can be explored using this data set, and description of pathways toward answers the students can take
  • Techniques: List of computational techniques – this is an opportunity to ask for access to a virtual machine that comes pre-loaded with different software packages
  • Source(s): Properly formatted citation of data source(s)

Expeditions recommended for an award will be asked to provide a Markdown or HTML document that contains, in addition to the information listed above:

  • List of variables: A list of the variable names and brief description for each (with hyperlinks to Codebook below)
  • Codebook: Description of each variable and its values

Learn More

See the iiD website.

Information Initiative at Duke Invites Students to Apply for Data+ Summer Program

Data+

Deadline: February 24, 2018

Student applications are now open for this summer’s Data+ research program, which runs from May 29 through August 3. Applications are reviewed on a rolling basis through February 24.

Undergraduates and master’s students interested in exploring new data-driven approaches to interdisciplinary challenges can apply to join small teams and learn how to marshal, analyze and visualize data, while gaining broad exposure to the field of data science.

Each student receives a $5,000 stipend for this full-time research experience. Participants are not allowed to enroll in classes or accept employment during Data+. The program is open to students at all levels and from all majors.

Each team has two to three undergraduates (and occasionally one master’s student) and one to two doctoral student (or postdoc) mentors, in addition to a client or sponsor. Teams work alongside each other in a communal environment, learning from each other.

Data+ is offered through the Information Initiative at Duke and is part of the Bass Connections Information, Society & Culture theme.

Data+ Projects for Summer 2018

Duke Wireless Data

Data-driven Improvement of Datacenter Performance

Smartphones and the Sixth Vital Sign

Data and Technology for Fact-checking

Gerrymandering and the Extent of Democracy in America

Energy Infrastructure Map of the World

Social Determinants of Health

Vaccine Hesitancy and Uptake

Improving the Machine Learning Pipeline at Duke

Rare Metabolic Diseases

Complex Decisions, Real Numbers: Medical Decision-making

Women’s Spaces

Visualizing the Lives of Orphaned and Separated Children

How Do We Build and Grow a PTA?

Pirating Texts

Big Data for Reproductive Health

Deep Learning for Single Cell Analysis

Poverty in Writing and Images

Analytical Exploration for Duke Development

Mental Health Interventions by the Durham Police

Data+ Project Fair

All are welcome to attend the Data+ Project Fair on Tuesday, January 16, 3:00 to 5:00 p.m., in the Ahmadieh Atrium on the third floor of Gross Hall.

Faculty Can Propose Interdisciplinary Data+ Projects for Summer 2018

Data+

Deadline: November 1, 2017

Data+ is offered through the Information Initiative at Duke and is part of the Bass Connections Information, Society & Culture theme.

Overview

Data+ is a ten-week summer research experience that welcomes Duke undergraduates interested in exploring new data-driven approaches to interdisciplinary challenges.

Students join small project teams, working alongside other teams in a communal environment. They learn how to marshal, analyze and visualize data, while gaining broad exposure to the modern world of data science. In Summer 2017 there were 25 such teams, and they all worked together in Gross Hall, sitting in dedicated workspace provided by the Information Initiative at Duke (iiD), the Social Science Research Institute (SSRI), the Energy Initiative and the Foundry. Each undergraduate participant receives a $5,000 stipend. Each summer Data+ runs from mid-May until the end of July. As the communal atmosphere is essential for student success, please note that Data+ projects only run during these ten weeks, and that all student participants are required to contribute full-time efforts (no employment, no other classes).

This is a call-for-proposals for faculty-sponsored Data+ projects in the Summer 2018 edition of Data+. We are especially interested in proposals that involve a partner from outside the academy, or a faculty member from a different discipline. We also encourage proposals that involve previously untested ideas or unanalyzed datasets, and we hope that the Data+ team can make a contribution with important proof-of-principle work that may lead to more substantial faculty work and/or connections in the future. We also welcome proposals that will lead to the undergraduates creating tools that might be used in the classroom or that might facilitate community engagement with data and data-driven questions.

Proposals should be emailed to Kathy Peterson by November 1, 2017.

If you would like help in developing your proposal, please contact Paul Bendich.

Proposal Specifics

Please limit your proposal to three pages. Every proposal will be different, but here are some issues that would be good to address in separate sections. It’s also probably best to begin with a short background description for general context.

Project Goals

Give a description of the goals for the summer project. If possible, this should come in three parts:

  • An entirely reachable goal that you fully expect the students to achieve: this could be an answer to a question, an exploration of a hypothesis, things of that nature. It would be best to give time-based specifics here: for example, by week 3 they should have learned X and produced Y, by week 6 they should extend this to Z, and so forth.
  • A tangible product the students will create in the course of their research, which ideally will be of use both to further researchers at the university and to the students as something they can show off to future employers or graduate schools. This could be, for example, a good piece of well-commented software, a visualization device or a detailed curation of previously raw data.
  • A more outrageous goal that you would be quite (pleasantly!) surprised to see the students achieve, along with a plan for them to build a potential roadmap toward that goal. For example, this goal might only be reachable if you had data that you currently do not have, and the students might build a speculative roadmap toward acquiring that data (write a mock grant, a mock IRB application and so forth).

Dataset Description

Please give a brief description of the dataset(s) you wish to have the undergraduate students work with. You should make sure to cover the following points:

  • What the data are about: A high-level description
  • What’s actually in the dataset: A lower-level description
  • Access issues: What is the plan to ensure that the students will be able to access the data before the beginning of Data+?
  • Privacy/ownership issues: Are there data sensitivity issues at play? Is an IRB protocol needed, and if so, what is the plan to obtain one? If the dataset is owned by an outside party, how will a data use agreement (DUA) be negotiated?

Partner Description

Some of the best Data+ projects have a partner from outside of the university, or at least from outside the traditionally academic parts of the university. This might be someone who is invested in the data or the questions, and to whom the students will in essence “deliver a product.” Ideally, this partner will be able to come to campus two or three times during the summer to hear updates from your students and provide feedback. If this paradigm makes sense in your project, please give:

  • A short description of the partner
  • A short description of their interest in the problem
  • An estimate about how much funding the partner might be able to contribute
  • A plan for stude­­­nt-partner engagement
    • Name and title of main point-of-contact
    • How often and in what context they might meet with the students

Day-to-day Mentoring

Day-to-day faculty involvement in Data+ is not expected. Instead, each Data+ project has a mentor, usually a graduate student or postdoc, who is on hand to give the student team more focused guidance. The time commitment tends to be 5-7 hours per week, and funding is generally available to cover this person’s time.

  • If you would like to involve a student or postdoc from your own group, please give this person’s name and contact information.
  • If you would like us to provide a mentor, please list skills you would like this person to have.

Software Needs

Please describe what types of software the students will need to complete the summer project. Bear in mind that much of their work will take place on their laptops, but that there are of course many remote login options.

Skills Needed

In order to help our recruiting efforts, please list skills that students will need to make reasonable progress on this project. You may want to divide these up into essential and desirable. Bear in mind that you have 2-3 students working together in your group, and your group will itself be in a working environment with around 15 other groups, and so they will be motivated and able to learn skills from each other.