Press "Enter" to skip to content

Upcoming +DS opportunities

+Data Science (+DS) is a Duke-wide program, operating in partnership with departments, schools, and institutes to enable faculty, students, and staff to employ data science at a level tailored to their needs, level of expertise, and interests. For more information, please visit our website at

bespokeDS: Effective Data Visualization

Monday, September 28 | 2:00-3:00 PM

Matthew Hirschey and Cédric Scherer

Attendance for this September 28 session will fulfill the RCR-200 (Responsible Conduct of Research) requirement for Duke faculty and staff.

Data visualization is part art and part science. A data visualization has to accurately convey the data, but also should be aesthetically pleasing. Great visual presentations of data will enhance the message and lead to deeper understanding of the underlying data. In this session, Matthew Hirschey will be speaking with Data Visualization expert Cédric Scherer about his journey into data visualization, discuss important principles in making figures, and review a recent example of converting a poor visual into a great one. We will use R, ggplot2, and principles of graphic design to dive into beautiful and truthful visualizations of data. This session is co-hosted by bespokeDS and Duke+DataScience. Register at

COVID-19 and the Telehealth Transformation: Insights into MyChart using Natural Language Processing

Tuesday, September 29 | 12:00-1:00 PM

Jedrek Wosik and Shijing Si

This session is open to all, and we especially encourage clinicians and members of the health system to join us.

COVID-19 has led to the rapid adoption of telehealth strategies in order to maintain continuity of care. As compared to in-person visits, important changes in patient characteristics were seen in telephone and video visits as well as clinician ordering patterns. In addition, MyChart patient portal usage increased dramatically. We present select initial Duke clinic utilization data before and during COVID-19. To better understand the increasing number of unstructured MyChart messages, we apply both unsupervised and supervised machine learning tools to patient-generated messages. Specifically, 1) we utilize dynamic topic modeling to gain insight into message meaning and monthly trends for patients with (+) and (-) COVID and Flu results; 2) we leverage the state-of-the-art machine learning model (Bidirectional Encoder Representations from Transformers or BERT) to construct an automatic message triaging algorithm or classifier that outperforms other baseline methods. Register at:

Analysis of CT Scan Imaging Data with Machine Learning: Classification, Detection, and Segmentation of Abnormalities

Tuesday, October 6 | 4:30-6:30 PM

Rachel Draelos

Medical image analysis with machine learning holds immense promise for accelerating the radiology workflow and benefiting patient care. Computed tomography (CT) is a medical imaging technique that produces a high-resolution volumetric image of the internal organs. CT scans can be used to diagnose and monitor a wide range of conditions including cancer, fractures, and infections. However, interpreting a chest CT scan requires over 12 years of postsecondary education and painstaking manual inspection of hundreds of 2D slices. There is thus significant interest in developing machine learning models that can automatically interpret chest CT images. In this session, a variety of machine learning models for automated chest CT interpretation are introduced, including slice and volume-based convolutional neural networks and approaches for classifying, detecting, and segmenting abnormal findings. Register at

Natural Language Processing with LSTM Recurrent Neural Networks

Wednesday, October 7 | 4:30-6:30 PM

Lawrence Carin

Natural language processing (NLP) is a field focused on developing automated methods for analyzing text, and also for computer-driven text generation (synthesis, for example in translation and text summarization). Recurrent neural networks have recently become a state-of-the-art method for NLP, with the long short-time memory (LSTM) network representing the primary methodology of this type. In this session LSTM NLP models will be introduced, with as little math as possible and with an emphasis on intuition. The concept of word embeddings will be introduced within the context of implementing LSTMs, and it will be explained how such models are used in practice for analysis and generation of natural language (e.g., language translation). Register at

Deep Learning and Smart Microscopy

Wednesday, October 14 | 4:30-6:30 PM

Roarke Horstmeyer

Deep learning algorithms offer a powerful means to automatically analyze the content of biomedical images. However, many biological samples of interest are difficult to resolve with a standard optical microscope. Either they are too large to fit within the microscope’s field-of- view, or are too thick, or are quickly moving around. This session will summarize how deep learning algorithms are useful in microscopy, and discuss strategies to use deep learning algorithms to design new ways to capture microscopic images. This leads to a new class of "smart microscopy" that can adapt to capture the best data possible for automatic image analysis. Register at

Deep Learning from the Perspective of the Experimental Biologist

A four-part series: October 15, 22, 29, and November 5 | all sessions 4:30-6:00 PM

Akshay Bareja

Deep learning has emerged as a powerful approach to address complex problems in various fields, including biology. In this four-part series of vLEs, we will describe the theory and application of two deep learning models – the multiplayer perceptron and the convolutional neural network. As an example of their use in biology, we will demonstrate how to use these models to perform automated classification of blood cell images. This series is targeted at experimental biologists who are unfamiliar with but are interested in applying deep learning to their own research. This is a "zero-entry" series that assumes no prior familiarity with deep learning or programming, and is open to everyone.

Squash, Gardener and the Future of AI in Political Fact-checking

Tuesday, October 20 | 4:30-6:00 PM

Bill Adair and Jun Yang

Professors Jun Yang (computer science) and Bill Adair (journalism and public policy) will discuss their work in automated fact-checking. For the past five years, Jun and Bill have worked together in a variety of Bass Connections, NSF and foundation-supported projects that have broken new ground in instant fact-checking of political speeches and debates. They’ll discuss their progress and the challenges they’ve had. Register at:

Recent related news:

· We still need humans:

· Humans and the bot:

· MediaReview, a new tagging system for fake videos and images, being developed with Facebook and Google:

Generative Adversarial Networks

Tuesday, October 27 | 4:30-6:00 PM

Ricardo Henao

Generative adversarial networks (GANs) are a new tool in machine learning, that leverage advances in deep neural networks. Using GANs, one can develop a computer model that is capable of synthesizing highly realistic images, such as human faces and interesting art. There are many applications of GANs, both good and bad (e.g., “fake news”). GANs are one of the most important recent developments in machine learning and artificial intelligence (AI), and in this session participants will be introduced to this modeling framework. It will be explained why and how these models work, and why this framework is fundamentally different from prior methods of learning AI models. Applications of GANs will also be discussed. Register at

How to Build Successful Machine Learning Products (That Solve Real Problems and Make Money!)

Wednesday, November 4 | 4:30-6:00 PM

Jon Reifschneider

Once considered to be niche technologies limited to the domain of academic research and the few large global tech companies, today machine learning and AI are finding innovative application in almost every industry by companies of every size. While there have been remarkable success stories of machine learning products which have disrupted entire industries, there have also been a large number of attempts to build machine learning based solutions that have failed miserably. Analytics products are unique relative to traditional hardware or software products in the high level of technical uncertainty and risk often involved. What makes an analytics product successful goes beyond the quality of the algorithm developed; it also includes the understanding of the customer’s needs, the availability of data, and the team’s discipline in following the data science process. Join our host for this session Jon Reifschneider as he shares lessons learned from building predictive analytics products now in use by most of the major US electric utilities and airlines around the world on the importance of data, process, talent and customer engagement alongside technical approach. Register at