+Data Science (+DS) is a Duke-wide program, operating in partnership with departments, schools, and institutes to enable faculty, students, and staff to employ data science at a level tailored to their needs, level of expertise, and interests. For more information, please visit our website at https://plus.datascience.duke.edu.
Upcoming Virtual Learning Experiences
Web Scraping
Thursday, March 4 | 12:00-1:00 PM
Chris Bail
In this seminar, we will learn how to collect data using Application Programming Interfaces. This lecture will introduce key concepts such as credentialing and rate limiting, and provide an example of how to collect data from Twitter. The seminar will discuss the strengths and weaknesses of using APIs for data scraping, and briefly discuss alternative web-scraping techniques as well. This class will assume basic working knowledge of R. Register at https://training.oit.duke.edu/enroll/common/show/21/175404
An Introduction to Text Analysis
Wednesday, March 10 | 12:00-1:00 PM
Chris Bail
This seminar will provide an introduction to text analysis. Text-based data abounds on social media platforms, digital archives, and elsewhere, but it poses numerous challenges for modeling because it is highly unstructured. We will discuss basic concepts in text analysis (e.g. tokenization, n-grams, and creating a document-term matrix). The class will briefly introduce dictionary-based methods of text analysis and conclude by preparing students for more advanced topics in text analysis such as Latent Dirichlet Allocation and Word2Vec. Register at https://training.oit.duke.edu/enroll/common/show/21/175405
Recommendation Systems and the Surveillance Economy
Tuesday, March 23 | 12:00-1:00 PM
Sarah Rispin Sedlak and Akshay Bareja
Recommendation systems—such as the algorithms powering Netflix, suggesting jobs to apply for, and curating Facebook feeds—are powerful tools that can help users navigate an overwhelming array of choices. However, these systems can have negative side effects if left unchecked. In this vLE, we will introduce you to a few popular recommendation-system algorithms, how they work, and discuss how their use may promote homogenization and polarization among target audiences. These effects, while lucrative to service providers, can have negative social consequences. Register at https://training.oit.duke.edu/enroll/common/show/21/175400
Generating Computational Three Dimensional (3D) Geographies
Wednesday, March 24 | 4:30-5:30 PM
Augustus Wendell
This workshop presents an ongoing body of work in computational 3D modeling. The topic revolves around the simulation and rendering of spatial geographic forms, in particular landmasses, landforms and large scale geologic features (mountain ranges). The body of work exists both as computational data sets and large scale visual art works. Affordances in computational methods involving the generation of specific 3D formal language and the processing of ultra high resolution imagery will be demonstrated and discussed. Register at https://training.oit.duke.edu/enroll/common/show/21/175401
Applying Deep Learning to Biological Sequence Data (A two-part basic sciences session)
Wednesday, April 7 & Thursday, April 8 | 4:30-5:30 PM
Akshay Bareja
Recurrent neural networks (RNNs) are a class of neural networks that can process sequential data, such as text. RNNs have been successfully applied to many natural language processing tasks, including text generation, classification, and translation. In this two-part vLE, we will first introduce you to RNNs and their specific application to biological sequence data. In the second part, we will demonstrate how to build an RNN using PyTorch that can predict protein function based on amino acid sequence data.
Part 1: What is a recurrent neural network? Register at https://training.oit.duke.edu/enroll/common/show/21/175402
Part 2: Implementing an RNN to predict protein function from sequence. Register at https://training.oit.duke.edu/enroll/common/show/21/175403
Note: If you would like a basic introduction to neural networks, please go through the Week 1 material of the course "Introduction to Machine Learning" on Coursera. https://www.coursera.org/learn/machine-learning-duke