Skip to content

Pros and Cons of Big Data

By: Rachel Yang

One thing that I’ve really appreciated about my research experience is the large amount of independence I have in shaping my project. Having been allowed to formulate my own research question has really taught me how to handle an overload of data in machine learning problems and what questions are the most important to ask when working with pattern recognition.

Almost all the problems I face are these “big data” problems, literally just problems I encounter from having to manipulate so much data. I work with 40 electrocardiogram records that span as long as 24 hours. Dividing the signals based on annotations from cardiologists and processing heart rate variability for each individual set can take hours, so a single error in a MATLAB code (such as dividing by the incorrect sampling frequency) pretty much makes all the analysis garbage. This also means if I want to play around with different parameters and see how it affects the accuracy of the machine learning algorithm, I have to spend days just compiling the different measurements before I can even begin comparing the models.

The best part is obviously just seeing the algorithm work. I’ve been getting around 90% accuracy in detection (which was a pretty big surprise to me considering I’m using pretty elementary machine learning algorithms). Although 90% definitely isn’t high enough for practical use, I think it does show that there is a detectable difference between the normal rhythm of people with atrial fibrillation versus the normal rhythm of people without atrial fibrillation, and it’s just a matter of finding the right parameters to use and implementing the most robust machine learning algorithm to distinguish this difference. Though I’m complaining about the problems of big data, one of the next steps to my project would be getting thousands of times more data from actual patients at Duke Hospital to test the reliability of the algorithm with “messier” electrocardiograms.

Leave a Reply

Your email address will not be published. Required fields are marked *