Abstract
This project designs and implements an intelligent “robot goalkeeper” system capable of accurately intercepting high-speed ping-pong balls. The system employs a vertically installed 3D-printed mechanical structure to simulate the role of a goalkeeper, and a camera is positioned directly behind the “goal” to capture and the center trajectory of the ping-pong ball in real time using computer vision technology. An Arduino control module drives the goalkeeper mechanism, precisely adjusting its position based on visual feedback to complete the interception. This system integrates computer vision and mechanical control, simulating the response mechanism of a human goalkeeper, and demonstrates the robot’s perception and real-time motion capabilities in dynamic environments.
Introduction
Object detection and tracking are core research areas in the fields of computer vision and artificial intelligence, widely applied in autonomous driving, drone navigation, intelligent surveillance, and robotic interaction. However, performing real-time and accurate detection and tracking of high-speed moving objects in dynamic environments remains a significant challenge. For example, in autonomous driving, vehicles need to promptly detect and avoid suddenly appearing pedestrians, vehicles, or other obstacles while traveling at high speeds to ensure the safety of passengers and pedestrians. In air defense systems, intercepting high-speed missiles or drones requires extremely high detection precision and response speed.
Factors such as complex lighting conditions, background interference, target occlusion, and nonlinear motion can all affect system performance. In drone navigation and intelligent surveillance, changes in illumination and complex backgrounds may lead to target detection failures, impacting the effectiveness of navigation and monitoring. The nonlinear motion of targets increases the difficulty of trajectory prediction, especially when the target moves relatively fast and changes direction frequently. Additionally, achieving efficient computation and response in resource-constrained embedded systems is a daunting task. For instance, embedded security systems need to perform real-time detection and response to high-speed targets under limited computational resources and power consumption.
In response to these real-world scientific problems, our project aims to simulate and study the challenges of dynamic target detection and interception by developing an “intelligent robot goalkeeper” system. By simulating and addressing these issues, our project not only verifies the effectiveness of relevant algorithms and systems but also provides practical foundations for tackling complex challenges in real-world scenarios.
Technical Requirements
Development and Optimization of Real-Time Object Detection Algorithms: How to improve the detection accuracy and speed of high-speed targets under complex backgrounds and lighting conditions.
Prediction Models for Nonlinear Motion Trajectories: How to establish accurate trajectory prediction models for targets that move rapidly and have unpredictable motion paths.
Real-Time Coordination of Vision and Mechanical Systems: How to effectively combine visual perception with mechanical control to achieve rapid system response and precise actions.
Computational Optimization in Embedded Systems: How to achieve efficient algorithm execution on resource-constrained hardware like Arduino to meet real-time requirements.
Design Concept and Workflow
The core idea of this project is to develop a robotic system capable of simulating the reactions of a human goalkeeper by intercepting moving ping-pong balls. This design is based on a modified 3D-printed drawing device, which is installed vertically to serve as a mechanical interception mechanism. This mechanical structure acts as the robot’s physical execution component, responsible for the actual interception actions.
To achieve precise real-time tracking of the ping-pong balls, cameras are installed within the goal area to ensure comprehensive capture of the balls’ trajectories. The computer vision module utilizes well-trained algorithms to detect and track the center coordinates of the ping-pong balls in real time. After processing, this visual data is used to predict the balls’ movement paths and the time they will reach specific positions.
The system’s control architecture centers around an Arduino microcontroller, serving as a bridge between the computer vision and mechanical execution components. The Arduino receives real-time data from the vision system, calculates the required movement distance and direction, and then drives the motors to quickly adjust the mechanical device to the predetermined interception position. This design ensures that the system can respond to high-speed moving targets in an extremely short time.
In terms of mechanical design, the structure and material selection of the device are based on 3D printing, requiring sufficient strength and flexibility to withstand the repeated movement and impact of ping-pong balls. The motion mechanism employs high-precision motors and transmission devices to achieve fast and smooth movements.
The entire system embodies a high degree of integration of visual perception, information processing, and mechanical control. Through real-time feedback and control loops, the robot can simulate the rapid reactions and decision-making processes of a human goalkeeper. This design not only addresses the challenge of achieving efficient computation on resource-constrained hardware but also explores technical approaches for robotic interaction with high-speed targets in dynamic environments.
The design concept of this project emphasizes the integration of multidisciplinary technologies, including computer vision, embedded systems, mechanical design, and control theory. By building such a comprehensive experimental platform, we can deeply study the detection, prediction, and interception of high-speed targets, providing valuable references for the application of robotic technology in complex dynamic environments.

Circuit Connection

Design Alternatives
(a) Set up a moveable panel with a camera fixed on it. The camera tracks the ball’s movement, keeping the ball and the camera on the same horizontal line. As the ball moves, the system adjusts to block the ball.
(b) Put the camera on the goalpost crossbar. A single camera is enough to start rapidly taking photos from a certain distance away from the goal. Using the captured trajectory data points, predict the coordinates of the ball on the goal line through regression methods in Spiking Neural Network (SNN) or other machine learning algorithms, and adjust the panel accordingly to block the ball.
Educational Level
Beginner
- Focus: Basic hardware setup and simple visual recognition.
- Main Tasks:
- Use Arduino to control simple motor movements.
- Learn to acquire camera image data and perform basic image processing, such as binarization and edge detection.
- Understand the concept of Gaussian blur and apply it to a small dataset.
- Goal: Build a basic hardware system to achieve simple detection of the ping-pong ball (e.g., position recognition) without requiring real-time high-precision tracking.
Intermediate
- Focus: Integration of hardware control and real-time visual tracking.
- Main Tasks:
- Learn and implement computer vision algorithms, such as Gaussian blur and motion blur.
- Use Python and OpenCV to perform real-time object detection and trajectory prediction.
- Control motors via Arduino to enable mechanical systems to move in response to visual data.
- Optimize camera data transmission and frame rate to ensure low-latency performance.
- Goal: Achieve real-time detection of the ping-pong ball with basic trajectory prediction and enable the hardware system to intercept based on visual input.
Expert
- Focus: Deep multidisciplinary integration and system optimization.
- Main Tasks:
- Develop and train a high-performance object detection model (e.g., using YOLO or CNN) to improve recognition accuracy.
- Implement advanced trajectory prediction algorithms capable of handling nonlinear, high-speed motion.
- Optimize the system’s real-time performance, including image processing speed, hardware response time, and computational efficiency.
- Integrate embedded systems (e.g., Jetson Nano or other high-performance single-board computers) to deploy deep learning models for offline inference in real-time applications.
- Address issues such as motion blur, frame drops, and system robustness at high frame rates.
- Goal: Build a high-accuracy, real-time “robot goalkeeper” system capable of efficiently identifying and intercepting high-speed ping-pong balls in dynamic scenarios, achieving near-industrial-level performance.
Figures and Videos
Mechanical System


Computer Vision System

Result Demonstration
About Our Team
Following parts were done by Zhe
Part 1: Project Beginning
At the start of the project, I made significant contributions to shaping the direction and design. First, after we determined the goal of building a goalkeeper robot, I proposed using a Cartesian Robot, an idea that was ultimately adopted due to its suitability for precise, planar movements in line with our project’s needs. Next, I focused on selecting the computer vision algorithm for object detection. Following a thorough review of relevant literature and a consideration of the project’s demand for fast recognition, I recommended YOLO (You Only Look Once) for its superior speed and real-time detection capabilities. This proposal was accepted and became the foundational algorithm for our vision system. Additionally, I contributed to the robot’s specific design by proposing the idea of mounting the camera on the blocking mechanism, similar to the goalkeeper’s position. This design not only provided a stable and effective perspective for tracking the ball but also simplified computational costs, making it a more efficient solution.
Part 2: Progress Leading to Midterm
In collaboration with Ruoyao Wang, I worked on running the pre-trained YOLO model and analyzing its initial performance. We identified critical challenges, such as the model’s difficulty detecting fast-moving objects due to motion blur and defocus, problems arising because the dataset primarily focused on static objects. To address this, I proposed augmenting the static dataset by artificially adding motion blur and defocus effects through coding, enhancing the dataset’s ability to reflect realistic scenarios.
I also assumed full responsibility for the robot’s control and motion system, with support from Ruoyao Wang as needed. This involved testing multiple stepper motor drivers available in the lab, including DM556T, A4988, and L298N. Unfortunately, none of these were compatible with our robot. I ultimately identified and integrated the Pololu Tic T500 Stepper Motor Controller, using the I2C master-slave communication mode. Working with Ruoyao Wang, we confirmed this driver’s compatibility with our system through extensive testing.
Part 3: Final Stages and Current Progress
To further improve performance, I focused on mastering the I2C communication protocol and developing the necessary control code for the robot’s transmission structure. I conducted extensive testing and refined the code to ensure the robot functioned as intended, while optimizing PID controller parameters to improve movement accuracy and responsiveness. This testing phase allowed the robot to operate more smoothly and efficiently in response to fast-moving targets, culminating in a refined control system that was ready for real-time integration with the computer vision model and overall robotic system.

Email: rw332@duke.edu
Master of Science student in Mechanical Engineering and Material Science
LinkedIn: www.linkedin.com/in/ruoyao-wang-601239328
Computer Vision part was done by Ruoyao Wang
Part 1: Project Beginning
At the onset of the project, the initial focus was on dataset preparation and model selection. The project required accurate detection of a fast-moving ping pong ball, so gathering high-quality images was essential. Feng Zhe studied the model that can be used for fast moving object detection, and provided the idea of apply YOLO algorithm in computer vision for this project.
After discussion, YOLO was selected as the model for its real-time object detection capabilities, making it a suitable choice for tracking quick-moving objects. This selection process considered alternatives such SSD and Retina Net, but YOLO’s efficiency and speed made it ideal for the project’s demands. The collected datasets included various angles, and lighting conditions to ensure robust training for the model. All images were labeled with bounding boxes around the ball, formatted according to YOLO’s requirements, and organized in a structured dataset path to maintain consistency.
Part 2: Progress Leading to Midterm
With a structured dataset and YOLO as the model of choice, the project progressed to training and testing. The model was fed the dataset, and adjustments on training parameters were made to optimize detection accuracy while avoiding overfitting. Performance metrics, such as the loss function, were closely monitored, and parameters were fine-tuned to ensure the model’s ability to accurately detect the ping pong ball.
Several tests followed, where the model’s detection performance was assessed using unseen images. This process was iterative, with rounds of training, testing, and refinement leading to steady improvement in the model’s accuracy. By the midterm presentation, the YOLO model was effectively trained to detect the ball in most scenarios with a high degree of precision. Although the algorithm achieved significant progress at this stage, it still lost track of the ball when it was moving very fast. The video below demonstrates the performance of the first-generation model, which can only detect a larger ball at slower speeds and has a probability of losing the target during movement.
After filtering and augmenting the dataset, the model gained the ability to detect fast-moving ping pong balls, though it still occasionally loses the target. The video below demonstrates the performance of the improved model.

Part 3: Final Stages
To minimize the possibility of lost tracking, we addressed the issue of insufficient camera frame rate blur and defocus. All hardware-related tests were conducted with the assistance of Feng Zhe. The dataset was processed by adding a defocus function and Gaussian blur with help of Jinze Lyu to simulate the unclear images captured by cameras in reality. After dataset augmentation and successfully training and validating the model, the next stage involved deploying it in the goalkeeper robot’s real-time environment.
The model was integrated with the robot’s control system, where a live camera feed enabled YOLO to detect the ball in real time. Extensive tests were conducted to minimize detection latency, ensuring the robot could respond quickly. Additional evaluations confirmed the system’s robustness, including its performance under challenging lighting conditions and extreme ball speeds. To further enhance accuracy, transfer training was applied, and a confidence decision threshold was set for tracking the fast-moving ball. Additionally, the mathematical relationship between the machine vision system’s coordinates and motor control steps was calculated, refining the system setup for accurate and consistent real-time ball detection in dynamic conditions. The video below demonstrates the performance of the final model. It can be seen the model is capable of tracking the fast moving ping-pong ball.


YuFan Hu
Email: yh439@duke.edu
Master of Science Student in Mechanical Engineering and Material Science
Following parts are mainly done by YuFan Hu.
Part1 Project Beginning
At the beginning, we discussed how to realize the goalkeeper robot. We considered an anthropomorphic goalkeeper, and finally decided to use a Cartesian plane mobile robot similar to the movement principle of a drawing robot. I found a similar DIY drawing robot on Amazon as the basic framework of our project.
After discussing the general mechanical structure, we discussed that a PID-based control model can be used. Because PID is used as the basis of the control model, the control system can have higher real-time response performance, robustness and adaptive adjustment capabilities, so that the end of the actuator can follow the trajectory of the ball to a certain extent, thereby intercepting the fast-moving ball.
Part2 Progress leading to midtern
After we determined the direction of the project and the purchased materials arrived, we started to study the painting robot we purchased. We found that the language used by the robot was GCode, which we were not familiar with, so we decided to use the Arduino control board plus a driver to initially control the motor. However, after many attempts, the motor still could not rotate. With the help of Feng Zhe and Wang Ruoyao, we checked various possible problems such as breadboard, wires, and control boards, but none of them could solve the problem. As time went on, we tried to change the driver again. The connection methods of each driver were completed by Feng Zhe through gpt and searching for relevant knowledge on Youtube and Google. Wang Ruoyao and I were mainly responsible for test feedback. Zhe Feng, Ruoyao Wang and I did not find out what the problem was with the previous motor until the end. In the end, we accidentally discovered that the TIC500 driver could successfully drive the motor, and the acceleration and maximum speed could be manually adjusted.
After completing the most basic drive, I started to design the bracket and racket, or blocking point, that we need. After I made a relatively accurate measurement of our overall basic mechanical structure, I designed the goal we need, which is also a bracket. After discussing with my groupmates, I determined that the approximate area of the racket is 15cm*15cm. At the same time, I designed two versions of the racket shape. The first version is to fix the camera and the racket together, but this plan was rejected after my discussion with the teaching assistant, so I planned to use the camera fixation method, so I designed the second version of the racket, which will be smaller in size and weight. The overall size of the bracket is roughly 60cm*25cm*55cm. Since the size of the bracket we need is quite large, ordinary 3D printers cannot meet our requirements. So I learned how to use Bamboo studio and Ocarslicer to complete block printing. At present, some of the fixing devices have been tested and confirmed to be able to stabilize the entire mechanical structure well. The overall frame is still being printed in steps and will be completed soon.
Part3 Work in Final Stage
After completing the overall framework, I will participate in the final implementation process of combining software and hardware. We have already derived the mathematical formula for calculating the number of motor rotations, movement direction and distance. We still have to use computer vision to obtain and calculate the real-time position of the ball and the relative coordinate system conversion of the camera. After completing the coordinate system conversion, we will consider how to control the two motors to block or eject the ball with the minimum control cost and how to implement the model.
Frame Work:
When I started to design the whole, I first designed two pillars to hold the robot and then connect it to a whole base plate. After the design was completed, I was ready to use 3D printing to print it out, but I found that the structure was too large and the 3D printer could not complete the printing at one time. So, I searched for information on the Internet and found that I could design a mortise and tenon connection structure on the separated parts to print them in steps and assemble them together. However, because even the divided parts are very high, and there are certain requirements for material density, 3D printing cannot achieve the goal. So, I started to use 80-20 as the main body of the two brackets. The 80-20 in the laboratory cannot be used directly. After I used the cutting machine, it reached the use standard. Then I used 3D printing to print the connector between the bracket and the robot, and then cut the whole wooden board, and used a right-angle steel plate to firmly fix the bracket and the base plate together, which solved the overall frame problem. In addition, I also tried the buckle structure, but in the design process, I found that due to the structural problems of the robot itself, the bayonet structure would affect the normal operation of the robot, so I gave up this solution.
Camera-Connection-Adapter:
Regarding the camera bracket, I first considered a rectangular pillar support, which is a very stable structure. However, as the test progressed, I found that I needed to change the position of the camera in space at any time to find a perfect position to achieve the highest recognition accuracy. Therefore, a rectangular bracket was not enough, so I designed a model bracket that can change the upper and lower heights, and also designed a connector that can change the front and rear lengths. After connecting the camera and the overall frame, all the mechanical designs were completed. However, there are still some shortcomings. For example, due to the structural problems of my design, the camera bracket will be slightly offset due to the slight vibration caused by the movement of the robot itself, which will slightly affect the accuracy, but it is acceptable. I will continue to improve this in subsequent work.
In addition to these, I also designed and updated the table tennis racket and arduino case, and finally completed all my work and achieved the expected results.

Jinze Lyu
Email: jinze.lyu@duke.edu
Master of Science Student in Mechanical Engineering and Material Science
LinkedIn: https://www.linkedin.com/in/jinze-lyu-21b611324/
Interested in Robotics and Connected-Automated Vehicle (CAV)
In this project, I mainly contributed the following parts:
Part 1 Sensor Selection and Testing
In this project, camera performance is essential for real-time object detection and tracking. To accurately and quickly capture the ping-pong ball’s trajectory, we tested several cameras.
Initially, we selected the Insta360 GO 3 due to its compact design and motion-optimized features. While it offers excellent stabilization and high-speed capture, we found it lacked real-time video streaming capabilities, as it is primarily designed for consumer recording. Since our project requires real-time data processing for detection and prediction, this limitation made it unsuitable.
We then evaluated the Logitech C920, which supports up to 60 frames per second at 720p resolution, ideal for real-time capture of fast-moving objects. Moreover, its stable real-time streaming seamlessly integrates with our computer vision system. In testing, the Logitech C920 performed well, capturing clear movement even under complex lighting, meeting our requirements for real-time accuracy.
Ultimately, we selected the Logitech C920 for its reliable performance, compatibility, and stability, providing a solid foundation for further algorithm development and system integration.
Part 2 Evaluation and Final Decision on the Jetson Nano
At the initial stage of the project, we planned to integrate the Jetson Nano into our system, primarily due to its advantages in artificial intelligence and deep learning applications. Equipped with a powerful GPU, the Jetson Nano can efficiently run deep learning models, which has significant potential to enhance the performance of our real-time computer vision tasks, especially in object detection and trajectory prediction. We hoped to use the Jetson Nano to improve the system’s detection accuracy and response speed by directly acquiring data from the camera and processing it locally, thereby achieving an efficient, standalone embedded vision processing unit and reducing reliance on external computing resources.
However, during actual development, we encountered challenges with the Jetson Nano in terms of configuration and performance. Firstly, the flashing and environment setup process was relatively complex, consuming a significant amount of time and effort. The core issue was that we found the JetPack version supported by the Jetson Nano was incompatible with Python 3.10.11 that we were using, resulting in suboptimal compatibility and limiting our use of the latest deep learning frameworks and libraries.
Moreover, the Jetson Nano performed poorly in terms of real-time capabilities. Its processing power was insufficient to meet our real-time processing requirements for high-frame-rate video streams, causing the system’s response speed to fall short of expectations. When handling high-speed video data, the Jetson Nano may experience frame drops, which would affect the accuracy of object detection and trajectory prediction, negatively impacting the overall system performance.
Upon discovering these issues during our attempts, I realized that this might not be the most straightforward and efficient approach. Therefore, we decided to abandon the use of Jetson Nano and focus on the development of core algorithms and the implementation of system functions.
Part 3 Gaussian Blur and Motion Blur Enhance Dataset Training Effectiveness
In this study, I enhanced the training dataset for the computer vision model to improve its accuracy under various conditions. Specifically, I applied Gaussian blur and motion blur to the dataset of ping-pong balls provided by Ruoyao Wang.
First, I applied Gaussian blur, which smooths the image by reducing noise and fine details, allowing the model to focus more on the overall features of the target. After introducing Gaussian blur, the model’s recognition accuracy improved, as it became less sensitive to noise in the data. However, when I increased the ball’s speed, the model’s performance in recognizing fast dynamic trajectories was suboptimal. This indicated that Gaussian blur alone was insufficient to adapt the model to high-speed moving objects.
To further improve the model’s performance, I introduced motion blur into the dataset. Motion blur simulates the blurring effect caused by fast-moving objects or camera shake. By incorporating motion blur into the training data, the model could learn and adapt to the blurred features presented by high-speed moving objects. After retraining the model, I observed a significant improvement in its recognition accuracy under high-speed motion conditions.
The significance of this operation lies in the enhanced diversity of the training data achieved through the introduction of Gaussian blur and motion blur. This enabled the model to better adapt to various scenarios in real-world applications. For my “robot goalkeeper” project, where the ping-pong ball moves at high speeds with complex and variable trajectories, often resulting in motion blur, this data augmentation ensured that the model maintained a high level of recognition accuracy even in challenging conditions. This enhancement improved the system’s robustness and generalization capabilities, ultimately boosting the robot’s performance in dynamic environments and ensuring accurate and timely interceptions.
Code Example
Example of Original Image and Image after Processed Image


Part 4 Motion Decomposition and Measurements in Coordinate System Transformation

With my teammates, we finished the meaurements in Coordinate System Transformation together.