Preparing and Training YOLO on Jetson

Pre-requisites

The Darknet should be installed and the dataset should be labelled in the YOLO format

Step by Step : Preparation

1. Copy and paste all the images, text annotations and the classes.txt file in a single folder. (LabelledData)

2. Navigate to the folder and download the python scripts from the GitHub folder (also on profile) and run the getting-full-path.py script. This will give the path from the root directory to your current directory.

3. Copy this path

4. Download the creating-train-and-test-txt-files.py and the creating-files-data-and-name.py scripts from GitHub. Modify the image path variable in the scripts to the path copied earlier and run these scripts.

5. After running, four files will be created, classes.names , labelled_data.data, test.txt and train.txt. The train.txt file contains paths to the images that will be used to train the model, and the test.txt file contains the paths for images that will be used for validation. (Set value of 15% of images , can be changed).

6. Navigate to the cfg (configuration folder in the darknet directory). And create 2 custom configuration files. ‘yolov3_traincustom.cfg’ and ‘yolov3_testcustom.cfg’.

7. Copy the yolov3.cfg (default) file in these files.

8. In the train file, delete the testing comments code and set the batches and subdivisions to 16 and 8 respectively. (can be 64-32, 8-4 depending on the computation capabilities, data and accuracy required).

Since we need to train 3 classes , namely SpongeBall, Golfball and RubberBall we will need to change the following values. If they are not changed appropriately the program will crash.

a. classes = 3

b. max_batches = 6000

the batches are calculated as number of classes multiplied by 2000.

c. steps = 4800, 5400

These values are calculated as the 80% and the 90% of the max_batches

d. filters= 24 (only linear)

Filters are calculated as (classes+co-ordinates+1)*masks

(masks and co-ordinate values are indicates in the file)

9. In the test file, change the subdivisions and batches both to 1 and implement the rest changes as above.

10. Run make

11. Run the train command

12. Post training, the weights will be saved in the backup folder.

Error and Troubleshooting

If the process repeatedly gets killed and/or freezes (can freeze for hours), the memory of the GPU hardware is falling short.

One way to increase performance is to create a swap file. (virtual memory)

If the behavior is still the same, the files cannot be processed and need a larger memory GPU.

Another way to deal with this issue is to upload these files on Google Colab and use a remote GPU.