Home Python projects

Part of my role as a Run Director of Edinburgh parkrun is the processing of the event results and uploading them to the parkrun results system. The initial files consist of multiple files containing athlete positions, and a paper list of athletes missing from the files. There are three areas in the results processing where a script could be usefully applied.

Paper list of athletes

Athletes can be added as part of the online main results system. However, this is at a late stage in the processing, and the data will be lost of there is a reason to reprocess the results. Additionally, this requires an online connection that is often not available immediately after the event.

The application I created using the wxpython library allowed the quick entry of athletes and their finish positions and output these into a file with the same format as the files generated by the athlete barcode scanners

Correction of errors in athlete position files

Athlete position files are generated by a barcode scanner which is used to alternately scan athlete barcodes and position barcodes. Occasionaly the operators using the scanners scan the same barcode twice if they are uncertain that a scan was successful. At other times the operators may fail to scan an athlete barcode or position barcode. Upload to the online system will fail if any of these errors are present.

Used boa-constructor GUI builder to create wxpython. Script removes all duplicate scans, and deletes incomplete scans. A list of incomplete scans is presented in GUI. Also runs as a command-line app and has small automated test suite.

Primary and backup timer files

Edinburgh parkrun typically has 500-600 runners. The timekeeper operating the event timer can make occasional mistakes, mostly missing a runner crossing the finish line, or occasionally pressing the button an extra time. On rare occasions larger failures of the results system can occur. Simply using the results from a backup timer is the only option for a serious problem. However, for isolated timing errors, some kind of merge of the two sets of results is useful. The parkrun event has other mechanisms in place to narrow down the timing error to a subset of the runners.

Comparing the two results files by hand is difficult, requiring the creation of a spreadsheet to calculate run times and then manually aligning the times in two columns

A first attempt to automatically compare the two files in a single pass, worked through the two input files looking at nearest neighbours. On its first use, it more accurately identified 2 errors in a set of results than was achieved through manual inspection. However, the script also gave a few false indications of errors, especially where multiple runners crossed the finish line close together. This gives an effect where the 2 sets of results are misaligned for a short period and then become re-aligned.

A second attempt to compare the files used a two pass approach, breaking the runners into clusters of similar times and then aligning those clusters. This approach ran into a few problems where the clusters in the 2 input files were of different sizes. Work on this approach stopped in favour of the third attempt

The third attempt uses a muliple pass approach, using an approach that is the inverse of the second approach. It looks for the most isolated times first and aligns those with a high degree of confidence, before looking at times that are closer together. This narrows down the error to a small number of times. Initial results with this approach are promising, although some bugs remain to be resolved.

Automated tests allow the same data to be passed through all 3 approaches. A class hierarchy with classes for a single timer, a generic merged timer derived from parent single timers, and specialised merged timers for each approach has been used. This allows for refinements to result generation and test approaches to be appled to the two earlier approaches, even though development is focused on the third approach.

Actual results files from real events is being used as test data, with some modifications expected to achieve specific test scenarios. These results files vary from files which were believed to match without errors, to files were one timekeeper struggled to operate the timer