NTrack: A Multiple-Object Tracker and Dataset for Infield Cotton Boll Counting

Md Ahmed Al Muzaddid and William J. Beksi
The University of Texas at Arlington

Abstract

In agriculture, automating the accurate tracking of fruits, vegetables, and fiber is a very tough problem. The issue becomes extremely challenging in dynamic field environments. Yet, this information is critical for making day-to-day agricultural decisions, assisting breeding programs, and much more. To tackle this dilemma, we introduce NTrack, a novel multiple object tracking framework based on the linear relationship between the locations of neighboring tracks. NTrack computes dense optical flow and utilizes particle filtering to guide each tracker. Correspondences between detections and tracks are found through data association via direct observations and indirect cues, which are then combined to obtain an updated observation. Our modular multiple object tracking system is independent of the underlying detection method, thus allowing for the interchangeable use of any off-the-shelf object detector. We show the efficacy of our approach on the task of tracking and counting infield cotton bolls. Experimental results show that our system exceeds contemporary tracking and cotton boll-based counting methods by a large margin. Furthermore, we publicly release the first annotated cotton boll video dataset to the research community.


Dataset

The TexCot22 dataset is a set of cotton crop video sequences for training and testing multi-object tracking methods. Each tracking sequence is 10 to 20 seconds in length. The dataset contains of a total of 30 sequences of which 17 are for training and the remaining 13 are for testing. Among the training sequences, 2 of them consist of roughly 5,000 annotated images, which can be used to train a cotton boll detection model. The video sequences were captured at 4K resolution and at distinct frame rates (e.g., 10, 15, 30). There are typically 2 to 10 cotton bolls per cluster. The average width and height of an annotated bounding box is approximately 230 x 210 pixels. To make the dataset robust to environmental conditions, we recorded the field videos at separate times of day to account for varying lighting conditions. In total, there are roughly 30 x 300 frames with 150,000 labeled instances. On average there are 70 unique cotton bolls in each sequence.


Citation

If you find this project useful, then please consider citing both our paper and dataset.

@article{muzaddid2023ntrack,
  title={NTrack: A Multiple-Object Tracker and Dataset for Infield Cotton Boll Counting},
  author={Muzaddid, Md Ahmed Al and Beksi, William J},
  journal={IEEE Transactions on Automation Science and Engineering},
  volume={},
  number={},
  pages={1--13},
  doi={10.1109/TASE.2023.3342791},
  year={2023}
}
@data{T8/5M9NCI_2024,
  title={TexCot22},
  author={Muzaddid, Md Ahmed Al and Beksi, William J},
  publisher={Texas Data Repository},
  version={V2},
  url={https://doi.org/10.18738/T8/5M9NCI},
  doi={10.18738/T8/5M9NCI},
  year={2024}
}

License

NTrack is licensed under the Apache License, Version 2.0. The TexCot22 dataset is available for non-commercial use under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License ("CC BY-NC-SA 4.0").