
According to Schrder (2014), the world’s agricultural workforce is expected to decline around 30% between 20. During times of highly intensive agricultural activities (eg., harvest), there are very pronounced peaks in workload which can only be predicted on a short-term basis due to the weather conditions and seasonality. This is not only due to the dependency on the weather conditions, but as well on the labor market. Using new convolutional deep learning techniques based on single-shot detectors to detect and count fruits (apple and pear) within the tree canopy.Īgriculture is a sector with very specific working conditions and constraints. Processing speed is higher than 20 FPS which is fast enough for any grasping/harvesting robotic arm or other real-time applications. Based on correlation between number of visible fruits, detected fruits on one frame and the real number of fruits on one tree, a model was created to accommodate this error rate. This architecture showed an accuracy of more than 90% fruit detection. Labeling images for training consisted on manually specifying the bounding boxes for fruits, where (x, y) are the center coordinates of the box and (w, h) are width and height. Each tree image with approximately 50 fruits, that at the end resulted on more than 5000 images of apple and pear fruits each. More than 100 images of apple and pear trees were taken. We want this confidence score to be high if a fruit exists in a cell, otherwise to be zero, if no fruit is in the cell. Each of those cells is responsible to predict bounding boxes and confidence score for fruit (apple and pear in the case of this study) detected in that cell. To each grid cell an image detection and localization algorithm is applied. This architecture takes the input image and divides into AxA grid, where A is a configurable hyper-parameter that defines the fineness of the grid. Using deep-learning techniques eliminates the need for hard-code specific features for specific fruit shapes, color and/or other attributes. This paper details the use of deep convolution neural networks architecture based on single-stage detectors.

While accurate, these approaches even with high-end hardware are still computationally intensive and too slow for real-time systems.

Image/video processing for fruit detection in the tree using hard-coded feature extraction algorithms has shown high accuracy on fruit detection during recent years.

Kushtrim Bresilla * Giulio Demetrio Perulli Alexandra Boini Brunella Morandi Luca Corelli Grappadelli * Luigi Manfrini *
