Vineyard yield forecasting is a key issue for vintage scheduling and optimization of winemaking operations. High errors in yield forecasting can be found in the wine industry, mainly due to the high spatial variability in vineyards, strong dependency on historical yield data, insufficient use of agroclimatic data and inadequate sampling methods. Today, errors can reach values within the range of 20%-30% per block. Thus, improved methodologies for early and accurate vineyard yield forecasting are needed. We proposed a new system for vineyard yield forecasting that integrates: systematic cluster counting, sampling and weight measurement; key agroclimatic parameters; vineyards spatial variability and the use of forecasting models based on machine learning (ML). We carried out a trial in high yield Cabernet Sauvignon (CS) vineyards located in Maule Valley (Chile), during season 2020. We covered four blocks (66 ha) and two trellis systems (pergola and free-cordon). We characterized the spatial variability of blocks using Sentinel-2 images and NDVI analysis. We defined sampling units based on NDVI levels and we counted and sampled grape clusters and measured their weights before veraison was completed. Key agroclimatic data were taken from public databases and we collected yield historical data from 2016 onwards. We trained and applied machine-learning models based on MARS, Random Forest, SVR and xgbTree algorithms, to generate yield forecasts in veraison. After comparing to actual harvest yields, we obtained a MAPE of 7.6% per block against a 10.1% given by the traditional method (TM). In addition, we obtained a standard deviation of the errors of 2.0% against a 7.9% given by TM. As a result, we obtained a cost-efficient, early and accurate new system for vineyard yield forecasting.

Cuevas-Valenzuela, José1; Caris-Maldonado, Carlos1; Reyes-Suárez, José Antonio2; González-Rojas, Álvaro1
1Center for Research and Innovation (CRI) Viña Concha y Toro, Ruta k-650 km 10, Pencahue, Maule, Chile.
2Bioinformatics Department, Faculty of Engineering, Universidad de Talca, Campus Lircay, Talca, Maule, Chile.

Email: jose.cuevas@conchaytoro.cl

Article extracted from the presentation held during Enoforum Web Conference (23-25 February 2021)

1. Introduction

Vineyard yield forecasting has the objective of predicting as accurately as possible the quantity of grapes for winemaking that will be harvested in a specific season (Dami & Sabbatini, 2011; Sabbatini et al., 2012). Precise and in-time information about yield is crucial to improve the efficiency of vineyard and winery operations, to decide on new investments in winery equipment, and to take grapes and wine purchase decisions. It is also essential to successfully regulate vineyard yield, preventing under- and over-cropping and maintaining healthy and balanced vines each season (Bocco et al., 2015; Carrillo et al., 2016; Cunha et al., 2010; Dami & Sabbatini, 2011; Diago et al., 2012; Dunn, 2010; M. De la Fuente, R. Linares, P. Baeza, C. Miranda, 2015; Sabbatini et al., 2012).

Currently, there is not a standardized methodology for achieving an appropriate vineyard yield prediction, before veraison, and with an error less or equal to 10% (Carrillo et al., 2016; Diago et al., 2012; Dunn and Martin, 2004; Sabbatini et al., 2012). Current forecasting methods (that we call traditional methods) show average errors around 20-30% per vineyard block and they are destructive, laborious, time demanding and/or expensive (Blom and Tarara, 2009; Diago et al., 2012; Dunn and Martin, 2004). These errors are also supported by an internal study carried out in Viña Concha y Toro (VCyT, Chile), using historical data in different grapevine varieties and quality levels. The current methods are based on manual cluster counting and weighing of randomly selected vines and they do not consider the spatial variability present in the vineyard (Carrillo et al., 2016; Diago et al., 2012; Dunn and Martin, 2004; Dunn, 2010).

Therefore, it becomes very important to have improved methodologies for early and accurate vineyard yield forecasting. New methods, mainly based on digital technologies, have arisen in the last years as interesting alternatives for industrial applications. Most of them are supported by remote sensing technologies, such as satellites and RGB and multispectral cameras coupled to UAV (Unmanned Aerial Vehicles) or mobile robots. These technologies have been optimized and can be effectively implemented in vineyard operations to capture key data and characterize spatial variability (Matese et al., 2015; Santesteban, 2019). In addition, computer vision and machine learning technologies are growing rapidly and they have demonstrated to be useful for analysing data and determining key vine and yield parameters (Di Gennaro et al., 2019; Kurtser et al., 2020; Liu et al., 2017, 2018, 2020; Nuske et al., 2014; Pérez-Zavala et al., 2018; Silver & Monga, 2019).

As an alternative to the traditional methods, we proposed to develop and validate a new machine-learning-based (ML-based) system for early and accurate vineyard yield forecasting, i.e. before veraison is completed and forecasting errors below 10% per vineyard block. We did this following a precision agriculture approach given by ISPA (International Society of Precision Agriculture, 2019): “management strategy that gathers processes and analyses temporal, spatial and individual data and combines it with other information to support management decisions according to estimated variability for improved resource use efficiency, productivity, quality, profitability and sustainability of agricultural production”. Thus, we primarily focused on the characterization of vineyard spatial variability, then we worked on grape and vine data acquisition and finally we processed the information and developed the ML-based models to give yield forecasts. 

2. Methodology

We developed a new system for vineyard yield forecasting that encompasses: (i) construction of a historical database; (ii) characterization of vineyards spatial variability using satellite data; (iii) systematic cluster counting, sampling and weight measurement, based on the spatial variability (for the trial season); (iv) acquisition and integration of key agroclimatic data; and, (v) the construction and application of yield forecasting models based on machine learning (ML) algorithms and collected data. All the data processing and model development follows a framework described in Figure 1.

We tested and validated the new yield forecasting system in commercial vineyards belonging to Viña Concha y Toro (VCyT) in Chile. Particularly, we carried out trials in high yield Cabernet Sauvignon (CS) vineyards located in Lourdes estate (Pencahue, Maule Valley, Chile), during season 2020. We covered four blocks (66 ha) and two trellis systems (pergola and free-cordon), which were selected because they present major challenges for forecasting in VCyT.

Further details on the development of the new yield forecasting system are described in the following sections.

Figure 1. Framework for the development of machine-learning (ML) -based yield forecasting models. This includes three main parts: data collection, cleaning and pre-processing; feature selection and model benchmarking.

2.1 Construction of the historical database

Agricultural data for the four CS trial blocks were collected from Viña Concha y Toro databases, from year 2016 to 2020. These data correspond to the type of trellis system, plantation density, grape quality classification and actual yields for each season. Agroclimatic data were collected from the nearest Chilean public meteorological station available for the same seasons and trial location (Agromet, www.agromet.cl). The data found on this database contain minimum and maximum temperatures, accumulated degree-days, wind speed and direction, cold hours, relative humidity and rainfall. These data were cured, structured and added to the dataset using the framework described in Figure 1. Finally, Sentinel-2 images were obtained from the Copernicus Open Access Hub (https://scihub.copernicus.eu/) to complete the historical database (satellite data). Retrieved images for each season covered a period between November and the end of January of the following year (before veraison).

2.2 Characterization of vineyards spatial variability – Season 2020

Sentinel-2 images for each trial block were analysed using an open source Geographic Information System tool (QGIS, https://qgis.org/es/site/) and NDVI (Normalized Difference Vegetation Index) data were obtained. NDVI data corresponding to season 2020 were used to define five different vigour zones for each block, as an approach to characterize their spatial variability These zones included: Low (minimum NDVI value), Low to Medium, Medium (mean NDVI value), Medium to High, and High (maximum NDVI value) zones. Sampling units (SU) were then defined for each zone, following a protocol given in Figure 2. Finally, each SU was named and marked on-site for the following clus.

Figure 2. Definition of sampling units (SU) as a function of the vineyard trellis system. Left: In the case of free-cordon trellis systems (applicable also for double cordon), the SU is formed by the leaf area space between two trunks. Right: in the case of pergola systems, the leaf cover between four trunks forms the SU.

2.3 Yield parameters and agroclimatic data collection – Season 2020

For each sampling unit (SU), during veraison (end of January 2020), cluster counting, sampling and weight measurement were carried out. Grape cluster counting was done manually by the agricultural team and a common digital scale was used on-site for cluster weight measurement. All these processes were done considering three size categories: small, medium and large grape clusters (defined by the vineyard manager). Therefore, total number of clusters and weight values (in g) were obtained for each size category and SU. It is important to highlight that the processes were carried out first for the fruit-set stage (end of November 2019) and repeated for veraison (late January 2020).

Besides, agroclimatic data for the period between November 2019 and the end of January 2020 were collected and integrated for the construction of machine-learning-based yield forecasting models. Same public databases (Agromet) and key data described before were used.

2.4 Construction and application of yield forecasting models

Once all the data were structured and cured, several machine learning (ML) algorithms were evaluated for yield forecasting: Multivariate Adaptive Regression Splines (MARS), Random Forest (RF), Support Vector Regression (SVR) and extreme gradient boosting (xgbTree). As part of the data pre-processing step (Figure 1), categorical variables were one-hot-encoded and numerical variables were standardized for each algorithm. In addition, as part of the feature selection step, a recursive feature elimination (RFE) method was used in order to select the best predictors. For model benchmarking, repeated k-fold cross validation was applied to test each algorithm. In this step, a randomized grid search was applied for hyperparameter tuning of each algorithm. Mean absolute percentage error (MAPE) was used as a measure of performance to select the models for further application (using a threshold of 10%). Most of this approach is based on the work of Maya Gopal & Bhargavi (2019).

Afterwards, the best ML-based models were applied to generate yield forecasts (in ton/ha) at the veraison stage (late January 2020). For that, we used as an input all the data previously collected for season 2020, including NDVI data, fruit-set/veraison yield parameters and corresponding agroclimatic data. Results given by the ML-based models were compared to those obtained by the traditional method used in Viña Concha y Toro. This method, which is commonly used in the wine industry, is characterized by: non-systematic and random grape cluster sampling methods; linear models based on historical data for yield estimation; and lack of agroclimatic data integration. Finally, both, the traditional and the ML-based method, were then compared against the actual yield for season 2020 (given by the corresponding harvest industrial operation). The relative error, given by the following equation,

was used as a measure of forecasting accuracy. Also, standard deviation of the errors was used to describe the precision of each method.

3. Results

From the five ML algorithms tested, we selected those that had less than 10% MAPE (Table 1) in the training stage for veraison (model benchmarking): Random Forest, Support vector machine (kernel A) and xgbTree. These algorithms were used to generate the yield forecasts (in ton/ha) for each vineyard block just before veraison was completed, at the end of January 2020. In parallel, the traditional method (TM) was also applied to obtain yield forecasts. As observed in Figure 3, these TM forecasts have a low variability, showing for example the same values for all the free-cordon blocks (6, 10 and 13). In contrast, the forecasts obtained by the ML-based method follow a similar trend that the one given by the actual yields (obtained during harvest, between late March and April 2020). It is important to highlight that ML-based yield forecasts shown in Figure 3 correspond to the application of the algorithm that presents the best performance between the three previously aforementioned. 

ML-based forecasts were then compared against the actual yield obtained during harvest, obtaining relative errors for each block (expressed as percentages, Table 2). Same process was applied using the traditional method (TM). Results show that the best performance was given by the ML-based method with a MAPE of 7.6% compared to 10.1% obtained by TM (Table 2). Regarding precision, a standard deviation (SD) of 2.0% was obtained for the errors given by the ML-based method. In contrast, an SD value of 7.9% was obtained for TM, indicating less precision in yield forecasting.

Figure 3. Actual grape yields (grey bars) compared to forecasts given by the traditional method (blue bars) and the ML-based method developed in this work (yellow bars). Actual yields were obtained during harvest, between late march and april. Forecasts were generated at the end of January, before veraison was completed.
Figure 3. Actual grape yields (grey bars) compared to forecasts given by the traditional method (blue bars) and the ML-based method developed in this work (yellow bars). Actual yields were obtained during harvest, between late march and april. Forecasts were generated at the end of January, before veraison was completed.
Table 2. Relative errors (% error) in grape yield forecasting reported for each method, traditional (TM) and ML-based method, after comparing against actual yield (in ton/ha). Standard deviation (SD) of the errors are also included. Forecasts were generated before veraison was completed (late January 2020) and actual yields were obtained during harvest (March-April 2020). In the case of the ML-based method, errors correspond to the application of the algorithm that presents the best performance between the three previously selected.

4. Conclusions and future work

Grape yield forecasting presents a complex challenge that requires taking into account different factors that are difficult to measure objectively and in a timely manner: the spatial variability of the blocks; the different agroclimatic conditions between seasons; and, the arduous sampling tasks. The use of remote sensing technologies such as satellite imagery and weather stations are key to be able to perform sampling in a systematized way that represent the variability of the different blocks and trellis systems. This combined with the use of machine learning methods have proven to be an effective tool to generate early and accurate yield forecasts.

We have obtained a cost-efficient, early and accurate new system for vineyard yield forecasting, based on machine learning models. This new system:

  • Significantly reduces the forecasting yield error, below the threshold of 10% per block, giving an overall MAPE (mean absolute percentage error) of 7.6% for the Cabernet Sauvignon blocks considered in the trial (before veraison is completed).
  • Has a better performance that the one obtained by the traditional yield forecasting method, already used at commercial scale (MAPE of 7.6% against a 10.1%; SD of 2.0% against 7.9%).
  • Captures and exploits the vineyard spatial variability in a practical way.
  • Integrates different types of data that can be obtained/measured at low cost (satellite images and NDVI data; yield parameters for representative sampling units) or are readily available (wine company historical data, agroclimatic data).
  • Is flexible, scalable and adaptable to any grape variety/quality, trellis system and vineyard extension.

As part of the technological transfer and implementation of the new vineyard yield forecasting system, we are working on a new trial study for this 2021 season in Viña Concha y Toro, covering more than 300 ha, new trellis systems and different CS quality levels and origins. We are working with third parties for developing a digital platform that comprehends different tools for data integration and analysis and our ML-based models for forecasting. The digital platform will be a useful precision viticulture tool for a lean yield forecasting process and an optimal harvest management. 


We are grateful for the support of technicians and professionals from the Agricultural Management of Viña Concha y Toro, especially the administration of the Lourdes estate.


Blom, P. E., & Tarara, J. M. (2009). Trellis tension monitoring improves yield estimation in vineyards. HortScience, 44(3), 678–685. https://doi.org/10.21273/hortsci.44.3.678

Bocco, M., Sayago, S., & Violini, S. (2015). Modelos simples para estimar rendimiento de cultivos agrícolas a partir de imágenes satelitales : una herramienta para la planificación. 2o Simposio Argentino Sobre Tecnología y Sociedad. Modelos, 26–35.

Carrillo, E., Matese, A., Rousseau, J., & Tisseyre, B. (2016). Use of multi-spectral airborne imagery to improve yield sampling in viticulture. Precision Agriculture, 17(1), 74–92. https://doi.org/10.1007/s11119-015-9407-8

Cunha, M., Marçal, A. R. S., & Silva, L. (2010). Very early prediction of wine yield based on satellite data from vegetation. International Journal of Remote Sensing, 31(12), 3125–3142. https://doi.org/10.1080/01431160903154382

Dami, I., & Sabbatini, P. (2011). Crop Estimation of Grapes. Fact Sheet – Agriculture and Natural Resources, Stage I, 1–2.

Di Gennaro, S. F., Toscano, P., Cinat, P., Berton, A., & Matese, A. (2019). A Low-Cost and Unsupervised Image Recognition Methodology for Yield Estimation in a Vineyard   . In Frontiers in Plant Science   (Vol. 10, p. 559). https://www.frontiersin.org/article/10.3389/fpls.2019.00559

Diago, M.-P., Correa, C., Millán, B., Barreiro, P., Valero, C., & Tardaguila, J. (2012). Grapevine yield and leaf area estimation using supervised classification methodology on RGB images taken under field conditions. Sensors (Basel, Switzerland), 12(12), 16988–17006. https://doi.org/10.3390/s121216988

Dunn, G. (2010). Yield Forescasting. GWRDC Innovators Network, June 2010, 6.

DUNN, G. M., & MARTIN, S. R. (2004). Yield prediction from digital image analysis: A technique with potential for vineyard assessments prior to harvest. Australian Journal of Grape and Wine Research, 10(3), 196–198 https://doi.org/https://doi.org/10.1111/j.1755-0238.2004.tb00022.x

ISPA (2019). Precision AG Definition, International Society of Precision Agriculture. Available in: https://www.ispag.org/about/definition.

Kurtser, P., Ringdahl, O., Rotstein, N., Berenstein, R., & Edan, Y. (2020). In-Field Grape Cluster Size Assessment for Vine Yield Estimation Using a Mobile Robot and a Consumer Level RGB-D Camera. IEEE Robotics and Automation Letters, 5(2), 2031–2038. https://doi.org/10.1109/LRA.2020.2970654

Related sheets: