Satellite image quality and availability are rapidly improving, but interpreting the images is still hard. Experts may be able to identify drought conditions from satellite images, but this process is slow and expensive, as is repeatedly collecting and labeling data from the ground. Existing systems can’t predict droughts reliably—for example, they do not distinguish between edible plants and inedible cacti, which makes a huge difference for foraging animals. Higher accuracy on this task directly improves financial support for families in Northern Kenya through an existing insurance program. We hope that this project inspires researchers to collaborate on improving measures of drought and other agricultural factors from satellite—and on applying deep learning to meaningful challenges in environmental sustainability.
Weights & Biases Benchmarks are a centralized, interactive, and visual platform for open and efficient collaboration on deep learning projects. Drought Watch is one of these projects: a benchmark for drought detection from satellite. The latest iteration of this work, developed with Andrew Hobbs of UC Davis and support from many others , has been accepted to the International Conference on Learning Representations (ICLR) 2020 as part of the Computer Vision for Agriculture workshop. You can read the paper on Arxiv or check out the livestream and join the discussion on Sunday April 26th at 11am PST.
This work demonstrates that computer vision can improve our predictions of agricultural factors like farming conditions and crop yields from satellite data alone—accuracy and efficiency in such predictions is increasingly important as the climate crisis worsens. Also, it’s the first proof that community benchmarks encourage distributed collaboration on meaningful problems with a concrete positive impact. We hope to steward many more collaborations like this, focusing on environmental sustainability, physical health, and social good, as these are the critical current challenges for our planet.
In August of 2019, we launched the benchmark with a baseline of 75.8%. Our starting model is an intentionally simple convolutional neural network in which the layer sizes, learning rate, and other hyperparameters are easily configurable to encourage participants to explore different model variants. We performed only brief hyperparameter exploration before releasing this baseline. Excluding the author team, 9 participants have joined the Drought Watch Benchmark on W&B, and over 2,500 experiments runs have been logged in their projects. Folks have tried unsupervised clustering and ResNets, and the highest validation accuracy reached so far is 77.8% by user ubamba98, with an EfficientNet architecture from Tan & Le in 2019. All the submissions outperform the model currently used for index insurance  in Kenya, which achieves only 67% on the benchmark validation set of satellite images.
This level of engagement on an open project—and the 10% crowdsourced improvement in validation accuracy—is really exciting, and we’re just getting started. Since the performance difference between our simple baseline and the more sophisticated models is only a few percentage points, there are clear opportunities for further improvement. The authors plan to continue exploring different network architectures. On the data processing side, we intend to run a finer-grain analysis of the spectral bands (as some currently add more noise than signal), filter out images with obscuring clouds, and explore augmentation strategies, especially to leverage the currently unlabeled off-center pixels, perhaps by clustering. We also want to try various techniques to compensate for the class imbalance, as roughly 60% of the data is of class 0, indicating drought. We hope the ongoing benchmark will support and cross-pollinate these improvements.
Concretely, the Drought Watch dataset pairs over 100,000 expert-labeled ground-level georeferenced photos with satellite images centered at the same geolocations. This approach yields a supervised labeling of drought conditions  on satellite images, and it is highly generalizable. In any cases where ground-level data on an agricultural outcome is available (e.g. crop yields, disease status), this method can be used to develop a satellite-based index. This further enables model ensembling or transfer learning across countries or indices (e.g. yields of related crops), which could improve model robustness and reduce the need for new supervised labels. Such models can tangibly help in agricultural/farming applications by flexibly mapping satellite data to timely information on yields, forage quality, or other measures collected from the ground.
More accurate indices based on machine learning can improve the effectiveness of index insurance programs . Drought Watch shows that computer vision methods can outperform existing remote sensing techniques. Our end goal with Drought Watch is to improve index insurance in the real world. We’re currently working on an insurance product based on our model and hope to test it in the field later this year. Once trained and deployed, satellite-based computer vision models can assess conditions on demand, making it possible to monitor more frequently and compensate the insured more quickly. By increasing both the accuracy and the speed of insurance payments, this could substantially increase the benefit of insurance for farmers. Positive results from this program may encourage adoption of similar programs in other countries and in other agricultural or environmental decision-making contexts. More generally, the strategy of using deep learning to improve insurance models could be helpful in many other critical applications like managing wildfire risk.
Based on the engagement and improved models in Drought Watch, CodeSearchNet, and other early Benchmarks, we have initial evidence that benchmarks help facilitate distributed collaboration. The true value of Benchmarks will be measured by their long-term impact in meaningful products, after the crowdsourced deep learning solution is deployed. We need more folks to work together faster on such solutions, especially in times of pandemic and climate crisis. As the model-building process becomes more inclusive and more collaborative, we can leverage more diverse skills and perspectives to build better—more accurate, general, explainable, fair, efficient, and safe—models, which lead to better outcomes for the planet. We can’t do it without you.
If any of this resonates with you, we would absolutely love to collaborate:
With all my active hope on the 50th anniversary of Earth Day,
— Citations & Notes —
 The data used in this research was collected through a research collaboration between the International Livestock Research Institute, Cornell University, and UC San Diego. It was supported by the Atkinson Centre for a Sustainable Future’s Academic Venture Fund, Australian Aid through the AusAID Development Research Awards Scheme Agreement No. 66138, the National Science Foundation (0832782, 1059284, 1522054), and ARO grant W911-NF-14-1-0498. We’d also like to thank our excellent collaborators: Nathan Jensen, Jin Baek, and the entire Weights & Biases team.
 How does index insurance work? Agricultural index insurance uses a common metric like rainfall, average yield per area, or vegetation growth to statistically model crop or livestock production with varying accuracy and computational cost. Indices that can be measured remotely, like rainfall or drought conditions from satellite, have lower computational costs—we can increase prediction accuracy through various post-processing techniques without needing to make additional local measurements. By relying on a shared, external, and maximally objective metric, index insurance avoids two main issues with regular insurance (aside from high cost): adverse selection (where only the folks more likely to suffer losses pay for the insurance) and moral hazard (counterproductive action with the intent of collecting a payout). This leaves the main cost: the basis risk, or the error between the index’s estimate and an individual farmer’s real losses. As agricultural indices of conditions like drought become more accurate with technological advances (in remote sensing, satellite imaging, and now computer vision and deep learning), the basis risk declines, reducing the cost of the program and making it accessible to more farmers. To measure the effectiveness of index insurance programs, economists consider the Minimum Quality Standard, comparing the program’s value over time—in terms of potential for income stabilization—to no insurance program and equivalent direct cash transfer. They also study “ex-ante” effects, increased investments in future productivity that pastoralists are able to make with the knowledge of coverage. You can learn more here.
 More precisely, labels of forage quality, as the expert Kenyan pastoralists on the ground are answering the question of “how many cows can this location feed?” with 0, 1, 2, or 3+ to generate the training data labels.