Weather from Microsoft Start’s new AI capabilities are improving 30-day weather forecasts

In a newly published article on ArXiv, the research team at Weather from Microsoft Start has shown how AI weather models compare to the latest state-of-the-art European Centre for Medium-range Weather Forecasts (ECMWF) extended-range ensemble. Instead of using a single type of AI model, we combine five different trained models comprised of three different deep learning architectures together to produce some promising forecasts one month in advance.

In 1972, Edward Norton Lorenz, one of the pioneers of numerical weather prediction (NWP), famously stated that “a butterfly flapping its wings in Brazil can produce a tornado in Texas.” This vivid metaphor was intended to demonstrate the chaotic nature of the atmosphere, where even the tiniest influence can result in a wildly unpredictable outcome. Scientific research has suggested that even with perfect weather models and nearly perfect data, it becomes very difficult to predict phenomena such as thunderstorms even one or two days ahead.

So how, then, can we hope to make useful weather forecasts all the way out to 30 days? Unsurprisingly, if we look at a single simulation by an NWP model, this forecast would be wildly inaccurate most of the time. However, decades of scientific research in ensemble forecasting have shown that it is possible to tease out information in long-range forecasts by relying on probabilistic forecasts – running dozens or even thousands of different but equally-likely simulations of the weather and extracting meaningful information from them.

NWP ensembles, such as the state-of-the-art system run by ECMWF, require large amounts of supercomputing resources and produce petabytes of data. However, recent advances in AI research have shown that deep learning methods can predict the weather much faster and even more accurately than traditional NWP models.

Unlike traditional models, which compute the evolution of weather around the globe by using physics of fluid dynamics in addition to approximations of other physical processes such as thunderstorms and wind turbulence, AI-powered weather prediction models learn from decades of observed weather to recognize patterns and predict their future evolution. They operate in much the same way as an NWP model, though: given the current state of the atmosphere on a 3-D globe (latitude, longitude, and height), predict the state of the atmosphere for some future time, say one hour later. They then feed this prediction back into the model to predict two hours later, and so on. Because the models can operate at much coarser spatial resolution and take much larger time steps than an equivalent thermodynamic model could, simulations take only minutes on a single graphics processing unit (GPU). Hence these models can run more frequently to produce more simulations for better probabilistic forecasts.

In our preprint, we compare our AI weather models to the state-of-the-art ECMWF extended-range ensemble, which makes forecasts at 0.4° spatial resolution every six hours up to 46 days ahead. The ECMWF model was last updated in June 2023 with an increase in ensemble size from 50 members to 100. Each of our five AI models was run 20 times to create an ensemble of 100 forecasts at 1° resolution in latitude and longitude every six hours into the future.

The results are quite encouraging: when measuring temperature errors using the Continuous Ranked Probability Score (CRPS) metric, our out-of-the-box AI ensemble outperforms the ECMWF model by 17% for one-week forecasts and 4% for four-week forecasts (Figure 1). The CRPS is optimized when the distribution of the ensemble matches the expected distribution of the observations, hence the model must correctly represent the uncertainty in a forecast. It can be thought of like a mean absolute error, where lower is better.
 
 
Figure 1: Temperature forecast error (CRPS; lower is better) for Microsoft’s AI ensemble and the ECMWF ensemble, for each week of forecast lead time.

The longer a model runs into the future, the more it tends to accumulate errors due to model drift biases. When running an operational model, it’s important to correct these systematic errors by learning from simulated forecasts of the past, or hindcasts, how the model tends to drift.  When applying a correction, we observe that our AI ensemble scores fall behind the ECMWF ensemble’s by about 3% at week four.

We also consider what happens when combining the two ensembles together into a 200-member probabilistic forecast. It turns out that the result is better than either individual model, albeit by a very small (not significant) margin. This suggests that the AI ensemble is creating new variability in the forecasts that can help capture more weather phenomena such as extreme temperatures or precipitation, yet at the same time traditional forecasting methods remain useful. As we can see from the spatial distribution of forecast errors in Figure 2, which are very similar for our AI ensemble and the ECMWF ensemble, the predictability of each location’s weather remains the dominant factor in determining forecast accuracy rather than the specific model used for the forecast.
          
   
Figure 2: Spatial distribution of temperature forecast errors at week 4 (CRPS, lower is better).

As shown by our results, AI weather models have the potential to bring the next big improvements to weather forecasting beyond ten days. These 30-day forecasts will be the latest addition to Microsoft’s growing inventory of world-leading weather modeling. According to an independent study commissioned by Microsoft,* Weather from Microsoft Start was recognized for its leading forecast accuracy. You can find weather information from Weather from Microsoft Start through its integration into Windows 10, Windows 11, Microsoft Edge, Bing, and in the Bing and Microsoft Start mobile apps.

*ForecastWatch, Analysis of One-to Five-Day-Out Global Temperature, Wind Speed, Precipitation and Opacity Forecasts, Jan-Jun 2022 (msn.com).