Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
112 changes: 112 additions & 0 deletions docs/zh/examples/pangu_weather.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# Pangu-Weather

=== "模型训练命令"

暂无

=== "模型评估命令"

暂无

=== "模型导出命令"

暂无

=== "模型推理命令"

``` sh
# Download sample input data
wget -nc https://paddle-org.bj.bcebos.com/paddlescience/models/Pangu/input_surface.npy -P ./data
wget -nc https://paddle-org.bj.bcebos.com/paddlescience/models/Pangu/input_upper.npy -P ./data

# Download pretrain model weight
wget -nc https://paddle-org.bj.bcebos.com/paddlescience/models/Pangu/pangu_weather_1.onnx -P ./inference
wget -nc https://paddle-org.bj.bcebos.com/paddlescience/models/Pangu/pangu_weather_3.onnx -P ./inference
wget -nc https://paddle-org.bj.bcebos.com/paddlescience/models/Pangu/pangu_weather_6.onnx -P ./inference
wget -nc https://paddle-org.bj.bcebos.com/paddlescience/models/Pangu/pangu_weather_24.onnx -P ./inference

# 1h interval-time model inference
python predict.py INFER.export_path=inference/pangu_weather_1
# 3h interval-time model inference
python predict.py INFER.export_path=inference/pangu_weather_3
# 6h interval-time model inference
python predict.py INFER.export_path=inference/pangu_weather_6
# 24h interval-time model inference
python predict.py INFER.export_path=inference/pangu_weather_24
```

## 1. 背景简介

盘古气象大模型(Pangu-Weather)是首个精度超过传统数值预报方法的 AI 方法,其提供了 1 小时间隔、3 小时间隔、6 小时间隔、24 小时间隔的预训练模型。其使用的数据,包括垂直高度上13个不同气压层,每层五种气象要素(温度、湿度、位势、经度和纬度方向的风速),以及地球表面的四种气象要素(2米温度、经度和纬度方向的10米风速、海平面气压)。1 小时 - 7 天预测精度均高于传统数值方法(即欧洲气象中心的 operational IFS)。

同时,盘古气象大模型在一张V100显卡上只需要1.4秒就能完成24小时的全球气象预报,相比传统数值预报提速10000倍以上。

## 2. 模型原理

本章节仅对盘古气象大模型的原理进行简单地介绍,详细的理论推导请阅读 [Pangu-Weather: A 3D High-Resolution System for Fast and Accurate Global Weather Forecast](https://arxiv.org/pdf/2211.02556)。

模型的总体结构如图所示:

<figure markdown>
![result](https://paddle-org.bj.bcebos.com/paddlescience/docs/pangu-weather/model_architecture.png){ loading=lazy style="margin:0 auto;"}
<figcaption>模型结构</figcaption>
</figure>

其主要思想是使用一个视觉transformer的3D变种来处理复杂的不均匀的气象要素。由于气象数据分辨率很大,因而相比于常见的vision transformer方法,研究人员将网络的encoder和decoder减少到2级(8个block),同时采用Swin transformer的滑窗注意力机制,以减少网络的计算量

模型使用预训练权重推理,接下来将介绍模型的推理过程。

## 3. 模型构建

在该案例中,实现了 PanguWeatherPredictor用于ONNX模型的推理:

``` py linenums="67" title="examples/pangu_weather/predict.py"
--8<--
examples/pangu_weather/predict.py:67:97
--8<--
```

``` yaml linenums="29" title="examples/pangu_weather/conf/pangu_weather.yaml"
--8<--
examples/pangu_weather/conf/pangu_weather.yaml:29:44
--8<--
```

其中,`input_file` 和 `input_surface_file` 分别代表网络模型输入的高空气象数据和地面气象。

## 4. 结果可视化

先将数据从 npy 转换为 NetCDF 格式,然后采用 ncvue 进行可视化

1. 安装相关依赖
```python
pip install cdsapi netCDF4 ncvue
```

2. 使用脚本进行数据转换
```python
python convert_data.py
```

3. 使用 ncvue 打开转换后的 NetCDF 文件, ncvue 具体说明见[ncvue官方文档](https://github.com/mcuntz/ncvue)

## 5. 完整代码

``` py linenums="1" title="examples/pangu_weather/predict.py"
--8<--
examples/pangu_weather/predict.py
--8<--
```

## 6. 结果展示

下图展示了模型的温度预测结果,更多指标可以使用 ncvue 查看。

<figure markdown>
![result](https://paddle-org.bj.bcebos.com/paddlescience/docs/pangu-weather/temperature.png){ loading=lazy style="margin:0 auto;"}
<figcaption>温度预测结果</figcaption>
</figure>

## 7. 参考资料

- [Pangu-Weather: A 3D High-Resolution System for Fast and Accurate Global Weather Forecast](https://arxiv.org/pdf/2211.02556)
44 changes: 44 additions & 0 deletions examples/pangu_weather/conf/pangu_weather.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
defaults:
- ppsci_default
- INFER: infer_default
- hydra/job/config/override_dirname/exclude_keys: exclude_keys_default
- _self_

hydra:
run:
# dynamic output directory according to running time and override name
dir: ./outputs_pangu_weather
job:
name: ${mode} # name of logfile
chdir: false # keep current working directory unchanged
callbacks:
init_callback:
_target_: ppsci.utils.callbacks.InitCallback
sweep:
# output directory for multirun
dir: ${hydra.run.dir}
subdir: ./

# general settings
mode: infer # running mode: infer
seed: 2023
output_dir: ${hydra:run.dir}
log_freq: 20

# inference settings
INFER:
pretrained_model_path: null
export_path: inference/pangu_weather_24
onnx_path: ${INFER.export_path}.onnx
device: gpu
engine: onnx
precision: fp32
ir_optim: false
min_subgraph_size: 30
gpu_mem: 100
gpu_id: 0
max_batch_size: 1
num_cpu_threads: 10
batch_size: 1
input_file: './data/input_upper.npy'
input_surface_file: './data/input_surface.npy'
159 changes: 159 additions & 0 deletions examples/pangu_weather/convert_data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
# Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved.

# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at

# http://www.apache.org/licenses/LICENSE-2.0

# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# ref: https://github.com/HaxyMoly/Pangu-Weather-ReadyToGo/blob/main/forecast_decode_functions.py

import os
from os import path as osp
from typing import Dict

import hydra
import netCDF4 as nc
import numpy as np

from ppsci.utils import logger


def convert_surface_data_to_nc(
surface_file: str, file_name: str, output_dir: str
) -> None:
surface_data = np.load(surface_file)
mean_sea_level_pressure = surface_data[0]
u_component_of_wind_10m = surface_data[1]
v_component_of_wind_10m = surface_data[2]
temperature_2m = surface_data[3]

with nc.Dataset(
os.path.join(output_dir, file_name), "w", format="NETCDF4_CLASSIC"
) as nc_file:
# Create dimensions
nc_file.createDimension("longitude", 1440)
nc_file.createDimension("latitude", 721)

# Create variables
nc_lon = nc_file.createVariable("longitude", np.float32, ("longitude",))
nc_lat = nc_file.createVariable("latitude", np.float32, ("latitude",))
nc_msl = nc_file.createVariable(
"mean_sea_level_pressure", np.float32, ("latitude", "longitude")
)
nc_u10 = nc_file.createVariable(
"u_component_of_wind_10m", np.float32, ("latitude", "longitude")
)
nc_v10 = nc_file.createVariable(
"v_component_of_wind_10m", np.float32, ("latitude", "longitude")
)
nc_t2m = nc_file.createVariable(
"temperature_2m", np.float32, ("latitude", "longitude")
)

# Set variable attributes
nc_lon.units = "degrees_east"
nc_lat.units = "degrees_north"
nc_msl.units = "Pa"
nc_u10.units = "m/s"
nc_v10.units = "m/s"
nc_t2m.units = "K"

# Write data to variables
nc_lon[:] = np.linspace(0.125, 359.875, 1440)
nc_lat[:] = np.linspace(90, -90, 721)
nc_msl[:] = mean_sea_level_pressure
nc_u10[:] = u_component_of_wind_10m
nc_v10[:] = v_component_of_wind_10m
nc_t2m[:] = temperature_2m

logger.info(
f"Convert output surface data file {surface_file} as nc format and save to {output_dir}/{file_name}."
)


def convert_upper_data_to_nc(upper_file: str, file_name: str, output_dir: str) -> None:
# Load the saved numpy arrays
upper_data = np.load(upper_file)
geopotential = upper_data[0]
specific_humidity = upper_data[1]
temperature = upper_data[2]
u_component_of_wind = upper_data[3]
v_component_of_wind = upper_data[4]

with nc.Dataset(
os.path.join(output_dir, file_name), "w", format="NETCDF4_CLASSIC"
) as nc_file:
# Create dimensions
nc_file.createDimension("longitude", 1440)
nc_file.createDimension("latitude", 721)
nc_file.createDimension("level", 13)

# Create variables
nc_lon = nc_file.createVariable("longitude", np.float32, ("longitude",))
nc_lat = nc_file.createVariable("latitude", np.float32, ("latitude",))
nc_geopotential = nc_file.createVariable(
"geopotential", np.float32, ("level", "latitude", "longitude")
)
nc_specific_humidity = nc_file.createVariable(
"specific_humidity", np.float32, ("level", "latitude", "longitude")
)
nc_temperature = nc_file.createVariable(
"temperature", np.float32, ("level", "latitude", "longitude")
)
nc_u_component_of_wind = nc_file.createVariable(
"u_component_of_wind", np.float32, ("level", "latitude", "longitude")
)
nc_v_component_of_wind = nc_file.createVariable(
"v_component_of_wind", np.float32, ("level", "latitude", "longitude")
)

# Set variable attributes
nc_lon.units = "degrees_east"
nc_lat.units = "degrees_north"
nc_geopotential.units = "m"
nc_specific_humidity.units = "kg/kg"
nc_temperature.units = "K"
nc_u_component_of_wind.units = "m/s"
nc_v_component_of_wind.units = "m/s"
# Write data to variables
nc_lon[:] = np.linspace(0.125, 359.875, 1440)
nc_lat[:] = np.linspace(90, -90, 721)
nc_geopotential[:] = geopotential
nc_specific_humidity[:] = specific_humidity
nc_temperature[:] = temperature
nc_u_component_of_wind[:] = u_component_of_wind
nc_v_component_of_wind[:] = v_component_of_wind

logger.info(
f"Convert output upper data file {upper_file} as nc format and save to {output_dir}/{file_name}."
)


def convert(cfg: Dict):
output_dir = cfg.output_dir

convert_surface_data_to_nc(
osp.join(output_dir, "output_surface.npy"), "output_surface.nc", output_dir
)
convert_upper_data_to_nc(
osp.join(output_dir, "output_upper.npy"), "output_upper.nc", output_dir
)


@hydra.main(version_base=None, config_path="./conf", config_name="pangu_weather.yaml")
def main(cfg: Dict):
if cfg.mode == "infer":
convert(cfg)
else:
raise ValueError(f"cfg.mode should in ['infer'], but got '{cfg.mode}'")


if __name__ == "__main__":
main()
Loading