- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 0
Description
Should be fairly simple to implement by just modifying the for loop at the bottom of Perceiver.forward().  Note that x is the latents.
Here's a quick diagram. My additions are in black.
(I've removed the "weight sharing" from the diagram, but weight sharing would absolutely still be part of this)
The paper talks about using different timestep inputs. But I don't think the paper talks about using different outputs for each timestep. Maybe that's a bad idea :)
Related to: openclimatefix-archives/predict_pv_yield#35
Background
The ultimate aim is to predict solar electricity generation for a single solar system, over the next few hours, every five minutes. The inputs to the model will include 5-minutely satellite data, real-time power from thousands of solar PV systems, etc.
Whilst satellite imagery is probably great for telling us that "there's a big dark cloud coming in 25 minutes", satellite imagery probably doesn't tell us exactly how much sunlight will get through that big dark cloud.. so we need to blend satellite data with measurements from the ground. IMHO, this is where ML can really shine: Combing multiple data sources, many of which will be quite low quality data sources.
The input to the "Perceiver RNN" would include:
- Recent history (for the last hour or so):
- Satellite data (1 image every 5 minutes). Probably just a 64x64 crop of satellite imagery, centred on the solar system that we're making predictions for. 12 channels.
- Ground measurements within the geospatial extent of the satellite imagery:
- Solar electricity generation from all solar systems with the region of interest
- Air-quality measurements
- Rainfall radar
- Weather measurements (temperature, irradiance, wind speed, etc.)
 
 
- Predictions of the next few hours:
- Predicted satellite imagery for the next few hours (using SatFlow)
- Numerical weather predictions
 
(I'm really excited about The Perceiver because our data inputs are "multi-modal", and The Perceiver works really well for multi-modal perception!)
So, maybe we'd actually have two "Perceiver RNNs" (i.e. weights would be shared within the encoder, and within the decoder. But the encoder and decoder would have different weights):
- An "encoder" which gets each timestep of the recent history. The per-timestep outputs would be ignored. The final output would form the latent array for the "decoder". The job of the "encoder" is to "perceive" the recent context, e.g. to use ground-based measurements to calibrate how "dark" each cloud is in the satellite imagery.
- A "decoder" which gets each timestep of our predictions, and outputs each timestep of the predicted solar electricity generation.
One problem with this is that information can only flow in one direction: forwards. So it might be interesting to add a self-attention block which functions over the time dimension:
Maybe the initial latent is the embedding for the solar PV system of interest.
At each input tinestep, the PV would be a concatenation of the power, the embedding of the PV system ID, and the geospatial location.

