I am looking to build a dataset of generation by plant over time. I have currently compiled the data contained in the ‘Generation_MD’ files here using Gen_Code to identify individual plants. As I understand it, this data maps injections into the grid at individual nodes (POC_Code) to plants that are known to generate at these nodes, and does not include embedded generation.
In sorting through the data, I have noticed some data which appears to be missing:
- The plant ‘Kumara’ only appears in the data up to 2009, injecting KUM0111. The plant ‘kumara_dillmans’ seems to inject at KUM0661 for a brief period in 2009 but also disappears past 2009.
- The plant ‘Cobb’ appears to be missing from between 2015 and 2020. Up to 2015 Cobb appears to inject at COB0661 and then it disappears until 2020, when it begins to inject at KUM0661 (but with a much lower GWh output).
I understand that there is an alternative source of generation (at node-level) available in the generation trends data here and I have compared the daily totals for each POC_Code contained in ‘Generation_MD’ to the equivalent node in this data. In this data, the node STK0661 shows generation from the day after COB0661 stops (continuing at a similar GWh output), and the node KUM0661 has generation throughout the period.
I was hoping you might be able to help with a few questions relating to this issue:
- Is there any reason why this data is missing from ‘Generation_MD’?
- Is the generation trends data a suitable alternative source of plant-level generation for just these nodes?
- In general, the generation by POC_Code in ‘Generation_MD’ is sometimes (but not always) different to the generation by node in the generation trends data where the nodes exist in both datasets. Why could this be the case? I understand that the generation trends data in can be revised for as much as 14 months following first publication, but some differences exist for data going further back than this.