New access arrangements to EMI datasets (retirement of anonymous FTP)

  • This discussion is locked, please start a new discussion.
  • 5.7K Views
  • Last post 26 April 2022
Matthew Keir posted this 16 October 2018

The old anonymous FTP access to the EMI datasets has been replaced by REST-based access to Microsoft Azure storage.  The existing FTP access and testing sites will be disestablished on 8 November 2018. Please ensure you have adjusted any scripts you rely on before then. 

These options are being provided following the testing and feedback received here and via email. Thank you to those users who participated in testing and provided feedback.

Four methods to access EMI datasets are supported

1. Web-based access via www.emi.ea.govt.nz

You can still use the EMI website to browse content as you always have. The main menu categories have a 'datasets' option in the drop-down that lets users browse folder content and download files one at a time. Each folder will usually be supported by a short paragraph describing its content.

This method is best for ad-hoc downloads of a few files every now and again and provides the most descriptive information to support the data.

2. Access via a storage client (Azure Storage Explorer) 

If you’d like to browse through the datasets as you’ve previously done with an FTP client, you can now use Azure Storage Explorer and connect to EMI datasets using this Shared Access Signature URI: 

https://emidatasets.blob.core.windows.net/publicdata?sv=2020-08-04&si=exp2023-03-31&sr=c&sig=TmXW68yI9Z2PUnheGD6PAy3c8cTdoug1tY7UrDMWuVE%3D
https://emidatasets.blob.core.windows.net/publicdata?sv=2021-10-04&si=exp2024-03-31&sr=c&sig=fH6rIJPLMPtmt37cC6CZk44UDLs0E0C9Sy695KQCxlo%3D

Edit April 2024: Updated SAS URI for 2024 onwards

https://emidatasets.blob.core.windows.net/publicdata?sv=2021-10-04&si=publicdata&sr=c&sig=f034UWz1xmMbk89jd76zY0M%2BwycFDhhumejUrjqlfIw%3D

This method is best if you are used to using a client or want to manually download whole folders or a large set of files.

Figure 1: Add the URI above in the connect dialogue box

The access token included in the URI above does expire and will be updated in the future. The expiry date is denoted as part of the URI “si=exp2019-12-31” we will update users by posting on the forum prior to this expiry date.

3. Shell/Command line access (AzCopy)

Ideal for both one offs and scripting. Instructions and download links here. Use the second half of the URI above to connect ie:

Example 1: Download a file

azcopy.exe copy "https://emidatasets.blob.core.windows.net/publicdata/Datasets/Wholesale/MappingsAndGeospatial/NetworkSupplyPointsTable/20230404_NetworkSupplyPointsTable.csv??sv=2021-10-04&si=publicdata&sr=c&sig=f034UWz1xmMbk89jd76zY0M%2BwycFDhhumejUrjqlfIw%3D" "C:\Temp"

Example 2: List files

azcopy.exe list "https://emidatasets.blob.core.windows.net/publicdata/Datasets/Wholesale/MappingsAndGeospatial/NetworkSupplyPointsTable/??sv=2021-10-04&si=publicdata&sr=c&sig=f034UWz1xmMbk89jd76zY0M%2BwycFDhhumejUrjqlfIw%3D" 

4. Full programmatic access

If you want to automate access to the datasets, or you already have scripts that do this, they will need adjusting. The BLOB storage endpoint to use is the same as above or you can just directly access the storage container via https://emidatasets.blob.core.windows.net/publicdata?restype=container&comp=list.

This method is best if you want to set up scripts to automatically download files into your own system.

The date modified of files can be used to find the latest files. This process will normally be fine, although some intermediate files may get refreshed multiple times. In addition, we may infrequently regenerate entire sets of files as we improve the data quality and align formats. We’ll aim to notify you on this forum of any such changes ahead of time.

Example 1: Access more list information

Each list request will contain a maximum of 5000 blobs. If there are more than 5000 blobs a marker value is included in the ‘NextMarker’ element at the end of the XML response. To return the next set of results, pass the value returned in the NextMarker tag as the marker parameter in the request URI.

https://emidatasets.blob.core.windows.net/publicdata?restype=container&comp=list&marker=2!144!MDAwMDY0IURhdGFzZXRzL1dob2xlc2FsZS9CaWRzQW5kT2ZmZXJzL09mZmVycy8yMDE2LzIwMTYxMjEwX09mZmVycy5jc3YhMDAwMDI4ITk5OTktMTItMzFUMjM6NTk6NTkuOTk5OTk5OVoh

Example 2: Access to a folder

Alternatively, you can narrow your search to a specific folder by using the 'prefix' parameter in the request URI.

https://emidatasets.blob.core.windows.net/publicdata?restype=container&comp=list&prefix=Datasets/Wholesale/BidsAndOffers/Bids/2018

Example 3: Access to a file

https://emidatasets.blob.core.windows.net/publicdata/Datasets/Wholesale/BidsAndOffers/Bids/2018/20181015_Bids.csv

Further assistance is available via the links below:

REST reference to access AZURE BLOB storage: https://docs.microsoft.com/en-us/rest/api/storageservices/Blob-Service-REST-API 

The following articles offer specific guidance in your language of choice:

 

  • Liked by
  • msouness
Order by: Standard | Newest | Votes
Josh Smith posted this 22 October 2018

Hi Matthew,

Just wondering if or when the SPD raw case files will be moved over to the Azure Storage platform?

I can't see any case files when I go to this XML page "https://emidatasets.blob.core.windows.net/publicdata?restype=container&comp=list"

Thanks in advance!

Josh

Matthew Keir posted this 24 October 2018

Hi Josh,

Did you check the right location? I see you had a discussion with Phil earlier about the change for case files (https://www.emi.ea.govt.nz/Forum/thread/final-pricing-spd-case-files-and-vspd-gdx-files-location-about-to-change/)

In the Azure storage:

Yesterday's file is: https://emidatasets.blob.core.windows.net/publicdata/Datasets/Wholesale/FinalPricing/CaseFiles/2018/MSS_211112018101100014_0X.ZIP

The whole folder is: https://emidatasets.blob.core.windows.net/publicdata?restype=container&comp=list&prefix=Datasets/Wholesale/FinalPricing/CaseFiles/2018

We're still experiencing occasional issues with the latest modified date being overwritten. We should have this sorted soon.

Cheers,

Matthew

 

davidw posted this 05 November 2018

Hi Matthew,

I'm downloading the full XML index (https://emidatasets.blob.core.windows.net/publicdata?restype=container&comp=list), and then parsing that to get the addresses of the files I want, but it doesn't seem to include all files. It's mostly files of the form "HydrologicalModellingDataset", "BidsAndOffers" or "AncillaryServices".

Any idea why that might be? Is there, perhaps, a limit on how many results get returned when looking at the whole structure? Or am I doing something stupid?

Cheers,

Dave

Edit - I've done a workaround by getting the XML file for each specific folder, and that seems to come under the record limit. But I'll leave this comment here in case anyone else had the same question.

Matthew Keir posted this 05 November 2018

Hi Dave,

That was news to me too. Yes, you are correct - 5000 blobs per list request. You'll need to use the 'NextMarker' element at the end of the XML in the first list in your next list request URI. I've edited the original post to include a brief description so all the info for users is in one place.

More info is in the REST reference to access AZURE BLOB storage available via the link in the post.

Hope that helps.

Matthew

davidw posted this 09 November 2018

Thanks. I'll also keep an eye out for the NextMarker element.

geoff_ey posted this 25 September 2019

Hi Matt,

I'm using the REST API detailed above to acces EMI, through Python. I'm trying to retrieve URLs to datafiles in the EMI. When retrieving URLs for "Final Prices", according to the API, the last file available is for July 2018. This is the last available month for Reserve Prices as well. I also tested against generation data. In this case, the last URL provided is 201812_Generation_MD.csv. However, 201809_Generation_MD.csv, 201810_Generation_MD.csv and 201811_Generation_MD.csv data is missing! I've tried to access the missing files through the Azure Storage Explorer and I can see that they exist but they have a different icon next to them - maybe a clue as to why they aren't appearing through the API call. I've attaced a screenshot below.

Do you know what this means Matt? Am I meant to be providing some other parameters to my call? Please advise.

Thanks,

Geoff

.

Oliver Butt posted this 25 September 2019

Hi Geoff, The earlier files were uploaded to Azure storage as part of the migration to Azure and got given the content type "text/csv". Files uploaded after October 2018 were created by scheduled jobs and got the content type "application/octet-stream". This is the cause of the differing icons.

If I run the code below for Generation_MD I get files for 2019, can you confirm that you do too?


from azure.storage.blob import BlockBlobService

service = BlockBlobService(account_name="emidatasets")

blobs = service.list_blobs("publicdata", "Datasets/Wholesale/Generation/Generation_MD/2019")

for b in blobs:

   print(b.name)


 

Thanks,

Oliver

 

 

geoff_ey posted this 26 September 2019

Hi Oliver, that was the issue I was filtering for text/csv! I get all the files now. Thanks so much for your help.

 

Geoff.

Chets posted this 19 February 2020

Hi Matthew

What is the latest 'Shared Access Signature URI'  to access the EMI datasets.

I was following your instructions on accessing the datasets and then maybe work by way to accessing this programmatically.... when i came across the fact the a new URI string is needed.

Could you please tell me where this can be found please.  Thank you for your help in this matter.

 

Kind Rgds

Chets

Matthew Keir posted this 20 February 2020

Hi Chets et al

We've added a new URI, all users should migrate to this as the old one that was extended (exp2020-12-31) will stop working soon.

https://emidatasets.blob.core.windows.net/publicdata?sv=2018-03-28&si=exp2021-03-31&sr=c&sig=RgEr3fnUCRgCg%2FGc%2BYus0OJHXpWQBZvUPpIDxsOtJQE%3D

Please let us know any issues.

Regards,

Matthew

Chets posted this 20 February 2020

Hi Matthew

Thanks for the reply. 

I tried the new link but it didn't work for me.... I have attached a screenshot of what the result was... Perhaps you could let me know what i am doing wrong or missing out....

Thanks

Chets

 

Blair posted this 05 March 2020

Hi Chets,

Please use this URI https://emidatasets.blob.core.windows.net/publicdata?si=exp2019-12-31&sv=2019-02-02&sr=c&sig=T28UUskNCAi%2BUOmKiFrQzWcekINwF9PKxxjQyfwSemk%3D 

Let me know if you run into any problems.

 

geoff_ey posted this 02 April 2020

Hi Matt,

Your new link with the udpated url and the link Blair has posted above isn't working. I get the same error as Chets above.

Could you please update the link and it'd be great if you could update the actual body of the post with the new URL as well.

Thanks,

Geoff.

Matthew Keir posted this 02 April 2020

Hi Geoff,

Have updated the main discussion above - hope that sorts it!

Cheers,

Matthew

geoff_ey posted this 02 April 2020

I can confirm the new link works. Thanks Matt.

Show more posts

This discussion is locked, please start a new discussion.