The old anonymous FTP access to the EMI datasets has been replaced by REST-based access to Microsoft Azure storage. The existing FTP access and testing sites will be disestablished on 8 November 2018. Please ensure you have adjusted any scripts you rely on before then.
These options are being provided following the testing and feedback received here and via email. Thank you to those users who participated in testing and provided feedback.
Three methods to access EMI datasets are supported
1. Web-based access via www.emi.ea.govt.nz
You can still use the EMI website to browse content as you always have. The main menu categories have a 'datasets' option in the drop-down that lets users browse folder content and download files one at a time. Each folder will usually be supported by a short paragraph describing its content.
This method is best for ad-hoc downloads of a few files every now and again and provides the most descriptive information to support the data.
2. Access via a storage client (Azure Storage Explorer)
If you’d like to browse through the datasets as you’ve previously done with an FTP client, you can now use Azure Storage Explorer and connect to EMI datasets using this Shared Access Signature URI:
Edit: Updated SAS URIhttps://emidatasets.blob.core.windows.net/publicdata?sv=2020-08-04&si=exp2023-03-31&sr=c&sig=TmXW68yI9Z2PUnheGD6PAy3c8cTdoug1tY7UrDMWuVE%3D
This method is best if you are used to using a client or want to manually download whole folders or a large set of files.
Figure 1: Add the URI above in the connect dialogue box
The access token included in the URI above does expire and will be updated in the future. The expiry date is denoted as part of the URI “si=exp2019-12-31” we will update users by posting on the forum prior to this expiry date.
3. Programmatic access
If you want to automate access to the datasets, or you already have scripts that do this, they will need adjusting. The BLOB storage endpoint to use is the same as above or you can just directly access the storage container via https://emidatasets.blob.core.windows.net/publicdata?restype=container&comp=list.
This method is best if you want to set up scripts to automatically download files into your own system.
The date modified of files can be used to find the latest files. This process will normally be fine, although some intermediate files may get refreshed multiple times. In addition, we may infrequently regenerate entire sets of files as we improve the data quality and align formats. We’ll aim to notify you on this forum of any such changes ahead of time.
Example 1: Access more list information
Each list request will contain a maximum of 5000 blobs. If there are more than 5000 blobs a marker value is included in the ‘NextMarker’ element at the end of the XML response. To return the next set of results, pass the value returned in the NextMarker tag as the marker parameter in the request URI.
Example 2: Access to a folder
Alternatively, you can narrow your search to a specific folder by using the 'prefix' parameter in the request URI.
Example 3: Access to a file
Further assistance is available via the links below:
REST reference to access AZURE BLOB storage: https://docs.microsoft.com/en-us/rest/api/storageservices/Blob-Service-REST-API
The following articles offer specific guidance in your language of choice:
- Python: https://azure.microsoft.com/en-us/resources/samples/storage-python-getting-started/
- PowerShell: https://azure.microsoft.com/en-us/resources/samples/storage-powershell-getting-started/
- Node.js: https://azure.microsoft.com/en-us/resources/samples/storage-blob-node-getting-started
Please note that the testing sites discussed in the previous post will be removed on 8 November 2018 along with the FTP access.