Frost v1 (work in progress)
DISCLAIMER: This version of Frost is still work in progress. MET Norway does not guarantee that the service will always behave 100% according to documentation or expectations.


Downloading Large Datasets

To download a dataset that is too large to fit in a single response, we can use a concept called pagination. The idea is to issue multiple requests and assemble the resulting responses into a total dataset.

Overview

In many cases a request to Frost results in a single response that contains the complete dataset matching the request.



For practical reasons Frost imposes a limit to the size of the dataset that can be returned in a single response. A request that would result in a larger dataset will fail with a 403 Forbidden response.



The client can download a large dataset by sending multiple requests and assembling the subresponses into a total dataset.



Using a standard pagination protocol can make it easier to download a large dataset and will also minimize the number of iterations (thus minimizing the total communication overhead).



Datasets and pages

A Frost dataset is defined as one or more time series, each containing zero or more observations. Within a time series, observations are always ordered on increasing observation time.



The size of a dataset is defined in terms of the number of time series headers plus all individual observations. If a given request would cause the total number of such items to exceed the predefined limit, the dataset will need to be partitioned and downloaded by sending multiple requests. The resulting subdatasets are called pages.

The page sequence may not necessarily be aligned on time series. That is, a dataset may be split (possibly more than once) in the middle of a time series. The header is included in all parts. The following example shows a situation where the size limit is five and the total dataset is downloaded in three pages.



Depending on the situaton, the user may need to take such splitting of time series into account when assembling the pages into a total dataset.

Technique 1: Modify query parameters

The easiest approach is sometimes to partition the dataset by splitting along one or more query parameter dimensions.

Let's say that you request Frost for wind and temperature observations from three weather stations throughout the entire year 2021. The resulting dataset would consist of 2x3=6 time series and would typically be too large to fit in a single response (e.g. if the limit is 105 items and there is one observation every 10 minutes).

The first thing to try could be to send one request for the wind observations and then a second one for the temperature observations. If there are still problems with some of the pages being too large, we could split further by sending three requests for the wind observations, one per station. And so on.

This technique requires the query parameters of successive requests to be modified in a systematic way so that all combinations are eventually covered.

Technique 2: Use a standard pagination protocol

In some cases it can be easier to download a large dataset by implementing a standard pagination protocol. The idea here is to keep the URL (including the query parameters) constant, but instead set certain HTTP request headers according to a few simple rules. The headers represents state information that allows Frost to keep track of how far the client has got into the page sequence used for downlading the dataset for this request. Also, since the client doesn't know in advance how many pages will be required to download the total dataset, it must assume that any response (including the first!) may be the last one.

First request

To initiate the protocol, the client simply ensures that the first request has the HTTP header X-Frost-Ptsheader set to an empty string.

Last response

To indicate the final page, Frost sets the HTTP header X-Frost-Ptsnextheader in the response to an empty string.

General/intermediate request-response

Upon receiving a response R with a non-empty X-Frost-Ptsnextheader, the headers of the next request must be set from the headers of R as follows:
  • X-Frost-Ptsheader set to value of X-Frost-Nextptsheader
  • X-Frost-Ptsbaseid set to value of X-Frost-Nextptsbaseid
  • X-Frost-Ptime set to value of X-Frost-Nextptime
That's all there is to it for the protocol as such. Obviously, each subdataset must be aggregated into what will eventually constitute the total dataset after the final response has been received.

Example

This Python example demonstrates how to use the pagination protocol. Note: the primary purpose of this example is to show the operation of the protocol as such. The subdatasets in the responses will not be written to disk or otherwise processed.