DISCLAIMER: This version of Frost
is still work in progress.
MET Norway does not guarantee that the service will always behave
100% according to documentation or expectations.
Downloading Large Datasets
To download a dataset that is too large to fit in a single response, we can use a concept
called
pagination. The idea is to issue multiple requests and assemble the resulting
responses into a total dataset.
Overview
In many cases a request to Frost results in a single response that contains the complete dataset
matching the request.
For practical reasons Frost imposes a limit to the size of the dataset that can be returned in
a single response. A request that would result in a larger dataset will fail with a
403 Forbidden response.
The client can download a large dataset by sending multiple requests and assembling the
subresponses into a total dataset.
Using a standard pagination protocol can make it easier to download a large dataset and will
also minimize the number of iterations (thus minimizing the total communication overhead).
Datasets and pages
A Frost
dataset is defined as one or more time series, each containing zero or more
observations. Within a time series, observations are always ordered on increasing observation
time.
The
size of a dataset is defined in terms of the number of time series
headers
plus all individual
observations. If a given request would cause the total number of such
items to exceed the predefined limit, the dataset will need to be partitioned and downloaded by
sending multiple requests. The resulting subdatasets are called
pages.
The page sequence may not necessarily be aligned on time series. That is, a dataset may be split
(possibly more than once) in the middle of a time series. The header is included in all parts.
The following example shows a situation where the size limit is five and the total dataset is
downloaded in three pages.
Depending on the situaton, the user may need to take such splitting of time series into
account when assembling the pages into a total dataset.
Technique 1: Modify query parameters
The easiest approach is sometimes to partition the dataset by splitting along one or more query
parameter dimensions.
Let's say that you request Frost for wind and temperature observations from three weather
stations throughout the entire year 2021. The resulting dataset would consist of 2x3=6
time series and would typically be too large to fit in a single response (e.g. if the limit
is 10
5 items and there is one observation every 10 minutes).
The first thing to try could be to send one request for the wind observations and then a second one
for the temperature observations. If there are still problems with some of the pages being too
large, we could split further by sending three requests for the wind observations, one per station.
And so on.
This technique requires the query parameters of successive requests to be modified in a
systematic way so that all combinations are eventually covered.
Technique 2: Use a standard pagination protocol
In some cases it can be easier to download a large dataset by implementing a standard
pagination protocol. The idea here is to keep the URL (including the query parameters) constant,
but instead set certain
HTTP request headers according to a few simple rules. The
headers represents state information that allows Frost to keep track of how far the client has got
into the page sequence used for downlading the dataset for this request. Also, since the client
doesn't know in advance how many pages will be required to download the total dataset, it must
assume that any response (including the first!) may be the last one.
First request
To initiate the protocol, the client simply ensures that the first request has the HTTP header
X-Frost-Ptsheader
set to an empty string.
Last response
To indicate the final page, Frost sets the HTTP header
X-Frost-Ptsnextheader
in the
response to an empty string.
General/intermediate request-response
Upon receiving a response R with a non-empty
X-Frost-Ptsnextheader
, the headers of
the next request must be set from the headers of R as follows:
X-Frost-Ptsheader
set to value of X-Frost-Nextptsheader
X-Frost-Ptsbaseid
set to value of X-Frost-Nextptsbaseid
X-Frost-Ptime
set to value of X-Frost-Nextptime
That's all there is to it for the protocol as such. Obviously, each subdataset must be aggregated
into what will eventually constitute the total dataset after the final response has been received.
Example
This
Python example demonstrates how to
use the pagination protocol.
Note: the primary purpose of this example is to show
the operation of the protocol as such. The subdatasets in the responses will not be
written to disk or otherwise processed.