Paginated Search
Pagination is a crucial feature in any API that deals with large datasets, and our FHIR API is no exception. When querying resources, it's often impractical or unnecessary to return all matching results in a single response. Pagination allows clients to retrieve results in manageable chunks, improving performance and reducing network load.
Our FHIR API implements two distinct pagination methods:
- Offset-based pagination
- Cursor-based pagination
Each method has its own use cases, advantages, and limitations, which we'll explore in detail in this documentation.
Pagination Methods
Offset-based Pagination
Offset-based pagination is implemented using the _offset
parameter. This method is straightforward and allows clients to skip a specified number of results.
Usage:
- The
_offset
parameter accepts an integer value. _offset=0
returns the first page of results.- Increasing the offset value skips that many rows in the result set.
Example:
GET /Patient?_offset=20
This request would return results starting from the 21st matching Patient resource.
- Our server supports
_offset
values up to 10,000. - Offset-based pagination can lead to performance issues with very large datasets.
Implementation Details:
Behind the scenes, the _offset
parameter translates to a SQL LIMIT
clause. While this is efficient for smaller offsets, it can cause performance and stability issues for larger offsets, hence the 10,000 limit.
When to Use:
Offset-based pagination is best suited for:
- Smaller datasets
- Use cases where you need to jump to a specific page number
- Scenarios where the total number of results is important
Cursor-based Pagination
Cursor-based pagination is implemented using the _cursor
parameter. This method uses an opaque string value that represents a pointer to a specific item in the result set.
Usage:
- The
_cursor
parameter accepts a string value provided by the server in previous responses. - The initial request doesn't include a
_cursor
parameter. - Subsequent requests use the
_cursor
value from theBundle.link
element withrelation="next"
in the previous response.
Example:
GET /Patient?_cursor=abc123xyz
Advantages:
- Supports pagination through very large datasets (millions of resources)
- More performant than offset-based pagination for large offsets
- Provides consistent results even if the underlying data changes between requests
Limitations:
- Currently only supported for searches that are sorted on
_lastUpdated
in ascending order. - The cursor values are opaque and should be treated as black boxes by clients.
Implementation Details:
Cursor-based pagination uses database indexes, making it much faster than offset-based pagination, especially for large datasets. The _cursor
values are generated by the server and encode information about the last returned item's position in the result set.
When to Use:
Cursor-based pagination is ideal for:
- Large datasets
- Use cases like analytics or data export where you need to iterate through all resources
- Scenarios where performance is critical
Note: While cursor-based pagination requires sorting by _lastUpdated
ascending, it still works with other search filters. For example:
GET /Observation?code=xyz&_sort=_lastUpdated&_cursor=abc123xyz
Alway sort when paginating
When paginating through search results, it is essential to sort the results to ensure consistent output across pages. If you don't sort the results, you may see different resources on each page, which can lead to unexpected behavior.
See "Sorting the Results" for more info.
Setting the page size with the _count
parameter
To set the number of items returned per page, use the _count
query parameter. In the Medplum API, the default page size is 20, and the maximum allowed page size is 1000.
Here's an example query that sets the page size to 50:
- TypeScript
- cURL
await medplum.searchResources('Patient', { _count: '50' });
curl https://api.medplum.com/Patient?_count=50
In this example, the search will return up to 50 Patient resources per page.
Pagination can be difficult when you are including linked resources, as you will not know how many of each resource will be returned. It may make sense to use chained searches instead so that only resources of one type are returned.
Getting the total number of results with _total
To include the total count of matching resources in the search response, you need to use the _total
parameter in your search query. This information is particularly useful for pagination and understanding the scope of the data you are dealing with.
The _total
parameter can have three values: accurate
, estimate
, and none
.
none (Default) | No total is returned |
estimate | Tells the Medplum server that you are willing to accept an approximate count. This is usually faster than the accurate option as it may use database statistics or other methods for estimating the total number without scanning the entire dataset. This option is particularly useful when you need a rough idea of the number of resources without requiring precision. |
accurate | The Medplum server will perform additional processing to calculate the exact number of resources that match the search criteria. This can be more time-consuming, especially for large datasets, but you will receive a precise count. Use this option when an exact number is crucial for your use case. |
Because computing counts is an expensive operation, Medplum only produces estimated counts above a certain threshold.
- Medplum first computes an estimated count.
- If this count is above below the threshold, an accurate count is computed.
- Otherwise, the estimated count is returned even if
_total=accurate
is specified.
For customers on the Medplum hosted service, this threshold is set to 1 million entries
For self-hosted customers, this threshold is server-level configuration called accurateCountThreshold
(learn more).
By default, the search responses do not include totals. Choosing between accurate
and estimate
depends on your specific requirements. For large datasets, using estimate
can significantly improve response times, but at the cost of precision.
Example Query
Here is an example of how to use the _total
parameter in a search query:
- TypeScript
- cURL
await medplum.search('Patient', { name: 'Smith', _total: 'accurate' });
curl https://api.medplum.com/fhir/R4/Patient?name=smith&_total=accurate
This query will search for patients with the name "smith" and will return a Bundle with the accurate total number of matching resources included.
const response: Bundle = {
resourceType: 'Bundle',
id: 'bundle-id',
type: 'searchset',
total: 15,
entry: [
{
fullUrl: 'http://example.com/base/Patient/1',
resource: {
resourceType: 'Patient',
// ...
},
},
{
fullUrl: 'http://example.com/base/Patient/2',
resource: {
resourceType: 'Patient',
// ...
},
},
// ...
],
// ...
};
The Medplum SDK provides the searchResources
helper function. This function unwraps the response bundle of your search results and returns an array of the resources that match your parameters. If you want to get the count when using this function, the .bundle
property is added to the array. You can access the total using response.bundle.total
.
Navigating pages with the Bundle.link
element
When you perform a paginated search, the response will be a Bundle
resource containing a list of resources matching the query. The Bundle
resource will also have a link
field containing navigation links to help you traverse through the search results.
The Bundle.link
field will include the following relations:
self
: The URL of the current search results page.first
: The URL of the first page of search results.previous
: The URL of the previous page of search results (if applicable).next
: The URL of the next page of search results (if applicable).
Here's an example of how the Bundle.link field may look :
'link': [
{
relation: 'self',
url: 'https://example.com/Patient?_count=50&_offset=60',
},
{
relation: 'first',
url: 'https://example.com/Patient?_count=50&_offset=0',
},
{
relation: 'previous',
url: 'https://example.com/Patient?_count=50&_offset=10',
},
{
relation: 'next',
url: 'https://example.com/Patient?_count=50&_offset=110',
}
];
To navigate between pages, simply follow the URLs provided in the Bundle.link
field.
The URLs in the Bundle.link
will opportunistically use _cursor
pagination if compatible with the search query (see Cursor-based pagination limitations). If _cursor
is not compatible, the URLs will use _offset
pagination.
It is strongly recommended to use the Bundle.link
field to navigate between pages, as it ensures that you are following the correct pagination method for the search query.
Navigating pages with searchResourcePages()
The searchResourcePages()
method of the MedplumClient provides an async generator to simplify the iteration over resource pages.
for await (const patientPage of medplum.searchResourcePages('Patient', { _count: 10 })) {
for (const patient of patientPage) {
console.log(`Processing Patient resource with ID: ${patient.id}`);
}
}
The array returned by searchResourcePages
also includes a bundle
property that contains the original Bundle
resource. You can use this to access bundle metadata such as Bundle.total
and Bundle.link
.
The searchResourcePages
method uses Bundle.link
to navigate between pages, ensuring that you are following the correct pagination method for the search query. The URLs in the Bundle.link
will opportunistically use _cursor
pagination if compatible with the search query (see Cursor-based pagination limitations). If _cursor
is not compatible, the URLs will use _offset
pagination.
Setting the page offset with the _offset
parameter
To set the page offset, or the starting point of the search results, use the _offset
query parameter. This allows you to skip a certain number of items before starting to return results.
Here's an example query that sets the page offset to 30:
- TypeScript
- cURL
await medplum.searchResources('Patient', { _count: '50', _offset: '30' });
curl https://api.medplum.com/Patient?_count=50&_offset=30
In this example, the search will return up to 50 Patient resources per page, starting from the 31st item in the result set.
Using _offset
pagination is discouraged for large datasets, as it can lead to performance issues. For large datasets, consider using _cursor
pagination instead.