Traveling time with Postgres Range Columns
In Challenges of Time-Based Systems Without Proper Database Structures, we looked into everything that went wrong when we tried to build a temporal system without a compatible foundation. In this article, we will describe how we added that foundation to support temporal use cases. We will start by discussing how we built the foundation using Postgres ranges that could be a potential denominator for any time-based system. The solution might not be general enough but it can provide some good insights for building a foundation for the temporal system.
Migrating First Table
We started by migrating the
state_taxes as it contained fewer rows and had fewer dependencies than other tables. The reason for starting with a relatively simple table was to vet the solution with minimum dependencies and then expand to other tables. The first version of the table structure we came up with was as follows.
CREATE TABLE IF NOT EXISTS public.state_taxes ( id bigint NOT NULL DEFAULT nextval('state_taxes_id_seq'::regclass), state_id integer NOT NULL, tax_type character varying COLLATE pg_catalog."default" NOT NULL, rate numeric NOT NULL, effective_range daterange NOT NULL, system_range tsrange NOT NULL, CONSTRAINT state_taxes_pkey PRIMARY KEY (id), CONSTRAINT prevent_overlapping_state_taxes EXCLUDE USING gist ( system_range WITH &&, state_id WITH =, effective_range WITH &&, tax_type WITH = ) )
Understanding State Taxes Structure
The key and important difference from the previous version is two columns
system_range with the addition of the constraint
prevent_overlapping_state_taxes. Let's go through each of them and see what value they add
Effective Range Column
This column unlocks the ability to create timelines by having a rate for a specific start and end date, eliminating the need for
year the column. The clients will add rates only by providing a start date and the backend system will automatically detect the end date for the rate. The benefit of using range columns is that querying becomes easier using powerful Postgres range functions. For example, if a client asks for a rate on a specific effective date we can easily find it by searching a row whose effective range overlaps with the provided effective date.
System Range Column
system_range helps us solve the shoe store problem discussed in the last article. This column stores the validity of data in terms of system time, also in the form of a range with specific start and end dates. When a rate is added, the system will set the current time at the time of change as the start of the validity range. Later if the rate is invalidated, the system will set the end time as the end of the system range when the change was made. This eliminates any need for maintaining
deleted_at columns. The system range actually removes the concept of soft deletes and replaces it with versioning the data with system validity.
You can think of this constraint as a unique constraint but since ranges are involved and we want to check for overlapping ranges, the exclude constraint was used. Exclude constraint basically doesn't allow two rows to exist that return true for the provided gist condition. This helps us ensure we only get one valid row for one effective date.
Adding Timeline Logic to State Taxes
With a solid underlying table structure to support temporal operations next step was to add logic to
StateTaxes model which will ensure the timeline logic of changes as they are added. We defined the following expectations for handling changes
If a rate is added for state tax for the first time for the effective date let's say
2023-01-01 we expected the following record in the table
This row tells us that the rate
0.15 is effective from
2023-01-01 till the end of time and it is valid from
2023-10-16 (the time it was added) to the end of time, for
tax_type=income_tax (identified unique tax rate). This statement can be understood by a few queries, let's ask the system for a rate effective on
SELECT rate FROM state_taxes WHERE state_id = 1 AND tax_type = 'income_tax' AND effective_range @> '2023-05-01'::date #=> 0.15
This seems correct since the rate is effective from
2023-01-01 to end of time, let's ask for the rate before this date
SELECT rate FROM state_taxes WHERE state_id = 1 AND tax_type = 'income_tax' AND effective_range @> '2022-12-31'::date #=> null
As expected since the date is before the date the first rate is effective, the query returned
nil. Now let's query for any rates valid in the system time before the date
SELECT rate FROM state_taxes WHERE state_id = 1 AND tax_type = 'income_tax' AND system_range @> '2022-10-16'::timestamp #=> nil
nil because as far as the system is concerned no rate existed in the system time for
2023-10-16, this is how it helps in the example of a shoe store by finding rates when transactions occurred in the system.
After First Change
If the first change is already added the rest of the changes will fall in one or a combination of the following scenarios
- The new change has the same effective date as the effective date (Correction)
- The new change effective date is before the existing change effective date (Past Change)
- The new change effective date is after the existing change effecting date (Future Change)
Adding a correction
When a new change has the same effective date as an existing change, we need to invalidate the existing change and replace it with a new one. It is called a correction because the new change replaced the old one. If we correct our first change rate from
0.19 the result will look like something below
It shows that we invalidated our first change by adding an end to
system_range of the first change and then added the correction with the new rate. Now if only query valid rates effective on or after
2023-01-01 we get
SELECT rate FROM state_taxes WHERE state_id = 1 AND tax_type = 'income_tax' AND lower(effective_range) >= '2023-01-01' AND upper(system_range) is null # only valid rates have system_range null #=> 0.19
Adding a Past Change
When a new change is added whose effective date is before the already existing change, then the new change should automatically assume an end date as well. This makes sure that end result is a consistent timeline where effective ranges don't overlap. For example, continuing from before, if we add a change for the effective date
2022-12-01 with rate
0.14 then execute the query
SELECT * FROM state_taxes WHERE state_id = 1 AND tax_type = 'income_tax' AND upper(system_range) is null ORDER BY lower(effective_range)
It will return the following result
Adding a Future Change
When a change is added whose effective date is after the existing change, the existing change needs to have a new end date. So in order to apply the change, we correct the existing change by replacing it with a new end date. Now in our example if we add a rate
0.25 with effective date
2023-02-01 the query in the previous example will return the following result
For reference fetching changes including the invalidated ones results in the below
Scaling beyond State Tax Table
After completing the implementation for the state tax table, the next task was to assess how this implementation would work when joining tables and how the same implementation could be applied to other tables. We immediately saw that we needed to modify our approach or rethink our table relations.
Problem with Relations
Initially before adding effectivity to
state_tax table, the id was an explicit primary key to identifying a unique tax rate, whereas the composite key
(state_id, tax_type) served as the implicit primary key. However, with the new structure, the id was no longer the key to identify a tax rate hence won't work as a foreign key meant to identify a unique tax, and reason why we had to resort to using the composite key to identify taxes.
The nature of the issue can be traced to the fact that before the change each row
state_tax was one "tax rate" but after, a row was one "tax rate change". In other words, after changing the structure the table should also have been renamed to
state_tax_changes. To fix the relations we thought about just having a running id in the table to be used as the foreign key in the related tables. Still, the insight that we have fundamentally changed the table prevented us from continuing with the running id hack.
Splitting the Tables
To resolve the relations as they were defined currently we decided to not replace tables but rather split tables into the main model and its effective attributes. So effective attributes of
state_taxes were moved to another table
state_tax_changes. The resulting table structures looked something like the ones below
CREATE TABLE IF NOT EXISTS public.state_taxes ( id bigint NOT NULL DEFAULT nextval('state_taxes_id_seq'::regclass), state_id integer NOT NULL, tax_type character varying COLLATE pg_catalog."default" NOT NULL, )
CREATE TABLE IF NOT EXISTS public.state_tax_changes ( id bigint NOT NULL DEFAULT nextval('state_taxes_id_seq'::regclass), state_tax_id integer NOT NUL rate numeric NOT NULL, effective_range daterange NOT NULL, system_range tsrange NOT NULL, CONSTRAINT state_tax_changes_pkey PRIMARY KEY (id), CONSTRAINT prevent_overlapping_state_taxes EXCLUDE USING gist ( state_tax_id WITH &&, effective_range WITH &&, tax_type WITH = ) )
Although from the implementation perspective splitting tables added more complexity due to breaking up existing tables. However, this complexity was only temporary and was expected to subside with the migration of old tables. The benefit of this approach was that it reflected the true nature of our data tables. Previously one state tax had one rate and now one tax had many which was nicely reflected in
This project was not easy or smooth by easy means as we had to deal with some issues that were not directly related to not have temporality but as we moved ahead with the system the choice of undertaking a large refactor proved to be correct. It was a great reminder that no matter how good you are design is, if it isn't compatible with business it can't get you very far.