Data warehouses like Redshift and Snowflake have a super useful DATEDIFF
function – given two timestamps and a date part (hour, year, week, etc) it'll return how far apart they are. For example,
DATEDIFF('week', '06-01-2021', '06-28-2021')
returns 4
This function can be used to bucket times together, like when doing a cohort analysis. Unfortunately Postgres simply doesn't have it. For operational data it's probably not often used, but if you do analytical queries it can be pretty helpful.
If you want to use Postgres as a data warehouse you'll probably want it. A great blog by sqlines suggests an implementation, but I found it didn't quite match the functionality of most data warehouses.
Why is this even a function? Why not just use date_part? Because time rolls over: the last month of the year is 12 and the first is 1, so naively using date part would give us -11.
I'll jump straight to the code for those who like to see the answer first, and further down explain how it works
-- DateDiff function that returns the difference between two timestamps in the given date_part (weeks, months, etc) as an integer
-- This behaves like the DateDiff function in warehouses like Redshift and Snowflake, which count the boundaries between date_parts
CREATE OR REPLACE FUNCTION datediff (date_part VARCHAR(30), start_t TIMESTAMP, end_t TIMESTAMP)
RETURNS INT AS $diff$
DECLARE
years INT = 0;
days INT = 0;
hours INT = 0;
minutes INT = 0;
BEGIN
-- year is straightforward. Convert to an integer representing the year and subtract
years = DATE_PART('year', end_t) - DATE_PART('year', start_t);
IF date_part IN ('y', 'yr', 'yrs', 'year', 'years') THEN
RETURN years;
END IF;
-- quarter and month use integer math: count years, multiply to convert to quarters or months
-- as an integer and then subtract
IF date_part IN ('quarter', 'quarters', 'qtr', 'qtrs') THEN
RETURN years * 4 + (DATE_PART('quarter', end_t) - DATE_PART('quarter', start_t));
END IF;
IF date_part IN ('month', 'months', 'mon', 'mons') THEN
RETURN years * 12 + (DATE_PART('month', end_t) - DATE_PART('month', start_t));
END IF;
-- Weeks only fit evenly in days.
-- Truncate by week (which returns the start of the first day of the week)
-- then subtract by days.
IF date_part IN ('week', 'weeks', 'w') THEN
RETURN DATE_PART('day', (DATE_TRUNC('week', end_t) - DATE_TRUNC('week', start_t)) / 7);
END IF;
-- Day is similar to week. Truncate to beginning of day so that we can diff whole days
days = DATE_PART('day', DATE_TRUNC('day', end_t) - DATE_TRUNC('day', start_t));
IF date_part IN ('day', 'days', 'd') THEN
RETURN days;
END IF;
-- hours, minutes, and seconds all just build up from each other
hours = days * 24 + (DATE_PART('hour', end_t) - DATE_PART('hour', start_t));
IF date_part IN ('hour', 'hours', 'h', 'hr', 'hrs') THEN
RETURN hours;
END IF;
minutes = hours * 60 + (DATE_PART('minute', end_t) - DATE_PART('minute', start_t));
IF date_part IN ('minute', 'minutes', 'm', 'min', 'mins') THEN
RETURN minutes;
END IF;
IF date_part IN ('second', 'seconds', 's', 'sec', 'secs') THEN
RETURN minutes * 60 + (DATE_PART('second', end_t) - DATE_PART('second', start_t));
END IF;
RETURN 0;
END;
$diff$ LANGUAGE plpgsql;
This gist creates a function in Postgres that implements the DATEDIFF
function found in Snowflake, BigQuery, and Redshift.
This function take a time unit and two dates, and counts the number of date boundaries crossed between them. It will always return an integer, so it's very useful for grouping date differences together.
So what does counting date boundaries mean? It's best illustrated with an example
DATEDIFF('year', '12-31-2020', '01-01-2021')
returns 1
because even though the two dates are a day apart, they've crossed the year boundary.
Similarly DateDiff('week', '05-02-2021', '05-03-2021')
is 1
, because 5/02/21 is a Sunday and 5/03/21 is a Monday
Note that Postgres uses ISO 8601 week numbering, so weeks will always start on Mondays. This isn't consistent across databases – Redshift uses Sunday, while in Snowflake it's configurable.
The code basically works in two parts. It computes year, quarter, and month boundaries by subtracting integers. It computes weeks and below by truncating.
Years is easy: 2021 - 2020 = 1
Months – just use DATE_PART
to get the month as an integer (1-12). Subtract the two and add in the year expressed in months to handle rollover.
2021/2 - 2020/11 would be (2 - 11) + 1 * 12 = 3
Quarters is effectively the same as months (since DATE_PART
gives us quarters as an integer from 1-4).
Weeks and days use a slightly different approach. Weeks are special because they don't fit evenly into months, years or anything bigger. They only fit evenly into days.
The way to count weeks is to truncate the start and end timestamps to the first day of the week, then subtract days. That will give us an integer that's a multiple of 7. Note that the 'first day of the week' is not uniform across databases. Postgres uses Monday.
RETURN DATE_PART('day', (DATE_TRUNC('week', end_t) - DATE_TRUNC('week', start_t)) / 7);
Subtracting the days returns an interval
, so we use DATE_PART
to get an integer number of days. Since this is always divisible by 7, we now have our number of weeks.
Days is computed the same way, without dividing by 7. Instead of the start of the week we truncate to the beginning of the day to ensure the interval is an exact number of days.
Hours, minutes, and seconds are built out just like months and quarters.
hours = days * 24 + (DATE_PART('hour', end_t) - DATE_PART('hour', start_t));
We extract the hour as an integer (0-23) for each timestamp, subtract from each other, and multiply by the days expressed as hours.
Minutes is the same calculation, but it uses hours * 60
since we already have it.
Seconds is also the same, using minutes * 60
.
If you need to handle milliseconds or below it's easy to extend the code. I stopped at seconds.