Postgresql error generation expression is not immutable - Исправление ошибок и поиск оптимальных решений проблем

November 29, 2021

On this blog, we’ve covered a variety of different column types. So far, we’ve performed various operations on said columns such as
INSERT and
UPDATE to modify the data directly. In this article, we cover generated columns that work differently.

We call the above columns generated because PostgreSQL automatically computes their data based on predefined expressions. However, the crucial thing is that we can’t insert any data into them directly.

Generated columns became available in PostgreSQL 12 in 2019

The SQL standard consists of two types of generated columns:

virtual,
stored.

Virtual generated columns

The idea behind virtual generated columns is that it occupies no disk storage. Therefore, it is computed on the fly when requested. A good example is returning the value with the
SELECT statement. Unfortunately, PostgreSQL currently doesn’t implement virtual generated columns.

Stored generated columns

The stored generated columns occupy storage in the same way as a regular column. However, instead of computing the value every time it is requested, PostgreSQL does so only when the row is modified.

To define a stored generated column, we need to use the
GENERATED ALWAYS AS followed by expression and the keyword
STORED:

CREATE TABLE users (

id serial PRIMARY KEY,

email text UNIQUE,

first_name text,

last_name text,

full_name text GENERATED ALWAYS AS (

first_name || ‘ ‘ || last_name

) STORED

)

Above, we’ve defined the
full_name column to be a combination of the
first_name and
last_name. It is common to refer to other columns while defining a generated column. That being said, a generated column can’t refer to another generated column.

Let’s test the above table by inserting a row:

INSERT INTO users(

email, first_name, last_name

)

VALUES (

‘marcin@wanago.io’, ‘Marcin’, ‘Wanago’

);

As we can see in the above table, PostgreSQL automatically figured out the value for the
full_name column.

Types of expressions we can use

There are a few different types of expressions we can use when defining a generated column. For example, besides operating on text as in the previous example, we can also do so with numbers.

CREATE TABLE routes (

id serial PRIMARY KEY,

distance_in_kilometers numeric,

distance_in_miles numeric GENERATED ALWAYS AS (

distance_in_kilometers / 1.609344

) STORED

)

Besides simple operations, we can also use functions. The crucial thing is that they have to be immutable. An immutable function can’t modify the database and must return the same result given the same arguments.

CREATE TABLE posts (

id serial PRIMARY KEY,

title text,

paragraphs text[],

paragraphs_number numeric GENERATED ALWAYS AS (

array_length(paragraphs, 1)

) STORED

)

An interesting example of a function that is not immutable is
concat. Let’s try to use it instead of the
|| operator:

CREATE TABLE users (

id serial PRIMARY KEY,

email text UNIQUE,

first_name text,

last_name text,

full_name text GENERATED ALWAYS AS (

concat(first_name, ‘ ‘, last_name)

) STORED

)

ERROR: generation expression is not immutable

Surprisingly,
concat can yield different results based on the database configuration. Unfortunately, it makes it not immutable.

SET TIME ZONE ‘UTC’;

— Returns «Current time: 2021-11-25 22:08:00.041641»

SELECT CONCAT(‘Current time: ‘, NOW()::TIMESTAMP);

SET TIME ZONE ‘UTC+1’;

— Returns «Current time: 2021-11-25 21:08:00.041641»

SELECT CONCAT(‘Current time: ‘, NOW()::TIMESTAMP);

If you want to know more about timezones, check out Managing date and time with PostgreSQL and TypeORM

Generated columns with TypeORM

Fortunately, TypeORM started supporting generated columns for PostgreSQL a few days ago.

import { Column, Entity, PrimaryGeneratedColumn } from ‘typeorm’;

@Entity()

class User {

@PrimaryGeneratedColumn()

public id: number;

@Column()

public firstName: string;

@Column()

public lastName: string;

@Column({

generatedType: ‘STORED’,

asExpression: `‘firstName’ || ‘ ‘ || ‘lastName’`

})

fullName: string;

}

Please remember that we need to use the
‘ sign with the column names
‘firstName’ and
‘lastName’ above. Otherwise, PostgreSQL would transform them to
firstname and
lastname.

An issue with updating entities

There is an important catch when using generated columns. Consider the following example:

import {

Column,

Entity,

PrimaryGeneratedColumn,

} from ‘typeorm’;

@Entity()

export class Post {

@PrimaryGeneratedColumn()

public id: number;

@Column()

public title: string;

@Column(‘text’, { array: true })

public paragraphs: string[];

@Column({

generatedType: ‘STORED’,

asExpression: ‘array_length(paragraphs, 1)’

})

public paragraphsNumber: number;

}

There is a good chance that our application allows us to update posts with the
PUT method. With it, we expect the users of our API to send all of the properties of the entity. Unfortunately, appending the
paragraphsNumber in the body of the request will cause an error.

We can quickly solve the above issue with the
class—transformer library and the
@Exclude() decorator:

import { IsString, IsNotEmpty, IsNumber, IsOptional } from ‘class-validator’;

import { Exclude } from ‘class-transformer’;

export class UpdatePostDto {

@IsNumber()

@IsOptional()

id: number;

@IsString({ each: true })

@IsNotEmpty()

@IsOptional()

paragraphs: string[];

@IsString()

@IsNotEmpty()

@IsOptional()

title: string;

@Exclude()

public paragraphsNumber: number;

}

With it, the
class—transformer excludes the
paragraphsNumber field from the body of the request.

If you want to know more about the
class—transformer library, check out API with NestJS #5. Serializing the response with interceptors

Simulating virtual generated columns with getters

While PostgreSQL does not support virtual generated columns, there is a simple way of simulating them when using TypeORM and classes. To do that, we can use a getter.

There is a good chance you are using the class-transformer library along with TypeORM. If you want to return the value along with the rest of the data, you need to use the
@Expose() decorator:

import { Column, Entity, PrimaryGeneratedColumn } from ‘typeorm’;

import { Expose } from ‘class-transformer’;

@Entity()

export class User {

@PrimaryGeneratedColumn()

public id: number;

@Column()

public firstName: string;

@Column()

public lastName: string;

@Expose()

public get fullName() {

return `${this.firstName} ${this.lastName}`;

}

When doing the above, the database does not store the
fullName value in a column. Instead, it is computed every time on the fly when accessed. The crucial thing to keep in mind is that PostgreSQL won’t know anything about the
fullName value, so we can’t use it in any SQL queries.

Summary

The generated columns can come in handy when we often do a set of operations with our data. Instead of retrieving the data and performing the calculations, we can optimize this process by operating on the data beforehand. We need to keep in mind that we do the above optimization at the cost of our
INSERT and
UPDATE operations.

Another considerable use case for generated columns is when we want to refactor our database. For example, we can define generated columns if we’re going to change some of our columns but keep the backward compatibility.

All of the above make the generated columns a feature that is worth knowing. Especially since TypeORM just recently started supporting it for PostgreSQL.

Источник

PostgreSQL 12 introduces the generated columns, columns that are automatically computed depending on a generation expression.

The usage of generated columns is quite simple and can be summarized as follows:

the column must be annotated with the GENERATED ALWAYS AS (...) STORED instruction;
the expression in parentheses must use only IMMUTABLE functions and cannot use subqueries.

For more specific constraints, see the official documentation.

Please note I’ve indicated the STORED clause because at the moment PostgreSQL supports only that kind of columns: a STORED generated column is saved on disk storage as a normal column would be, the only difference is that you cannot modify it autonomously, the database will compute it for you.

You can think of a stored generated column as a trade-off between a table with a trigger and a materialized view. When the VIRTUAL (as opposed to STORED) will be implemented, the column will take no space at all and will be computed on each column access, something similar as a view.

An example of not-generated column

Let’s see generated columns in action: consider an ordinary table with a dependency between the age column and the birthday one, since the former can be computed from the values in the latter column:

testdb=> CREATE TABLE people( 
                  name text, 
                  birthday date, 
                  age int );

testdb=> WITH year AS ( 
   SELECT ( random() * 100 )::int % 70 AS y 
)
INSERT INTO people( name, age, birthday )
SELECT 'Person ' || v, y, current_date - ( y * 365 )
FROM generate_series(1, 1000000 ) v, year;

Let’s see how much space does it occupy to have such table filled with one million of rows:

testdb=> SELECT pg_relation_size( 'people' );
 pg_relation_size 
------------------
         52183040

An example with generated columns

In order to create a similar table where age is automatically computed.

Since the column must use an IMMUTABLE function, the first step is to abstract the computation into a function:

testdb=> CREATE OR REPLACE FUNCTION 
f_person_age( birthday date )
RETURNS int
AS $CODE$
BEGIN
    RETURN extract( year FROM CURRENT_DATE )
           - extract( year FROM birthday )
           + 1;
END
$CODE$
LANGUAGE plpgsql IMMUTABLE;

Then it is possible to create the table using the function as generation method:

testdb=> CREATE TABLE people_gc_stored ( 
      name text, 
      birthday date, 
      age int GENERATED ALWAYS AS ( f_person_age( birthday ) ) STORED
  );

If the table is filled in a similar way, the space occupied is the same:

testdb=> INSERT INTO people_gc_stored( name, birthday )
         SELECT 'Person ' || v, current_date - v 
         FROM generate_series(1, 1000000 ) v;

testdb=> SELECT pg_relation_size( 'people_gc_stored' );
 pg_relation_size 
------------------
         52183040

Why using a function in the generated column? Because if we place the real expression we got an error at creation time:

testdb=> CREATE TABLE people_gc_stored ( 
      name text, 
      birthday date, 
      age int GENERATED ALWAYS AS ( 
              extract( year FROM CURRENT_DATE ) 
              - extract( year FROM birthday ) 
              + 1 ) STORED
  );
  
ERROR:  generation expression is not immutable

Writing the generated column

As already written, the generated column is not writable once it has been computed:

testdb=> UPDATE people_gc_stored SET age = 40;
ERROR:  column "age" can only be updated to DEFAULT
DETAIL:  Column "age" is a generated column.

Querying the generated column

The generated column works and behaves as a normal column, that is access can be restricted or granted on such column:

testdb=# REVOKE ALL ON people_gc_stored FROM public;
testdb=# GRANT SELECT( name, age ) ON people_gc_stored TO harry;

Since user harry has access only on columns name and age, the user cannot see the dependency column:

testdb=> SELECT * FROM luca.people_gc_stored LIMIT 5;
ERROR:  permission denied for table people_gc_stored

testdb=> SELECT min( age ), max( age ) FROM luca.people_gc_stored;
 min | max  
-----|------
   1 | 2740
(1 row)

testdb=> SELECT min( birthday ), max( birthday ) FROM luca.people_gc_stored;
ERROR:  permission denied for table people_gc_stored

On the other hand, giving access only on birthday column does not automatically provide access on age:

testdb=# REVOKE SELECT ON people_gc_stored FROM harry;
testdb=# GRANT SELECT( name, birthday ) ON people_gc_stored TO harry;

testdb=> SELECT min( birthday ), max( birthday ) FROM luca.people_gc_stored;
      min      |    max     
---------------|------------
 0720-12-07 BC | 2019-11-03
(1 row)

testdb=> SELECT min( age ), max( age ) FROM luca.people_gc_stored;
ERROR:  permission denied for table people_gc_stored

The article PostgreSQL 12 Generated Columns
has been posted by
Luca Ferrari

on November 4, 2019

Источник

The data warehousing community will be happy to know that PostgreSQL now has a feature to generate columns based on data in other columns.

This feature is known in various other DBMS as “calculated columns”, “virtual columns”, or “generated columns”. PostgreSQL picked the moniker “generated”. We’ll go with that for now, and discuss later why that may not have been the most future-proof naming.

This feature allows PostgreSQL to calculate columns based on given input columns.

In the current implementation, the values are generated at INSERT/UPDATE time, and stored physically with the rest of the table data. A keyword is used to indicate that the values are computed and stored.

Let’s look at a simple example:

CREATE OR REPLACE FUNCTION my_concat(text, VARIADIC text[])
RETURNS TEXT AS 'text_concat_ws' LANGUAGE internal immutable;

CREATE TABLE addresses (
    id bigserial primary key,
    address1 text,
    address2 text,
    address3 text,
    city text,
    state text,
    zip text,
    delivery_address text generated always as 
        (my_concat(E'n',address1,address2,address3,my_concat(' ',my_concat(',', city, state),zip)) ) stored
);

INSERT INTO addresses (address1,address3,city,state,zip) 
    VALUES ('105 Live Oak Street','c/o Somebody, Somewhere','Live Oak Village','TX','78039');

SELECT delivery_address FROM addresses;

And the result is:

delivery_address
105 Live Oak Street
c/o Somebody, Somewhere
Live Oak Village,TX 78039
(1 row)

In this case, we are simply using the generated column feature to provide a mailing address that we won’t have to repeatedly compute at some later date. The more often the value would otherwise be required to be re-computed, the more valuable the generated column becomes.

If you examine the output carefully, you’ll notice that address line 2 is missing, and yet the format of the delivery address is intact.

This feature is particularly helpful with JSON/XML extraction, GIS data, full-text search and other highly repetitive operations.

Well, Ok, that’s a bit helpful with formatted text, but what’s the meaning of this function “my_concat”? It turns out that the text functions provided by PostgreSQL are considered volatile (produce different outputs, even with the same inputs) when they are locale specific. Anything that depends on the code page or collation sequence could potentially change in flight if the locale settings change.

This is very annoying, since nearly every text function is locale dependent. So, in order to use this feature, we have to redeclare the concat_ws function to my_concat function, just so that we can pretend that it is immutable. Otherwise, we get the error:

ERROR: generation expression is not immutable

Let’s try a bit more complex example. A common use for generated columns in data warehouse applications is to compute dates and date ranges for reporting. This example is fairly long, so I’ll post it in it’s entirety at the end of the article. If you want to see the items I’m calling out in context, skip to the end.

CREATE TABLE public.media_calendar (
    gregorian date NOT NULL PRIMARY KEY,
    month_int integer GENERATED ALWAYS AS 
        (date_part('month'::text, gregorian)) STORED,
    day_int integer GENERATED ALWAYS AS 
        (date_part('day'::text, gregorian)) STORED,
    year_int integer GENERATED ALWAYS AS 
        (date_part('year'::text, gregorian)) STORED,
    quarter_int integer GENERATED ALWAYS AS 
        (date_part('quarter'::text, gregorian)) STORED,
    dow_int integer GENERATED ALWAYS AS 
        (date_part('dow'::text, gregorian)) STORED,

This snippet shows some of the basic extractions that are possible without having to resort to heroics. It is very common to dissect a date in this way, and then use the values for partitioning, reporting, sorting, and financial calculations.

That being said, the generated columns are specifically not allowed for partitioning. You would have to build an update statement that does a lookup against the media calendar, set all of the values explicitly, and use those for the partition. This greatly reduces the value of the media calendar for partitioning relations.

Many more extractions are possible, some of which are shown later in the example.

The syntax of the SQL always follows this same pattern. At the time of this writing, there are no other options than “GENERATED ALWAYS” and “STORED”. Presumably, “VIRTUAL” or some such keyword will be introduced in the future for columns that are calculated at SELECT time. This makes the the moniker “generated” a bit dubious. I think for English grammar reasons I’d rather have named this feature “calculated”.

It is not possible to build generated columns on the basis of other generated columns. So, in our earlier example, it would be nice to build a “year-month” column based on the derived year concatenated with the derived month. Unfortunately, both extractions would have to be performed again in the year_month column.

Generated columns cannot be directly updated, either.

UPDATE media_calendar SET boq = '2019-01-01' WHERE year = 2019 and month BETWEEN 1 AND 3;

psql ERROR: column boq can only be updated to DEFAULT
DETAIL: Column "boq" is a generated column.

INDEXES however, are perfectly fine on generated columns. In fact, they are possibly one of the biggest advantages.

CREATE INDEX media_year ON media_calendar (year_int);
CREATE INDEX

Here follows some hopefully useful code for your enjoyment. It is released to you under the PostgreSQL license.

--
-- Name: boq(date); Type: FUNCTION; Schema: public
--

CREATE FUNCTION public.boq(thedate date) RETURNS date
LANGUAGE sql IMMUTABLE
AS $$
    SELECT (date_part('year', thedate) || CASE date_part('quarter',thedate)
    WHEN 1 THEN '-01-01'
    WHEN 2 THEN '-04-01'
    WHEN 3 THEN '-07-01'
    WHEN 4 THEN '-10-01' END)::date;
$$;

COMMENT ON FUNCTION public.boq(date) IS 'Beginning of the quarter using month boundaries.';

--
-- Name: month_english(date); Type: FUNCTION; Schema: public
--

CREATE FUNCTION public.month_english(thedate date) RETURNS text
LANGUAGE sql IMMUTABLE
AS $$
    SELECT CASE to_char(thedate, 'MM')
        WHEN '01' THEN 'January'
        WHEN '02' THEN 'February'
        WHEN '03' THEN 'March'
        WHEN '04' THEN 'April'
        WHEN '05' THEN 'May'
        WHEN '06' THEN 'June'
        WHEN '07' THEN 'July'
        WHEN '08' THEN 'August'
        WHEN '09' THEN 'September'
        WHEN '10' THEN 'October'
        WHEN '11' THEN 'November'
        WHEN '12' THEN 'December'
        ELSE to_char(thedate, 'MM') 
        END;
$$;

COMMENT ON FUNCTION public.month_english(date) IS 'Explicit conversion to English, not using the to_char() with locale.';

--
-- Name: nth_name(integer); Type: FUNCTION; Schema: public
--

CREATE FUNCTION public.nth_name(ordinal integer) RETURNS text
LANGUAGE sql IMMUTABLE
AS $$
    SELECT ordinal::text || CASE right(ordinal::text,1)::integer
        WHEN 1 THEN 'st'
        WHEN 2 THEN 'nd'
        WHEN 3 THEN 'rd'
        ELSE 'th' 
        END;
$$;

COMMENT ON FUNCTION public.nth_name(integer) IS 'Append the appropriate rank name abbreviation to the given ordinal. 1st, 2nd, 3rd...';

--
-- Name: boy_date(date); Type: FUNCTION; Schema: public
--

CREATE FUNCTION public.boy_date(thedate date) RETURNS date
LANGUAGE sql IMMUTABLE
AS $$
    SELECT (date_part('year', thedate) || '-01-01')::date;
$$;

COMMENT ON FUNCTION public.boy_date(date) IS 'Beginning of calendar year date.';

--
-- Name: dow_english(date); Type: FUNCTION; Schema: public
--

CREATE FUNCTION public.dow_english(thedate date) RETURNS text
LANGUAGE sql IMMUTABLE
AS $$
    SELECT CASE date_part('dow',thedate)
        WHEN 0 THEN 'Sunday'
        WHEN 1 THEN 'Monday'
        WHEN 2 THEN 'Tuesday'
        WHEN 3 THEN 'Wednesday'
        WHEN 4 THEN 'Thursday'
        WHEN 5 THEN 'Friday'
        WHEN 6 THEN 'Saturday'
        ELSE 'Unknown' 
        END;
$$;

COMMENT ON FUNCTION public.dow_english(date) IS 'Day of week in the English language. Specifically does not use to_char(), as that is locale specific.';

--
-- Name: dow_locale(date); Type: FUNCTION; Schema: public
--

CREATE FUNCTION public.dow_locale(thedate date) RETURNS text
LANGUAGE sql IMMUTABLE
AS $$
    SELECT to_char(thedate,'Day');
$$;

COMMENT ON FUNCTION public.dow_locale(date) IS 'Day of week in the installation specific language, taken from the local encoding.';

--
-- Name: end_of_month(date); Type: FUNCTION; Schema: public
--

CREATE FUNCTION public.end_of_month(thedate date) RETURNS date
LANGUAGE sql IMMUTABLE
AS $$
-- subtract to get the first day of the month. Add a month, then subtract a day.
    SELECT ((thedate - (date_part('day', thedate) - 1)::integer)::date + interval '1 month' - interval '1 day')::date;
$$;

COMMENT ON FUNCTION public.end_of_month(date) IS 'The date of the last day of the month.';

--
-- Name: end_of_quarter(date); Type: FUNCTION; Schema: public
--

CREATE FUNCTION public.end_of_quarter(thedate date) RETURNS date
LANGUAGE sql IMMUTABLE
AS $$
-- get the beginning of quarter, add 3 months, and subtract a day.
    SELECT (boq(thedate) + interval '3 months' - interval '1 day')::date;
$$;

COMMENT ON FUNCTION public.end_of_quarter(date) IS 'The date of the last day of the calendar quarter.';

--
-- Name: end_of_week(date); Type: FUNCTION; Schema: public
--

CREATE FUNCTION public.end_of_week(thedate date) RETURNS date
LANGUAGE sql IMMUTABLE
AS $$
-- Days are numbered 0-6 Sun-Sat.
-- Start with Saturday, subtract today's Day ordinal,
-- add that to the original date to get to the end of the week
    SELECT thedate + (6-date_part('dow', thedate)::integer);
$$;

COMMENT ON FUNCTION public.end_of_week(date) IS 'The date of the last day of the week.';

--
-- Name: end_of_year(date); Type: FUNCTION; Schema: public
--

CREATE FUNCTION public.end_of_year(thedate date) RETURNS date
LANGUAGE sql IMMUTABLE
AS $$
-- truncate the month and day, add a year, then subtract a day
    SELECT (((date_part('year',thedate)+1) || '-01-01')::date -1)::date;
$$;

COMMENT ON FUNCTION public.end_of_year (date) IS 'The date of the last calendar day of the year.';

--
-- Name: full_date_english(date); Type: FUNCTION; Schema: public
--

CREATE FUNCTION public.full_date_english(thedate date) RETURNS text
LANGUAGE sql IMMUTABLE
AS $$
-- This should probably be done with a language parsing engine, but this is quick and easy.
    SELECT dow_english(thedate) || ', the ' || nth_name(date_part('day', thedate)::integer) || ' day of ' || month_english(thedate) || ', ' || date_part('year', thedate);
$$;

COMMENT ON FUNCTION public.full_date_english (date) IS 'The formally written datum of the given date in English.';

--
-- Name: is_leap_year(date); Type: FUNCTION; Schema: public
--

CREATE FUNCTION public.is_leap_year(thedate date) RETURNS boolean
LANGUAGE sql IMMUTABLE
AS $$
-- start with Mar 1, subtract a day, and see if we got a leap day.
    SELECT date_part('day', (date_part('year', thedate) || '-03-01')::date -1) = 29;
$$;

COMMENT ON FUNCTION public.is_leap_year(date) IS 'Delegate the calculation of leap year to the PostgreSQL engine by moving to March 1 and subtracting a day.';

--
-- Name: julian_day(date); Type: FUNCTION; Schema: public
--

CREATE FUNCTION public.julian_day(thedate date) RETURNS integer
LANGUAGE sql IMMUTABLE
AS $$
-- Don't have any idea who would ever use this, but it was easy with to_char().
    SELECT to_char(thedate, 'J')::integer;
$$;

COMMENT ON FUNCTION public.julian_day(date) IS 'Ordinal day on the Julian calendar since November 24, 4714 BC at midnight UTC';

--
-- Name: month_local(date); Type: FUNCTION; Schema: public
--

CREATE FUNCTION public.month_local(thedate date) RETURNS text
LANGUAGE sql IMMUTABLE
AS $$
-- This only exists because to_char() is locale specific, so VOLATILE. Generated columns can't come from VOLATILE functions.
    SELECT to_char(thedate, 'Month');
$$;

COMMENT ON FUNCTION public.month_local(date) IS 'Conversion of the month ordinal to plain language by locale settings';

--
-- Name: next_business_day(date); Type: FUNCTION; Schema: public
--

CREATE FUNCTION public.next_business_day(thedate date) RETURNS date
LANGUAGE sql IMMUTABLE
AS $$
-- Find a date in the near future that is a weekday. It won't be more than 3 days away...
    SELECT thedate + x
    FROM generate_series(1,4) x
    WHERE CASE WHEN date_part('dow', thedate +x)
        BETWEEN 1 AND 5
    THEN 
        TRUE
    ELSE 
        FALSE 
    END
    ORDER BY x
    LIMIT 1;
$$;

COMMENT ON FUNCTION public.next_business_day(date) IS 'Next weekday date.';

--
-- Name: previous_business_day(date); Type: FUNCTION; Schema: public
--

CREATE FUNCTION public.previous_business_day(thedate date) RETURNS date
LANGUAGE sql IMMUTABLE
AS $$
-- Select a date in the recent past that was a weekday
    SELECT thedate - x
    FROM generate_series(1,4) x
    WHERE CASE WHEN date_part('dow', thedate -x)
        BETWEEN 1 AND 5
    THEN 
        TRUE
    ELSE 
        FALSE
    END
    ORDER BY x
    LIMIT 1;

$$;

COMMENT ON FUNCTION public.previous_business_day(date) IS 'Previous weekday date.';

--
-- Name: week_of_month(date); Type: FUNCTION; Schema: public
--

CREATE FUNCTION public.week_of_month(thedate date) RETURNS integer
LANGUAGE sql IMMUTABLE
AS $$
-- Count of weeks starting with the first in week 1.
    SELECT to_char(thedate, 'W')::integer;
$$;

COMMENT ON FUNCTION public.week_of_month(date) IS 'Week of month starting with the first day in the first week.';

--
-- Name: week_of_year(date); Type: FUNCTION; Schema: public
--

CREATE FUNCTION public.week_of_year(thedate date) RETURNS integer
LANGUAGE sql IMMUTABLE
AS $$
-- Exists because to_char() is VOLATILE
    SELECT to_char(thedate, 'WW')::integer;
$$;

COMMENT ON FUNCTION public.week_of_year(date) IS 'Week of year calculated with the first of the year in the first week.';

CREATE OR REPLACE FUNCTION public.is_weekday(thedate date) RETURNS boolean
AS $$

-- Weekdays are numbered 0-6 Sun-Sat.
    SELECT CASE
    WHEN date_part('dow'::text, thedate) 
        BETWEEN 1 AND 5 
    THEN 
        true
    ELSE 
        false
    END

$$ LANGUAGE SQL
IMMUTABLE;

--
-- Name: media_calendar; Type: TABLE; Schema: public
--

CREATE TABLE public.media_calendar (
    gregorian date NOT NULL PRIMARY KEY,
    month_int integer GENERATED ALWAYS AS (date_part('month'::text, gregorian)) STORED,
    day_int integer GENERATED ALWAYS AS (date_part('day'::text, gregorian)) STORED,
    year_int integer GENERATED ALWAYS AS (date_part('year'::text, gregorian)) STORED,
    quarter_int integer GENERATED ALWAYS AS (date_part('quarter'::text, gregorian)) STORED,
    dow_int integer GENERATED ALWAYS AS (date_part('dow'::text, gregorian)) STORED,
    day_local character(20) GENERATED ALWAYS AS (public.dow_locale(gregorian)) STORED,
    doy_int integer GENERATED ALWAYS AS (date_part('doy'::text, gregorian)) STORED,
    boq_int date GENERATED ALWAYS AS (public.boq(gregorian)) STORED,
    doq_int integer GENERATED ALWAYS AS (((gregorian - public.boq(gregorian)) + 1)) STORED,
    day_english character(20) GENERATED ALWAYS AS (public.dow_english(gregorian)) STORED,
    month_local character(20) GENERATED ALWAYS AS (public.month_local(gregorian)) STORED,
    month_english character(20) GENERATED ALWAYS AS (public.month_english(gregorian)) STORED,
    week_of_month integer GENERATED ALWAYS AS (public.week_of_month(gregorian)) STORED,
    week_of_year integer GENERATED ALWAYS AS (public.week_of_year(gregorian)) STORED,
    julian_day integer GENERATED ALWAYS AS (public.julian_day(gregorian)) STORED,
    is_weekday boolean GENERATED ALWAYS AS (public.is_weekday(gregorian)) STORED,
    nth_day character(10) GENERATED ALWAYS AS (public.nth_name((date_part('day'::text, gregorian))::integer)) STORED,
    full_english text GENERATED ALWAYS AS (public.full_date_english(gregorian)) STORED,
    first_day_of_week date GENERATED ALWAYS AS ((gregorian - (date_part('dow'::text, gregorian))::integer)) STORED,
    first_day_of_month date GENERATED ALWAYS AS ((gregorian - ((date_part('day'::text, gregorian))::integer - 1))) STORED,
    first_day_of_year date GENERATED ALWAYS AS (public.boy_date(gregorian)) STORED,
    previous_business_day date GENERATED ALWAYS AS (public.previous_business_day(gregorian)) STORED,
    next_business_day date GENERATED ALWAYS AS (public.next_business_day(gregorian)) STORED,
    is_leap_year boolean GENERATED ALWAYS AS (public.is_leap_year(gregorian)) STORED,
    end_of_month date GENERATED ALWAYS AS (public.end_of_month(gregorian)) STORED,
    end_of_week date GENERATED ALWAYS AS (public.end_of_week(gregorian)) STORED,
    end_of_year date GENERATED ALWAYS AS (public.end_of_year(gregorian)) STORED,
    end_of_quarter date GENERATED ALWAYS AS (public.end_of_quarter(gregorian)) STORED,
    dow_in_month integer GENERATED ALWAYS AS (((gregorian - (gregorian - (date_part('day'::text, gregorian)::integer )) + 6) / 7)::integer) STORED,
    dow_in_year integer GENERATED ALWAYS AS (((gregorian - (boy_date(gregorian) -1) + 6) / 7)::integer) STORED,
    dow_in_quarter integer GENERATED ALWAYS AS (((((gregorian - public.boq(gregorian)) + 1) + 6) / 7)) STORED
);

COMMENT ON COLUMN public.media_calendar.gregorian IS 'The only physical column in the table, and the primary key. All others columns are generated.';
COMMENT ON COLUMN public.media_calendar.month_int IS 'The month of the year counting from January as 1.';
COMMENT ON COLUMN public.media_calendar.day_int IS 'Ordinal day of month.';
COMMENT ON COLUMN public.media_calendar.year_int IS 'Year as an integer.';
COMMENT ON COLUMN public.media_calendar.quarter_int IS 'Quarter of year by calendar months.';
COMMENT ON COLUMN public.media_calendar.dow_int IS 'Day of week as an integer from 0-6 Sunday-Saturday.';
COMMENT ON COLUMN public.media_calendar.day_local IS 'Day of week writeen out by locale.';
COMMENT ON COLUMN public.media_calendar.doy_int IS 'Day of year counting from Jan 1.';
COMMENT ON COLUMN public.media_calendar.boq_int IS 'Date of beginning of calendar quarter.';
COMMENT ON COLUMN public.media_calendar.doq_int IS 'Count of days since beginning of calendar quarter.';
COMMENT ON COLUMN public.media_calendar.day_english IS 'Day of the week written out in English. Does not use to_char().';
COMMENT ON COLUMN public.media_calendar.month_local IS 'Month of year written out using the locale settings.';
COMMENT ON COLUMN public.media_calendar.month_english IS 'Month of year written out explicitly in the English language.';
COMMENT ON COLUMN public.media_calendar.week_of_month IS 'Week of the month starting with the first in the first calendar week.';
COMMENT ON COLUMN public.media_calendar.week_of_year IS 'Week of year starting with the first in the first calendar week.';
COMMENT ON COLUMN public.media_calendar.julian_day IS 'Ordinal day on the Julian calendar since November 24, 4714 BC at midnight UTC';
COMMENT ON COLUMN public.media_calendar.is_weekday IS 'Is Monday-Friday.';
COMMENT ON COLUMN public.media_calendar.nth_day IS 'Day of month written using rank abbreviations. 1st, 2nd, 3rd...31st.';
COMMENT ON COLUMN public.media_calendar.full_english IS 'The formally written datum of the given date in English.';
COMMENT ON COLUMN public.media_calendar.first_day_of_week IS 'Date of the preceeding Sunday.';
COMMENT ON COLUMN public.media_calendar.first_day_of_month IS 'Date of the first of the month.';
COMMENT ON COLUMN public.media_calendar.first_day_of_year IS 'Date of the new year.';
COMMENT ON COLUMN public.media_calendar.previous_business_day IS 'Date of the previous weekday (Mon-Fri).';
COMMENT ON COLUMN public.media_calendar.next_business_day IS 'Date of the next weekday (Mon-Fri).';
COMMENT ON COLUMN public.media_calendar.is_leap_year IS 'Delegates the existence of leap year to the PostgreSQL engine.';
COMMENT ON COLUMN public.media_calendar.end_of_month IS 'Date of the last day of the month.';
COMMENT ON COLUMN public.media_calendar.end_of_week IS 'Date of the next Saturday.';
COMMENT ON COLUMN public.media_calendar.end_of_quarter IS 'Date of the last day in the quarter';
COMMENT ON COLUMN public.media_calendar.dow_in_month IS 'The ranked occurrence of this day in the month (2nd Saturday).';
COMMENT ON COLUMN public.media_calendar.dow_in_year IS 'How many times this day (Sun-Sat) has occurred so far this calendar year.';
COMMENT ON COLUMN public.media_calendar.dow_in_quarter IS 'How many times this day (Sun-Sat) has occurred in this quarter.';

COMMENT ON TABLE public.media_calendar IS 'Gregorian calendar with date parts and intervals broken out for convenience.';

-- This table fits completely in the 8k page. No need for TOAST.
ALTER TABLE public.media_calendar ALTER COLUMN day_local SET STORAGE PLAIN;
ALTER TABLE public.media_calendar ALTER COLUMN day_english SET STORAGE PLAIN;
ALTER TABLE public.media_calendar ALTER COLUMN month_local SET STORAGE PLAIN;
ALTER TABLE public.media_calendar ALTER COLUMN month_english SET STORAGE PLAIN;
ALTER TABLE public.media_calendar ALTER COLUMN nth_day SET STORAGE PLAIN;
ALTER TABLE public.media_calendar ALTER COLUMN full_english SET STORAGE PLAIN;

--
-- Data for Name: media_calendar; Type: TABLE DATA; Schema: public
--

INSERT INTO public.media_calendar (gregorian)
SELECT '1900-01-01'::date + x
-- Starting with 1900-01-01, fill the table with 200 years of data.
FROM generate_series(0,365*200) x;

COMMENT ON COLUMN public.media_calendar.end_of_year IS $$Date of new year's eve of this year. $$;

Источник

The PostgreSQL team has been jamming out updates on a regular basis, adding some amazing features that I hope to go into over time but one of these features made me extremely excited! Generated columns:

A generated column is a special column that is always computed from other columns. Thus, it is for columns what a view is for tables.

Yay!

What this means is that you can have a managed «meta» column that will be created and updated whenever data changes in the other columns.

Too bad Dee didn’t know about this when she was working with the Cassini data! Setting up those search columns would have been much easier!

An Example: A Fuzzy Search for a Document Table

Let’s say you have a table where you store JSONB documents. For this example, I’ll store conference talks in a table I’ll call «NDC», since I was just there and did just this:

create table ndc(
  id serial primary key,
  body jsonb not null,
  created_at timestamptz not null default now(),
  updated_at timestamptz not null default now()
);

Here’s an example of a talk — a real one I scraped from the NDC site, which happens to be Heather Downing’s amazing keynote:

{
  "title": "Keynote: The Care and Feeding of Software Engineers",
  "name": "Heather Downing",
  "location": "Room 1",
  "link": "https://ndcsydney.com/talk/keynote-the-care-and-feeding-of-software-engineers/",
  "tags": ["agile", "people", "soft-skills", "ethics"],
  "startTime": {
    "hour": 9,
    "minutes": 0
  },
  "endTime": {
    "hour": 10,
    "minutes": 0
  },
  "day": "wednesday"
}

This wad of JSON will get stored happily in our new table’s body field but querying it might be a pain. For instance — I might remember that Heather’s talk is the Keynote, but it’s a long title so remembering the whole thing is a bummer. I could query like this:

select * from ndc where body ->> 'title' ilike 'Key%';

Aside from being a bit of an eyesore (the body ->> 'title' stuff is a bit ugly), the ilike 'Key%' has to run a full table scan, loading up the entire JSON blob just to make the comparison. Not a huge deal for smaller tables, but as a table grows this query will start sucking resources.

We can fix this easily using the new GENERATED syntax when creating our table:

alter table ndc
add column title text 
generated always as (body ->> 'text');

Run this and the generated column is created and then populated as well! Check it:

title is now a relational column

But wait, there’s more. If we tried to run our search query with the fuzzy match on title we’d still have to do a full table scan. Generated columns actually store the data as opposed to computing it at query time, which means we can…

create index idx_title on ndc(title);

BAM! What used to require a few triggers and an occassionally pissed off DBA is now handled by PostgreSQL.

Also — just to be sure this is clear — we could also have declared this in the orginal definition if we wanted:

create table ndc(
  id serial primary key,
  body jsonb not null,
  title text generated always as (body ->> 'title') stored,
  created_at timestamptz not null default now(),
  updated_at timestamptz not null default now()
);
create index idx_title on ndc(title);

Into the Weeds: The Search Field

Adding a full text search index would seem to be the obvious use of GENERATED don’t you think? I decided to wait on that because, for now, it’s not exactly straightforward.

If all I wanted to do was to search on the title of a talk then we’re in business… sort of:

alter table ndc
add search tsvector
generated always as (to_tsvector('english', body ->> 'title')) stored;

This works really well, as you can see:

But it took me about 2 hours (seriously) to figure this out as I kept getting a really annoying error, which I’ll go into in a minute:

ERROR:  generation expression is not immutable

Long story short, if you don’t add the english language definition to the ts_vector function than things will fail. The expressions that you use to define a generated column must be immutable, as the error says, but understanding which functions are and are not can be a bit of a slog.

Deeper Into the Weeds: Using Concat

Let’s keep going and break things shall we? We’ve got a lot of lovely textual information in our JSON dump, including tags and name. This is where we earn our keep as solid PostgreSQL brats because we know, ohhh do we know that a blanket full text indexing that tokenizes everything evenly is pure crap :).

We’ll want to be sure to weight the tags and maybe suppress the tokenization of names — I’ll get to that in a later post — right now I just want to take the next step, which is to add other fields to our search column. All we have at the moment is the title — let’s add name:

alter table ndc drop search;
alter table ndc
add search tsvector
generated always as (
  to_tsvector('english', 
    concat((body ->> 'name'), ' ', (body ->> 'title'))
  )
) stored;

I formatted this so it reads better — hopefully it’s clear what I’m trying to do? I’m using the concat function to, well, concatenate the name with a blank space and then a title. I need that blank space in there otherwise the name and title will be rammed together making it useless.

ERROR:  generation expression is not immutable

Crap! What? This is a concatenation!?!?! How is this not immutable? Turns out it’s the concat function that’s causing the problem, and I’m not sure why (if you know please leave me a comment). This, however, does work:

alter table ndc drop search;
alter table ndc
add search tsvector
generated always as (
  to_tsvector('english', 
    (body ->> 'name') || ' ' || (body ->> 'title')
  )
) stored;

That, my friends, is super hideous — but it gets the job done. I’ll get more into full text indexes in a later post as I’ve had some really good fun with them recently.

Summary

I’ve had a lot of fun goofing around with the generated bit. If you’re wondering, the actual update goes off right after the before trigger would normally go off — so if you do have a before trigger on your table, you can use whatever values are generated there.

You also might be wondering about the stored keyword you see here? Right now it’s the only option: the generated bits are stored on disk next to your data. In future releases you’ll be able to specify virtual for just in time computed bits… but not now.

Источник