31👍
The error you get:
ON CONFLICT DO UPDATE command cannot affect row a second time
… indicates you are trying to upsert the same row more than once in a single command. In other words: you have dupes on (name, url, email)
in your VALUES
list. Fold duplicates (if that’s an option) and the error goes away. This chooses an arbitrary row from each set of dupes:
INSERT INTO feeds_person (created, modified, name, url, email)
SELECT DISTINCT ON (name, url, email) *
FROM (
VALUES
('blah', 'blah', 'blah', 'blah', 'blah')
-- ... more rows
) AS v(created, modified, name, url, email) -- match column list
ON CONFLICT (name, url, email) DO UPDATE
SET url = feeds_person.url
RETURNING id;
Since we use a free-standing VALUES
expression now, you have to add explicit type casts for non-default types. Like:
VALUES
(timestamptz '2016-03-12 02:47:56+01'
, timestamptz '2016-03-12 02:47:56+01'
, 'n3', 'u3', 'e3')
...
Your timestamptz
columns need an explicit type cast, while the string types can operate with default text
. (You could still cast to varchar(n)
right away.)
If you want to have a say in which row to pick from each set of dupes, there are ways to do that:
You are right, there is (currently) no way to use excluded
columns in the RETURNING
clause. I quote the Postgres Wiki:
Note that
RETURNING
does not make visible the "EXCLUDED.*
" alias
from theUPDATE
(just the generic "TARGET.*
" alias is visible
there). Doing so is thought to create annoying ambiguity for the
simple, common cases [30] for little to no benefit. At some
point in the future, we may pursue a way of exposing if
RETURNING
-projected tuples were inserted and updated, but this
probably doesn’t need to make it into the first committed iteration of
the feature [31].
However, you shouldn’t be updating rows that are not supposed to be updated. Empty updates are almost as expensive as regular updates – and might have unintended side effects. You don’t strictly need UPSERT to begin with, your case looks more like "SELECT or INSERT". Related:
One cleaner way to insert a set of rows would be with data-modifying CTEs:
WITH val AS (
SELECT DISTINCT ON (name, url, email) *
FROM (
VALUES
(timestamptz '2016-1-1 0:0+1', timestamptz '2016-1-1 0:0+1', 'n', 'u', 'e')
, ('2016-03-12 02:47:56+01', '2016-03-12 02:47:56+01', 'n1', 'u3', 'e3')
-- more (type cast only needed in 1st row)
) v(created, modified, name, url, email)
)
, ins AS (
INSERT INTO feeds_person (created, modified, name, url, email)
SELECT created, modified, name, url, email FROM val
ON CONFLICT (name, url, email) DO NOTHING
RETURNING id, name, url, email
)
SELECT 'inserted' AS how, id FROM ins -- inserted
UNION ALL
SELECT 'selected' AS how, f.id -- not inserted
FROM val v
JOIN feeds_person f USING (name, url, email);
The added complexity should pay for big tables where INSERT
is the rule and SELECT
the exception.
Originally, I had added a NOT EXISTS
predicate on the last SELECT
to prevent duplicates in the result. But that was redundant. All CTEs of a single query see the same snapshots of tables. The set returned with ON CONFLICT (name, url, email) DO NOTHING
is mutually exclusive to the set returned after the INNER JOIN
on the same columns.
Unfortunately this also opens a tiny window for a race condition. If …
- a concurrent transaction inserts conflicting rows
- has not committed yet
- but commits eventually
… some rows may be lost.
You might just INSERT .. ON CONFLICT DO NOTHING
, followed by a separate SELECT
query for all rows – within the same transaction to overcome this. Which in turn opens another tiny window for a race condition if concurrent transactions can commit writes to the table between INSERT
and SELECT
(in default READ COMMITTED
isolation level). Can be avoided with REPEATABLE READ
transaction isolation (or stricter). Or with a (possibly expensive or even unacceptable) write lock on the whole table. You can get any behavior you need, but there may be a price to pay.
Related: