3👍
With group by
and having
, we can do Tag 1
AND Tag 2
:
with cte_tags as (
select book_id
from tags
where tag in ('Tag 1', 'Tag 2')
group by book_id
having count(*)=2)
select b.id as book_id,
b.name
from books b
join cte_tags t
on b.id = t.book_id;
EDIT:
- If
tag
intags
for abook_id
can be duplicated, use the following:
with cte_tags as (
select book_id
from tags
where tag in ('Tag 1', 'Tag 2')
group by book_id
having count(distinct tag)=2)
select b.id as book_id,
b.name
from books b
join cte_tags t
on b.id = t.book_id;
- If looking for
tag1
ORtag2
, ortag1
ANDtag2
, use the following:
with cte_tags as (
select book_id
from tags
where tag in ('Tag 1', 'Tag 2')
group by book_id
having count(distinct tag) between 1 and 2)
select b.id as book_id,
b.name
from books b
join cte_tags t
on b.id = t.book_id;
1👍
If I understand it correctly you want books that only have tags ‘Tag1’ and ‘Tag2’. I.e. no other tags. I’m not aware of an official name for this problem, maybe exclusive contains.
It amounts to finding books meeting two conditions:
- have all tags in ("Tag1", "Tag2")
- have two unique tags (or don’t have other or zero tags)
You’re looking for a solution that "works with any ORM". Of course, that’s impossible. However, there is a LINQ solution that will probably work with any LINQ-based ORM. It works with EF anyway.
This query can be used when the tag names are not unique per book:
var tags = new[] { "Tag1", "Tag2" };
var books = context.Books
.Where(b => b.Tags.All(t => tags.Contains(t.Tag))
&& b.Tags.Select(t => t.Tag).Distinct().Count() == tags.Count());
This, if the tag names are unique per book (which I assume):
var books = context.Books
.Where(b => b.Tags.All(t => tags.Contains(t.Tag))
&& b.Tags.Count() > 0);
The second condition is necessary, otherwise books without any tags would also be selected (that’s the semantics of All
).
This generates SQL queries that you’ll have to settle with (as with any LINQ-based ORM). For the second case, using EF-core 6:
SELECT [b].[Id]
FROM [Books] AS [b]
WHERE NOT EXISTS (
SELECT 1
FROM [Tags] AS [t]
WHERE ([b].[Id] = [t].[BookId]) AND [t].[Tag] NOT IN (N'Tag1', N'Tag2'))
AND ((
SELECT COUNT(*)
FROM [Tags] AS [t0]
WHERE [b].[Id] = [t0].[BookId]) > 0)
Of course it’s possible to write SQL queries manually that do a (far) better job, performance-wise, in a specific constellation of indexes and statistics. Also, most (if not all) ORMs will offer a way to execute raw SQL. In that sense, a SQL query can be used with "any" ORM. I don’t consider that an "ORM solution" though. It doesn’t use the ORM’s core machinery, merely its database connection.
- [Django]-Django send_mail returns SMTPConnectError
- [Django]-Django Caching – How do I set up my code to avoid duplicating cache logic?
- [Django]-Calling a view from a management command
- [Django]-How do I check if a many-to-many relationship exists in a Django template?
1👍
If performance is important, you should try various queries on your server with actual data and measure their performance.
I have a general note.
A query like this:
select book_id
from tags
where tag in ('Tag1', 'Tag2')
or like this:
select book_id
from tags
where tag = 'Tag 1' OR tag = 'Tag2'
usually would result in the scan of the whole table tags
even if it had index on the tag
column.
On the other hand, a query like this:
select book_id
from tags
where tag = 'Tag1'
would usually use an index.
So, we can expand a query with OR
into two separate queries and then combine their results.
WITH
CTE_BookIDs
AS
(
select book_id
from tags
where tag = 'Tag1'
INTERSECT
select book_id
from tags
where tag = 'Tag2'
)
SELECT
books.*
FROM
books
INNER JOIN CTE_BookIDs ON CTE_BookIDs.book_id = books.id
;
Here is a query over a sample data set:
CREATE TABLE #Tags
(ID int IDENTITY NOT NULL PRIMARY KEY
,BookID int NOT NULL
,Tag varchar(50) NOT NULL);
INSERT INTO #Tags VALUES
(1, 'Tag1'),
(1, 'Tag2'),
(1, 'Tag3'),
(1, 'Tag4'),
(2, 'Tag1'),
(3, 'Tag2'),
(4, 'Tag1'),
(4, 'Tag2'),
(4, 'Tag3'),
(5, 'Tag3'),
(5, 'Tag4'),
(5, 'Tag5'),
(6, 'Tag1'),
(6, 'Tag3'),
(6, 'Tag5'),
(7, 'Tag2'),
(7, 'Tag3'),
(8, 'Tag1'),
(8, 'Tag2');
CREATE INDEX IX_Tag ON #Tags
(
Tag, BookID
);
WITH
CTE_BookIDs
AS
(
select BookID
from #Tags
where tag = 'Tag1'
INTERSECT
select BookID
from #Tags
where tag = 'Tag2'
)
SELECT *
FROM CTE_BookIDs
;
DROP TABLE #Tags;
Result
+--------+
| BookID |
+--------+
| 1 |
| 4 |
| 8 |
+--------+
Execution plan
- [Django]-What exactly is meant when mr.developer says "The package 'django-quoteme' is dirty."
- [Django]-ModuleNotFoundError : no module named : crispy_forms
- [Django]-Extending a template thats already extended in Django
- [Django]-How to solve UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
- [Django]-Django: "auto_now_add=True" giving incorrect time
0👍
Try next solution:
First create index for speed up this query
create index on tags (tag, book_id);
Second, check next query
SELECT * FROM books
JOIN tags t1 on t1.tag = 'Tag1' AND t2.book_id = books.id
JOIN tags t2 on t2.tag = 'Tag2' AND t2.book_id = books.id;
If you want to deal with arbitrary list of any number of tags:
SELECT
books.id,
count(distinct tags.id) as tags_count
FROM books
JOIN tags on tags.tag = ANY(['Tag1', 'Tag2', ...]) AND tags.book_id = books.id
GROUP BY books.id
HAVING
count(distinct tags.id) = <number of tags>
For speed up second query check if next index wil help
create index on tags (book_id, tag);