sql - Postgres: SELECT where a list of JOINed subtables contains specific data -
in our project have table events
each of can have multiple dates
. means each row in dates
has event_id
, position
position
(starting @ 0) unique within same event_id
. dates
(obviously) has date
column.
now want select events
have specific list of date
child values (ordered position
). possible within single sql query?
example tables
events: (id, name, event_type, description, comment, user_id) dates: (id, event_id, position, date_at, latitude, longitude, location)
id
, position , lat/long columns integer, date_at
date, rest strings (character varying) don't matter here. example:
insert events (id, name) values (1, 'birthday party'); insert dates (id, event_id, position, date_at) values ( 1, 1, 0, '2014-06-30'); insert dates (id, event_id, position, date_at) values ( 2, 1, 1, '2015-06-30'); insert dates (id, event_id, position, date_at) values ( 3, 1, 2, '2016-06-30'); insert dates (id, event_id, position, date_at) values ( 4, 1, 3, '2017-06-30'); insert dates (id, event_id, position, date_at) values ( 5, 1, 4, '2018-06-30'); insert events (id, name) values (2, 's.o. birthday party'); insert dates (id, event_id, position, date_at) values ( 6, 2, 0, '2014-02-11'); insert dates (id, event_id, position, date_at) values ( 7, 2, 1, '2015-02-11'); insert dates (id, event_id, position, date_at) values ( 8, 2, 2, '2016-02-11'); insert dates (id, event_id, position, date_at) values ( 9, 2, 3, '2017-02-11'); insert dates (id, event_id, position, date_at) values (10, 2, 4, '2018-02-11');
i need query returns event #1
when feed array of dates: [2014-06-30, 2015-06-30, 2016-06-30, 2017-06-30, 2018-06-30]
. whole query string array automatically generated, can have format , can hardcoded in example (we use ruby on rails). list of possible events have other restrictions (user_id, event_type, etc) won't long list (perhaps 10..100 events), need not indexable query.
it should not return event #1 when fed these dates in order or other dates or subset of these dates. should accept number of dates input, including empty set.
i'm going assume moment have created additional table in store list of specific dates
select event.* events inner join ( select d.event_id dates d inner join list_of_dates l on l.date = d.date group d.event_id having count(*) = (select count(*) list_of_dates) , max(d.position) = min(d.position) + count(*) - 1 ) list on list.event_id = events.id
this joins dates
table list of dates have.
it groups results event_id
.
the having
clause allows through event_id
if returned number of dates matches number in list. (if join 5 dates, want event_id
s include 5 of dates).
the having
clause ensure of dates sequential. if have list of 5 dates, max(position)
should 4 more min(position)
i'm not sure if last criteria wanted, give structure work criteria need.
also, note require full table scan. type of multi-row search slow in sql. (academically, it's same searching entity-attribute-value table.)
edit: in response question edit.
if hard-coding dates query, following, must crucially aware of possibility of sql injection attacks if any part of query controlled user.
select event.* events inner join ( select event_id dates d date in ('2014-06-30', '2015-06-30', '2016-06-30', '2017-06-30', '2018-06-30') group event_id having count(*) = 5 , max(d.position) = min(d.position) + 5 - 1 ) list on list.event_id = events.id
you hard code both list of dates , 2 occurrences of 5
.
with regards requirement of so need not indexable query
, find in fact impossible, degree. i'll try explain why...
there no mechanism interrogate multiple rows together , @ same time. do simultaneously, individually. mean can check is date on row in list of dates
? can't are of these dates in list
.
the query above checks each row 1 @ time, is date in list, then calculates how many matched , were in order in having clause. if event has 1 of dates still need process date, process having clause , realise not of dates present.
each individual is date in list can indexable
, final having clause full scan of matches. result does not scale particularly @ all. , there little can done that.
depending on statistical structure of data, may able make minor, , quite esoteric, optimisations. example...
insert dates (id, event_id, first_position, date0, date1, date2, date3, date4, date5) values ( 1, 1, 0, '2014-06-30', '2015-06-30', '2016-06-30', '2017-06-30', '2018-06-30', null ), ( 1, 1, 1, '2015-06-30', '2016-06-30', '2017-06-30', '2018-06-30', null, null ), ( 1, 1, 2, '2016-06-30', '2017-06-30', '2018-06-30', null, null, null ), ( 1, 1, 3, '2017-06-30', '2018-06-30', null, null, null, null ), ( 1, 1, 4, '2018-06-30', null, null, null, null, null ) ; select events.* events inner join dates on dates.event_id = events.id dates.date0 = '2015-06-30' , dates.date1 = '2016-06-30' , dates.date2 = '2017-06-30' ;
now, have index on (date0, event_id)
, find events have '2015-06-30'
1 of dates. 1 row dates
table, , can check if of other dates present.
it's dirty though. if event have 100 associated dates, you'd need 100 date fields. makes in inflexible (you might provision 128 dates, in future find need 129). makes writing queries against bit messy (you're not passing parameters, you're writing clause).
the index seek on date0
still going return rows later discarded remained of clause, it's you'll fewer of these cases using having clause method; should faster. unless have many columns i/o of reading many redundant columns counters savings. over-all i'd still expect measurably faster, , yet i'd still apprehensive of trying it.
unfortunately concept of searching multiple rows within grouped rows going messy and/or costly execute.
Comments
Post a Comment