Are sub-queries truly isolated?

Sub-queries are just standard queries which are nested inside of other queries. Sub queries are generally isolated and can’t access the parent queries, but certain circumstances allow for us to access parent query data.
I came across a need for such a circumstance when I was trying find missing monthly data for which relate to objects.
When trying to find, this missing data, it sounds easy in your head: “Find the first one which is newer than you”. Once you started the planning you realise that your normal “JOIN” just isn’t going to cut it, especially if you have multiple data rows missing one after the other.

Below you will find a quick write up of some example tables:

Table – Monthly Income
Name Properties
MonthlyIncomeId Int, Not Null, Auto Increment
StoreId Int, Not Null
Income_date DateTime2, Not Null
Table – Store
Name Properties
StoreId Int, Not Null, Auto Increment
Name NVarchar(200)
Address NVarchar(512)
PhoneNumber nVarchar(15), Not Null

If you’ve looked over the tables, you’ll have noticed that they are very simple in design. The table [store] isn’t needed, but for some reason I think it helps a mind to work around a problem when you can connect data to the object, which in this case are the stores to which the monthly incomes are related to.
If you look at the monthly income table you’ll notice that “Income_date” isn’t a DateTime, but rather a Int. I’ve done this so that it’s easier to connect the data later on.

The style and data we will be using can be found in the table below:

“yyyymm”


201701 — January
201702 — February
201703 — March
201704 — April
201707 — July
201712 — December

We could try to join on to “monthly_income” tables like so:

SELECT *

FROM monthly_income AS first_date

JOIN monthly_income AS next_date ON next_date.income_date = first_date.income_date

AND next_date.storeid = first_date.storeid

WHERE first_date.storeid = 1

This would run perfectly well, but we’d be missing “April”, “July” because they wouldn’t be able to join on anything.

We could also do a “left join”, but that would only show that the month following the date found was missing and we wouldn’t know about the item which is missing after the first:

First_date


Next_date


Monthly_incomeid


storeid


Income_date


Income


Monthly_incomeid


storeid


Income_date


Income


1 1 201701 901755 5 1 201702 875871
5 1 201702 875871 10 1 201703 895439
10 1 201703 895439 15 1 201704 679211
15 1 201704 679211 Null Null Null Null ← We can assume May is missing
30 1 201707 106212 Null Null Null Null ← We can assume August is missing

Obviously with a table this small you could simply use your eyes, but imagine if you had thousands of lines… you simply couldn’t look through it, and so you may look for next_date rows which contain nulls:

SELECT first_date.*, ISNULL(next_date.income_date, first_date.Income_date + 1)
FROM monthly_income AS first_date
JOIN monthly_income AS next_date ON next_date.income_date = first_date.income_date

AND next_date.storeid = first_date.storeid

WHERE first_date.storeid = 1
AND next_date.monthly_incomeid IS NULL

Ignoring the fact that if [first_date].[income_date] was December and I incremented by 1, that I’d end up with a date that didn’t make sense, we can see that it just doesn’t cut it and this is exactly where our “special” circumstance can come with handy.

When your joining a table, you can use a sub-query in the “ON” segment of the code, which before I’d used it couldn’t wrap my head around the use. Another great thing about those special sub-queries is that you can also order them! Which is as such unique to “ON JOIN” but rather the fact that we will be using “TOP” in our select statement.

SELECT *
FROM monthly_income AS first_date
JOIN monthly_income AS next_date ON next_date.storeid = first_date.storeid

AND next_date.income_date = (SELECT TOP 1 income_date
FROM monthly_income AS specialquery

WHERE specialquery.income_date > first_date.income_date
AND first_date.storeid = 1
ORDER BY specialquery.income_date)

Now we should get the following results:

First_date


Next_date


Monthly_incomeid


storeid


Income_date


Income


Monthly_incomeid


storeid


Income_date


Income


1 1 201701 901755 5 1 201702 875871
5 1 201702 875871 10 1 201703 895439
10 1 201703 895439 15 1 201704 679211
15 1 201704 679211 30 1 201707 106212 ← We can assume May is missing
30 1 201707 106212 55 1 201712 156321 ← We can assume August is missing

From this we can quickly change the query to do a check and filter out the ones we don’t want, and change the SELECT segment to something a little more useful:

SELECT first_date.*, next_date.income_date AS [Next Available Date], next_date.income_date – first_date.income_date [Missing Months]

FROM monthly_income AS first_date
JOIN monthly_income AS next_date ON next_date.storeid = first_date.storeid

AND next_date.income_date = (SELECT TOP 1 income_date
FROM monthly_income AS specialquery
WHERE specialquery.income_date > first_date.income_date
ORDER BY specialquery.income_date)

WHERE first_date.storeid = 1
AND next_date.income_date > (first_date.income_date + 1) – – This will filter out any next dates which are only 1 month ahead of the first income date.

Now we can see the next available date, and how many months’ worth of data are missing!

Now we have seen the results, let’ look over the query to see what just happened line by line (starting from the join):

JOIN monthly_income AS next_date ON next_date.storeid = first_date.storeid

This part is quite obvious, we’ve just linked both of the table version together on the storied

AND next_date.income_date = (SELECT TOP 1 income_date

This is where the fun starts. As we’ve used an equals sign we only need 1 result, and make sure we only get the 1 result we use “TOP 1”, which just returns the top most data row. If you never put “TOP 1”.

FROM monthly_income AS specialquery

This part isn’t anything mind blowing we are simply calling the monthly income table and giving it reference name of “specialquery”

WHERE specialquery.income_date > first_date.income_date

This part is what makes the “TOP 1” part so import, we tell the query to pull anything from the table with column income_date which has a greater value than first_date’s income_date… if this isn’t the last month your going to potentially get a whole wack of dates linking to this table. Another important part of this query, this sub-query is that’s referencing its parent query!

ORDER BY specialquery.income_date)

The order may not be required but because I want to make sure that I only return the next available date after first_date’s income_date I order it by specialqueries’ income_date ascendingly.