The first 236 pages don’t get into IUDs or Insert, Updates and Deletes. No data modification in the first 7 chapters which is how DBAs should start out. I am about 1/2 way through the book so I decided I would pause and type up some of my notes and thoughts.
T-SQL Fundamentals is one of the first technical books I have decided to actually read. All of the other technical books I have read I used as a reference manual and maybe focused on a chapter that relates to something that had me stumped.
Taking a pen to paper really helps me stay focused when reading. I use blank sheets of paper without lines so if I feel the urge to draw a picture or change a font size I can do so without limitation. The author stresses several times to do the exercises to really understand what is being covered. I agree, and that is part of my Learn One, Do One, Teach One path to mastery.
The first notable item that I have come across in the book is a definition of the resource database. This appears to be a behind the scenes database that stores system objects. Short explanaions of all the systems databases are as follows:
1. master – stores meta information and configs
2. model – new db template
3. tempdb – work tables, sort space, row versions (rebuilt on sql restart)
4. msdb – sql agent data
5. Resource – system objects
Unique constraints – first thing that comes to mind is a primary key, however these can be things like SSN or phone number that are not the primary key.
A foreign key is a link to another table.
Alter table Orders add constraint 'FK_Blah' foreign key (empid) references dbo.employees(empid)
The order of which queries are processed is important when trying to understand the results.
1. FROM clause – code should always schema qualify tables in the from clause “dbo.table” instead of “table”
2. WHERE clause – the filter
3. GROUP BY clause – SUM() AVG() agg functions are required for selected columns when grouping so only one value is returned
4. HAVING clause – another filter
5. SELECT – column or fields to return
6. ORDER BY clause – It is stressed several times that order is not guaranteed unless this is added
Agg functions ignore NULLs except for count(*)
TOP – an option to limit results. processed as part of the select which is similar the time DISTINCT
OVER clause – exposes a window of rows to an aggregate function
select orderid, custid, val, 100. * val/sum(val) OVER() as pct_all 100. * val / sum(val) OVER(PARTITION BY custid) as pct_cust from sales.ordervalues;
note 100. causes an implicit conversion to decimal where as 100 would be implicit to int
This sample OVER query returns two percentages for each orderid so you can tell how big the order was compared to that customers orders(the partitioned one) and all orders.
OVER supports 4 ranking functions
DENSE_RANK – 5,6,6,8
RANK – 5,6,6,7
ROW_NUMBER – 5,6,7,8
NTILE – 1,1,1,1 (I assume this is like a percentile. if you pass in 100 you get 1 for top 1 percentile)
*(multiply) / (divide) % (modulo)
= > < !=
select prod, prodname, cat_id CASE cat_id WHEN 1 then 'Bevs' WHEN 2 then 'produce' END as category name from products
An important thing to note about nulls:
select * from blah where col 'MI'
Although NULL does not = MI this query will not return records where col is NULL
col = NULL <—-returns no rows
col IS null <—- returns proper results
NULLS are sorted before values in col such as 'MI' or 'AL'
Also, sql doesn't process left to right in where clauses. EX:
Where col1 0 AND col2/col1 > 2
Character Data (just the highlights)
STUFF -> remove substring and insert new string
UPPER -> convert to uppercase
SELECT RIGHT(‘12345’, 3) -> 345
wildcards -> %, _, [^A-E]
SQL Server recognizes string literals such as ‘20070212’ and will convert. Its suggested to always use YYYYMMDD so language options don’t get confused.
This query is bad for indexes
select blah from orders WHERE YEAR(orderdate) = 2007
Change the where clause to orderdate = ‘20070101’ so the function doesn’t have to complete for every row.
CAST(GETDATE() as TIME)
select DATEADD(YEAR, 1, ‘20091010’) -> 2010-10-10 00:00:00.000
DATEDIFF(day, ‘20080101’, ‘20090101’) -> 365
DATEPART(month, ‘20080210’) –> int of 2
ISDATE(‘20090101’) 1=TRUE 0=FALSE
The inner workings of these escaped me for quite a while. As soon as I thought I knew what was going on I would be challenged to write a multi-table query and struggle.
The different types of joins are Cross, Inner(default), Outer
Apply, Piviot and Unpiviot are other types of joins
A cross join creates a Cartesian product where you simply get a result = (tableA # of rows) * (tableB # of rows)
SELECT c.id, e.id from sales.customers as C cross join hr.employees as E
The book recommends using ANSI-92 format instead of ANSI-89.
ANSI-89 – DO NOT USE
select a.blah, b.blah from table1 as a, table2 as b where a.blah = b.blah
note: without the where clause it would be a cross join
ANSI-92 – GOOD
select c.custid, count(*) as numorders from sales.customers as c left outer join sales.orders as o on c.custid = o.custid group by c.custid;
This will get a table with rows for every order and then all matching customer id’s will be grouped together and the second column will be a number of rows that were grouped.
Sub Queries (1, p133)
These can solve a lot of the same problems as joins. There are two flavors, self contained and correlated. Self contained are much easier, especially when debugging because the inner query can be ran separately from the outer query.
Most self contained inner queries return a scalar, or a single value.
select orderid, orderdate from sales.orders where orderid = (select max(o.orderid) from sales.orders as o)
Some self contained inner queries return multiple values by using the IN operator instead of =.
select orderid, orderdate from sales.orders where orderid IN (select o.orderid from sales.orders as o where o.lastname like 'D%')
Another example of a subquery with exists in a where clause. Most of the time we should avoid where clause function type processing but the EXISTS clause is performent.
where blah = blah and exists (select blah)
Table expressions and derived tables
select orderyear, count(distinct custid) as numcusts from (select year(orderdate) as order year, custid from sales.orders) as D group by orderyear
Common table expressions (CTEs) are better than derived tables because they have recursive capabilities
WITH USACusts as ( select custid, companyname from sales.customers where country='usa' )
Operations performed on two query results. ORDER BY returns a cursor so the sets cannot use this.
UNION (distinct is implicit)
INTERSECT – return rows that are in both query and query
EXCEPT – exclude results that are in query B
1. Ben-Gan, I. (2009). Microsoft SQL Server 2008 T-SQL Fundamentals. Redmond, Washington: Microsoft Press.