I’m writing this post to force myself to spend some quality time with the materials for this exam. I have been at it for almost two months now and am nearing my exam date. I accelerated my plan so I could get into the 2nd shot window offered by Microsoft and also so I could finish my MCSA within 1 year. It has been a battle at times and is not an easy certification to get. Microsoft has really increased the difficulty since the MCITP for SQL 2008 which only required 2 exams.
My employer is assisting with the costs in a few ways. They will reimburse me for the cost of a passed exam. They are giving me a $500 bonus
if when I pass all three exams and prove my MCSA. And they have loaned me the Training Kit book along with the other tests books that I have already returned.
My plan has been going fairly well. I’ve been able to put at least 10-15 minutes in about 6 days a week. Some of those have lasted and hour or more but that is pretty rare. Data warehousing is interesting to me because we have a lot of things starting up at work that may take off and require these skills. Before I started studying I had deployed only a few packages for my own small data collection and reporting tasks as an administrator. I also do not get too involved with database design since we rely on a lot of 3rd party applications. That world is changing for me and that is why I have been able to be a fairly good student for this last test.
So lets get to my plan.
The percentages are the first thing to note on this page: https://www.microsoft.com/learning/en-us/exam-70-463.aspx
11% – Design and implement
23% – Extract and Transform
27% – Load
24% – Configure and deploy SSIS
15% – DQS
I like to sit down with the book and read as much as I can while taking notes. I write down a lot. When I look at it later I think, “duh I knew that why did I write it down?” But it actually helps me stay focused. Even if I just write down the title of the section, it keeps me on track. At this point, I am ready to go back and review a lot of those notes and type them up so here they are.
The book is split out into those same 5 “Parts” as listed on the exam website.
Part 1: Design and Implement
Use snowflake in a POC since it will be easier to design from the complex OLTP environment.
Star schema for everything else.
Star is just a simplified, denormalized, merged, cleansed, historical schema with fewer joins
Star schema works well for SSAS cubes, SSAS won’t be on the test (phew).
A fact is: “Cust A purchased product B on date C in quantity D for amount e”
Dimension table: Customer, Product, Date
One star per business area
The Granularity level is the number of dimensions or depth you can slice by (thinks sales by quarter or sales by day)
Auditing: Who, What, When
Lineage: Where is the data coming from?
Dimensions: The goal is to make it look good in a pivot chart
-descretizing: putting values into bins and not keeping too much granularity because it doesn’t graph well
-Member Properties: columns not used for pivoting
Slowly changing: type 1- no history, overwrite; type 2 – keep history with current flag or validto-validfrom cols; type3 – limited history with additional cols like prevAddr
Keep business keys intact, create additional DW specific keys (surrogate keys) for linking fact to dimensions, probably INDENTITY
Use a SEQUENCE if you need to know the number before inserting, request multiple at once, or need a multi-table key
FACT TABLES: made up of FKs, Measures, Lineage cols, Business keys
consider the additivity of measures. EG: can’t sum an AvgDiscCol
Fact tables should be on the Many side of the 1->many relationship
Dimensions contain the lineage data
Age is a common computed column
design dimensions first, then fact tables
use partitioning on your fact table
Fact tables contain measures
Every table should have a clustered index
Do not index FKs of fact table because HASH joins dont need it?
If you are doing merge joins and nested loop joins indexes on FKs help
indexed views are useful in some cases
Row/page compression automatically applies unicode compression
batch mode is faster and will show in the query plan
column store indexes: one per table, not filtered, not on indexed views
Partitioning function maps rows to a partition
partitioning scheme maps partition to filegroups
aligned index: table with same schema which allows for partition switching
optimizer can eliminate partitions
inferred member: row added in dimension during fact table load
PART II: Developing SSIS Packages
To be continued…