RSS

More Preparation for 70-463 Implementing a Data Warehouse with Microsoft SQL Server 2012

08 Apr

This is a continuation of my previous post. This is just some very quick notes that I am posting for my benefit and so that readers may get an idea of the preparation necessary for this test. They are my notes from this book: https://www.microsoftpressstore.com/store/training-kit-exam-70-463-implementing-a-data-warehouse-9780735666092

PART II: Developing SSIS Packages

simple data movement – can use import export wizard
complex data movement – SSDT
SSDT is visual studio shell used to develop IS,AS,RS projects
Control Flow connection managers can be package or project scoped
Connection manager types:
ADO – backwards compatibility
ADO.net – compatible with sql server
AS – analysis services
File – SSIS data type
Flat file – delimited file
ftp – security option is only basic auth
http – web services or file, no windows auth
OLE DB – sql server, will be removed in favor of ODBC
ODBC – open database connection
SMTP – basic email auth only

package scoped connection managers will override the higher level project scoped connmgrs

control flow tasks and containers
containers help control execution of tasks
transformations include
cleansing – remove invalid data or unwanted data
normalization – XML value to varchar
conversion – byte[] to varbinary(max)
translation – “F” to “Female”
validation
data calculation and data aggregation
data pivoting and data unpivoting

ssis tasks categories, data prep, workflow, data movement, SQL admin, SQL maintenance

containers, for loop, foreach loop, sequence

Precedence Contstraints ( the arrows that come off of tasks)

success, failure, completion
dotted lines mean OR and solid means AND logic used when multiple tasks are involved in flow

Designing and Implementing Data Flow

Data Flow is a level deeper than the control flow
Control flow triggers data flow
data flow task builds execution plan from data flow definition
data flow engine executes the plan
*Validate external metadata – checks for existance of tables and objects and should be turned off if they are dynamically created
builk OLEDB = fast load
ODBC = batch
fast parse is available at the column level on some data types ( date, time, int )
Working with data flow transformations
-Blocking (ex: sort, aggregate) transformations that read all data in before passing any rows down the pipeline
-Non-Blocking -> lookup, multicast, conditional split or other row-by-row transformations
-partial-blocking -> merge, merge join, union all, data flows in chunks
cache transformations – good for multiple transformations on same data
import/export col – good for blobs
character map – upper case, lower, linguistic bit operations
advanced data prep: dqs cleansing, oledb command, slowly changing dimension, fuzzy grouping, fuzzy lookup, script component(custom.net)
#NEW# Resolve references editor helps resolve mapping problems
Lesson 3: strategy and tools
lookup transformation caching
how to handle rows w/ no matches
sort is expensive, optionally perform sorts at source and use advanced editor to mark data as sorted
avoid update and delete on fact tables
do large table joins on database layer
do updates on loading or temp tables in set based sql operations
Chapter 6: Enhancing Control Flow
ssis variables and parameters
avoid retrieving external source variables more than once
parameters are exposed to the caller bu variables are not
parameters are read-only and can only be set by the caller
variables are helpful to reuseability
variables are user defined or system
variables can store rows foreach enum containers
-avoid storing large rowsets in memory/variables
variable data types
-object: last resort
Int16: -32,768 thru 32,768
UInt16: 0 thru 65,535
UInt32: 0 thru 4,294,967,295
DateTime
String
Char: 65,353 unicode
Decimal: 28 or 29 significant digits
Single
Double
Variable Scope
-Package Scopre
—-Container Scoped
——–task scoped
property parameterization
explicit assignment
expressions
lesson 2: connmgr, tasks, and precedence constraint expressions
expression: combination of constants, variables, parameters, column refs, functions, and expression operators
-special ssis syntax close to C++
math functions: ABS, EXP, CEILING, etc…
String Functions: FINDSTRING, HEX, LEN, LEFT, REPLACE
precedence constraints can use AND/OR logic expressions
Lesson 3: Master Pakcage
just a normal package that uses the execute package task
use variables to expose results to parent
use project deployment model to make parameters available to child packages
use project scoped parameters
CHAP7: Enhancing Data Flow
Lesson 1: Slowly Changing Dimesions
-late arriving dims or early arriving facts
–1. insert row into dim, mark inferred… requires bit col
–2. use newly created surrogate key
–3. when loading dim overwrite inferred members
TYPE 1 SCD: overwrite
TYPE 2 SCD: keep all history
can use conditional split to see what columns changed
ex: source.fullname dest.fullname
using t-sql hashbytes can compare for changes
–then two cols for hash val Type1 & type2
use set based updates instead of wizard
Lesson 2: preparing a package for incremental load
dynamic sql
change data capture
Dynamic SQL in OLEDB source
1. select dataaccess mode of sql command and use ? to pass parameter
2. pass variable to sql command and use expressions to modify the sql string
cdc functionality – cdc source and cdc splitter
-ALL, ALL w/old, net, netw/update mask, net w/merge
lesson3: error flows
route bad rows – fail, ignore (copies null), redirect rows
chapter 8: creating robust and restartable packages
can set transactions at package control flow or task level
transactions use msdtc
transaction options are: required, supported, not supported
transactions work on control flow not data flow
can nest a not supported execsql that won’t rollback inside a transaction (ex: still want to audit on fail)
lesson2: checkpoints
save checkpoints need turned on, on package
creates a file and restarts if exists
starts from begining if not exists
lesson3: event handlers
onerror
onwarning
onvariablechanged
ontaskfailed
can turn event handlers off for task
chapter 9: implementing dynamic packages
project level and package level connection mgrs and paramters
must be deployed to ssis catalog
parameter design values are stored in the project file
cannot change parameter value while package is running
property expressions are evaluated on access
lesson2: package configs
enable package deployment model
can get parent package configs
chapter10: auditing and logging
logging: package configuration
auditing: dataflow trnasformation component
lesson1: logging packages
providers are: txt file, sql profileer, sql server, event log, xml
boundry progress exception
use parent setting is default
ssis control flows can be configured for logging
lesson2: auditing and lineage
elementary auditing – captures changes
complete – adds usage or read activity
audit transformation editor
lesson3: preparing package templates
keep packages in source control

Part IV: managing and maintaing ssis packages
ssis service is required in production
ssisdb new
package install utility is legacy
can use ssdt or ssms to deploy packages
project model or package model
dtexecui is legacy
can use TSQL, powershell, manual dtexec cli to execute packages
agent to schedule packages
introduced master package concept
securing packages: uses sql security concepts of principals and securables
ssis_admin role
ssis_user by default allowed to deploy, and deployer is allowed to read, modify, execute
Chapter 13: troubleshooting and perf tuning
breakpoints work only in control flow
breakpoints and fire on a hit count
data viewers on path will show grid view of data
use error outputs to catch bad rows
test with a subset of data
basic logging is default
switch to verbose when there are problems
data taps are like dataviewers for production
must be predefined using catalog.add_data_tap for specific data flow
lesson2: perf tuning
buffers are a group of data in data flow
determined automatically
Transformation Types
-non-blocking: row based synchronous
-partial blocking: asynchronous transformation
-blocking: asynchronous
backpressure controls flow for best memory control
max buffer rows – 10,000 default
max buffer size – 10MB by default
fast load on destination
full-cache lookups
avoid oledb transformations
BLOBs get swapped to disk
data flow engine threds
max concurrent executables -1 = # of logical processors +2
perfmon counter: buffers spooled

PART V: Building Data Quality Solutions

chapter14: installing and maintaining DQS
Soft dimensions: timeliness, ease of use, intension, trust, presentation quality
hard dimensions: accuracy, consistancy
Schema dimensions: completeness, correctness, documentation, compliance w/theoretical models, minimalization
activites: understand sources and destinations
lifecycle
security and backups managed through ssms
Chapter15: implementing MDS
metadata, transactional, hierachical, semi-structured, unstructured, master
MDM goals: unifying or harmonizing, maximize ROI through reuse, support compliance, improving quality
MDM: coordinated set of tools policies to maintain accurate master data
map master data dimensions to DW
Installing MDS: DB, Service(Needs IIS), Manager, Excel Add-IN
Creating MDS model
1.Model
2.Entities(like tables)
3.Attributes(like columns)
4.Hierarchies
5.collections
Derived hierarchies: Recursive with TOP = NULL (ex: Org Chart)
Explicit Hierarchies – Organization can go any way
Collection: flat list of members
MDS service performs business logic
Chapter16: managing master data
MDS Packages
-Model deployment package to move data to another server
-wizard only includes meta data
-permissions are not included
-MDSModelDeploy command prompt if you want to move data
exporting – tsql on subscription views, webservice
Security, system admin (one user, tsql to change), model admin (complete model access)
entity permissions apply to all attributes
mds add-in for excel (connect to to http://server:8080)
when model and member permissions are overlapping read-only > updated and deny > *
excel add-in can use DQS KB matching
Chapter17: creating a data quality project to clean data
knowledge disovery
domain managment
reference data services
matching policy
domain: semantic representation of column
properties: data type, leading values, normalize, format, spellchecking
Term basic relation: Inc. -> Incorporated

I skipped 18,19,20: Advanced ssis and data quality topics because only 5 parts are listed on the exam prep and I ran low on time.

Advertisements
 
1 Comment

Posted by on April 8, 2015 in SQL Admin, SQL Dev

 

One response to “More Preparation for 70-463 Implementing a Data Warehouse with Microsoft SQL Server 2012

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: