When loading hub s and satellites the best option you have in SQL Server is to use merge for loading data vault raw. This is because it’s fail gracefully when it loads an existing hub id leaving out those in staging already loaded. It’ll easily insert new satellite rows for not matched records when you’re hash diff is not matching.

Here is a little code snippet for a hub and a satellite. The example is to load a set of records which are associated with the Monash Model for health localities. You can also use a simpler insert for hub loads such is this

CREATE PROCEDURE [dbo].[Load_Hub_Monash_Model]
	@executionId uniqueidentifier = ''

AS
	DECLARE @RecordSource        nvarchar(100)
	DECLARE @DefaultValidFrom    datetime2(0)     --use datetime2(0) to remove milliseconds
	DECLARE @DefaultValidTo      datetime2(0)  
	DECLARE @LoadDateTime        datetime2(0)  
 
	SET @RecordSource            = N'CDR_Artius_Staging MDS Stg_Monash_Model'
	SET @DefaultValidFrom        = '1900-01-01'
	SET @DefaultValidTo          = '9999-12-31'
	SET @LoadDateTime            = GETDATE()
	
	-- Find existing Hub Hashkeys
	--; WITH GetHubkeys
	--AS
	--(SELECT H_Monash_Model_key, H_MM_Locality_Key FROM H_MonashModel

	--)
	-- Insert new Hub Records
	INSERT INTO dbo.H_MonashModel 
	(H_Monash_Model_key, H_MM_Locality_Key, H_MM_Source, H_MM_Load_Date) 
	SELECT 
	 	dbo.Create_DV_Hash( CONCAT(  SMM.Town_Locality, SMM.Postcode)), CONCAT(SMM.Town_Locality, SMM.Postcode),  @RecordSource, @LoadDateTime
	FROM [$(DatavaultDB1)].Staging.Stg_Monash_Model SMM
	WHERE  NOT EXISTS ( SELECT H_Monash_Model_Key FROM H_MonashModel HMM WHERE CONCAT(  SMM.Town_Locality, SMM. Postcode) = HMM.H_MM_Locality_Key  );



RETURN 0

The table for this would like a but like this

CREATE TABLE [dbo].[H_MonashModel]
(
	[H_Monash_Model_key] VARBINARY(128) NOT NULL, 
    [H_MM_Locality_Key] VARCHAR(85) NOT NULL, 
    [H_MM_Source] VARCHAR(150) NULL, 
    [H_MM_Load_Date] DATETIME2 NULL, 
    CONSTRAINT [PK_H_MonashModel] PRIMARY KEY ([H_MM_Locality_Key]) 

The code to load the satellite would look a bit like this. This code is using DV version 1 style with an end date for a record. Don’t do that, it has performance issues with updating records when you have many millions of rows

CREATE PROCEDURE [dbo].[Load_Sat_Monash_Model]
	@executionId uniqueidentifier = ''
AS
	DECLARE @RecordSource        nvarchar(100)
	DECLARE @DefaultValidFrom    datetime2(0)     --use datetime2(0) to remove milliseconds
	DECLARE @DefaultValidTo      datetime2(0)  
	DECLARE @LoadDateTime        datetime2(0)  
 
	SET @RecordSource            = N'CDR_Artius_Staging MDS Stg_Monash_Model'
	SET @DefaultValidFrom        = '1900-01-01'
	SET @DefaultValidTo          = '9999-12-31'
	SET @LoadDateTime            = GETDATE()

	SET NOCOUNT ON
    SET XACT_ABORT ON

	BEGIN TRY
	BEGIN TRANSACTION
	--Insert new current records for changed records

	INSERT INTO S_MonashModel
	( S_Monash_Model_Key, S_MM_Source, S_MM_Load_Date, S_MM_Town_Locality, S_MM_State_Territory,S_MM_Postcode, S_MM_Monash_Code, S_Hash_Diff)
	SELECT 
		H_Monash_Model_key,
		RecSource,
		LoadDate,
		Town_Locality,
		State_Territory,
		Postcode,
		MM_Code,
		Sat_Hash_Diff
	FROM -- Main
	(
		MERGE S_MonashModel AS Target 
		USING
		(SELECT 
		-- query distinct values from source (staging)
		hub.H_Monash_Model_key,               -- Hash Key
		@RecordSource AS RecSource,            -- Records Source
		@LoadDateTime AS LoadDate,            -- Load Date time Stamp
		stage.Town_Locality,
		stage.State_Territory,
		stage.Postcode,
		stage.MM_Code,
		dbo.Create_DV_Hash(CONCAT(Town_Locality, State_Territory, Postcode, MM_Code)) Sat_Hash_Diff
		FROM [$(DatavaultDB1)].[Staging].[Stg_Monash_Model] stage
		left outer join dbo.H_MonashModel hub on hub.H_Monash_Model_key = dbo.Create_DV_Hash(CONCAT(stage.Postcode, stage.Town_Locality))
		--where hub.H_Monash_Model_Key is not null
		) AS Source 
		ON target.S_Monash_Model_Key = dbo.Create_DV_Hash(CONCAT(Source.Postcode, Source.Town_Locality))
		AND target.S_End_Rec_Date IS NULL 
		
		-- End Date existing Record
		WHEN MATCHED AND
		(target.S_MM_Town_Locality = Source.Town_Locality
		 AND target.S_MM_Postcode = Source.Postcode
		)
		THEN UPDATE SET
			S_End_Rec_Date = @LoadDateTime
		WHEN NOT MATCHED BY TARGET
		THEN INSERT
		(
			S_Monash_Model_Key, S_MM_Source, S_MM_Load_Date, S_MM_Town_Locality, S_MM_State_Territory,S_MM_Postcode, S_MM_Monash_Code, S_End_Rec_Date
		)
		VALUES
		(
			dbo.Create_DV_Hash(CONCAT(Source.Postcode, Source.Town_Locality)),
			@RecordSource,
			@LoadDateTime,
			Town_Locality,
			State_Territory,
			Postcode,
			MM_Code,
			NULL  -- We add a date when we expire the record
		)

		-- Output Changed Records
		OUTPUT

          $action AS Action
		  
          ,Source.*
	) as MergeOutput
	WHERE MergeOutput.ACTION = 'UPDATE'

	COMMIT TRAN
		SELECT
			'Success' AS ExecutionResult
		RETURN;
-- End Main
	END TRY 

	BEGIN CATCH
 
     IF @@TRANCOUNT > 0
     ROLLBACK
 
     SELECT
          'Failure' as ExecutionResult,
          ERROR_MESSAGE() AS ErrorMessage;
     RETURN;
END CATCH

GO

Some points, this is a slightly different way of using merge, the insert merge

First we set up what looks like a fairly normal insert and then drive that from a subquery in the from which is a merge statement. Don’t for get you can use CTE and subqueries in the merge to help with creating the data set you want to load

This would be loading to a table similar to this

CREATE TABLE [dbo].[S_MonashModel]
(
    [S_Monash_Model_Key] VARBINARY(128) NOT NULL , 
    [S_MM_Source] VARCHAR(150) NULL, 
    [S_MM_Load_Date] DATETIME2 NULL, 
    [S_MM_Town_Locality] VARCHAR(75) NULL, 
    [S_MM_State_Territory] VARCHAR(40) NULL, 
    [S_MM_Postcode] CHAR(5) NULL, 
    [S_MM_Monash_Code] TINYINT NULL, 
 
    [S_Hash_Diff] VARBINARY(128) NULL, 
    CONSTRAINT [PK_S_MonashModel] PRIMARY KEY ([S_Monash_Model_Key])
)

Leave a Reply

Your email address will not be published. Required fields are marked *

Share via
Copy link
Powered by Social Snap