Showing posts with label SSIS. Show all posts
Showing posts with label SSIS. Show all posts

Saturday, January 14, 2012

By Pass "Excel Named Range" in SSIS When Importing Data From Excel File

Sometimes when you want to import data from excel file using SSIS, and when you want to edit the Excel Source SSIS data item to choose the sheet that contains the data, you will be shocked with a big list of sheets names that they are not even in the excel file you want to import data from, something like:

DATA1, DATA2, DATA3 ... AND TEST1, TEST2, TEST3 ... and other sheet names


if the name of the sheet that contains the data is always the same, you can find the name of that sheet in the list and that's all, but if the name of the sheet that contains the data changes from time to time and from file to file , for example the name of the sheet is the date when the data was inserted in the sheet, you have to use a script task to get the sheet with data ( known as the TABLE_NAME) and by pass the Excel sheet names , the code is:
        Try            
            Dim strConnectionString As String
            Dim oledbExcelConnection As OleDbConnection
            Dim ExcelDataTable As DataTable
            Dim SheetCount As Integer = 0
            Dim ExcelSheet As DataRow
            Dim CurrentSheet As String
            Dim ListOfExcelSheets As String()
            Dim LoopForNumberOfRealTables As Integer = 0

            Dim flagSourceMultiSheet As Boolean
            flagSourceMultiSheet = Dts.Variables("uVar_SourceMultiSheet").Value

            If flagSourceMultiSheet = True Then
                strConnectionString = Dts.Variables("uVar_SourceConnectionString").Value.ToString
                oledbExcelConnection = New OleDbConnection(strConnectionString)
                oledbExcelConnection.Open()

                ExcelDataTable = oledbExcelConnection.GetSchema("Tables")
                SheetCount = ExcelDataTable.Rows.Count

                For Each ExcelSheet In ExcelDataTable.Rows
                    CurrentSheet = ExcelSheet.Item("TABLE_NAME").ToString
                    CurrentSheet = CurrentSheet.Replace("'", "")

                    If Right(CurrentSheet, 1) = "$" Then
                        LoopForNumberOfRealTables += 1
                        ReDim Preserve ListOfExcelSheets(LoopForNumberOfRealTables - 1)
                        ListOfExcelSheets(LoopForNumberOfRealTables - 1) = CurrentSheet
                    End If
                Next
                oledbExcelConnection.Close()
            Else
                ReDim Preserve ListOfExcelSheets(0)
                ListOfExcelSheets(0) = Dts.Variables("uVar_SourceActiveSheetName").Value.ToString
            End If

            Dts.Variables("uVar_ArrayOfExcelSheetNames").Value = ListOfExcelSheets
            Dts.TaskResult = ScriptResults.Success

        Catch ex As Exception
            Dim msgException As String
            msgException = ex.Message.ToString
        End Try


by the way, this list is called "Excel Named Ranged".
Also you can check the article titled Looping Through Excel Files And Sheets in SSIS, it contains a good detailed case and step by step example.

Sunday, November 20, 2011

Loading big and slowly changing dimensions and fact tables using SSIS


In the case of small and not slowly changed dimensions we can truncate the table entirely and refill it again each time we want to update it, but this way is not a good practice with big and slowly changing dimensions or fact tables, as it will consume a lot of resources and time to be accomplished; so we have to load just those records that have been inserted or modified, and to do so we have to create two more columns: the first one tracks when the record was created or last time the record was modified, the second column contains when the last time the record was fetched, but this approach will add overhead on the system that creates the transaction as it should record the creation or modification datetime for each record, which in turn requires modifying the front-end systems that create the records or modify them.
If you source database is SQL server 2008, a new feature called "change data capture" uses the SQL transaction log to capture data modifications then write those modifications to a data capture tables. A change table is created for each table that is being tracked by change data capture and this table is named as cdc.schemaname_tablename_CT.
To implement this you have to do the following:
  1. Enable Change Data Capture for the whole databbase, by executing the following command:
    USE theDatabase EXEC sys.sp_cdc_enable_db

    where "theDatabase" is the name of the database that contains the tables you want to track.
  2. Enable Change Data Capture for each table you want to track; so run the following script for each table:
    EXECUTE sys.sp_cdc_enable_table@source_schema = N'dbo', @source_name = N'tableX', @role_name = N'cdc_Admin', @supports_net_changes = 1, @captured_column_list = N'column1, column2, column4'

    where the:
    -@source_name is the name of the tracked table.
    -@role_name is the database role that will be used to establish database rights for the change data capture process, this database role will be created if it does not already exist in the database.
    -@supports_net_changes when set to 1 the change data capture for this particular table will support the net changes feature, in other words the record maybe modified more than once after that last load setting this parameter to 1 will provides a function that returns the last status (net changes) for that record over a given time period.
    -@captured_column_list is the list of column you want to capture changes for. you can remove this parameter if you want to capture modifications for all the columns. note that for each tracked columns there is a mirror column in change table.
  3. In order to be able to query the change table for a given period, we have to create wrappers for the all changes query functions. the script that are required to create this function is the following:
    CREATE TABLE #WrapperSource (FunctionName sysname, SourceCode nvarchar(max)) 

       INSERT INTO #WrapperSource
       EXEC sys.sp_cdc_generate_wrapper_function
       DECLARE @CurrFunctionName sysname
       DECLARE @SourceCode nvarchar(max)
       SELECT @CurrFunctionName = MIN(FunctionName)
       FROM #WrapperSource
       WHILE @CurrFunctionName IS NOT NULL
       BEGIN
         SELECT @SourceCode = SourceCode
         FROM #WrapperSource
         WHERE FunctionName = @CurrFunctionName
         EXEC sp_executesql @SourceCode
         SELECT @CurrFunctionName = MIN(FunctionName)
         FROM #WrapperSource
         WHERE FunctionName > @CurrFunctionName
      END
      DROP TABLE #WrapperSource

    and the following is example of how to use it:
     SELECT * FROM dbo.fn_all_changes_dbo_tableName('11/11/2011', '12/11/2011', 'all')
  4. Create a table to save the last time the data was loaded.. Name it "lastTimeDataLoaded".
  5. Ceate new SSIS and create the following variables:
                    -startDate as datetime
                    -endDate as datetime
                    -dataLoadQry as sting, set the EvaluateAsExpression property of this variable to  true and set the Expression property to:
    "SELECT column1, column2, column4, __CDC_OPERATION AS Operation
    FROM dbo.fn_all_changes_dbo_Orders('" + @[User::startDate]+ "', '" + @[User::endDate] + "', 'all')
    ORDER BY __CDC_STARTLSN, __CDC_SEQVAL"
  6. Add Execute SQL task to the package then insert a sql script that queries the "lastTimeDataLoaded" table and set startDate and endDate variable. the query is something like:
        SELECT MAx(lastLoadDate)  AS startDate, ATEADD(d,1,lastLoadDate) AS endDate From lastTimeDataLoaded
        assuming the package runs once every day at the same time.
    In the result set tab of the Execute SQL Task Editor map the result columns to the startDate and endDate variables.
  7. Add a data flow task. connect the Execute SQL task to the newly add data flow task. Inside the data flow task there is the following:- data flow source that gets data by executing dataLoadQry, which is connected to:                         - conditional split transformation that splits the return data to three data flow destinations: one for inserting new records, another for updating existing record and the last one for deleting records. The splitting done based on operation column return by the query, so if it's "I" this mean insert, if it's "U" this is update, if it's "D" this is for deletion.
  8. Finally go back to the Control flow tab and add another Execute SQL task. This task should contain a SQL update statement to update the lastTimeDataLoaded.
Reference: Delivering Business Intelligence With Microsoft SQL Server 2008 by Brain Larson.

Sunday, November 13, 2011

Continuously watch files with WMI Event Watcher Task by setting “Watch for the event again” as value for AfterEvent property.


This post is to show how to develop SSIS package that keeps watching a folder and sends email if somebody pasted a file in that specific folder... and this is to illustrate how to use “Watch for the event again” value for AfterEvent property of  WMI Event Watcher Task.

1.  Create a folder to be watched. let’s say “C:\ImWatchingYou”.
2.  Open your SQL Server Business Intelligence Development Studio and create a new Integration Services Project.
3.  Right click the Connection Manager and choose “New Connection…”, a popup window will appear titled “Add SSIS Connection Manager”, scroll to and choose the item called “WMI” connection manager for the WMI tasks … probably the last item in the list. Another popup window will appear, check the “Use Windows Authentication” check box and leave the rest as it is (unless you want to watch another computer).

4.  Again right click the Connection Manager, but this time choose “New Flat File Connection…”. This will open the Flat File connection Manager Editor. In the Connection Manager Name write “Connection to The Pasted File”. In the File name insert any dummy file path for example “C:\ImWatchingYou\file.txt” .  in the same editor go to the Advanced tab and click new to add a dummy column to enable the OK button  (as you can see it’s now disabled) … click OK.
5.  Create a new variable, name it pastedFile, of type string, and the value is “C:\ImWatchingYou\*.*” 

6.  Right click the “Connection to The Pasted File” (in Connection Managers tab) and select properties. Open the Expression, choose ConnectionString Property, then insert @[User::PastedFile] as Expression.
7.  While in the Control Flow tab, drag the “WMI Event Watcher Task” item from the Control Flow Items in the Toolbox to the designer surface.

    Double click the task to open the WMI Event Watcher Task Editor. Go to the WMI Options tab and set the following:
a.       WmiConnection : WMI Connection Manager.
b.      WqlQuerySource :
SELECT * FROM __InstanceCreationEvent WITHIN 10 WHERE TargetInstance ISA "CIM_DirectoryContainsFile" and TargetInstance.GroupComponent= "Win32_Directory.Name=\"c:\\\\ ImWatchingYou \""
c.       AfterEvent : Watchfor the event again
d.      AfterTimeout : Return with success.

befor you hit the OK button, go to the Expressions tab and add a property expression for the Timeout property. Insert the following in the Expression field after selecting the Timeout property:
DATEDIFF( "ss", GETDATE(), DATEADD("Mi",55,DATEADD("Hh",23,(DT_DBDATE)GETDATE())) )
this will compute the number of remaining seconds till 23:55 which is the time we want the package to stop watching.
8.  Create another connection manager and choose the connection manager type to be SMTP. In the SMTP Connection manager Editor set the information needed to access the SMTP server.

9.  Drag and drop “Send Mail Task” to designer surface, then go the Mail tab and fill the :
a.       SmtpConnection : choose the SMTP Connection Manager.
b.      From: any email .. for example me@work.com
c.       To: your email.
d.      Subject: “Task finished notification”.
e.      MessageSource: “Task finished”.
10.   Connect the WMI Event Watcher Task to Send Mail Task.

11.   In the Control Flow tab select WMI Event Watcher Task, then open the Event Handlers tab, after that choose WMIEventWatcherEventOccurred from the Even handler combo box.

12.   Again, do the same as you did in step 9, except in the Subject and MessageSource insert “Someone pasted a file in the folder”
Now you are done. you can test it, bu note that when running the package outside the BIDS you will not be able to stop it until the timeout occurred or you will be forced to open the task manager and end all the DTCexec.exe or DTCexecUI.exe there... that's the only way I know.
By the way, there is another way to continuously watch a folder, which is by using infinite for loop. For more details check the following link:
http://microsoft-ssis.blogspot.com/2010/12/continuously-watching-files-with-wmi_25.html

Good luck :)