Problem: I have two tables, my source table is in Access and my destination table is in SQL Server. I need to add only the records from Access that don't already exists in my SQL server destination table.
Complaining: (You can skip this part if you want, its my blog so I figured I can grumble a little about the effort it took to figure this out). Initially I tried building a temp dataset and seeing which transformations allowed two input sources. The only ones were the merge tools. So I backed up a minute and did some reading. I found one method that described doing it with a script. I thought that surely there was a strictly SSIS "way" to do it without getting scripting involved so I kept digging. I found another method that used a Merge Join transformation. I figured that this was perfect! I mean, we all know that doing this in straight TSQL would be a very simple thing. So I gave it a shot. Alas *frown*, the Merge Join only allows Left Outer, Full Outer and inner joins. So my data source rows were getting added EVERY time instead of only when they didn't exists. Sooooo, back to research. Finally I found the method I used here, and believe me, the whole time I was constructing the work flow I was questioning how the heck it made any sense. Well, it works, and yes, its freaking odd. *shrug*
Now the part youre waiting for (are you excited? You should be because this is going to work!).
Step 1.
Create (drag) a new OLE DB Source onto the design pane and get it wired to the data source that you want to insert FROM. In my case its the Access data source.
You have the option to use a table, view or a SQL Command. In my case I chose to use the SQL Command option and get only the columns that I'm concerned with in order to maximize performance.
Step 2.
In my case, I wanted to ensure that the Lookup Transformation later had a value of the correct type and length to use for comparison with my source table. So I added this Data Conversion Transformation to handle 'fixing' the incoming value.
Step 3.
The Lookup Transformation requires its data source to be sorted. This was easily enough implemented.
So here is my design pane so far:
Step 4.
Here is the part that had me scratching my head and re-reading the posts of those who have suggested using it. The Lookup Data Flow Transformation is described as "Looks up values in a data set by using exact matching". Hmm... OK Well that is certainly what I want to do, look up values in a data set. But how to we get from "look up values in a data set" to "look up values in a data set and then add only ones that aren't there"? It turns out that the Lookup Transformation does "look up values in a data set" and send the matching values down the green leg and the non matches down the red leg. (I know! We are using an 'error' condition to do some desired work for us *shakes head.)
I added the Lookup Transformation to my design pane and linked it to my Sort Transformation:
And here is how it is configured - the Reference Table is the 'destination' table.
The Lookup Transformation is 'aware' of the incoming fields from the source, so we link the important one to the important one in our destination.
Then here is the trick, we make it redirect our error rows (the rows that DONT match) to the error output.
So what is happening? The Lookup Transformation attempts to match each row in the source table with each row in the destination, each one that doesn't match gets redirected out to the error leg. Which is exactly what we want, it just makes no sense that we get it this way.
Step 5.
Plug the error leg of the Lookup Transformation to our data destination and were done!
There you go!