SSIS merge join can produce inconsistent results when joined on varchar columns.
This is primarily because of different collation settings. Because we have set the IsSorted as “TRUE” but it rows are not sorted as per SSIS, so it gives inconsistent results.
SSIS uses windows collation (Latin1_General_CS_AS_WS) where as SQL (Case-Insensitive) can use either of windows or SQL specific collation. In other words, the sort method using ORDER BY clause is different from using SORT transform with SSIS because SSIS uses case-sensitive where as SQL uses case-insensitive. To compare apples-to-apples, SSIS Sort and SQL sort be COLLATED to same.
So when a join is done on the columns with different collations the results can be quite unpredictable.
It mostly works and quite quirky when dealing with varchar columns,
Troubleshooting steps :
1) check issorted and sortkeyposition in the connection manager
2) check the datatypes, length and case sensitivity of the columns
3) RTRIM & LTRIM the source columns to avoid any additional space issues
4) if the sort is done in SQL, use collate statement in the query (isSorted=true need not be set in this case, however sorykeyposition need to be set for the column)
[cc lang=”sql”]
SELECT name, value from names
ORDER BY name
COLLATE Latin1_General_CS_AS_WS
[/cc]
5) if the sort is done in SSIS, (SET issorted = TRUE, sortkeyposition = 1),
set comparison flags = ignorecase, in the sort transformation editor, set the Comparison Flags property to Ignore case.
Historically when I had these problems, the best solution was avoid merge join, use “UNION ALL” transform, then use a conditional split to look for the NULL rows and then continue with the needed transforms.
Observations
1) Setting the IsSorted and SortKeyPosition properties is not sufficient as these properties dont set the sort, so be mindful of the data itself.
2) Cannot collate GUID, so cast them to nvarchar (SSIS with uniqueidentifiers is known to have problems)
Link : The link has connect bug from MS and isnt fixed yet.