Greedy Test Case Algorithm in a SQL Stored Proc

Here’s a straightforward problem: I have a table with a lot of fields in it (in this case, several tables — new Fact and Dimension tables in a star schema data warehouse, but, you know, any wide table will do).

I want to extract a few real world test records that exercise the entire table… a “covering set” of test cases… so if I have 100 columns, and record A has non-zero, non-null values in columns 1-50 and record B has good values in columns 51-100, then I only need to test those two records. How great is that?!

Ok, I should probably BUILD test cases, but I like using real data since there are always unseen business rules lurking about. Anyway, this is a pretty basic math problem: Select the minimal number of objects from the set of rows where the union of the viable (non-null, non-zero) columns across the subset covers all possible columns.

There’s some code below.  Note that it is very bad code.  I use the wrong scope on global temporary tables, I don’t do lots of checking of things, I generate SQL and execute it, I debug with print statements.  It is also formatted poorly, but that’s actually more of a wordpress/plugin issue than anything else.

But it’s mine, and I love it…

To use, do something like:


exec etl.DQFindCompleteTestCases 'Facts', 'MyFactTable', 'MyFactTablePrimaryKey'

Which will output records with the MyFactTablePrimaryKey ID and the number of new columns handled by that corresponding record.

Note that the procedure will not go into an infinite loop if there is a column that has all NULLs or all zeros or something.  Found out I needed that the hard way.  Oops!

ALTER PROCEDURE [ETL].[DQFindCompleteTestCases](@SchemaName [nvarchar](128),
@TableName [nvarchar](128),
@IDFieldName [nvarchar](128))
/*
Created 2017/02/22 by Chip Lynch

Inputs: @DatabaseName.@SchemaName.@TableName identifies a table for which
all numeric type columns will be searched for non-zero non-null values.
A short (not provably minimal) list of records, identified by @IDFieldName,
will be returned so that the collection of those records contains non-zero or non-null
values in ALL of the available columns.

ToDo: Add more robust schedule to rules when runtimes become onerous.
*/
AS
BEGIN

SET NOCOUNT ON

/* Validate inputs */

IF (@SchemaName IS NULL OR
@TableName IS NULL OR
@IDFieldName IS NULL)
BEGIN
RAISERROR ('Incorrect parameters.', 15, 1)
END
/* Create tables and indexes as needed */

create table #TempGoodIDs (
CandidateID varchar(4000)
);

select
sac.name as column_name, sac.system_type_id, sac.user_type_id
into #TempMyColumns
from sys.all_columns sac
join sys.all_objects sao on
sac.object_id = sao.object_id
join sys.schemas ss on
sao.schema_id = ss.schema_id
join sys.types st on
sac.system_type_id = st.system_type_id
where
ss.name = @SchemaName and
sao.name = @TableName and
st.name in ('tinyint', 'smallint', 'int', 'real', 'money', 'float', 'decimal', 'numeric', 'smallmoney', 'bigint')
and st.name != @IDFieldName -- Just in case
;

DECLARE @ColumnsRemaining int;
select @ColumnsRemaining = count(1) from #TempMyColumns;

DECLARE @MySQL nvarchar(max);
DECLARE @MyNotNullList varchar(max);
DECLARE @MyNEZeroList varchar(max);
DECLARE @MySumList varchar(max);
DECLARE @MyGoodColumnList varchar(max);
DECLARE @MyCandidateColumn varchar(128);
DECLARE @MyGoodCount int;
DECLARE @MyRecordCount int;

WHILE(@ColumnsRemaining > 0)
BEGIN
-- Build SQL to select the record with the most <> 0 fields from the list:

--select @MyNotNullList = stuff( (
-- select ' IS NOT NULL OR ' + tmc.column_name
-- from #TempMyColumns tmc
-- for XML PATH('')
-- ), 1, 17, '') + ' IS NOT NULL'

select @MyNEZeroList = stuff( (
select ' <> 0 OR ' + tmc.column_name
from #TempMyColumns tmc
for XML PATH(''), type
).value('.','nvarchar(max)'), 1, 9, '') + ' <> 0'

print @MyNEZeroList

select @MySumList = stuff( (
select '+ CASE WHEN ' + tmc.column_name + ' <> 0 THEN 1 ELSE 0 END '
from #TempMyColumns tmc
for XML PATH(''), type
).value('.','nvarchar(max)'), 1, 1, '')

if object_id('tempdb..##TempBestID') IS NOT NULL
DROP TABLE ##TempBestID

set @MySQL = 'select top 1 ' + @IDFieldName + ' as MyId, ' + @MySumList + ' as MyGoodColCount ' +
' into ##TempBestID ' +
' from ' + @SchemaName + '.' + @TableName +
' where (' + @MyNEZeroList + ') ' +
--' AND ' + @MyNotNullList +
' order by 2 desc;'

print len(@MySQL)
print 'select top 1 ' + @IDFieldName + ' as MyId, ' + @MySumList + ' as MyGoodColCount '
print ' into ##TempBestID '
print ' from ' + @SchemaName + '.' + @TableName
print ' where (' + @MyNEZeroList + ') '
--' AND ' + @MyNotNullList +
print ' order by 2 desc;'

exec sp_executesql @MySQL

select * from ##TempBestID

SELECT @MyRecordCount = @@ROWCOUNT

if @MyRecordCount = 0
BEGIN
print 'No new useful records found.'
set @ColumnsRemaining = 0
END

-- insert into #TempGoodIDs select cast(MyId as varchar) from ##TempBestID

DECLARE MyColumnCursor CURSOR FOR
select tmc.column_name from #TempMyColumns tmc
OPEN MyColumnCursor

FETCH NEXT FROM MyColumnCursor into @MyCandidateColumn
WHILE (@@FETCH_STATUS = 0) AND (@ColumnsRemaining > 0)
BEGIN
-- print @MyCandidateColumn

if object_id('tempdb..##TempColumnFound') IS NOT NULL
DROP TABLE ##TempColumnFound

-- select * from tempdb.sys.tables

select @MySQL = 'select count(1) as MyCount into ##TempColumnFound from ' + @SchemaName + '.' + @TableName +
' where ' + @IDFieldName + ' = ' + cast(MyID as varchar) +
' and ' + @MyCandidateColumn + ' <> 0 ' +
' and ' + @MyCandidateColumn + ' IS NOT NULL ;'
from ##TempBestID; -- This will exist at runtime

exec sp_executesql @MySQL

select @MyGoodCount = MyCount from ##TempColumnFound;

if @MyGoodCount > 0
BEGIN
select @ColumnsRemaining = @ColumnsRemaining - @MyGoodCount; -- Which should be an if statement...
delete from #TempMyColumns where #TempMyColumns.column_name = @MyCandidateColumn
-- print 'Deleting ' + @MyCandidateColumn + ' - Columns Remaining = ' + cast(@ColumnsRemaining as varchar);
END

FETCH NEXT FROM MyColumnCursor into @MyCandidateColumn

END

CLOSE MyColumnCursor
DEALLOCATE MyColumnCursor
END

END

Leave a Reply

Your email address will not be published. Required fields are marked *