{"id":935,"date":"2017-03-13T11:36:43","date_gmt":"2017-03-13T15:36:43","guid":{"rendered":"http:\/\/www.chiplynch.com\/wordpress\/?p=935"},"modified":"2017-03-13T11:41:35","modified_gmt":"2017-03-13T15:41:35","slug":"greedy-test-case-algorithm-in-a-sql-stored-proc","status":"publish","type":"post","link":"https:\/\/www.chiplynch.com\/wordpress\/?p=935","title":{"rendered":"Greedy Test Case Algorithm in a SQL Stored Proc"},"content":{"rendered":"<p>Here&#8217;s a straightforward problem: I have a table with a lot of fields in it (in this case, several tables &#8212; new Fact and Dimension tables in a star schema data warehouse, but, you know, any wide table will do).<\/p>\n<p>I want to extract a few real world test records that exercise the entire table&#8230; a &#8220;covering set&#8221; of test cases&#8230; so if I have 100 columns, and record A has non-zero, non-null values in columns 1-50 and record B has good values in columns 51-100, then I only need to test those two records. How great is that?!<\/p>\n<p>Ok, I should probably BUILD test cases, but I like using real data since there are always unseen business rules lurking about. Anyway, this is a pretty basic math problem: Select the minimal number of objects from the set of rows where the union of the viable (non-null, non-zero) columns across the subset covers all possible columns.<\/p>\n<p>There&#8217;s some code below. \u00c2\u00a0Note that it is very bad code. \u00c2\u00a0I use the wrong scope on global temporary tables, I don&#8217;t do lots of checking of things, I generate SQL and execute it, I debug with print statements. \u00c2\u00a0It is also formatted poorly, but that&#8217;s actually more of a wordpress\/plugin issue than anything else.<\/p>\n<p>But it&#8217;s mine, and I love it&#8230;<\/p>\n<p><!--more--><\/p>\n<p>To use, do something like:<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\r\n\r\nexec etl.DQFindCompleteTestCases 'Facts', 'MyFactTable', 'MyFactTablePrimaryKey'\r\n\r\n<\/pre>\n<p>Which will output records with the MyFactTablePrimaryKey ID and the number of new columns handled by that corresponding record.<\/p>\n<p>Note that the procedure will not go into an infinite loop if there is a column that has all NULLs or all zeros or something. \u00c2\u00a0Found out I needed that the hard way. \u00c2\u00a0Oops!<\/p>\n<pre class=\"brush: sql; title: ; notranslate\" title=\"\">\r\nALTER PROCEDURE &#x5B;ETL].&#x5B;DQFindCompleteTestCases](@SchemaName &#x5B;nvarchar](128),\r\n@TableName &#x5B;nvarchar](128),\r\n@IDFieldName &#x5B;nvarchar](128))\r\n\/*\r\nCreated 2017\/02\/22 by Chip Lynch\r\n\r\nInputs: @DatabaseName.@SchemaName.@TableName identifies a table for which\r\nall numeric type columns will be searched for non-zero non-null values.\r\nA short (not provably minimal) list of records, identified by @IDFieldName,\r\nwill be returned so that the collection of those records contains non-zero or non-null\r\nvalues in ALL of the available columns.\r\n\r\nToDo: Add more robust schedule to rules when runtimes become onerous.\r\n*\/\r\nAS\r\nBEGIN\r\n\r\nSET NOCOUNT ON\r\n\r\n\/* Validate inputs *\/\r\n\r\nIF (@SchemaName IS NULL OR\r\n@TableName IS NULL OR\r\n@IDFieldName IS NULL)\r\nBEGIN\r\nRAISERROR ('Incorrect parameters.', 15, 1)\r\nEND\r\n\/* Create tables and indexes as needed *\/\r\n\r\ncreate table #TempGoodIDs (\r\nCandidateID varchar(4000)\r\n);\r\n\r\nselect\r\nsac.name as column_name, sac.system_type_id, sac.user_type_id\r\ninto #TempMyColumns\r\nfrom sys.all_columns sac\r\njoin sys.all_objects sao on\r\nsac.object_id = sao.object_id\r\njoin sys.schemas ss on\r\nsao.schema_id = ss.schema_id\r\njoin sys.types st on\r\nsac.system_type_id = st.system_type_id\r\nwhere\r\nss.name = @SchemaName and\r\nsao.name = @TableName and\r\nst.name in ('tinyint', 'smallint', 'int', 'real', 'money', 'float', 'decimal', 'numeric', 'smallmoney', 'bigint')\r\nand st.name != @IDFieldName -- Just in case\r\n;\r\n\r\nDECLARE @ColumnsRemaining int;\r\nselect @ColumnsRemaining = count(1) from #TempMyColumns;\r\n\r\nDECLARE @MySQL nvarchar(max);\r\nDECLARE @MyNotNullList varchar(max);\r\nDECLARE @MyNEZeroList varchar(max);\r\nDECLARE @MySumList varchar(max);\r\nDECLARE @MyGoodColumnList varchar(max);\r\nDECLARE @MyCandidateColumn varchar(128);\r\nDECLARE @MyGoodCount int;\r\nDECLARE @MyRecordCount int;\r\n\r\nWHILE(@ColumnsRemaining &gt; 0)\r\nBEGIN\r\n-- Build SQL to select the record with the most &lt;&gt; 0 fields from the list:\r\n\r\n--select @MyNotNullList = stuff( (\r\n-- select ' IS NOT NULL OR ' + tmc.column_name\r\n-- from #TempMyColumns tmc\r\n-- for XML PATH('')\r\n-- ), 1, 17, '') + ' IS NOT NULL'\r\n\r\nselect @MyNEZeroList = stuff( (\r\nselect ' &lt;&gt; 0 OR ' + tmc.column_name\r\nfrom #TempMyColumns tmc\r\nfor XML PATH(''), type\r\n).value('.','nvarchar(max)'), 1, 9, '') + ' &lt;&gt; 0'\r\n\r\nprint @MyNEZeroList\r\n\r\nselect @MySumList = stuff( (\r\nselect '+ CASE WHEN ' + tmc.column_name + ' &lt;&gt; 0 THEN 1 ELSE 0 END '\r\nfrom #TempMyColumns tmc\r\nfor XML PATH(''), type\r\n).value('.','nvarchar(max)'), 1, 1, '')\r\n\r\nif object_id('tempdb..##TempBestID') IS NOT NULL\r\nDROP TABLE ##TempBestID\r\n\r\nset @MySQL = 'select top 1 ' + @IDFieldName + ' as MyId, ' + @MySumList + ' as MyGoodColCount ' +\r\n' into ##TempBestID ' +\r\n' from ' + @SchemaName + '.' + @TableName +\r\n' where (' + @MyNEZeroList + ') ' +\r\n--' AND ' + @MyNotNullList +\r\n' order by 2 desc;'\r\n\r\nprint len(@MySQL)\r\nprint 'select top 1 ' + @IDFieldName + ' as MyId, ' + @MySumList + ' as MyGoodColCount '\r\nprint ' into ##TempBestID '\r\nprint ' from ' + @SchemaName + '.' + @TableName\r\nprint ' where (' + @MyNEZeroList + ') '\r\n--' AND ' + @MyNotNullList +\r\nprint ' order by 2 desc;'\r\n\r\nexec sp_executesql @MySQL\r\n\r\nselect * from ##TempBestID\r\n\r\nSELECT @MyRecordCount = @@ROWCOUNT\r\n\r\nif @MyRecordCount = 0\r\nBEGIN\r\nprint 'No new useful records found.'\r\nset @ColumnsRemaining = 0\r\nEND\r\n\r\n-- insert into #TempGoodIDs select cast(MyId as varchar) from ##TempBestID\r\n\r\nDECLARE MyColumnCursor CURSOR FOR\r\nselect tmc.column_name from #TempMyColumns tmc\r\nOPEN MyColumnCursor\r\n\r\nFETCH NEXT FROM MyColumnCursor into @MyCandidateColumn\r\nWHILE (@@FETCH_STATUS = 0) AND (@ColumnsRemaining &gt; 0)\r\nBEGIN\r\n-- print @MyCandidateColumn\r\n\r\nif object_id('tempdb..##TempColumnFound') IS NOT NULL\r\nDROP TABLE ##TempColumnFound\r\n\r\n-- select * from tempdb.sys.tables\r\n\r\nselect @MySQL = 'select count(1) as MyCount into ##TempColumnFound from ' + @SchemaName + '.' + @TableName +\r\n' where ' + @IDFieldName + ' = ' + cast(MyID as varchar) +\r\n' and ' + @MyCandidateColumn + ' &lt;&gt; 0 ' +\r\n' and ' + @MyCandidateColumn + ' IS NOT NULL ;'\r\nfrom ##TempBestID; -- This will exist at runtime\r\n\r\nexec sp_executesql @MySQL\r\n\r\nselect @MyGoodCount = MyCount from ##TempColumnFound;\r\n\r\nif @MyGoodCount &gt; 0\r\nBEGIN\r\nselect @ColumnsRemaining = @ColumnsRemaining - @MyGoodCount; -- Which should be an if statement...\r\ndelete from #TempMyColumns where #TempMyColumns.column_name = @MyCandidateColumn\r\n-- print 'Deleting ' + @MyCandidateColumn + ' - Columns Remaining = ' + cast(@ColumnsRemaining as varchar);\r\nEND\r\n\r\nFETCH NEXT FROM MyColumnCursor into @MyCandidateColumn\r\n\r\nEND\r\n\r\nCLOSE MyColumnCursor\r\nDEALLOCATE MyColumnCursor\r\nEND\r\n\r\nEND\r\n\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Here&#8217;s a straightforward problem: I have a table with a lot of fields in it (in this case, several tables &#8212; new Fact and Dimension tables in a star schema data warehouse, but, you know, any wide table&#8230; <a class=\"read-more\" href=\"https:\/\/www.chiplynch.com\/wordpress\/?p=935\">Read More<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"footnotes":""},"categories":[29,24,40],"tags":[],"class_list":["post-935","post","type-post","status-publish","format-standard","hentry","category-data","category-happytechnologist","category-sql"],"_links":{"self":[{"href":"https:\/\/www.chiplynch.com\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/935","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.chiplynch.com\/wordpress\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.chiplynch.com\/wordpress\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.chiplynch.com\/wordpress\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.chiplynch.com\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=935"}],"version-history":[{"count":6,"href":"https:\/\/www.chiplynch.com\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/935\/revisions"}],"predecessor-version":[{"id":942,"href":"https:\/\/www.chiplynch.com\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/935\/revisions\/942"}],"wp:attachment":[{"href":"https:\/\/www.chiplynch.com\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=935"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.chiplynch.com\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=935"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.chiplynch.com\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=935"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}