Pages

Wednesday, November 25, 2009

Text Parsing in SQL Server

There are lots scenarios in database procedure we may required to parse the test. Most common scenario is to parse the comma separated text.

For example if we want to get the details of few employees, we will send those employee ids separated by a delimiter as a parameter to the stored procedure.

Till SQL Server 2000 we will be parsing the text character by character and then we will split according to the delimiter.

Following is the general split function which we can use in any version of SQL Server:

Function Code:

set ANSI_NULLS ON
set QUOTED_IDENTIFIER ON
go

ALTER FUNCTION [dbo].[FNSplitString]
(
-- Add the parameters for the function here
@string nvarchar(4000),
@delimiter nvarchar(10)
)
RETURNS
@StringValueTable TABLE
(
-- Add the column definitions for the TABLE variable here
Pos int IDENTITY(1,1),
StringValue nvarchar(4000)
)
AS
BEGIN
-- Fill the table variable with the rows for your result set
DECLARE @delimiterPos int
DECLARE @StringValue nvarchar(4000)

SET @string = @string + @delimiter

-- Loop thru all the characters
WHILE(LEN(@string)>0)
BEGIN

SET @delimiterPos = CHARINDEX(@delimiter,@string)

SET @StringValue = SUBSTRING(@string,0,@delimiterPos)

-- Insert the record into table variable
INSERT INTO @StringValueTable VALUES (@StringValue)

SET @string = SUBSTRING(@string,@delimiterPos+1,LEN(@string))

-- End the loop if there no more elements
IF @delimiterPos = 0
BEGIN
Break
END

END

RETURN
END

Test Sample:

SELECT * FROM [dbo].[FNSplitString]('a,bc,def,ghij,klmno,pqrstu,vw,x,yz',',')

With the new feature "CTE" in SQL Server 2005 and above we can implement the same very easily without any explicit looping mentioned in the above code.

Following is the split function using CTE:

Function Code:

set ANSI_NULLS ON
set QUOTED_IDENTIFIER ON
go

CREATE FUNCTION [dbo].[FNSplitString_New]
(
-- Add the parameters for the function here
@string nvarchar(4000),
@delimiter nvarchar(1)
)
RETURNS
@StringValueTable TABLE
(
-- Add the column definitions for the TABLE variable here
Pos int IDENTITY(1,1),
StringValue nvarchar(4000)
)
AS
BEGIN

Declare @SourceString nvarchar(MAX)

SET @SourceString = ',' + @string + ',';

-- Query using CTE
WITH CTE (Indx, Position) AS
(
SELECT CHARINDEX(@delimiter, @SourceString) Indx, 1 AS Position
UNION ALL
SELECT CHARINDEX(@delimiter, @SourceString, Indx+1) , Position + 1
FROM CTE WHERE CHARINDEX(@delimiter, @SourceString , Indx+1) <> 0
)
INSERT INTO @StringValueTable(StringValue)
SELECT SUBSTRING(@SourceString, B.Indx+1, A.Indx - B.Indx - 1) Val
FROM CTE A INNER JOIN CTE B ON A.Position = B.Position + 1
OPTION (MAXRECURSION 5000)

-- In the above query recursive option is limited to 5000
-- By default it is 100, which means we cannot loop recursively more than 100 times

RETURN
END

Test Sample:

SELECT * FROM [dbo].[FNSplitString_New]('a,bc,def,ghij,klmno,pqrstu,vw,x,yz',',')

In most of our applications we are using first approach as we are reusing from long time. Now we started taking the power of new features like CTE in our regular design and development.

We can also think about passing the 'n' parameter values to a procedure as an xml. And by using XML query capability we can get the individual values as part of the query itself.

No comments: