I was at a client’s earlier today and the question came up of how to convert all NTEXT
columns to NVARCHAR(MAX)
in their SQL Server databases, and it turns out they have rather a lot of them.
There are a couple of obvious advantages to this conversion:
- Online index rebuilds with SQL Server Enterprise Edition become a possibility,
- Values are stored in row by default, potentially yielding performance gains.
My response to this was, “Yeah, sure: I can write a script to do that.” Two seconds after I said this I thought, “Hmm, I bet 30 seconds of Googling will provide a script because this must have come up a zillion times before.”
Sure enough, there are some pretty reasonable hits. For example, http://stackoverflow.com/questions/18789810/how-can-i-easily-convert-all-ntext-fields-to-nvarcharmax-in-sql-query.
Buuuuuuuuuuuut as ever, you’d be naive indeed to think that you can just copy and paste code from StackOverflow and have it work first time. Moreover, even with modification, you need to go over it with a fine-toothed comb to make sure you’ve squashed every last bug.
For example, this boned me earlier because I wasn’t paying proper attention:
CASE WHEN is_nullable = 1 THEN 'NOT' ELSE '' END
You can see the logic is reversed from what it should be.
So, anyway, I ended up concocting my own, of which you can find the latest version at https://github.com/bartread/sqlscripts/blob/master/scripts/AlterAllNtextColumnsInDbToNvarcharMax.sql.
There are essentially three phases:
- Change the datatype of the
NTEXT
columns toNVARCHAR(MAX)
usingALTER TABLE
statements - Pull any values small enough to fit in row out of LOB storage and back into rows using
UPDATE
statements. Thanks to John Conwell for pointing out the necessity of doing that to realise any performance increase with existing data. - Refresh the metadata for any views using
sp_refreshview
– this makes sure that, for example, columns listed for them insys.columns
have the correct data type.
Phases 1 and 2 are actually done together in a loop for each NTEXT
column in turn, whilst phase 3 is done in a separate loop at the end. I just refresh the metadata for all views because, although I could figure out only the views that depend on the tables, it’s simpler to just do them all and doesn’t take that long. Of course, if you have thousands of views and a relatively small number of NTEXT
columns you might want to rethink this. My situation is numbers of tables, views, and NTEXT
columns are all of the same order of magnitude so a simple script is fine.
For those of you who don’t have git installed, or aren’t comfortable with DVCS, here’s the full script:
USE _YOUR_DATABASE_NAME_
GO
SET NOCOUNT ON;
-- Set this to 0 to actually run commands, 1 to only print them.
DECLARE @printCommandsOnly BIT = 1;
-- Migrate columns NTEXT -> NVARCHAR(MAX)
DECLARE @object_id INT,
@columnName SYSNAME,
@isNullable BIT;
DECLARE @command NVARCHAR(MAX);
DECLARE @ntextColumnInfo TABLE (
object_id INT,
ColumnName SYSNAME,
IsNullable BIT
);
INSERT INTO @ntextColumnInfo ( object_id, ColumnName, IsNullable )
SELECT c.object_id, c.name, c.is_nullable
FROM sys.columns AS c
INNER JOIN sys.objects AS o
ON c.object_id = o.object_id
WHERE o.type = 'U' AND c.system_type_id = 99;
DECLARE col_cursor CURSOR FAST_FORWARD FOR
SELECT object_id, ColumnName, IsNullable FROM @ntextColumnInfo;
OPEN col_cursor;
FETCH NEXT FROM col_cursor INTO @object_id, @columnName, @isNullable;
WHILE @@FETCH_STATUS = 0
BEGIN
SELECT @command =
'ALTER TABLE '
+ QUOTENAME(OBJECT_SCHEMA_NAME(@object_id))
+ '.' + QUOTENAME(OBJECT_NAME(@object_id))
+ ' ALTER COLUMN '
+ QUOTENAME(@columnName)
+' NVARCHAR(MAX) '
+ CASE
WHEN @isNullable = 1 THEN ''
ELSE 'NOT'
END
+ ' NULL;';
PRINT @command;
IF @printCommandsOnly = 0
BEGIN
EXECUTE sp_executesql @command;
END
SELECT @command =
'UPDATE '
+ QUOTENAME(OBJECT_SCHEMA_NAME(@object_id))
+ '.' + QUOTENAME(OBJECT_NAME(@object_id))
+ ' SET '
+ QUOTENAME(@columnName)
+ ' = '
+ QUOTENAME(@columnName)
+ ';'
PRINT @command;
IF @printCommandsOnly = 0
BEGIN
EXECUTE sp_executesql @command;
END
FETCH NEXT FROM col_cursor INTO @object_id, @columnName, @isNullable;
END
CLOSE col_cursor;
DEALLOCATE col_cursor;
-- Now refresh the view metadata for all the views in the database
-- (We may not need to do them all but it won't hurt.)
DECLARE @viewObjectIds TABLE (
object_id INT
);
INSERT INTO @viewObjectIds
SELECT o.object_id
FROM sys.objects AS o
WHERE o.type = 'V';
DECLARE view_cursor CURSOR FAST_FORWARD FOR
SELECT object_id FROM @viewObjectIds;
OPEN view_cursor;
FETCH NEXT FROM view_cursor INTO @object_id;
WHILE @@FETCH_STATUS = 0
BEGIN
SELECT @command =
'EXECUTE sp_refreshview '''
+ QUOTENAME(OBJECT_SCHEMA_NAME(@object_id)) + '.' + QUOTENAME(OBJECT_NAME(@object_id))
+ ''';';
PRINT @command;
IF @printCommandsOnly = 0
BEGIN
EXECUTE sp_executesql @command;
END
FETCH NEXT FROM view_cursor INTO @object_id;
END
CLOSE view_cursor;
DEALLOCATE view_cursor;
GO
NOTE: this won’t work where views are created WITH SCHEMABINDING
. It will fail at ALTER TABLE
for any table upon which schemabound views depend. Instead, to make it work, you have to DROP
the views, then do the ALTER
s and UPDATE
s, then re-CREATE
the views. Bit of a PITA but there’s no way around it unfortunately. I didn’t need to worry about this because my client doesn’t use schemabound views.
Of course it goes without saying that you should back up your database before you run any script like this!
To use it you just need to substitute the name of your database where it says _YOUR_DATABASE_NAME_
at the top of the script.
Also, As with automating many tasks in SQL Server, dynamic SQL is a necessity. It’s a bit of a pain in the backside so a @printCommandsOnly
mode is advised for debugging purposes, and I’ve switched this on by default. You can copy and paste the commands into a query window, parse them, or even execute them to ensure they work as expected.
Once you’re happy this script does what you want set the value of @printCommandsOnly
to 0 and rerun it to actually execute the commands it generates.
You might wonder why I’ve written this imperatively rather than in set-based fashion. Well, it’s not just because I’m a programmer rather than a DBA. In fact the original version, which you can still see if you look at the file’s history, was set-based. It looked pretty much like this:
USE _YOUR_DATABASE_NAME_
GO
-- Migrate columns NTEXT -> NVARCHAR(MAX)
DECLARE @alterColumns NVARCHAR(MAX) = '';
SELECT @alterColumns = @alterColumns
+'ALTER TABLE '
+ QUOTENAME(OBJECT_SCHEMA_NAME(c.object_id)) + '.' + QUOTENAME(OBJECT_NAME(c.object_id))
+ ' ALTER COLUMN '
+ QUOTENAME(c.Name)
+' NVARCHAR(MAX) '
+ CASE WHEN c.is_nullable = 1 THEN '' ELSE 'NOT' END + ' NULL;'
+ CHAR(13)
+ 'UPDATE '
+ QUOTENAME(OBJECT_SCHEMA_NAME(c.object_id)) + '.' + QUOTENAME(OBJECT_NAME(c.object_id))
+ ' SET '
+ QUOTENAME(c.Name)
+ ' = '
+ QUOTENAME(c.Name)
+ ';' + CHAR(13) + CHAR(13)
FROM sys.columns AS c
INNER JOIN sys.objects AS o
ON c.object_id = o.object_id
WHERE o.type = 'U' AND c.system_type_id = 99; --NTEXT
PRINT @alterColumns;
EXECUTE sp_executesql @alterColumns;
GO
-- Update VIEW metadata
DECLARE @updateViews NVARCHAR(MAX) = '';
SELECT @updateViews = @updateViews
+ 'EXECUTE sp_refreshview '''
+ QUOTENAME(OBJECT_SCHEMA_NAME(o.object_id)) + '.' + QUOTENAME(OBJECT_NAME(o.object_id))
+ ''';' + CHAR(13)
FROM sys.objects AS o
WHERE o.type = 'V'
PRINT @updateViews;
EXECUTE sp_executesql @updateViews;
GO
It’s certainly a lot less code, which is nice. And it doesn’t use CURSOR
s, which is also nice.
However, it does have problems:
- The
PRINT
statement in T-SQL truncates output if it goes beyond a certain length. I don’t know exactly what this length is off the top of my head, but my generated scripts were more than long enough to reach it. - The result of this is you can’t copy and paste the complete generated script into another query window, so it might make debugging a bit trickier in some instances.
- The really problematic thing is that, when something goes wrong, you can’t necessarily relate it back to the exact command that failed, whereas the imperative version makes this easy since each generated command is executed individually.
So I threw away the, on the face of it, "cleverer" and more "elegant" set-based version in favour of the longer, clunkier (but easier to debug) imperative version.
I hope you find it useful and please feel free to submit patches, pull requests, bug reports, feature requests via the main project GitHub page at https://github.com/bartread/sqlscripts. (Turns out I’m building up a small library of handy scripts so I’ll be pushing a few more items into this repo in due course.)