Monthly Archives: September 2014

How to quickly convert all NTEXT columns to NVARCHAR(MAX) in a SQL Server database

I was at a client’s earlier today and the question came up of how to convert all NTEXT columns to NVARCHAR(MAX) in their SQL Server databases, and it turns out they have rather a lot of them.

There are a couple of obvious advantages to this conversion:

  1. Online index rebuilds with SQL Server Enterprise Edition become a possibility,
  2. Values are stored in row by default, potentially yielding performance gains.

My response to this was, “Yeah, sure: I can write a script to do that.” Two seconds after I said this I thought, “Hmm, I bet 30 seconds of Googling will provide a script because this must have come up a zillion times before.”

Sure enough, there are some pretty reasonable hits. For example, http://stackoverflow.com/questions/18789810/how-can-i-easily-convert-all-ntext-fields-to-nvarcharmax-in-sql-query.

Buuuuuuuuuuuut as ever, you’d be naive indeed to think that you can just copy and paste code from StackOverflow and have it work first time. Moreover, even with modification, you need to go over it with a fine-toothed comb to make sure you’ve squashed every last bug.

For example, this boned me earlier because I wasn’t paying proper attention:

CASE WHEN is_nullable = 1 THEN 'NOT' ELSE '' END

You can see the logic is reversed from what it should be.

So, anyway, I ended up concocting my own, of which you can find the latest version at https://github.com/bartread/sqlscripts/blob/master/scripts/AlterAllNtextColumnsInDbToNvarcharMax.sql.

There are essentially three phases:

  1. Change the datatype of the NTEXT columns to NVARCHAR(MAX) using ALTER TABLE statements
  2. Pull any values small enough to fit in row out of LOB storage and back into rows using UPDATE statements. Thanks to John Conwell for pointing out the necessity of doing that to realise any performance increase with existing data.
  3. Refresh the metadata for any views using sp_refreshview – this makes sure that, for example, columns listed for them in sys.columns have the correct data type.

Phases 1 and 2 are actually done together in a loop for each NTEXT column in turn, whilst phase 3 is done in a separate loop at the end. I just refresh the metadata for all views because, although I could figure out only the views that depend on the tables, it’s simpler to just do them all and doesn’t take that long. Of course, if you have thousands of views and a relatively small number of NTEXT columns you might want to rethink this. My situation is numbers of tables, views, and NTEXT columns are all of the same order of magnitude so a simple script is fine.

For those of you who don’t have git installed, or aren’t comfortable with DVCS, here’s the full script:

USE _YOUR_DATABASE_NAME_
GO

SET NOCOUNT ON;

-- Set this to 0 to actually run commands, 1 to only print them.
DECLARE @printCommandsOnly BIT = 1;

-- Migrate columns NTEXT -> NVARCHAR(MAX)

DECLARE @object_id INT,
      
@columnName SYSNAME,
      
@isNullable BIT;

DECLARE @command NVARCHAR(MAX);

DECLARE @ntextColumnInfo TABLE (
  
object_id INT,
  
ColumnName SYSNAME,
  
IsNullable BIT
);

INSERT INTO @ntextColumnInfo ( object_id, ColumnName, IsNullable )
  
SELECT  c.object_id, c.name, c.is_nullable
  
FROM    sys.columns AS c
  
INNER JOIN sys.objects AS o
  
ON c.object_id = o.object_id
  
WHERE   o.type = 'U' AND c.system_type_id = 99;

DECLARE col_cursor CURSOR FAST_FORWARD FOR
   SELECT
object_id, ColumnName, IsNullable FROM @ntextColumnInfo;

OPEN col_cursor;
FETCH NEXT FROM col_cursor INTO @object_id, @columnName, @isNullable;

WHILE @@FETCH_STATUS = 0
BEGIN
   SELECT
@command =
      
'ALTER TABLE '
      
+ QUOTENAME(OBJECT_SCHEMA_NAME(@object_id))
           +
'.' + QUOTENAME(OBJECT_NAME(@object_id))
       +
' ALTER COLUMN '
      
+ QUOTENAME(@columnName)
       +
' NVARCHAR(MAX) '
      
+ CASE
          
WHEN @isNullable = 1 THEN ''
          
ELSE 'NOT'
        
END
      
+ ' NULL;';
      
  
PRINT @command;
  
IF @printCommandsOnly = 0
  
BEGIN
       EXECUTE
sp_executesql @command;
  
END

   SELECT @command =
      
'UPDATE '
      
+ QUOTENAME(OBJECT_SCHEMA_NAME(@object_id))
           +
'.' + QUOTENAME(OBJECT_NAME(@object_id))
       +
' SET '
      
+ QUOTENAME(@columnName)
       +
' = '
      
+ QUOTENAME(@columnName)
       +
';'

   PRINT @command;
  
IF @printCommandsOnly = 0
  
BEGIN
       EXECUTE
sp_executesql @command;
  
END

   FETCH NEXT FROM col_cursor INTO @object_id, @columnName, @isNullable;
END

CLOSE col_cursor;
DEALLOCATE col_cursor;

-- Now refresh the view metadata for all the views in the database
-- (We may not need to do them all but it won't hurt.)

DECLARE @viewObjectIds TABLE (
  
object_id INT
);

INSERT INTO @viewObjectIds
  
SELECT o.object_id
  
FROM sys.objects AS o
  
WHERE o.type = 'V';

DECLARE view_cursor CURSOR FAST_FORWARD FOR
   SELECT
object_id FROM @viewObjectIds;

OPEN view_cursor;
FETCH NEXT FROM view_cursor INTO @object_id;

WHILE @@FETCH_STATUS = 0
BEGIN
   SELECT
@command =
      
'EXECUTE sp_refreshview '''
      
+ QUOTENAME(OBJECT_SCHEMA_NAME(@object_id)) + '.' + QUOTENAME(OBJECT_NAME(@object_id))
       +
''';';
      
  
PRINT @command;

   IF @printCommandsOnly = 0
  
BEGIN
       EXECUTE
sp_executesql @command;
  
END

   FETCH NEXT FROM view_cursor INTO @object_id;
END

CLOSE view_cursor;
DEALLOCATE view_cursor;
GO

NOTE: this won’t work where views are created WITH SCHEMABINDING. It will fail at ALTER TABLE for any table upon which schemabound views depend. Instead, to make it work, you have to DROP the views, then do the ALTERs and UPDATEs, then re-CREATE the views. Bit of a PITA but there’s no way around it unfortunately. I didn’t need to worry about this because my client doesn’t use schemabound views.

Of course it goes without saying that you should back up your database before you run any script like this!

To use it you just need to substitute the name of your database where it says _YOUR_DATABASE_NAME_ at the top of the script.

Also, As with automating many tasks in SQL Server, dynamic SQL is a necessity. It’s a bit of a pain in the backside so a @printCommandsOnly mode is advised for debugging purposes, and I’ve switched this on by default. You can copy and paste the commands into a query window, parse them, or even execute them to ensure they work as expected.

Once you’re happy this script does what you want set the value of @printCommandsOnly to 0 and rerun it to actually execute the commands it generates.

You might wonder why I’ve written this imperatively rather than in set-based fashion. Well, it’s not just because I’m a programmer rather than a DBA. In fact the original version, which you can still see if you look at the file’s history, was set-based. It looked pretty much like this:

USE _YOUR_DATABASE_NAME_
GO

-- Migrate columns NTEXT -> NVARCHAR(MAX)

DECLARE @alterColumns NVARCHAR(MAX) = '';
SELECT  @alterColumns = @alterColumns
  
+'ALTER TABLE '
  
+ QUOTENAME(OBJECT_SCHEMA_NAME(c.object_id)) + '.' + QUOTENAME(OBJECT_NAME(c.object_id))
   +
' ALTER COLUMN '
  
+ QUOTENAME(c.Name)
   +
' NVARCHAR(MAX) '
  
+ CASE WHEN c.is_nullable = 1 THEN '' ELSE 'NOT' END + ' NULL;'
  
+ CHAR(13)
   +
'UPDATE '
  
+ QUOTENAME(OBJECT_SCHEMA_NAME(c.object_id)) + '.' + QUOTENAME(OBJECT_NAME(c.object_id))
   +
' SET '
  
+ QUOTENAME(c.Name)
   +
' = '
  
+ QUOTENAME(c.Name)
   +
';' + CHAR(13) + CHAR(13)
FROM    sys.columns AS c
INNER JOIN sys.objects AS o
ON c.object_id = o.object_id
WHERE   o.type = 'U' AND c.system_type_id = 99; --NTEXT

PRINT @alterColumns;

EXECUTE sp_executesql @alterColumns;
GO

-- Update VIEW metadata

DECLARE @updateViews NVARCHAR(MAX) = '';
SELECT @updateViews = @updateViews
  
+ 'EXECUTE sp_refreshview '''
  
+ QUOTENAME(OBJECT_SCHEMA_NAME(o.object_id)) + '.' + QUOTENAME(OBJECT_NAME(o.object_id))
   +
''';' + CHAR(13)
FROM sys.objects AS o
WHERE o.type = 'V'

PRINT @updateViews;

EXECUTE sp_executesql @updateViews;
GO

It’s certainly a lot less code, which is nice. And it doesn’t use CURSORs, which is also nice.

However, it does have problems:

  • The PRINT statement in T-SQL truncates output if it goes beyond a certain length. I don’t know exactly what this length is off the top of my head, but my generated scripts were more than long enough to reach it.
  • The result of this is you can’t copy and paste the complete generated script into another query window, so it might make debugging a bit trickier in some instances.
  • The really problematic thing is that, when something goes wrong, you can’t necessarily relate it back to the exact command that failed, whereas the imperative version makes this easy since each generated command is executed individually.

So I threw away the, on the face of it, "cleverer" and more "elegant" set-based version in favour of the longer, clunkier (but easier to debug) imperative version.

I hope you find it useful and please feel free to submit patches, pull requests, bug reports, feature requests via the main project GitHub page at https://github.com/bartread/sqlscripts. (Turns out I’m building up a small library of handy scripts so I’ll be pushing a few more items into this repo in due course.)

The Spike Board: A Quick Agile Solution for Managing and Visualising Tech Spikes and Bug Bashes

I’ve recently been fortunate enough to start working with comparethemarket.com as one of my clients, specifically with their home team, who deal with their home insurance site. These guys are fully awesome, and I’m having a really good time.

What I want to do with this post is share with you a way in which we reorganised one of our agile boards, in a way that you might find helpful.

CTM make heavy use of agile techniques, to the point where bugs are often not filed in any kind of tracker (we use Mingle, when we use anything at all), but instead appear as cards on the appropriate board. The past couple of days we’ve been doing a bug bash in the run up to a release of some new functionality, and we’d written all the areas that needed testing as a big list on one of the boards.

People would pick areas for testing, write their initials by them, and mark them as completed either by ticking or crossing them off the list. Bugs were written in a separate list at the bottom of the board in the order they were encountered. After some discussion, some items we decided we didn’t care about or postponed until later, both test and bugs.

As you can imagine the board’s starting to look pretty messy at this point. Not a problem for those of us who’ve been around the whole time, but a couple of the team had been out and, at the standup this morning, it became clear that our board wasn’t really doing a great job of communicating:

  • what we’d done
  • what was left to do
  • what we’d decided not to do
  • what each of the items (test areas and bugs) actually meant

Lightweight is good but we’d probably gone a bit too far in that direction and, in fact, there was quite a bit of confusion.

The net result is we had to go through each item one by one. It didn’t take absolutely ages, but it was somewhat time-consuming.

So… we decided to rework the board to make it clearer to anyone and everyone what was happening and where we were in the process.

Here’s what we came up with for the “work item” board, where a work item is either an area for test, or a bug.

proposedworkitemboard

The basic idea is that work items are written on cards and start in the top left under proposed. They then migrate either to rejected or done on the bottom right. Obviously cards can skip over stages – so they can move directly from proposed to accepted, for example.

Note that rejected doesn’t mean rejected for all time: it just means we’ve chosen not to do something in this tech spike.

Bug prioritisation was another issue so we came up with this, although we haven’t yet needed it. In future though, when bugs are found we can write them on cards and stick them on another board that looks like this:

proposedbugboard

The axes are severity on the left (high or low) and incidence (alternatively hit probability) at the bottom. Priorities are shown in red – we just pick off the bugs in priority order. It’s rough and ready but should make it easy to prioritise.

You can obviously choose different axes that are more relevant for you if you like. Likewise, if you have different states for your work items than we use, or you have more or less of them, go ahead and use them.

Bugs that we’re fixing then become work items (on different coloured cards) that go back on the work item board, probably going straight into accepted. We probably lift them directly from the bugs board and place them on the work item board – thus the bugs board only contains live bugs we’re not actively working on.

Work item cards look like this:

proposedworkcard

Everything apart from the title and name(s) are optional, to keep it as lightweight as possible. We could just use avatars instead of names – we all have little stickies with our chosen avatar on that we add to any cards we’re working on. For things that are done it might be handy to use names so we don’t need to create loads of avatar stickies for everyone.

The cards on the bug board would be similar, but just with a title and description (optional). Potentially we could transfer them from the bug board to the work item board when we start working on them so that (i) we’re not duplicating cards, and (ii) it’s easy to see how many outstanding bugs there are.

Here’s what our work item board now looks like:

reorganisedworkitemboard

(Note that we decided not to add everything we’d already done to the new board, which comprised around two thirds of the total work items, but we took a photo as a backup so we have a record of the areas we need to test for future releases, and we’ll use the new board layout in future instead of the vanilla list.)

As you can see, it’s easy to understand:

  • the state of work items
  • how much WIP we have
  • how much is done
  • how much is left to do

Hopefully some of you will find this helpful as well.