LIKE operator and ESCAPE characters (how weird does it get?)

If you tried to post a comment on my previous post, you may have noticed an impressive error message. Sorry about that. It was the bug I was trying to fix. And I did, sort of. It shouldn't get any weirder than this:

I noticed that some of my entries still suffered from the "Empty update, no affected rows"-syndrome, just some. I thought about other offending characters, such as the [ character that caused all the previous pain. But I couldn't get it pinpointed. As it turned out, this was my problem, and you can reproduce it easily. Hold on, we'll work to the Area52 bit.

First take any SQL Server database (say MyFunDB) that you can play around with. Then fire up SQL Query Analyzer (isqlw.exe) and run this bit of SQL.

USE MyFunDB
CREATE TABLE WeirdStuff(Body ntext)
INSERT INTO WeirdStuff(Body) VALUES ('[1]234567890')
SELECT COUNT(*) FROM WeirdStuff WHERE Body LIKE ('[1]234567890')

Run it and it should return zero as the row count. That's because of the [ character. Replacing the last line with:

SELECT COUNT(*) FROM WeirdStuff WHERE Body LIKE ('[[]1]234567890')

will return 1, because we have escaped the escape character. We're approaching the freaky things now. Make both the inserted value and the LIKE value longer and longer, by copy-pasting sets of characters (I used 1234567890, so I had an easy count). Once you pass the magic length of 4000 for the LIKE value, it BREAKS! Zero rows after that. 4000 characters, returns 1, 4001 characters returns 0. Go figure. Remove the [ chars from both values and it works again. So, it's only broken with escape characters and a length above 4000.

That is the reason for 'my' bug. Not really my bug, but my application suffers from it anyway. So don't try for too long to post any comments, until I have this thing worked out. I do not seem to have passed the magic boundary, so comments should work.

Comments

# Alex Thissen said:

Posting comments seems to work. Fire at will.

donderdag 17 februari 2005 10:40
# Remco van Dalen said:

According to Jeroen Mak, an ntext field only stores the first 4000 characters in the appropriate table. Any additional characters are stored somewhere else (another table or a file, he can't remember which.) Maybe this is causing your problem...

donderdag 17 februari 2005 11:25
# Frans Bouma said:

What happened to parameterized queries? ;)

LIKE @param works perfectly.

donderdag 17 februari 2005 12:22
# Alex Thissen said:

Remco and Jeroen: that was my idea as well, since 8086 bytes fit into a single row. The 8000 (ntext uses Unicode, so it uses two bytes per character) bytes is just too close to the mark. I did not look into the way ntext and text are stored internally yet.

donderdag 17 februari 2005 13:18
# Alex Thissen said:

Frans, nice to hear from you again. Normally I would use parameters, but this time round the SQL statements are generated by SQLXML. I have no influence whatsoever on the way this is done. Then again, maybe I have. I'll report this issue in the SQLXML newsgroups. Hopefully someone from MS will pick up on this issue.

donderdag 17 februari 2005 13:21
# Alex Thissen said:

Oh, and Frans: I just checked with an @param approach, but it didn't work for me. Have you got a bit of SQL so I can try it your way?

donderdag 17 februari 2005 14:31
# Frans Bouma said:

WIth a param you have to call it from .NET I think, but you can try:

DECLARE @param varchar(8000)

SET @param = '.......verryyy loooonnggg strinngggg'

SELECT * FROM Table WHERE Foo LIKE @param

?

donderdag 17 februari 2005 15:49