Java / .NET String Gotchas Of The Day

Working on a project of porting a Websphere Java (JSP/Struts) based application to C# (ASP.NET MVC/Razor), I had this morning quite some fun with strings.  Some unexpected behaviour and an error on my part (taking a shortcut) gave me some minutes of headache.

First challenge – space is space right?

The Java code is ported using my own conversion routines, so the below code is the C# version of the Java code – but it’s practically the same.

 

private static char BLANK = ‘ ‘;
private static char SP_BLANK = ‘ ‘;

….
….
string result = text.replace(SP_BLANK, BLANK);

…

What is the meaning of replacing a space character ‘ ‘ with another one of the same type?   Well, what the code (not comments L) doesn’t directly show, is that the SP_BLANK char is actually not the space character that you thought it would be (0x20/32 character) but rather a non-breaking-space character code 160!

This was could be seen when running the code, or opening the source code using Visual Studio’s Binary Editor.

Ok, then the code really makes sense.

.NET Trim() really trims

Another place in the code, there was a situation where a Java .trim() was performed on a string.   Fortunately I’d implemented a string extension class with a trim() method rather than just converting the trim() to a Trim().  However my implementation of the trim() just called the .NET Trim().

So what I found out, was the the .NET Trim() really trims a string – also trimming non-breaking-space’s.   This is not the case in Java.  Lucky me, that I used string extension 🙂

If taking shortcuts – be sure to know the consequences

Going the TDD way, I started creating a unit test to prove my new trim implementation would work.   Then I made a classic mistake, taking me some minutes to figure out … as my assert failed.

 

const char SPACE = (char)0x20;
var text = SPACE + SPACE + “HELLO” + SPACE + SPACE;
Assert.AreEqual(5, text.Trim());

Initially I’d expect the content of the text to be “  HELLO  “ and trimming it would give a length of 5 right?  Wrong!  The length is 7!

So how is that?

I always preach not to concatenate strings this way, but in a hurry – I just did … shame on me.
This led to an unpredicted result, where instead of text becoming “  HELLO  “ is became “64HELLO  “.  How is that?

Well, that is becase the first space (0x20 = 32) is added with another char – which gives a value of 64.  This is added to the string “HELLO” so the 64 is converted to a string “64” and so forth.

Lesson learned – don’t concatenate strings this way!

The correct way is (and I knew, just being too lazy):

 

var text = String.Format(“{0}{0}HELLO{0}{0}”, SPACE);