As a developer sometimes you got to go to extremes to find out what is screwing up code that, on the surface, seems to be perfectly fine, and I’ve spent a more than a couple hours debugging a chunk of code that was misbehaving (and shouldn’t have been). For my current client, I’m deeply entrenched in building a section of a business layer that contains a large chunk of reference data that needs to be kept in memory (a dll that will be used by both the Smart Client code and the server side code), and will be used in a similar fashion to a dictionary in a word processing application. There are about 40,000 entries in the “dictionary”, and instead of loading it from SQL Server all the time (it doesn’t get updated often), I wrote a chunk of code to load it from the database into the business objects, and then proceeded to write the binary serialization bits (which isn’t much code).
I knew that the serialization wasn’t going to be extremely fast, but when the process was taking more than a few minutes I knew something was wrong ( I actually got impatient and after 10 minutes gave up on it). The problem is that binary serialization is part of the .Net framework, and thus not the easiest to debug. So instead of heading down that path, I decided to convert all the business objects that were serialized to custom serialization (by implementing the ISerializable interface). Now, I know that the frameworks serialization code is supposed to be much faster than custom serialization (in most circumstances), but I figured that if I couldn’t find the problem, then at least I have code that would work, opposed to code that didn’t. I test the code, and it serializes the objects in a couple seconds, so I figuring cool I fixed the problem, and lets get the deserialization code up and work. Well, when testing deserialization I find that I didn’t implement the custom serialization correctly, and I was missing a chunk. I forgot to add the Overrides modifier (VB.Net, yuck) to the GetObjectData method of a class that inherited another class, and when I implemented it correctly, my custom serialization code had the same problem as the standard serialization (taking forever to process). So, now I knew where the problem was located, now just to find out exactly what was causing it.
In custom serialization you control what and how things get serialized, so all I did is to remove everything from serialization, and add one variable at a time, and when I hit my problem, I know which variable is the problem, and in this case, it was a particular interesting problem. The variable (actually variables) that was a problem was a string variable that was getting filled in from a SQL Server Stored proc (and here is where it gets weird). It seems that for some unknown reason the guy that built the DB created a bunch of nullable varchar(1024) columns, but for some reason decided to create a default value of “null” for the columns (why add a default value to a null column, I have no idea). When these columns were returned from the stored proc and set to the private field in my business object, it caused binary serialization to go haywire and treat them as if they were 1024 bytes long, even though it looked like they only had the 4 chars “null” in the field, and thus made the whole serialization process take forever. Instead of just updated the columns and setting them to null, I first added a check to the DB to business object mapping code, looking for the string “null” in the resultset and instead of setting the private variable to the one in the resultset, setting it to a constant “null” (thus avoiding the instance of the string that was suspect, and using a new string set to the equivalent value. Sure enough, serialization completed in a couple seconds. So I now know what the problem was, just not why. I didn’t have time to log this all with Microsoft, but I know how to reproduce the problem, so I can do that at some other time, and not affect my client. Now, all I had to do is go back and remove all the custom serialization stuff (source control rules!), and fix the source of the problem, the bad columns in the DB.
This type of stuff is pretty typical of the things that most of us programmers run into everyday. Some of us try to find out what the problem is, and fix that, and others just try to fix the symptoms. With the proper development environment (things like NUnit and Source Control) it makes it much easier for a developer to fix the problem, not the symptom.