XML/XSLT

Improving Performance of XML Serializers in .Net

June 23rd, 2008

1. Problem background

.Net infrastructure makes it very easy to define and use strongly typed wrappers (XML serializes) to read from and write to an XML document. Default implementation of this support comes with a performance penalty that your program pays at run-time. By default, at runtime, .Net infrastructure will generate a serializer class on the fly, compile it using C# compiler into a temporary assembly, load the assembly and then use the generated class. Startup time cost of doing so is obvious, and may not be acceptable in many scenarios. In addition if compiler is not available in production (not installed, or disabled to run by policy), application will simply fail. This article shows how to address the problem by explicitly generating serialization assemblies and shipping them along with your application.

2. Creating sample project

Lets start by creating a sample project that uses XML serialization and demonstrating the problem. In Visual Studio go to File/New/Project…, select Visual C#/Windows/Console Application. Input “XmlSerializerLab” for the project name and hit OK.

Now that we have generated default console template let add very simple code that creates and uses XML serialization infrastructure. Below are the modifications to generated Program.cs with explanations:

   1:  using System;
   2:  using System.IO;
   3:  using System.Xml.Serialization;
   4:   
   5:   
   6:  namespace XmlSerializerLab
   7:  {
   8:   [XmlRootAttribute("MyRoot")]
   9:   public class MyXmlRoot
  10:   {
  11:   }
  12:   
  13:   class Program
  14:   {
  15:     static void Main(string[] args)
  16:     {
  17:       // create strongly typed content
  18:       MyXmlRoot root = new MyXmlRoot();
  19:   
  20:       // create serializer
  21:       var serializer = new XmlSerializer(
  22:         typeof(MyXmlRoot));
  23:   
  24:       // serialize
  25:       var ms = new MemoryStream();
  26:       serializer.Serialize(ms, root);
  27:   
  28:       // verify serialized XML
  29:       ms.Position = 0;
  30:       Console.WriteLine(
  31:         new StreamReader(ms).ReadToEnd());
  32:      }
  33:    }
  34:  }

 

Lines 2-3 import namespaces for XML serializer and stream support

Lines 8-11 define the simplest possible class to represent our XML document. We only represent the top level element which should suffice for our test.

Line 18 creates an instance of strongly typed class that we will serialize using framework

Lines 20-26 serialize our instance into memory stream

Lines 28-31 print the content of the memory stream to standard output so we can eyeball the resulting XML content

If you run your application now you should see something similar to:

image

3. Verifying the default implementation

Now we can add some code to verify that .Net runtime tries to use pre-generated serializers before generating and compiling them on the fly. By default .Net runtime will look for serializer in “AssemblyName.XmlSerializers.dll”, where “AssemblyName” is the name of the assembly that contains the actual class being serialized (MyXmlRoot in our example). To verify we will listen to domain’s AssemblyResolve event and print all the assemblies that .Net loader tried to locate but failed.

   1:  using System;
   2:  using System.IO;
   3:  using System.Xml.Serialization;
   4:   
   5:   
   6:  namespace XmlSerializerLab
   7:  {
   8:   [XmlRootAttribute("MyRoot")]
   9:   public class MyXmlRoot
  10:   {
  11:   }
  12:   
  13:   class Program
  14:   {
  15:     static void Main(string[] args)
  16:     {
  17:       AppDomain.
  18:         CurrentDomain.AssemblyResolve +=
  19:         (sender, e) =>
  20:       {
  21:         Console.WriteLine("Not found: {0}",
  22:           e.Name);
  23:         return null;
  24:       };
  25:       
  26:       // create strongly typed content
  27:       MyXmlRoot root = new MyXmlRoot();
  28:   
  29:       // create serializer
  30:       var serializer = new XmlSerializer(
  31:         typeof(MyXmlRoot));
  32:   
  33:       // serialize
  34:       var ms = new MemoryStream();
  35:       serializer.Serialize(ms, root);
  36:   
  37:       // verify serialized XML
  38:       ms.Position = 0;
  39:       Console.WriteLine(
  40:         new StreamReader(ms).ReadToEnd());
  41:      }
  42:    }
  43:  }

 

Lines 17-24 subscribe to AssemblyResolve event and output the name of the assembly being located to the console. If you run application now, you will see that .Net loader was trying to load serialization assembly for MyXmlRoot class twice. First it attempted to locate an assembly using its strong name (XmlSerializerLab.XmlSerializers, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null) and then assuming the assembly is not signed (XmlSerializerLab.XmlSerializers).

4. Solution

As I mention in the very beginning of the article solution is really simple: we need to pre-generate serialization assemblies at build time and ship them with our application. Please note that Visual Studio has a project attribute “Generate serialization assembly” in project properties/Build tab/Output section. Contrary to what its name implies, it won’t do you any good, unless you project contains Web service proxies. I’m not sure if this is by design or simply a bug, but for everything else adding an explicit post-build script (sgen /a:"$(TargetPath)" /force) will do it. Make sure that “sgen” utility is in the PATH or its full path is explicitly resolved. Alternatively you can run the command manually from Visual Studio command line. Start VS command line, change your active folder to our project’s bin/Debug and run the following:

image

If you examine Debug folder now, you will find that “XmlSerializerLab.XmlSerializers.dll” was created. Run you application now. You will see that no assembly resolution events are fired by .Net loader and startup time is considerably faster.

Intel XML Software Suite 1.0

December 23rd, 2007

If XML processing is impacting performance of your software you may want to give this new prodict from Intel a try. It is a bit disapointing that they dont support XPath/XSLT 2.0, however.

http://www.adtmag.com/article.aspx?id=21761

http://www.intel.com/cd/software/products/asmo-na/eng/366637.htm#comp 

Documents, Operating Systems and the PoSH Provider Model

June 19th, 2007

A few years ago in the course of writing the XSLT Cookbook (shameless plug) I was naturally working with XML quite heavily. When you work with a particular technology heavily you begin to look at other problems through the eyes of that technology. Sometimes this can lead to new insights and other times it can lead to the old when-all-you-have-is-a-hammer syndrome. I’ll let you judge which case my particular experience falls into.

XML is, of course, just a syntax (surface structure) for a more general Document Object Model or DOM (deep structure). The W3C DOM specification is not the most elegant of APIs (further evidence of the problems with design-by-committee); however, here I am talking about Document Models in the more general sense of uniform hierarchical models of structured data.

Read the rest of this entry »

Processing Semistructured XML - XSLT to Rescue

January 10th, 2007

Faced with the folllowing XML, the goal is to generate HTML table having an empty cell where subelement/s of <Measure> is/are missing.
Read the rest of this entry »

Good tutorial on LINQ

October 5th, 2006

Eric White has posted a good Linq Tutorial. He is one of the documentation writers on the LINQ to XML team. The code gets complex in places because he uses a functional style instead of using the built in query language but it’s good at showing what’s going on behind the scenes. His explanation of deferred execution is particularly good.

Using XSLT to Generate Delimited Text

July 27th, 2006

Undoubtedly, many of us used eXtensible Stylesheet Language Transformations (XSLT) in our daily work to transform one XML document into another or quickly present XML data in browser as HTML or XHTML. However, on a recent project we were faced with a task of generating delimited text output to be used as an input into a GUI visualization.
Read the rest of this entry »

VSTF Feature Breakdown

December 14th, 2005

I found this chart on msdn that breaks down the various features available in each of the three Visual Studio Team Foundation editions.

Note to TDD’ers: the Architects version has no inherent support for unit testing. I found this out the hard way…

Microsoft Re-invents Printf Debugging

December 1st, 2005

If you’re like me, you can’t get enough of good old printf debuggery. Thankfully, those friendly folks at Microsoft kept even bad engineers like us in mind when dreaming up features for VS.2005; they’ve added a variation on breakpoints called “TracePoints”. These don’t actually break when you reach them, instead they allow you to output logging info to the console or run macros. You can set these tracepoints to continue or to break, which makes me wonder if I’ll ever use breakpoints at all anymore. I haven’t delved into the macros at all, so if anybody has more info on that, please feel free to chime in.

The syntax for your console statement is to output variable values thus:

The value of foo is {foo}

There are also keywords to output useful trace stuff like $CALLSTACK, $PID, $TID, etc. But there’s a much more alarming thing that you can do in there:

The value of foo is {foo=-1}

Woohoo! Obviously, that little trick should be used sparingly.

In all seriousness, this is very useful stuff if you've ever tried debugging anything involving, say, paint or refresh events because whenever you break into the debugger you're mucking with your event stream; not to mention those "heisen-bugs" that either disappear under breakpoints or only appear when you add logging or instrumenting code. With tracepoints you can gather debug info without breaking the flow of the app, or even modify the flow via that {foo=5} stuff without actually touching the code.

equity 125 loan home ratepersonal 5000 loanloan hour 12000 personal loandollar 10,000 loans100k school loansloan amortizeloans 100 home financing Map

September 27th, 2005

Microsoft SQL Server Data Mining

March 28th, 2005