0

I have a web application that creates XML feeds on the fly, depending on criteria passed in via the query string.

It is a very simple application: it reads the query string variables, and uses them to format data (cached data from the database) into the appropriate XML format.

The returned XML file is about 25MB... there is a lot of data.

I am using an XmlTextWriter to build the XML, and then returning that to the requestor as an "application/octet-stream" - an attachment they can download.

Question is: Building the XML seems to use 100% of the CPU and is causing problems with my other applications.

Does anyone have experience with building such large XML files? Any tips on making the process less CPU intensive?

Code sample is:

    map.WriteRaw("<?xml version=\"1.0\" encoding=\"UTF-8\"?>");
            map.WriteStartElement("rss");
            map.WriteAttributeString("version", "2.0");
            map.WriteAttributeString("xmlns:g", "http://base.google.com/ns/1.0");
            map.WriteStartElement("channel");
            map.WriteElementString("link", "http://www.mywebsite.com...");

            ProductCollection items = Product.GetCachedSiteProducts();
            foreach (Product p in items)
            {
map.WriteStartElement("item");
........
                    map.WriteElementString("description", p.Description);
                    map.WriteElementString("g:id", p.SiteSku);
                    map.WriteElementString("g:condition", "new");
                    map.WriteElementString("g:price", p.Price.ToString() + " USD");
...............

                    map.WriteEndElement(); //item            
                }
            }
            map.WriteEndElement();//channel    
            map.WriteEndElement();//rss    
            Response.Write(sw.ToString());

UPDATE: I am answering my own question.. thanks to those who asked me to post more code, this was an easy one when I looked more carefully.

Code was using "Response.write(map.ToString())" to output the xml. Wow, that's inefficient. Will update the code. Thanks all!

Rebecca
  • 577
  • 4
  • 11
  • can you post the code that you are actually using to build the XMLStream perhaps there are some consumption issues there.. can't really tell without seeing some code in my opinion – MethodMan Jan 09 '12 at 17:30
  • Is there ways that you can create a shell that needs values to be entered, i.e. don't rebuild it 100% of the time? Have you considered caching the responses? – Brad Semrad Jan 09 '12 at 17:30
  • We build XML files > 100 MB and that doesn't use 100% CPU... so I suspect to somehow help you need to show some source code... – Yahia Jan 09 '12 at 17:34
  • @DJKRAZE source code added... don't think that's the main issue... I was wondering if it has to do with web.config settings, or maybe it's the actual returning to the browser that is CPU intensive? – Rebecca Jan 09 '12 at 17:36
  • @Yahia source code added. thanks. – Rebecca Jan 09 '12 at 17:37
  • 1
    ProductCollection items = Product.GetCachedSiteProducts(); does this create a new instance of ProductCollection, if so are you disposing of the items variable anywhere. I would also suggest cleaning up your code and wrapping as many things as you can within a using() {} if possible some try Catches most definitely .. and the full method from where you pasted the code snippet.. – MethodMan Jan 09 '12 at 17:40
  • Have you looked at profiling the application locally to see where the performance hit is, have you also looked at creating your RSS feed using the Rss20FeedFormatter and the classes in the System.ServiceModel.Syndication namespace. – Lloyd Jan 09 '12 at 17:41
  • @DJKRAZE Yes, ProductCollection items = Product.GetCachedSiteProducts() disposes of the instance after it is done being used. – Rebecca Jan 09 '12 at 17:42
  • @Brad this script is used internally, so it only gets run when we need a new instance. We don't cache the responses, because we only need each feed once the products have been updated, and will not use again until an updated file is needed. The problem is the CPU usage during the initial build. – Rebecca Jan 09 '12 at 17:44
  • @Rebecca So it sounds like what your saying is that when you first construct the `ProductCollection` its costly? From that I assume there is a lot of work involved in that? – RobV Jan 09 '12 at 17:50

1 Answers1

3

One immediate concern that springs to mind is Response.Writer(sw.ToString())

This looks like you are writing to a StringWriter first then sending to the output stream, why not write directly to the output stream here?

I wouldn't expect that alone to make much difference to CPU usage or cause 100% CPU usage.

What is ProductCollection, as you loop over that looks to be the most likely cause of the high CPU usage. If your code is properly doing IEnumerable<Product> and obtaining products JIT this may cause a lot of CPU usage if you are obtaining the products for some persistent storage (e.g. each iteration requires some independent database calls)

Without seeing further source code it's hard to tell what might be the cause of the problem.

RobV
  • 28,022
  • 11
  • 77
  • 119
  • I was updating my question as you wrote this :) Indeed, this is the answer I came up with as well, so marking it as correct. I will also look into your suggestion for the productcollection..... the collection is actually pulled from the cache, and is not making independant database calls for each product, but I'll look into those operations as well. Thanks! – Rebecca Jan 09 '12 at 17:49