0

I am writing a XML serializer which uses reflection to recursively crawl through an object's public or private fields to store as XML and later reconstruct, and while testing my serializer on the infamously unserialzeable DataTable, it somehow serialized then tried to re-instantiate some kind of pointer and, in doing so, crossed the managed / unmanaged boundary and (thankfully) crashed instead of garbling memory. I need to find a solution to this problem, but am somewhat lost because I don't have a background in unmanaged code.

I know you can't actually serialize a pointer or a reference as-is because the value of a pointer or reference is a memory address, and you can't expect the correct object to be at that address upon re-instantiating the pointer from XML. As I understand it, I either need to detect an object which will cause this problem and ignore it, or find and serialize the object being pointed to and then, upon deserialization, deserailize that object and then point the pointers at said object's location. But I don't know how to do either; my first guess was to filter on Type.IsPointer didn't seem to stop the problem. Can what I am asking be done? Is there a better solution? Could I do this with some strategic unmanaged code?

Context: I'm making a serializer which can serialize the types the normal XmlSerializer can't (IDictionary, types with circular references, etc). My serializer ignores attributes and the implementation of ISerializeable or IXMLSerializeable; it blindly uses a handful of rules recursively to serialize all of an object's fields. It works, but it's slamming into the native / managed boundary with some objects. I'm not using binary serialization because my objects are constantly being modified and I don't know how to resolve object version conflicts with binary serialization.

EDIT: Here is the code which is crashing upon trying to re-instantiate the "System.Globalization.TextInfo" class, which I think is part of the culture object buried deep within the DataTable somewhere. These functions recursively call each other (always starting from ReInstantiateValueInstance) until the initial type parameter has been re-instantiated.

The managed/native boundary exception is thrown at "bestCtor.Invoke(parameters.ToArray())" when re-instantiating System.Globalization.TextInfo(CultureInfo)

    protected object ReCreateTypeWithParameters(Type t)
    {
        if (t.ToString() == "System.Type") return typeof(object); //we dont know the type of type

        var construct = StoreUtilities.GetConstructors(t); //gets any and all constructors for an object

        if (construct != null && construct.Count > 0)
        {
            var leastParams = (from c in construct
                               select c.GetParameters().Count()).Min();

            var bestCtor = (from c in construct
                            where c.GetParameters().Count() == leastParams
                            select c).FirstOrDefault(); //the best constructor has the least parameters - less can go wrong

            if (bestCtor != null)
            {
                List<object> parameters = new List<object>();

                foreach (var param in bestCtor.GetParameters())
                {
                    parameters.Add(ReInstantiateValueInstance(param.ParameterType));
                }

                return bestCtor.Invoke(parameters.ToArray()); //pointer types go boom here.
            }
        }           

        return null;
    }

    protected virtual object ReInstantiateValueInstance(Type t)
    {
        try
        {       
            if (t.ToString() == "System.Type") //we don't know the Type of Type
            {
                return typeof(object);
            }
            var construct = StoreUtilities.GetConstructors(t, true); //gets an object's parameterless constructors

            if (construct == null && t.IsGenericType) //no constructor, it's generic
            {
                object generic = ReCreateGenericType(t);

                if (generic == null) //if the generic type had no constructor, we use the activator.
                {
                    return Activator.CreateInstance(t);
                }
                else
                {
                    return generic;
                }
            }

            if (construct == null || construct.Count() == 0) //we have no constuctor. Try and make a placeholder object anyways.
            {
               return ReCreateTypeWithParameters(t);
            }

            object o = construct.First().Invoke(null);
            return o;
        }
        catch
        {
            return null;
        }
    }

    protected object ReCreateGenericType(Type t)
    {
        try
        {
            if (Type.IsGenericType != true) return null;
            var construct = StoreUtilities.GetConstructors(Type, false);

            if (construct != null && construct.Count() > 0)
            {
                construct = construct.OrderBy(i => i.GetParameters().Count()).ToList();
                var tParams = construct[0].GetParameters();
                List<object> paramList = new List<object>();

                foreach (var p in tParams)
                {
                    if (StoreUtilities.CanStoreAsString(p.ParameterType) == true)
                    {
                        object o = Activator.CreateInstance(p.ParameterType);
                        paramList.Add(o);
                    }
                    else
                    {
                        paramList.Add(ReInstantiateValueInstance(p.ParameterType));
                    }
                }

                return construct[0].Invoke(paramList.ToArray());
            }
            else
            {
                return Activator.CreateInstance(t);
            }
        }
        catch
        {
            return null;
        }
    }
Richard
  • 991
  • 2
  • 11
  • 24
  • 3
    Some code samples of what you're trying to do will be helpful. – IAbstract Mar 07 '13 at 01:07
  • @Richard You do realize that you *can* edit your posts (which includes the title), right? Though in this case, others already fixed the title for you. – svick Mar 07 '13 at 01:21
  • I agree with @IAbstract: could you explain which specific type are you having problems with, and also include the code you're using to serialize it (ideally with irrelevant stuff removed)? – svick Mar 07 '13 at 01:24
  • Since you seem to implement generic mechanism also think about serializing objects that don't have backing store. I.e. `IDictionary` of files in current folder, or `SqlConnection` or Data/Stream readers. It may happen that you are trying to make more generic framework than you need... – Alexei Levenkov Mar 07 '13 at 01:47

2 Answers2

0

I'm not sure how managed/unmanaged has any relevance to this, but the fundamental way to do this is to have some sort of abstraction, a label, in the serialized data for internal references. During serialization you would add references to a dictionary, or similar, along with the label so you only serialize the object once.

Deserialization would mirror this process, only creating a reference with a specific label once, the remaining references look up the existing instance by the label.

  • I actually already do those exact things already; the problem here is when the type I am serializing has a value which is a literal memory address that somehow gets serialized as a number like "56823136" – Richard Mar 07 '13 at 03:50
0

You cannot solve this problem as stated. You just can't know if an object contains a reference to "unmanaged data". It might be storing the pointer to unmanaged memory in an Int64, String, byte[], whatever. Developers use all sorts of tricks.

If you could somehow detect this and then "ignore" the objects that have unmanaged references, then you have lost the game because when you go to deserialize your data, you will end up with an incomplete object.

The only way to solve this is with help from the objects you are serializing. Either through optional interfaces they can implement to help serialize/deserialize data that cannot be found through reflection, or via attributes.

The only serializers that can work universally without this help tend to require the objects they are serializing be POCOs for a reason...

Brandon
  • 38,310
  • 8
  • 82
  • 87