TL, DR: we've been serializing some data in some tables of our SQL DB. Unfortunately that serialization technique wastes a lot of space for markup characters. We've found a new, more efficient way: to keep backwards compatibility, is it safe to adopt the following logic? -> When the serialization occurs, it always occurs using the new, more efficient way. When deserialization occurs, we check whether the string uses the new serialization format or the old one --> we then deserialize with the appropriate method. Is this robust? Can this be used in production? Aren't there any subtle problems with this approach?
Greetings. I'm working on an application which interacts with a SQL database. To achieve a specific business requirement, we've been serializing some data in a special column of our DB tables, of type ntext. Basically, in each cell of this column, we serialize an array of "Attributo" object, so typeof(T) is Attributo[]:
The "Attributo" definition is like the following:
public class Attributo
{
public virtual String Nome { get; set; }
public virtual String Valore { get; set; }
public virtual String Tipologia { get; set; }
}
- Deserialization to read the actual values:
XMLUtilities.Deserialize<Attributo[]>(value));
Serialization to store the values in the column (for each row..):
XMLUtilities.Serialize(attributes.ToArray());
And this is the helper class, which makes use of the XmlSerializer object:
public static class XMLUtilities
{
public static T Deserialize<T>(String xmlString)
{
XmlSerializer serializer = new XmlSerializer(typeof(T));
using (TextReader reader = new StringReader(xmlString))
{
return (T) serializer.Deserialize(reader);
}
}
public static String Serialize<T>(T xmlObject)
{
MemoryStream stream = new MemoryStream();
XmlSerializer serializer = new XmlSerializer(typeof(T));
XmlSerializerNamespaces xmlnsEmpty = new XmlSerializerNamespaces();
xmlnsEmpty.Add("", "");
serializer.Serialize(stream, xmlObject, xmlnsEmpty);
stream.Seek(0, SeekOrigin.Begin);
using (StreamReader reader = new StreamReader(stream))
{
return reader.ReadToEnd();
}
}
}
Now, the problem with this technique is that it wastes a lot of space for markup characters. This is an example string which is stored on the db:
<?xml version="1.0"?>
<ArrayOfAttributo>
<Attributo>
<Nome>Leakage_test_Time_prg1_p1</Nome>
<Valore>4</Valore>
<Tipologia>Single</Tipologia>
</Attributo>
<Attributo>
<Nome>Leakage_air_Volume_p1</Nome>
<Valore>14</Valore>
<Tipologia>Single</Tipologia>
</Attributo>
</ArrayOfAttributo>
So, we've found a more concise way of serializing these Attributo[], which produces this kind of output:
<ArrayOfA xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<A>
<N>Leakage_test_Time_prg1_p1</N>
<V>4</V>
<T>Single</T>
</A>
<A>
<N>Leakage_air_Volume_p1</N>
<V>14</V>
<T>Single</T>
</A>
</ArrayOfA>
then, to preserve backwards compatibility, which is the core issue we have implemented the following logic:
- During serialization:
we always serialize in the new, more concise fashion
- During deserialization:
we check whether the string starts with:
<?xml version="1.0"?>
or not. If that is the case, this is an old entry so we deserialize it in the old way. Otherwise, we deserialize using the new format.
We achieved that by decorating "Attributo" this way:
[DataContract(Name = "A", Namespace= "")]
public class Attributo
{
[DataMember(Name = "N")]
public virtual String Nome { get; set; }
[DataMember(Name = "V")]
public virtual String Valore { get; set; }
[DataMember(Name = "T")]
public virtual String Tipologia { get; set; }
}
and by performing the following changes to our serialization/deserialization methods, which now, for the new serialization technique, rely on DataContractSerializer object:
public static T Deserialize<T>(String xmlString)
{
//let's see if it's an old-style entry...
if (xmlString.StartsWith("<?xml version=\"1.0\"?>\r\n<ArrayOfAttributo>"))
{
try
{
XmlSerializer serializer = new XmlSerializer(typeof(T));
using (TextReader reader = new StringReader(xmlString))
{
return (T)serializer.Deserialize(reader);
}
}
catch { }
}
//..then it must be a new-style one
DataContractSerializer ser = new DataContractSerializer(typeof(T));
using (Stream s = _streamFromString(xmlString))
{
return (T) ser.ReadObject(s);
}
}
public static String Serialize<T>(T xmlObject)
{
MemoryStream stream1 = new MemoryStream();
DataContractSerializer ser = new DataContractSerializer(typeof(T));
ser.WriteObject(stream1, xmlObject);
stream1.Position = 0;
StreamReader sr = new StreamReader(stream1);
string xmlString = sr.ReadToEnd();
return xmlString;
}
private static Stream _streamFromString(string s)
{
MemoryStream stream = new MemoryStream();
StreamWriter writer = new StreamWriter(stream);
writer.Write(s);
writer.Flush();
stream.Position = 0;
return stream;
}
Everything seems to be working with this approach but we want to assess every possible risk before proceeding any further. Is this safe to use in production?