1

We are experiencing some difficulty unpacking COMP-3 fields containing both numeric and date data from a file provided to us by one of our vendors.

The file specification provides the following information:

0066-0070 DATE-OPENED S9(9) COMP-3

The specification indicates that the date will expand to MMDDYYYY format.

When I retrieve this block of data from the file, I can load it into memory and see that I retrieve 5 bytes of data. (In the file, there is one byte per character.) The bytes retrieved are as follows:

0: 10
1:  0
2: 18
3:  0
4:  2

There's no sign overpuched into the least significant digit (where it always appears), so that's not an issue here. The bits expand into the following nibbles:

0 1 0 0 1 2 0 0 0 2

There's a couple of problems here:

  1. It's highly unlikely that 01001200 represents a valid date in MMDDYYYY format, and yet this seems to be how the data was packed into the field.

  2. When a COMP-3 field is unpacked, the template specifies that it should expand to 9 characters, but if a COMP-3 is expanded, it's size will ALWAYS double (producing a string with an even number of characters). As a result, there is a mismatch between the expected size and the unpacked size.

  3. No algorithm that I can find on the web seems to work for unpacking this data. Nothing seems to be able to come up with a recognizable date for any of the (supposedly) BCD values in our source file.

At this point I suspect that we may not be dealing with a true BCD format. However, keeping in mind that I should always doubt myself and not the tool, I am seeking suggestions for what I could be doing wrong in both my understanding of the COMP-3 format and the nature of the data I'm looking at.

My understanding of the format is taken from the following sources:

It's worth noting that I have attempted converting the data from EBCDIC to ASCII and vice versa before attempting to unpack it; neither produced any intelligible results. I've attempted every algorithm I could find on the Internet, and none of them seem to be producing any useful results.

I suppose, in the end, my question is: Am I actually dealing with BCD or COMP-3 data here?

Update

In answer to some questions:

  1. Once I've determined that the value contains a sign nibble, I clear that nibble.
  2. I've included all the code I can so that you can see exactly what I'm doing. The class is designed to provide lots of diagnostic info (like Bytes and Nibbles properties) so that you can see what it came up with after you parse the value.

We have both original and unparsed files on hand to use as reference materials. The date I'm expecting to get back is something along the lines of 06152008 (that's off the top of my head, but you get the gist). The value I'm computing is nothing like that.

Per request, the individual nibbles:

0 1 0 0 1 2 0 0 0 2

And for those who are interested in how I'm doing it, the class that's unpacking:

using System.Collections.Generic;
using System.Linq;
using System.Text;

internal class PackedDecimal
{
    #region Fields
    private bool _isPositive;
    private bool _isNegative;
    private bool _isUnsigned = true;

    #endregion

    #region Constructor
    /// <summary>
    /// Initializes a new instance of the <see cref="PackedDecimal"/> class.
    /// </summary>
    public PackedDecimal()
    {
    }

    /// <summary>
    /// Initializes a new instance of the <see cref="PackedDecimal"/> class.
    /// </summary>
    /// <param name="compressedDecimal">The compressed decimal.</param>
    public PackedDecimal(string compressedDecimal)
    {
        this.ParsedValue = this.Parse(compressedDecimal);
    }
    #endregion

    #region Properties
    /// <summary>
    /// Gets the bytes.
    /// </summary>
    public IEnumerable<byte> Bytes { get; private set; } 

    /// <summary>
    /// Gets the hexadecimal values.
    /// </summary>
    public IEnumerable<string> HexValues { get; private set; }

    /// <summary>
    /// Gets or sets a value indicating whether this instance is positive.
    /// </summary>
    /// <value>
    /// <c>true</c> if this instance is positive; otherwise, <c>false</c>.
    /// </value>
    public bool IsPositive
    {
        get { return this._isPositive; }
        set
        {
            this._isNegative = !this.IsPositive;
            this._isUnsigned = false;
        }
    }

    /// <summary>
    /// Gets or sets a value indicating whether this instance is negative.
    /// </summary>
    /// <value>
    /// <c>true</c> if this instance is negative; otherwise, <c>false</c>.
    /// </value>
    public bool IsNegative
    {
        get { return this._isNegative; }
        set
        {
            this._isNegative = value;
            this._isPositive = !value;
            this._isUnsigned = false;
        }
    }

    /// <summary>
    /// Gets a value indicating whether this instance is unsigned.
    /// </summary>
    /// <value>
    /// <c>true</c> if this instance is unsigned; otherwise, <c>false</c>.
    /// </value>
    public bool IsUnsigned { get { return this._isUnsigned; } }

    /// <summary>
    /// Gets the nibbles.
    /// </summary>
    public IEnumerable<int> Nibbles { get; private set; }

    /// <summary>
    /// Gets the parsed value.
    /// </summary>
    public string ParsedValue { get; private set; }
    #endregion

    /// <summary>
    /// Parses the specified value.
    /// </summary>
    /// <param name="value">The value.</param>
    /// <returns></returns>
    public string Parse(string value, SourceEncoding sourceEncoding = SourceEncoding.Ascii, int decimalPlaces = 0)
    {
        var localValue = value; // Encoding.Convert(Encoding.ASCII, Encoding.GetEncoding("IBM037"), value.ToByteArray()).FromByteArray();
        var sign = this.GetSign(localValue, out localValue);
        var bytes = localValue.ToByteArray();
        var nibbles = new List<int>();

        var buffer = new StringBuilder();

        foreach (var b in bytes)
        {
            var hi = (int)b.HiNibble();
            var lo = (int)b.LoNibble();
            nibbles.Add(hi);
            nibbles.Add(lo);

            buffer.AppendFormat("{0}{1}", hi, lo);
        }

        this.Bytes = bytes;
        this.Nibbles = nibbles;
        this.HexValues = nibbles.Select(v => v.ToString("X"));

        switch (sign)
        {
            case Sign.Unsigned:
                this.ParsedValue = buffer.ToString();
                break;

            case Sign.Positive:
                this.ParsedValue = "+" + buffer;
                break;

            case Sign.Negative:
                this.ParsedValue = "-" + buffer;
                break;
        }

        this.IsPositive = sign == Sign.Positive;
        this.IsNegative = sign == Sign.Negative;

        return this.ParsedValue;
    }

    #region GetSign Method
    /// <summary>
    /// Gets the sign for the packed decimal represented by this instance.
    /// </summary>
    /// <param name="value">The value to analyze.</param>
    /// <param name="buffer">Receives <paramref name="value"/>, less the sign digit if it is present.</param>
    /// <returns>The sign for the packed decimal represented by this instance.</returns>
    /// <remarks>If the value provided does not include a sign digit, it is assumed to be unsigned.</remarks>
    private Sign GetSign(string value, out string buffer)
    {
        var lastDigit = value.ToByteArray().Last();
        var loNibble = lastDigit.LoNibble();
        var hiNibble = lastDigit.HiNibble();

        var result = Sign.Unsigned;

        var hasSignDigit = true;

        switch (hiNibble)
        {
            case 0xC0: // "c"
                result = Sign.Positive;
                break;

            case 0xD0: // "d"
                result = Sign.Negative;
                break;

            case 0xF0: // "f"
                result = Sign.Unsigned;
                break;

            default:
                hasSignDigit = false;
                break;
        }

        // Remove the sign digit if it's present.
        buffer = hasSignDigit 
            ? value.Substring(0, value.Length - 1) + loNibble 
            : value;

        return result;
    }
    #endregion

    #region Sign Enum
    private enum Sign
    {
        Unsigned,
        Positive,
        Negative
    }
    #endregion
}

And the extension methods that support it:

using System;
using System.Linq;
using System.Text;

public static class Extensions
{
    /// <summary>
    /// Gets the high nibble (the high 4 bits) from a byte.
    /// </summary>
    /// <param name="value">The byte from which the high 4-bit nibble will be retrieved.</param>
    /// <returns>A byte containing the value of this byte, with all bits shifted four bits to the right.</returns>
    public static byte HiNibble(this byte value)
    {
        return (byte)((value & 0xF0) >> 4);
    }

    /// <summary>
    /// Gets the low nibble (the lowest 4 bits) from this byte.
    /// </summary>
    /// <param name="value">The byte from which the low 4-bit nibble will be retrieved.</param>
    /// <returns>A byte containing the value of this byte, with the high four bits discarded.</returns>
    public static byte LoNibble(this byte value)
    {
        return (byte)(value & 0x0F);
    }

    /// <summary>
    /// Gets the individual bytes from a string.
    /// </summary>
    /// <param name="value">The string to convert to a byte array.</param>
    /// <returns>An array of bytes representing the string.</returns>
    public static byte[] ToByteArray(this string value)
    {
        var bytes = new byte[Encoding.ASCII.GetByteCount(value)];
        Buffer.BlockCopy(value.ToCharArray(), 0, bytes, 0, bytes.Length);
        return bytes;
    }
}

public enum SourceEncoding
{
    Ascii,
    Ebcdic
}
Mike Hofer
  • 16,477
  • 11
  • 74
  • 110
  • You mean it is X'0100120002'? – Bill Woodger Oct 16 '15 at 18:27
  • No; my understanding is that each nibble stores a *decimal* digit (0-9). The digits are simply packed tighter together because those values can be squeezed into 4 bits each. It's literally "0100120002", which is meaningless in date terms. – Mike Hofer Oct 16 '15 at 18:49
  • I'm guessing that the data is not COMP-3. Reading the page on packed fields (2nd link in your question), the low nybble of the last byte should be C, D, or F (12, 13, or 15). The second clue for me is that the first byte read should not contain a Hex "A" nybble. That does not translate to a 0-9 digit very well. So, if we even reverse the bytes, it still doesn't conform to the packed format that it should. – Martin Soles Oct 16 '15 at 19:11
  • @MartinSoles where do you see an A? – Bill Woodger Oct 16 '15 at 19:26
  • Thanks. The answer was Yes. – Bill Woodger Oct 16 '15 at 19:26
  • Is the data you are showing from the source file, or after you've "converted" it from EBCDIC (or done anything else to it)? – Bill Woodger Oct 16 '15 at 19:30
  • @BillWoodger The first byte is a 10, in Hex that would be 0A. – Martin Soles Oct 16 '15 at 19:35
  • @MartinSoles OK, I guess I still don't know what the actual value is :-) – Bill Woodger Oct 16 '15 at 19:57
  • Mike, can you show the value, nybble by nybble. What I meant by the X'...' notation was the value, with a hexadecimal representation of each byte. Is that what it looks like in hex, binary, whatever you want to call it. Do you know what value it should be (did the vendor list values of their sample data)? – Bill Woodger Oct 16 '15 at 20:01
  • x0A00120002 The last 2 should be ignored (it's supposed to be the sign). If that A was supposed to be a 1, you would have the date [0] 10-01-2000. Now, whether this is January 10th or October 1st is up to debate. :) – Martin Soles Oct 16 '15 at 20:22
  • @BillWoodger I've updated the post to provide the nibbles and the code I'm using to arrive at this. It's more than likely something I'm doing wrong, but I've been staring at this for a week so I will totally not be surprised. – Mike Hofer Oct 16 '15 at 20:55
  • @MartinSoles I remove the sign from the nibbles once I've determined that it's present. That way it doesn't skew the result. See the code, newly attached. – Mike Hofer Oct 16 '15 at 20:56
  • Thanks. Can you show the original data please? – Bill Woodger Oct 16 '15 at 20:59
  • I agree with @MartinSoles, given MMDDYYYY and a start date, that could easily be 1st October, 2000. A packed-decimal (on an IBM Mainframe, it is up to the compiler how data is stored, so can't be sure about all COBOL compilers) does not have an overpunched sign. A zoned-decimal does. For packed-decimal it is the low-order nybble which contains the sign indication. You seem to be testing the high-order, and you also seem to have shifted the value. You *must not* "convert" this to ASCII. It is not EBCDIC, it is just binary values. Convert and you will pickle the data. – Bill Woodger Oct 16 '15 at 23:53
  • The field is defined with nine digits, but only eight. The first digit will always be zero, and you can ignore it (or, better, validate that it is zero). The next two digits, spanning a byte, are the MM, then the DD, then the YYYY. For the sign, A through F are "valid". If you look at the tag info for comp-3, you'll see what A, B and E represent (I wrote the tag wiki and excerpt for comp-3). The other thing is, for a signed field (the S in the S9(9)) you should not see F. – Bill Woodger Oct 16 '15 at 23:58
  • For a signed field, the best is to validate for C or D, and reject the data for A, B, E and F. You want to start off with good data, data which "confirms to PICture" in COBOL-speak. Even better, it is a date. You should only see "positive" dates, so only expect C and reject anything else. Even better, lots better, is the vendor to not give you packed-decimal or binary fields, but for them to convert them to zoned-decimal with a separate sign, and an actual decimal, or a fixed number of decimals, or a scaling value, whichever is the most convenient for you. Perhaps too late, or not possible. – Bill Woodger Oct 17 '15 at 00:03
  • Did you get a solution? – Bill Woodger Oct 20 '15 at 21:22
  • I presume this is c# and c# uses unicode ?? in which case your whole approach will not work. The string in PackedDecimal(string compressedDecimal) implies an EBCDIC to Unicode conversion which will screw everything up. – Bruce Martin Oct 21 '15 at 04:56
  • To process comp-3 values, you need to process the raw-ebcidic bytes; no character conversion – Bruce Martin Oct 21 '15 at 04:58
  • Since you have the copybook, you could try the Cobol2Csv or Cobol2Xml programs in JRecord or viewing the file in the RecordEditor (it can save the file as Csv or Xml) (note: both require java and I am the author of both). – Bruce Martin Oct 21 '15 at 05:02
  • @BillWoodger You pointed me in the right direction. I'm no longer attempting any conversion from EBCDIC or ASCII, but instead working with the raw bytes. I'm also looking at the low order nibble of the least significant byte in the array to get the sign digit. However, some of the data is still coming out wonky. Still working on it. – Mike Hofer Oct 21 '15 at 17:28

0 Answers0