MXNetError - dataset does not start with a valid magic number

Question

I am trying to use amazon sagemaker linear-learner algorithm, it support content type of ‘application/x-recordio-protobuf’. In preprocessing phase, i used scikit-learn preprocessing to one-hot-encode my features. Then i use linear learner estimator to with record-io converted input data.

I used package and the preprocess conversion was successful.

from sagemaker.amazon.common import write_spmatrix_to_sparse_tensor

def output_fn(prediction, accept):
    """Format prediction output

    The default accept/content-type between containers for serial inference is JSON.
    We also want to set the ContentType or mimetype as the same value as accept so the next
    container can read the response payload correctly.
    """
   if accept == 'text/csv':
        return worker.Response(encoders.encode(prediction.todense(), accept), mimetype=accept)
    elif accept == 'application/x-recordio-protobuf':
        buf = BytesIO()
        write_spmatrix_to_sparse_tensor(buf, prediction)
        buf.seek(0)
        return worker.Response(buf, accept, mimetype=accept)
    else:
        raise RuntimeError("{} accept type is not supported by this script.".format(accept))

But when linear-learner takes the input record, it fails with the error below

Caused by: [15:53:30] /opt/brazil-pkg-cache/packages/AIAlgorithmsCppLibs/AIAlgorithmsCppLibs-2.0.774.0/AL2012/generic-flavor/src/src/aialgs/io/iterator_base.cpp:100:

(Input Error) The header of the MXNet RecordIO record at position 810 in the dataset does not start with a valid magic number.

So I had a similar problem, but it had to do with _how_ I was saving my data to S3. Here's the code that worked for me: `bucket = 'my-bucket-name' buffer = io.BytesIO() smac.write_spmatrix_to_sparse_tensor(buffer, testVectors, testLabels) buffer.seek(0) key = 'my-key-name' boto3.client('s3').upload_fileobj(buffer, Bucket=bucket, Key=key, ExtraArgs={'ACL': 'bucket-owner-full-control'}) ` — matt, May 01 '19 at 22:32

score 1 · Answer 1 · answered Mar 11 '20 at 19:09

I had a similar problem, but it had to do with how I was saving my data to S3.

Here's code that worked for me:

buffer = io.BytesIO()
smac.write_spmatrix_to_sparse_tensor(buffer, testVectors, testLabels) 
buffer.seek(0)

boto3.client('s3').upload_fileobj(buffer,
                                  Bucket='my-bucket-name',
                                  Key='my-key-name',
                                  ExtraArgs={'ACL': 'bucket-owner-full-control'})

score 1 · Answer 2 · answered Dec 30 '20 at 02:19

I had the same error. The position noted in the error was the last record in the test dataset.

How to fix: remove all other files in the s3 path. You must have the data in a folder with nothing else in it (besides test and train). Once I removed the other files it ran fine.

(Input Error) The header of the MXNet RecordIO record at position XX in the dataset does not start with a valid magic number.

score 0 · Answer 3 · answered Mar 11 '20 at 19:10

Following Matt's suggestion the upload to S3 also fixed my problem, Interesting that the error was the last line for me. This AWS example and several others have a lightly different upload format that I verified. Matt's might work also.

key = 'recordio-pb-data'
boto3.resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'train', key)).upload_fileobj(buf)

MXNetError - dataset does not start with a valid magic number

3 Answers3