-1

I have loaded a dataset to Dynamo DB successfully. I then want to read the data from the dynamo DB and load into a .csv file. Use this file by weka to develop the clusters. Unfortunately, only few data is read from the dynamo DB are loaded into the .csv file. The below is the snippet where the data is read from Dynamo DB. I have 2201 records in my Dynamo DB but it abruptly stops writing into the file at 1986 th record and in the 3 column. I have tried all possible solutions I cloud find online but was not able to solve it. Kindly someone please help me in this.

//scanning the data from dynamobb

ScanRequest scanRequest = new ScanRequest().withTableName(tablename[2]);
ScanResult result = client.scan(scanRequest);
for (Map<String, AttributeValue> item : result.getItems()){
        printItem(item,writer);
}

//appending the data into an empty CSV file

private static void printItem(Map<String, AttributeValue> attributeList,FileWriter writer) {
    int i=1;
    System.out.println("Inside printItem");
    try{
        int k=1;
    for (Map.Entry<String, AttributeValue> item : attributeList.entrySet()) {
        AttributeValue value = item.getValue();
        String valueName= value.getS();
        writer.append(valueName);
        if(k<=4){
        writer.append(',');
        }
        ++i;
        ++k;
    }
    writer.append('\n');
    ++count;
    }
    catch (IOException e) {
        e.printStackTrace();
}
}
kirti
  • 649
  • 1
  • 6
  • 5

1 Answers1

0

Scan is a paginated API, so you have to keep calling it repeatedly by passing in the LastEvaluatedKey as the ExclusiveStartKey. More details are in the developer guide and api docs.

The DynamoDBMapper sdk and document SDK (both ship with the aws-java-sdk) give some automatic pagination APIs so that you can just treat your table as an Iterable instead of paginating yourself. There's an example of using the low-level Java SDK like you're doing to do pagination in this section of the developer guide.

David Yanacek
  • 806
  • 7
  • 9
  • Also, if you're looking to export your table into CSV files, you may be interested in the EMR integration, which can export your tables to CSV files in S3, even on a schedule, using Data Pipeline: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/OtherServices.html – David Yanacek Nov 09 '14 at 18:20