I'm trying to build a model workflow in AWS SageMaker using Data Wrangler for preprocessing. I'm loading data from various tables in a Redshift instance, before mutating and joining them as required to build the model input data.
I'm a contractor working for a company who has provisioned some resource in their AWS environment for me to work, and am reading from a production database. If I do not load open the Data Wrangler flow early enough in the day (which I suspect is related to load on their system), some of the nodes which I have created will not validate, and instead show a red cross and the following error message:
RedshiftQueryExecutionIdValidationError: An error occurred when trying to invoke `describe_statement`: An error occurred (ValidationException) when calling the DescribeStatement operation: Could not retrieve the query result as it has expired after 1655759552.
The remaining un-errored nodes appear to hang in a loading/validating state. Here's a screenshot of part of the flow in this state:
I'm not sure if it's related, but I occasionally see error messages pop up saying something about "too many inflight requests".
My main issue, I think, is a lack of context. I have not worked in this environment before, and am finding it difficult to diagnose the issue. It might be possible to provision more resource, and I could likely trim down some of the information before reading it in, but I'd like to be able to read the error messages and understand what's causing the nodes to error, so that I can decide on the appropriate course of action.
Can somebody please help explain what's going on here?