Questions tagged [aws-databrew]
19 questions
1
vote
0 answers
Why is Pyarrow and Pandas Dataframe Compression Create Higher Memory Files Than AWS Databrew?
I'm going from a dataframe to a parquet file using pyarrow or pandas dataframe function 'to_parquet' and in both of them, they have a field to specify what kind of compression you want done. The issue is when I generate the parquet files using these…

user20035230
- 11
- 2
1
vote
0 answers
AWS Databrew DateTime format coversion error
I've imported a CSV file in AWS Databrew. By default, it has converted every date-time column in string. I need to check whether a field is in date-time format or not. When I'm trying to convert "Source" column into "timeStamp" format, it's giving…

Mayank Kaushik
- 11
- 1
1
vote
0 answers
AWS Glue Databrew, manipulating data with code
In Glue Databrew, there is a part that you can manipulate data with recipes. It has some conditions. You can prepare the data without coding. For example you can say "if City Column's value is 'New York', make it 'NY'" with the key condition words…

o_o
- 11
- 1
1
vote
1 answer
Unable to reach AWS Glue to get connection in DataBrew
I'm trying to get started with AWS Databrew using connection to Redshift. I did add connection to AWS Glue and it is working while testing. When databrew tries to use this connection it gives following error. Both databrew and glue are on same…

vivekpadia70
- 1,039
- 3
- 10
- 30
0
votes
0 answers
Can I have AWS Glue Databrew update a Redshift table instead of create a new one every time the job runs?
We are looking to implement AWS Glue Databrew for our analysts to have more control over datasets they want to create and then use. When testing, I have found that I can have the output go to Redshift and create a new table every time the job runs,…

jtl7034
- 1
0
votes
0 answers
Aws Data Brew -
I have requirement, I have 3 columns A,B,C in data i want to put if..else / switch condition on C columns and if true then return A*B values. That means output of any condition result math functions on A,B columns. Please help me how to achieve this…
0
votes
0 answers
Attribute not found with AWS SAM databrew?
I am trying to create a data quality validation for set of files in s3. For that I have chose AWS data brew and have created a dataset, data quality rules
and a data profile job via SAM template.
Here, Once a dataset is created I have to refer the…

Dhivakhar Venkatachalam
- 345
- 2
- 16
0
votes
0 answers
Why is AWS Glue DataBrew project when created, Is trying to load all data from the RDS table
When we create a DataBrew project and refer to the Dataset based on JDBC connection
we find that the RDS(MySQL database) executes the query=> select * from table. But
our table contains huge data. Why is the complete data being loaded ?. Can we…

SB1306
- 1
- 1
0
votes
0 answers
How to flatten a json file which has structs and arrays in the same file with AWS Glue
I used to work with Azure but I'm very very new on AWS. I do have a situation similar with this question . I have lots of json files on S3 bucket and in the files I have structs and arrays. I need to flatten both of them and keep it S3 again.
End of…

Ensar
- 17
- 4
0
votes
0 answers
How to select all columns within a JSON structure in aws databrew?
Databrew recipes can be written under JSON for transformations that will be used more than once for multiple datasets.
This is an example that i copied from Databrew Developer Guide to do joins between datasets:
`
{
"Action": {
…

tenayta
- 1
- 2
0
votes
1 answer
AWS S3 bucket notification lambda throws exception (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey)
We have a AWS Glue DataBrew job which puts the output to some S3 bucket folder. Then a java lambda is notified for this Put notification. But the following sample code throws exception:
S3EventNotification.S3EventNotificationRecord record =…

handle_009
- 11
- 2
0
votes
0 answers
Could not find an active AWS Glue VPC interface endpoint. Could not find an active NAT
I'm trying to create a AWS Databrew job that pulls data from an S3 folder into a AWS RDS SQL Server table and receive the following:
"AWS Glue VPC interface endpoint validation failed for SubnetId: subnet-xxx9574. VPC: vpc-xxxdd2. Reason: Could not…
0
votes
0 answers
sharing a recipe with a colleague?
AWS Glue Databrew noob here. My colleague and I have a shared s3 bucket with our datasources. We'd like to work on the same Glue Databrew project, or failing that, at least share recipes.
Looks like the only way to share a recipe between two…

Charlie
- 193
- 1
- 2
- 9
0
votes
0 answers
Aws Databrew does not support selection of role name with . from console
Aws Databrew does not allow selection of role which has . in role name while creating project. I tried using boto3 api to create project with same role name and it went through. Looks like its only an issue from console. Does not even give error in…

rishabh srivastava
- 45
- 10
0
votes
1 answer
In AWS Databrew, how can I stop the Databrew job from partitioning the result file?
All Databrew jobs that saves the result in S3 creates partitions of the resultant file. Due to this, we need to merge this partitioned files before we can use them in Excel. Is there any way by which I can stop the Databrew job from partitioning the…

Arijit
- 25
- 5