5

I'm trying to partition data by a column. However, when I run the the query MSCK REPAIR TABLE mytable, it returns error

Partitions not in metastore: city:countrycode=AFG city:countrycode=AGO city:countrycode=AIA city:countrycode=ALB city:countrycode=AND city:countrycode=ANT city:countrycode=ARE

I created the table from Avro by this query:

CREATE external table city (
  ID int,
  Name string,
  District string,
  Population int
)
PARTITIONED by (CountryCode string)
ROW FORMAT
SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
WITH SERDEPROPERTIES ('avro.schema.literal'='
{
  "fields": [
    {
      "name": "ID",
      "type": "int"
    },
    {
      "name": "Name",
      "type": "string"
    },
    {
      "name": "countrycode",
      "type": "string"
    },
    {
      "name": "District",
      "type": "string"
    },
    {
      "name": "Population",
      "type": "int"
    }
  ],
  "name": "value",
  "namespace": "world.city",
  "type": "record"
}
')
STORED AS AVRO
LOCATION "s3://mybucket/city"

My partition look like s3://mybucket/city/countrycode=ABC

Yves M.
  • 29,855
  • 23
  • 108
  • 144
Nhiên Ngô Đình
  • 544
  • 2
  • 6
  • 16

2 Answers2

4

This is an old question, and Athena seems to have added a warning message on this, but in case anybody else misses the first several times they try something similar...

Here is the message Athena gives when you create the table:

Query successful. If your table has partitions, you need to load these partitions to be able to query data. You can either load all partitions or load them individually. If you use the load all partitions (MSCK REPAIR TABLE) command, partitions must be in a format understood by Hive. Learn more.

It seems that the codes you are using to partition don't work with Hive (I was doing something similar, partitioning by a grouping code). So, instead of MSCK REPAIR TABLE, you need to run an ALTER TABLE for each partition (see: https://docs.aws.amazon.com/athena/latest/ug/partitions.html)

ALTER TABLE city ADD PARTITION (CountryCode='ABC') location 's3://mybucket/city/ABC/' ;

...and you'll have to run that each time you add new county code bucket.

Bart Mosley
  • 136
  • 5
0

You definitely need a trailing slash in your location: https://docs.aws.amazon.com/athena/latest/ug/create-table.html

Maybe also try lowercase for the partition column PARTITIONED by (countrycode string).

Did you try to add the partitions manually in Glue Catalog or via Crawler? Did this work?

Philipp Johannis
  • 2,718
  • 1
  • 15
  • 18