Lets say I have a partitioned table with multiple columns as partition keys e.g.
partitioned by (department string,year int, month int,day int)
So does this specific order really matter? All the online resources refer to advantage of scanning only specific sub-directories for search. But ultimately everything is a file in big data, directories seem to be more like logical grouping. And when one specifies a filter on partitioned column, hive just needs to know which files are involved and where they are located, not sure how directory is going to be useful -- it's not like as if directories are loaded in memory -- files are loaded in memory -- and the directory path is more like a label for a given file. If that's the case, no matter which order we specify for partitioning , it shouldn't matter. This is especially more evident in HDInsight where the underlying file system (BLOBs) has no concept of directory.