0

for some reason ,we do a move from the old cluster to the new one .But our new cluster doesnot work well at first ,so we find some problems and fix it .

But in the time that I spend on fixing , some etl works ,and sqls may produce some wrong data . How to quick compare this two cluster's data of one same table ?

I have tried to use getmerge and checksum to find the diffrences , but I'm not sure whether two cluster 'result spilt in the same way , in my opinion ,two cluster may produce diffrent amount of data block , so the split in each block may be diffrent ,

how to compare the two data? almostly this two data is the same ,but the result has diffrent split amount ? This two table is large and I have a lot of the compare to do ... so...

Does any boss has a solution to deal with it ?

Thanks a lot .

bulbcat
  • 13
  • 2

1 Answers1

0

Yes there it is possible you can create an external table that points to your data of other server this way you can query the tables from other server. You need to specify the location of data in your create table statement.

just make sure that there are permission exists to access other server HDFS Also make sure permissions are consistent (ie kerberos realms are trusted) and the staging directory setting points to the location of data

it can be like

CREATE TABLE othertable (a INT, b STRING, c INT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘,’
LOCATION ‘hdfs://{Name service on second cluster}/<path to table>’;
Strick
  • 1,512
  • 9
  • 15
  • Thanks a lot . It's seems a way to deal with it . But acturly the two cluster location in diffrent city . So ... is this way may cost a lot in the bandwidth ? – bulbcat Jan 09 '20 at 11:05
  • yes it may but i dont think there is any other way possible except you distcp data from one server to other which is again same thing – Strick Jan 09 '20 at 11:26
  • Thanks . i prefer it as a backup way to compare some important data .for the need to control the bandwidth influnece , I will make some tests if the same sql produce the same amount data split with the same checksum result . Thanks another time , help me a lot. – bulbcat Jan 09 '20 at 11:44