1

I created a table using Regex Serde in Hive. In Hue it returns that the creation of the table was successful. However, when I try to return the table SELECT * FROM pricefile_edited or view the table in hue, it doesn't work and I get the Error .

The data is a 130 character (each line) with no delimiter.

Does anyone know what seems to be the problem, and help? Thanks

CREATE EXTERNAL TABLE pricefile_edited(
field1 STRING,
field2 STRING,
field3 STRING,
field4 STRING,
field5 STRING,
field6 STRING, 
field7 STRING,
field8 STRING,
field9 STRING,
field10 STRING,
field11 STRING,
field12 STRING,
field13 STRING,
field14 STRING,
field15 STRING,
field16 STRING,
field17 STRING,
field18 STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' 
 WITH SERDEPROPERTIES ("input.regex" = 
"(\\.{12})(\\.{1})(\\.{1})(\\.{24})(\\.{6})(\\.{6})(\\.{13})(\\.{6})(\\.{1})(\\.{4})(\\.{1})(\\.{3})(\\.{17})(\\.{9})(\\.{12})(\\.{1})(\\.{1})(\\.
{12})")
LOCATION '/user/hive/warehouse';

I get this error:

Bad status for request TFetchResultsReq(fetchType=0, operationHandle=TOperationHandle(hasResultSet=True, modifiedRowCount=None, operationType=0, operationId=THandleIdentifier(secret='\xc3\xd7\x97\xd3coB\xa1\x90P\x9e\xab\x82\xa4\xf4A', guid='\x80\xa1\x93\xe2\x10\xefJ\xd9\xa3\xa3\xdb\x1f\x95\x85\x88\xb3')), orientation=4, maxRows=100): TFetchResultsResp(status=TStatus(errorCode=0, errorMessage='java.io.IOException: java.io.IOException: Not a file: hdfs://quickstart.cloudera:8020/user/hive/warehouse/categories', sqlState=None, infoMessages=['*org.apache.hive.service.cli.HiveSQLException:java.io.IOException: java.io.IOException: Not a file: hdfs://quickstart.cloudera:8020/user/hive/warehouse/categories:25:24', 'org.apache.hive.service.cli.operation.SQLOperation:getNextRowSet:SQLOperation.java:463', 'org.apache.hive.service.cli.operation.OperationManager:getOperationNextRowSet:OperationManager.java:294', 'org.apache.hive.service.cli.session.HiveSessionImpl:fetchResults:HiveSessionImpl.java:769', 'sun.reflect.GeneratedMethodAccessor20:invoke::-1', 'sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43', 'java.lang.reflect.Method:invoke:Method.java:498', 'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78', 'org.apache.hive.service.cli.session.HiveSessionProxy:access$000:HiveSessionProxy.java:36', 'org.apache.hive.service.cli.session.HiveSessionProxy$1:run:HiveSessionProxy.java:63', 'java.security.AccessController:doPrivileged:AccessController.java:-2', 'javax.security.auth.Subject:doAs:Subject.java:422', 'org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1917', 'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59', 'com.sun.proxy.$Proxy21:fetchResults::-1', 'org.apache.hive.service.cli.CLIService:fetchResults:CLIService.java:462', 'org.apache.hive.service.cli.thrift.ThriftCLIService:FetchResults:ThriftCLIService.java:694', 'org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1553', 'org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1538', 'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39', 'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39', 'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56', 'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:286', 'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1149', 'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:624', 'java.lang.Thread:run:Thread.java:748', '*java.io.IOException:java.io.IOException: Not a file: hdfs://quickstart.cloudera:8020/user/hive/warehouse/categories:29:4', 'org.apache.hadoop.hive.ql.exec.FetchOperator:getNextRow:FetchOperator.java:508', 'org.apache.hadoop.hive.ql.exec.FetchOperator:pushRow:FetchOperator.java:415', 'org.apache.hadoop.hive.ql.exec.FetchTask:fetch:FetchTask.java:140', 'org.apache.hadoop.hive.ql.Driver:getResults:Driver.java:2069', 'org.apache.hive.service.cli.operation.SQLOperation:getNextRowSet:SQLOperation.java:458', '*java.io.IOException:Not a file: hdfs://quickstart.cloudera:8020/user/hive/warehouse/categories:32:3', 'org.apache.hadoop.mapred.FileInputFormat:getSplits:FileInputFormat.java:322', 'org.apache.hadoop.hive.ql.exec.FetchOperator:getNextSplits:FetchOperator.java:363', 'org.apache.hadoop.hive.ql.exec.FetchOperator:getRecordReader:FetchOperator.java:295', 'org.apache.hadoop.hive.ql.exec.FetchOperator:getNextRow:FetchOperator.java:446'], statusCode=3), results=None, hasMoreRows=None)

leftjoin
  • 36,950
  • 8
  • 57
  • 116
QBits
  • 121
  • 1
  • 11
  • This is a sample of the first line of the data file (pricefile_edited.txt): 140219040187A0 A1ZM1M 1202105EDB 9HBO0000108790000000000000000000000000000013XS1116588559 – QBits Apr 22 '19 at 11:06

1 Answers1

1

Table location seems wrong: /user/hive/warehouse - this looks like default warehouse directory. There are some directories inside. It fails on /user/hive/warehouse/categories, saying that this is not a file. Looks like this is categories table directory.

Create a folder inside /user/hive/warehouse directory and put files in it. Like this:

/user/hive/warehouse/pricefiles/pricefile_edited.txt

Change table location in the DDL:

LOCATION '/user/hive/warehouse/pricefiles

The regexp is not correct. Each column should have corresponding group in the regex (in parenthesis). Your regexp for the first column for example says it is 12 dots . because \\. means dot character literally. If you want any 12 characters it should be (.{12}) without two slashes. Also add delimiters between groups (space or tabs or what): (.{12})(.{1}) - this will take 12 characters from 140219078921B0 (140219078921) and B as second column. Fix your regexp accordingly and add spaces(delimiters) between groups if necessary. Also remove extra enter from regexp, write it as single line.

You can test regex in a simple way using regexp_extract(string, regexp, group_number):

hive> select regexp_extract('140219078921B0 A1DU1M 1223105DDB','(.{12})',1); --extract group number 1 (group 0 is the whole regexp)
OK
140219078921
Time taken: 1.057 seconds, Fetched: 1 row(s)

hive> select regexp_extract('140219078921B0 A1DU1M 1223105DDB','(.{12})(.{1})',2); --extract group number 2
OK
B
Time taken: 0.441 seconds, Fetched: 1 row(s)

And so on. Add more groups and carefully test

leftjoin
  • 36,950
  • 8
  • 57
  • 116
  • Thank you very much leftjoin. Your solution resolved the errors. – QBits Apr 22 '19 at 14:18
  • Sure. I just did that. It will not be publicly displayed because I have less than 15 reputations, But its recorded. – QBits Apr 22 '19 at 14:36