1

I'm a novice to talend. I'm trying to read data from hbase and make some transformations on the data in expression builder using big data batch and write the output to a file. enter image description here

now i want to get the row key of the table and apply transformations on it like below,

(concat('-',cast(cus.key as string))) as id

Here key is the rowkey of hbase table which i'm drawing data from.

And im attaching the snapshot of the mapping tab.

enter image description here

So when i basically run my job, the key of the hbase table should be picked up so that the above transformation cast(cus.key as string) should be applied on the rowkey and stored as a column id.

I want to know whether do we have any easy method to get the rowkey from the hbase table?

Thanks in advance.

jack AKA karthik
  • 885
  • 3
  • 15
  • 30

2 Answers2

1

First of all you need to create a custom rowkey (in the hbaseoutput option) when you load your data in Hbase.

You can use some ID field in order to make it unique like "key"+user_id.

Follow this : Here

At the same time you do that, store the same value ("key"+user_id) in a column that you name row_key_technical (for example)

Now you can use the rowkey like a normal column in your table. So with an thbaseinput you can retreive the rowkey store in the technical column and do whatever you want.

You need to do it in two time.

I'm not sure this is the only solution but it's one. Mybe someone have a better solution ;) .

Théo Capdet
  • 1,042
  • 3
  • 18
  • 34
  • You want me create a new hbase table from the older table!! this is not possible for me.I just want to retrieve the row key from the hbase table. – jack AKA karthik Jan 27 '17 at 05:01
  • 1
    i found a workaround to retrieve the row key from hbase table by changing the code of hbaseInput class in C:\Program Files (x86)\Talend-Studio\studio\plugins\org.talend.designer.components.mrprovider_6.2.1.20160704_1411\components\tHBaseInput – jack AKA karthik Jan 30 '17 at 08:56
1

You can force your HbaseInput component to fetch the rowkey of the Hbase table. Do the following, go the location where you have the tHbaseInput class exists.

C:\Program Files (x86)\Talend-Studio\studio\plugins\org.talend.designer.components.mrprovider_6.2.1.20160704_1411\components\tHBaseInput

And in the tHBaseInput_mrcode_main_only java jet class, There will be a method validateResult(), like below

    public boolean validateResult(org.apache.hadoop.hbase.client.Result result,
                    <%=recordStruct%> value) throws IOException {
                org.apache.hadoop.hbase.io.ImmutableBytesWritable rowKey = new org.apache.hadoop.hbase.io.ImmutableBytesWritable();
                rowKey.set(result.getRow());
                lastSuccessfulRow = rowKey.get();

                byte[] rowResult = null;
                String temp = null;

                <%
                for (int i = 0; i < mapping.size(); i++) {
                    Map<String, String> map = mapping.get(i);
                    String family_column= map.get("FAMILY_COLUMN");
                    IMetadataColumn column = mainColumns.get(i);
                    String columnName = column.getLabel();
                    String defaultValue = column.getDefault();
                    String typeToGenerate = JavaTypesManager.getTypeToGenerate(column.getTalendType(), column.isNullable());
                    JavaType javaType = JavaTypesManager.getJavaTypeFromId(column.getTalendType());
                    String patternValue = column.getPattern() == null || column.getPattern().trim().length() == 0 ? null : column.getPattern();
                    boolean isPrimitiveType = JavaTypesManager.isJavaPrimitiveType(javaType, column.isNullable());
                    String toAssign = "value." + columnName;

                    %>

                    rowResult = result.getValue(
                            org.apache.hadoop.hbase.util.Bytes.toBytes(<%=family_column%>),
                            org.apache.hadoop.hbase.util.Bytes.toBytes("<%=column.getOriginalDbColumnName()%>"));
                    temp = org.apache.hadoop.hbase.util.Bytes.toString(rowResult);

Modify the above method to below

public boolean validateResult(org.apache.hadoop.hbase.client.Result result,
            <%=recordStruct%> value) throws IOException {
        org.apache.hadoop.hbase.io.ImmutableBytesWritable rowKey = new org.apache.hadoop.hbase.io.ImmutableBytesWritable();
        rowKey.set(result.getRow());
        lastSuccessfulRow = rowKey.get();

        byte[] rowResult = null;
        String temp = null;
        value.key = org.apache.hadoop.hbase.util.Bytes.toString(lastSuccessfulRow);
        <%
        for (int i = 0; i < mapping.size(); i++) {
            Map<String, String> map = mapping.get(i);
            String family_column= map.get("FAMILY_COLUMN");
            IMetadataColumn column = mainColumns.get(i);
            String columnName = column.getLabel();
            String defaultValue = column.getDefault();
            String typeToGenerate = JavaTypesManager.getTypeToGenerate(column.getTalendType(), column.isNullable());
            JavaType javaType = JavaTypesManager.getJavaTypeFromId(column.getTalendType());
            String patternValue = column.getPattern() == null || column.getPattern().trim().length() == 0 ? null : column.getPattern();
            boolean isPrimitiveType = JavaTypesManager.isJavaPrimitiveType(javaType, column.isNullable());
            String toAssign = "value." + columnName;

            %>
            if(!"key".equalsIgnoreCase("<%=column.getOriginalDbColumnName()%>"))

Once done, delete the file "ComponentsCache.javacache" in C:\Program Files (x86)\Talend-Studio\studio\configuration. And restart the talend open studio. Now your tHbaseInput Component will fetch the row key from Hbase table. This may not be advisable for every case, but if you are using talend open studio to generate jobs and deploy the jars elsewhere, this might be helpful.

Thanks to my project manager.

jack AKA karthik
  • 885
  • 3
  • 15
  • 30