Questions tagged [apache-pig]

Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization which enables them to handle very large data sets.

Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization which enables them to handle very large data sets.

Pig runs in two execution modes: Local mode and MapReduce mode. Pig script can be written in two modes: Interactive mode and Batch mode.

At the present time, Pig's infrastructure layer consists of a compiler that produces sequences of Map-Reduce programs for which large-scale parallel implementations already exist (e.g. the Hadoop subproject). Pig's language layer currently consists of a textual language called Pig Latin which has the following key properties:

  • Ease of programming. It is trivial to achieve parallel execution of simple, "embarrassingly parallel" data analysis tasks. Complex tasks comprised of multiple interrelated data transformations are explicitly encoded as data flow sequences, easy to write and understand.
  • Optimization opportunities. The declarative way in which tasks are encoded permits the system to optimize their execution plan automatically, allowing the user to focus on semantics rather than efficiency.
  • Extensibility. Users can create their own functions to do special-purpose processing.

Official Website:

Useful Links:

5199 questions
1
vote
1 answer

matrix multiplication apache pig

I am trying to perform matrix multiplication in pig latin. Here's my attempt so far: matrix1 = LOAD 'mat1' AS (row,col,value); matrix2 = LOAD 'mat2' AS (row,col,value); mult_mat = COGROUP matrix1 BY row, matrix2 BY col; mult_mat = FOREACH mult_mat…
Fortunato
  • 567
  • 6
  • 18
1
vote
2 answers

PIG command execution

I am learning Hadoop by myself so I am not sure if what I asking is even a problem. When I run the command pig -x local to run it locally, i get the following message: 15/10/05 15:23:28 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL …
Anonymous Person
  • 1,437
  • 8
  • 26
  • 47
1
vote
0 answers

PIG UDF's in Hive

Apologies for no code on this as this is a Generic Question - Can PIG UDF's be consumed from Hive? Specifically can the PIG Apache DataFu (http://datafu.incubator.apache.org/) UDF'S be used in HIVE? I saw a Jira about using HIVE UDF's in PIG -…
myloginid
  • 1,463
  • 2
  • 22
  • 37
1
vote
0 answers

How does Pig integrate with Map Reduce [Research Project]

I am working on a master thesis project which aims at integrating a custom Map Reduce framework with similar MR interface but own implementation and pipeline, with higher level language frameworks as PIG. Currently, the MR master and workers have…
1
vote
2 answers

How to split a tuple in character '\' in PIG

I'm beginning learn PIG and I want to split a tuple in character '\'. My original tuple is (192.168.2.227\al0000) and I need to split it in '\' (192.168.2.227, al0000) I tryed to use B = FOREACH original GENERATE FLATTEN (STRSPLIT(tuple,…
1
vote
0 answers

Not able to connect to metastore using Thrift URI - Hive

Can someone please help me with below issue, I have added thrift uri value in hive-site.xml. Aslo how can i verify the correct uri value? I am running this command grunt> battingdata = LOAD 'default.batting' USING …
Manish H
  • 11
  • 3
1
vote
0 answers

Load JSON array using Pig

I have a file formatted as JSON array per line. Something like ["6400000000",{"status":"FINE","ok":"false","addresses":"00:00:00:00:00:00"}] ["4900000000",{"status":"FINE","ok":"true","addresses":"00:00:00:00:00:00"}] i'm running the following on…
1
vote
1 answer

PigLatin mismatched input ';' expecting LEFT_PAREN (IBM BIGINSIGHTS)

Sorry for a naive question. I am a newbie. I have a Pig script and getting below error: ERROR [main] org.apache.pig.tools.grunt.Grunt - ERROR 1200: mismatched input ';' expecting LEFT_PAREN This is how…
1
vote
0 answers

Counting sub-directories from a log file containing directory paths in Pig

I have a huge log file which contains directory paths as one of the columns. For instance, / /a /a/b /a/b/e /d /d/f /e There are no duplicate lines in the log. My question is, using Pig, how do I count the number of sub-directories under each…
Shane R
  • 11
  • 2
1
vote
1 answer

error writing to mongodb from pig

I'am trying to use the mongo hadoop connector with pig or streaming to load/store data from mongodb. using pig i have following problem: $cat process.pig REGISTER /usr/hdp/2.2.4.2-2/hadoop/lib/mongo-java-driver-3.0.2.jar REGISTER…
onebitaway
  • 103
  • 4
1
vote
0 answers

Apache Pig: LIMIT inside FOREACH referencing toplevel field, Scalar has more than one row in the output

This question is similar to one asked two years ago, however for some reason, it does not work for me. Actually this is a combination of two ideas (answered questions) as given in the header. The example below replicates the accepted solution…
Harlan Nelson
  • 1,394
  • 1
  • 10
  • 22
1
vote
0 answers

Exception when trying to execute a pig operation from java

I am unable to execute a pig command from Java. The command is to create a relation, load an input file in the relation and then store it into a file in hdfs path. Following is the error I am getting: 15/09/23 05:02:50 WARN mapReduceLayer.Launcher:…
Abhisekh
  • 151
  • 1
  • 1
  • 10
1
vote
1 answer

PIG: Filter a string on a basis of a phrase

I was wondering if it is possible yo filter a string on the basis of the phrase? For example,I want to count number of times when ps3(ps 3) appears in the query. I am not sure how not to use exact match with the filter condition for "ps 3" as do not…
madbitloman
  • 816
  • 2
  • 13
  • 21
1
vote
0 answers

FILTER ON column from another relation in PIG

Suppose, I have the following data in PIG. DUMP…
Ravi
  • 55
  • 7
1
vote
1 answer

Convert date with milliseconds using PIG

Really stuck on this! Assume I have a following data set: A | B ------------------ 1/2/12 | 13:3.8 04:4.1 | 12:1.4 15:4.3 | 1/3/13 Observations A and B are in general in the format minutes:seconds.milliseconds like A is a click and B is a…
madbitloman
  • 816
  • 2
  • 13
  • 21