Questions tagged [apache-pig]

Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization which enables them to handle very large data sets.

Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization which enables them to handle very large data sets.

Pig runs in two execution modes: Local mode and MapReduce mode. Pig script can be written in two modes: Interactive mode and Batch mode.

At the present time, Pig's infrastructure layer consists of a compiler that produces sequences of Map-Reduce programs for which large-scale parallel implementations already exist (e.g. the Hadoop subproject). Pig's language layer currently consists of a textual language called Pig Latin which has the following key properties:

Ease of programming. It is trivial to achieve parallel execution of simple, "embarrassingly parallel" data analysis tasks. Complex tasks comprised of multiple interrelated data transformations are explicitly encoded as data flow sequences, easy to write and understand.
Optimization opportunities. The declarative way in which tasks are encoded permits the system to optimize their execution plan automatically, allowing the user to focus on semantics rather than efficiency.
Extensibility. Users can create their own functions to do special-purpose processing.

Official Website:

Useful Links:

5199 questions

votes

4 answers

Computing median in map reduce

Can someone example the computation of median/quantiles in map reduce? My understanding of Datafu's median is that the 'n' mappers sort the data and send the data to "1" reducer which is responsible for sorting all the data from n mappers and…

asked Apr 11 '12 at 15:53

learner

votes

3 answers

select count distinct using pig latin

I need help with this pig script. I am just getting a single record. I am selecting 2 columns and doing a count(distinct) on another while also using a where like clause to find a particular description (desc). Here's my sql with pig I am trying to…

hadoop apache-pig

asked Feb 12 '12 at 07:55

jdamae

3,839
16
58
78

votes

4 answers

Connection Error in Apache Pig

I am running Apache Pig .11.1 with Hadoop 2.0.5. Most simple jobs that I run in Pig work perfectly fine. However, whenever I try to use GROUP BY on a large dataset, or the LIMIT operator, I get these connection errors: 2013-07-29 13:24:08,591 [main]…

hadoop apache-pig

asked Jul 29 '13 at 17:42

Andy Botelho

votes

2 answers

Pig: Get top n values per group

I have data that's already grouped and aggregated, it looks like so: user value count ---- -------- ------ Alice third 5 Alice first 11 Alice second 10 Alice fourth 2 ... Bob second 20 Bob third …

hadoop hdfs apache-pig

asked Jul 15 '13 at 13:56

Hoff

38,776
17
74
99

votes

2 answers

How to force STORE (overwrite) to HDFS in Pig?

When developing Pig scripts that use the STORE command I have to delete the output directory for every run or the script stops and offers: 2012-06-19 19:22:49,680 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 6000: Output Location Validation…

apache-pig hdfs

asked Jun 19 '12 at 22:28

valid

1,858
1
18
28

votes

5 answers

Skipping the header while loading the text file using Piglatin

I have a text file and it's first row contains the header. Now I want to do some operation on the data, but while loading the file using PigStorage it takes the HEADER too. I just want to skip the HEADER. Is it possible to do so(directly or through…

hadoop apache-pig

asked Oct 01 '13 at 11:44

Pawan Kumar

votes

5 answers

Is there any Conditional IF like operator in Apache PIG?

Actually I am writing PIG Script and want to execute some set of statements if one of the condition is satisfied. I have set one variable and checking for some value of that variable. Suppose if flag==0 then A = LOAD 'file' using PigStorage() as…

hadoop apache-pig

asked Jul 16 '13 at 06:31

Bhavesh Shah

3,299
11
49
73

votes

3 answers

How to use Cassandra's Map Reduce with or w/o Pig?

Can someone explain how MapReduce works with Cassandra .6? I've read through the word count example, but I don't quite follow what's happening on the Cassandra end vs. the "client"…

mapreduce cassandra apache-pig

asked Apr 29 '10 at 00:17

Brent

23,354
10
44
49

votes

1 answer

Find if a string is present inside another string in Pig

I want to find if a string contains another string in Pig. I found that there is a built-in index function, but it only searches for characters not strings. Is there any other alternative?

string apache-pig

asked Dec 20 '12 at 10:01

Sudar

18,954
30
85
131

votes

2 answers

STORE output to a single CSV?

Currently, when I STORE into HDFS, it creates many part files. Is there any way to store out to a single CSV file?

apache-pig

asked Mar 28 '12 at 15:34

JasonA

votes

4 answers

How can I incorporate the current input filename into my Pig Latin script?

I am processing data from a set of files which contain a date stamp as part of the filename. The data within the file does not contain the date stamp. I would like to process the filename and add it to one of the data structures within the script.…

apache-pig

asked Mar 17 '12 at 16:04

Kevin Fink

votes

6 answers

What is the best Pig plugin for Eclipse?

I'm about to start playing around with PIG-latin, and I was hoping to get some text highlighting and such for it in Eclipse. Doing a quick Google search, I saw a couple of Eclipse plugins for it. Are they all still in development? Which is the best?

eclipse eclipse-plugin editor apache-pig

asked Aug 25 '11 at 16:59

Eli

36,793
40
144
207

votes

4 answers

Filtering null values with pig

It looks like a silly problem, but I can´t find a way to filter null values from my rows. This is the result when I dump the object geoinfo: DUMP geoinfo; ([longitude#70.95853,latitude#30.9773]) ([longitude#-9.37944507,latitude#38.91780853]) …

hadoop apache-pig

asked Oct 31 '12 at 18:26

Arian Pasquali

votes

2 answers

Define tuple datas in the pig script

I am currently debugging a pig script. I'd like to define a tuple in the Pig file directly (instead of the basic "Load" function). Is there a way to do it? I am looking for something like that: A= ('name#bob'','age#29';'name#paul','age#12') The…

hadoop apache-pig

asked Sep 14 '12 at 11:14

romain-nio

1,183
9
25

votes

1 answer

Join vs COGROUP in PIG

Are there any advantages (wrt performance / no of map reduces ) when i use COGROUP instead of JOIN in pig ? http://developer.yahoo.com/hadoop/tutorial/module6.html talks about the difference in the type of output they produce. But, ignoring the…

hadoop apache-pig

asked Sep 21 '11 at 07:23

raj

3,769
4
25
43

Prev 1

…

99 100 Next