Questions tagged [apache-pig]

Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization which enables them to handle very large data sets.

Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization which enables them to handle very large data sets.

Pig runs in two execution modes: Local mode and MapReduce mode. Pig script can be written in two modes: Interactive mode and Batch mode.

At the present time, Pig's infrastructure layer consists of a compiler that produces sequences of Map-Reduce programs for which large-scale parallel implementations already exist (e.g. the Hadoop subproject). Pig's language layer currently consists of a textual language called Pig Latin which has the following key properties:

Ease of programming. It is trivial to achieve parallel execution of simple, "embarrassingly parallel" data analysis tasks. Complex tasks comprised of multiple interrelated data transformations are explicitly encoded as data flow sequences, easy to write and understand.
Optimization opportunities. The declarative way in which tasks are encoded permits the system to optimize their execution plan automatically, allowing the user to focus on semantics rather than efficiency.
Extensibility. Users can create their own functions to do special-purpose processing.

Official Website:

Useful Links:

5199 questions

vote

1 answer

Left outer join on more than 2 relations at a time in PIG

I am trying to perform a left outer join for more that 2 relations in a single statement in pig. Is it possible? Regards Harish

hadoop apache-pig

asked Aug 24 '15 at 15:34

HarishKotha

vote

0 answers

Pig-Attempt to access non existing field

Problem: Dumping filtered output throws an error and prints incorrect output with warnings: Error-attempt to access non-existing field in input Steps: Loaded a tab-delimited file into relation a: a = LOAD…

apache-pig

asked Aug 21 '15 at 21:08

Sathya Magesh Kumar

vote

1 answer

LOAD csv file in PigLatin

I'm trying to load a csv file in PigLatin. Record format is as follows: "ABBOTT,DEEDEE W",GRADES 9-12 TEACHER,"52,122.10",0,LBOE,ATLANTA INDEPENDENT SCHOOL SYSTEM,2010 I tried the following code: A = LOAD '/user/hduser/salaryTravel.csv' using…

csv apache-pig

asked Aug 21 '15 at 10:47

Niyas

vote

1 answer

Ingesting large files into Hive on a single node Hadoop

I want to ingest large csv files(up to 6 GB) on a regular basis into a Hadoop single node with 32 GB RAM. They key requirement is to register the data in HCatalog. (Please do not discuss requirements, it is a functional demo). Performance is not…

java hadoop garbage-collection apache-pig heap-memory

asked Aug 20 '15 at 06:00

Stefan Papp

2,199
1
28
54

vote

0 answers

Error writing to Hive table with HCatStorer()

I'm currently pulling data from a hive table over S3 with HCatLoader(), and attempting to write back out to a hive table over S3 with HCatStorer(). I'm using the default Hive install that comes baked into AWS EMR. HCatLoader works fine, and I can…

hadoop hive apache-pig hcatalog

asked Aug 19 '15 at 20:53

MattClark

vote

3 answers

Move file from local to HDFS

My environment uses Spark, Pig and Hive. I am having some trouble to write a code in Scala (or any other language compatible with my environment) that could copy a file from a local file system to HDFS. Does anyone have any advices on how I should…

scala hadoop apache-spark hive apache-pig

asked Aug 19 '15 at 12:23

Shakile

vote

2 answers

Pig-Is there any maximum number of columns for which FILTER command be applied?

I am having an input file which contains 952 columns. I would like to have a pig script which will check for schema not being altered. If altered, my script should fail. This is important because if the columns are altered or missing, my other pig…

filter apache-pig

asked Aug 17 '15 at 16:13

Sathya Magesh Kumar

vote

1 answer

Unable to run Pig latin script on Apache Tez

I am having a pseudo-distributed single cluster Ubuntu machine. I have written a simple pig latin script which runs fine while using mapreduce as execution mode. But when i use tez as excution mode using -x switch then i got following…

hadoop apache-pig apache-tez

asked Aug 17 '15 at 12:11

infiQuanta

vote

0 answers

pig script failed to validate: java.lang.RuntimeException: could not instantiate 'org.apache.pig.piggybank.storage.DBStorage'

I'm trying to output a pig script with 3 fields to a PostgreSQL database. When I dump the output, the script works fine. However when I use the DBStorage() method: register /$directory/postgresql9.4-1201.jdbc41.jar; register…

java sql postgresql output apache-pig

asked Aug 14 '15 at 17:38

zaralleru

vote

2 answers

Removing duplicates using PigLatin and retaining the last element

I am using PigLatin. And I want to remove the duplicates from the bags and want to retain the last element of the particular key. Input: User1 7 LA User1 8 NYC User1 9 NYC User2 3 NYC User2 4 DC Output: User1 9 NYC User2 4 DC Here…

hadoop apache-pig duplicates datastage

asked Aug 14 '15 at 16:09

Anil Savaliya

vote

1 answer

embedded pig error when running on pig 15 on Hadoop 2

Whenever i run any apache pig code from the terminal everythig goes well and i get the result. So i conclude that my installation for Pig 0.15.0 and Hadoop 2.7.0 is alright. The problem is when i run the pigServer from inside java code: PigServer…

hadoop apache-pig

asked Aug 13 '15 at 22:21

Abdulrahman

vote

1 answer

Multipy after joining data in PIG

I am trying to multiply two fields and take their sum after joining three tables in Pig. However I keep on getting this error: (Name: Multiply Type: null Uid: null)incompatible types in Multiply…

join apache-pig

asked Aug 09 '15 at 01:55

harshvardhan.agr

vote

0 answers

How do I flatten nested Avro records in a Pig query?

Avro schema looks like this: { "type" : "record", "name" : "name1", "fields" : [ { "name" : "f1", "type" : "string" }, { "name" : "f2", "type" : { "type" : "array", "items" : …

hadoop apache-pig hdfs avro

asked Aug 07 '15 at 18:39

Vikas

8,790
4
38
48

vote

1 answer

Reading array of strings from file with Apache Pig

I'm storing a Hive table externally, and it's a pretty simple data structure. The table is created in Hive as (user string, names array) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' COLLECTION ITEMS TERMINATED BY '\001' STORED AS…

arrays hadoop apache-pig

asked Aug 07 '15 at 00:32

JayC

vote

1 answer

ERROR 2999: Unexpected internal error. java.net.URISyntaxException: Relative path in absolute URI

pig -param CURR_TS=`date "+%F %H:%M:%S"` -f pig_script.pig After running this i am getting below Error - ERROR 2999: Unexpected internal error. java.net.URISyntaxException: Relative path in absolute URI: 04:36:33 I know the problem is with ":"…

hadoop apache-pig

asked Aug 06 '15 at 09:52

Indrajeet Gour

4,020
5
43
70

Prev 1 2 3

…

99 100 Next