0% found this document useful (0 votes)

5 views10 pages

exp10

The document outlines the use of Apache Pig for data processing, detailing operations such as loading data, sorting, grouping, filtering, and joining relations using Pig Latin scripts. It provides examples of commands for sorting student data, grouping by age, filtering tuples, and performing various types of joins (self, inner, outer) on customer and order data. Additionally, it explains the structure of relations, bags, tuples, and fields in Pig, along with syntax for executing these operations.

Uploaded by

Sai Tejaswini

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views10 pages

exp10

Uploaded by

Sai Tejaswini

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Experiment 10 : Week 11

11. Install and Run Pig then write Pig Latin scripts to sort,
group, join, project, and filter your data.
Relations, Bags, Tuples, Fields
Pig Latin statements work with relations. A relation can be defined as follows:

• A relation is a bag (more specifically, an outer bag).

• A bag is a collection of tuples.

• A tuple is an ordered set of fields.

• A field is a piece of data.

pig
A = LOAD 'student' USING PigStorage() AS (name:chararray, age:int, gpa:float);

DESCRIBE A;

DUMPING THE VALUES:

MEANS DISPLAYING THE DATA

DUMP A;

(John,18,4.0F)
(Mary,19,3.8F)
(Bill,20,3.9F)
(Joe,18,3.8F)

Sort the data using “ORDER BY”

Use the ORDER BY command to sort a relation by one or more of its fields. Create a new Pig script
named "Pig-Sort" and enter the following commands to sort the drivers data by name then date
in ascending order:
A = LOAD 'student' USING PigStorage() AS (name:chararray, age:int, gpa:float);
•
ordered_data = ORDER A BY age asc;

DUMP ordered_data;

For reference: https://pig.apache.org/docs/r0.7.0/piglatin_ref2.html

Group

The GROUP operator groups together tuples that have the same group key (key field). The key field
will be a tuple if the group key has more than one field, otherwise it will be the same type as that of
the group key. The result of a GROUP operation is a relation that includes one tuple per group. This
tuple contains two fields:

• The first field is named "group" (do not confuse this with the GROUP operator) and is the
same type as the group key.

• The second field takes the name of the original relation and is type bag.

• The names of both fields are generated by the system as shown in the example below.
Note that the GROUP (and thus COGROUP) and JOIN operators perform similar functions.
GROUP creates a nested set of output tuples while JOIN creates a flat set of output tuples.

Example
Suppose we have relation A.
A = load 'student' AS (name: chararray, age: int, gpa: float);

DESCRIBE A;

A: {name: chararray,age: int,gpa: float}

DUMP A;
(John,18,4.0F)
(Mary,19,3.8F)
(Bill,20,3.9F)
(Joe,18,3.8F)

Now, suppose we group relation A on field "age" for form relation B. We can use the DESCRIBE
and ILLUSTRATE operators to examine the structure of relation B. Relation B has two fields. The
first field is named "group" and is type int, the same as field "age" in relation A. The second field is
name "A" after relation A and is type bag.

B = GROUP A BY age;

DESCRIBE B;

B: {group: int, A: {name: chararray,age: int,gpa: float}}

ILLUSTRATE B;
etc …
----------------------------------------------------------------------
| B | group: int | A: bag({name: chararray,age: int,gpa: float}) |
----------------------------------------------------------------------
| | 18 | {(John, 18, 4.0), (Joe, 18, 3.8)} |
| | 20 | {(Bill, 20, 3.9)} |
----------------------------------------------------------------------
DUMP B;

(18,{(John,18,4.0F),(Joe,18,3.8F)})
(19,{(Mary,19,3.8F)})
(20,{(Bill,20,3.9F)})

Filter
Use the FILTER operator to work with tuples or rows of data (if you want to work with columns of
data, use the FOREACH …GENERATE operation).
FILTER is commonly used to select the data that you want; or, conversely, to filter out (remove) the
data you don’t want.

Examples
Suppose we have relation A.
A = LOAD 'data' AS (a1:int,a2:int,a3:int);

DUMP A;
(1,2,3)
(4,2,1)
(8,3,4)
(4,3,3)
(7,2,5)
(8,4,3)

In this example the condition states that if the third field equals 3, then include the tuple with
relation X.
X = FILTER A BY f3 == 3;

DUMP X;
(1,2,3)
(4,3,3)
(8,4,3)

JOIN Operator:

The JOIN operator is used to combine records from two or more relations. While performing
a join operation, we declare one (or a group of) tuple(s) from each relation, as keys. When
these keys match, the two particular tuples are matched, else the records are dropped. Joins
can be of the following types −

• Self-join
• Inner-join
• Outer-join − left join, right join, and full join
This chapter explains with examples how to use the join operator in Pig Latin. Assume that we have
two files namely customers.txt and orders.txt in the /pig_data/ directory of HDFS as shown
below.
customers.txt
1,Ramesh,32,Ahmedabad,2000.00
2,Khilan,25,Delhi,1500.00
3,kaushik,23,Kota,2000.00
4,Chaitali,25,Mumbai,6500.00
5,Hardik,27,Bhopal,8500.00
6,Komal,22,MP,4500.00
7,Muffy,24,Indore,10000.00

orders.txt
102,2009-10-08 00:00:00,3,3000
100,2009-10-08 00:00:00,3,1500
101,2009-11-20 00:00:00,2,1560
103,2008-05-20 00:00:00,4,2

And we have loaded these two files into Pig with the relations customers and orders as shown
below.
grunt> customers = LOAD 'hdfs://localhost:9000/pig_data/customers.txt' USING
PigStorage(',')
as (id:int, name:chararray, age:int, address:chararray, salary:int);

grunt> orders = LOAD 'hdfs://localhost:9000/pig_data/orders.txt' USING

PigStorage(',')
as (oid:int, date:chararray, customer_id:int, amount:int);

Let us now perform various Join operations on these two relations.

Self - join
Self-join is used to join a table with itself as if the table were two relations, temporarily renaming at
least one relation.
Generally, in Apache Pig, to perform self-join, we will load the same data multiple times, under
different aliases (names). Therefore let us load the contents of the file customers.txt as two tables
as shown below.
grunt> customers1 = LOAD 'hdfs://localhost:9000/pig_data/customers.txt' USING
PigStorage(',')
as (id:int, name:chararray, age:int, address:chararray, salary:int);

grunt> customers2 = LOAD 'hdfs://localhost:9000/pig_data/customers.txt' USING

PigStorage(',')
as (id:int, name:chararray, age:int, address:chararray, salary:int);

Syntax
Given below is the syntax of performing self-join operation using the JOIN operator.
grunt> Relation3_name = JOIN Relation1_name BY key, Relation2_name BY key ;

Example
Let us perform self-join operation on the relation customers, by joining the two relations
customers1 and customers2 as shown below.
grunt> customers3 = JOIN customers1 BY id, customers2 BY id;
Verification
Verify the relation customers3 using the DUMP operator as shown below.
grunt> Dump customers3;

Output
It will produce the following output, displaying the contents of the relation customers.
(1,Ramesh,32,Ahmedabad,2000,1,Ramesh,32,Ahmedabad,2000)
(2,Khilan,25,Delhi,1500,2,Khilan,25,Delhi,1500)
(3,kaushik,23,Kota,2000,3,kaushik,23,Kota,2000)
(4,Chaitali,25,Mumbai,6500,4,Chaitali,25,Mumbai,6500)
(5,Hardik,27,Bhopal,8500,5,Hardik,27,Bhopal,8500)
(6,Komal,22,MP,4500,6,Komal,22,MP,4500)
(7,Muffy,24,Indore,10000,7,Muffy,24,Indore,10000)

Inner Join
Inner Join is used quite frequently; it is also referred to as equijoin. An inner join returns rows
when there is a match in both tables.
It creates a new relation by combining column values of two relations (say A and B) based upon the
join-predicate. The query compares each row of A with each row of B to find all pairs of rows
which satisfy the join-predicate. When the join-predicate is satisfied, the column values for each
matched pair of rows of A and B are combined into a result row.

Syntax
Here is the syntax of performing inner join operation using the JOIN operator.
grunt> result = JOIN relation1 BY columnname, relation2 BY columnname;

Example
Let us perform inner join operation on the two relations customers and orders as shown below.
grunt> customer_orders = JOIN customers BY id, orders BY customer_id;

Verification
Verify the relation coustomer_orders using the DUMP operator as shown below.
grunt> Dump customer_orders;

Output
You will get the following output that will the contents of the relation named customer_orders.
(2,Khilan,25,Delhi,1500,101,2009-11-20 00:00:00,2,1560)
(3,kaushik,23,Kota,2000,100,2009-10-08 00:00:00,3,1500)
(3,kaushik,23,Kota,2000,102,2009-10-08 00:00:00,3,3000)
(4,Chaitali,25,Mumbai,6500,103,2008-05-20 00:00:00,4,2060)

Note −
Outer Join: Unlike inner join, outer join returns all the rows from at least one of the relations. An
outer join operation is carried out in three ways −

• Left outer join

• Right outer join
• Full outer join

Left Outer Join

The left outer Join operation returns all rows from the left table, even if there are no matches in the
right relation.

Syntax
Given below is the syntax of performing left outer join operation using the JOIN operator.
grunt> Relation3_name = JOIN Relation1_name BY id LEFT OUTER, Relation2_name BY
customer_id;

Example
Let us perform left outer join operation on the two relations customers and orders as shown below.
grunt> outer_left = JOIN customers BY id LEFT OUTER, orders BY customer_id;

Verification
Verify the relation outer_left using the DUMP operator as shown below.
grunt> Dump outer_left;

Output
It will produce the following output, displaying the contents of the relation outer_left.
(1,Ramesh,32,Ahmedabad,2000,,,,)
(2,Khilan,25,Delhi,1500,101,2009-11-20 00:00:00,2,1560)
(3,kaushik,23,Kota,2000,100,2009-10-08 00:00:00,3,1500)
(3,kaushik,23,Kota,2000,102,2009-10-08 00:00:00,3,3000)
(4,Chaitali,25,Mumbai,6500,103,2008-05-20 00:00:00,4,2060)
(5,Hardik,27,Bhopal,8500,,,,)
(6,Komal,22,MP,4500,,,,)
(7,Muffy,24,Indore,10000,,,,)

Right Outer Join

The right outer join operation returns all rows from the right table, even if there are no matches in
the left table.

Syntax
Given below is the syntax of performing right outer join operation using the JOIN operator.
grunt> outer_right = JOIN customers BY id RIGHT, orders BY customer_id;
Example
Let us perform right outer join operation on the two relations customers and orders as shown
below.
grunt> outer_right = JOIN customers BY id RIGHT, orders BY customer_id;

Verificatioc
Verify the relation outer_right using the DUMP operator as shown below.
grunt> Dump outer_right

Output
It will produce the following output, displaying the contents of the relation outer_right.
(2,Khilan,25,Delhi,1500,101,2009-11-20 00:00:00,2,1560)
(3,kaushik,23,Kota,2000,100,2009-10-08 00:00:00,3,1500)
(3,kaushik,23,Kota,2000,102,2009-10-08 00:00:00,3,3000)
(4,Chaitali,25,Mumbai,6500,103,2008-05-20 00:00:00,4,2060)

Full Outer Join

The full outer join operation returns rows when there is a match in one of the relations.

Syntax
Given below is the syntax of performing full outer join using the JOIN operator.
grunt> outer_full = JOIN customers BY id FULL OUTER, orders BY customer_id;

Example
Let us perform full outer join operation on the two relations customers and orders as shown
below.
grunt> outer_full = JOIN customers BY id FULL OUTER, orders BY customer_id;

Verification
Verify the relation outer_full using the DUMP operator as shown below.
grun> Dump outer_full;

Output
It will produce the following output, displaying the contents of the relation outer_full.
(1,Ramesh,32,Ahmedabad,2000,,,,)
(2,Khilan,25,Delhi,1500,101,2009-11-20 00:00:00,2,1560)
(3,kaushik,23,Kota,2000,100,2009-10-08 00:00:00,3,1500)
(3,kaushik,23,Kota,2000,102,2009-10-08 00:00:00,3,3000)
(4,Chaitali,25,Mumbai,6500,103,2008-05-20 00:00:00,4,2060)
(5,Hardik,27,Bhopal,8500,,,,)
(6,Komal,22,MP,4500,,,,)
(7,Muffy,24,Indore,10000,,,,)

Using Multiple Keys

We can perform JOIN operation using multiple keys.

Syntax
Here is how you can perform a JOIN operation on two tables using multiple keys.
grunt> Relation3_name = JOIN Relation2_name BY (key1, key2), Relation3_name BY
(key1, key2);

Assume that we have two files namely employee.txt and employee_contact.txt in the /pig_data/
directory of HDFS as shown below.
employee.txt
001,Rajiv,Reddy,21,programmer,003
002,siddarth,Battacharya,22,programmer,003
003,Rajesh,Khanna,22,programmer,003
004,Preethi,Agarwal,21,programmer,003
005,Trupthi,Mohanthy,23,programmer,003
006,Archana,Mishra,23,programmer,003
007,Komal,Nayak,24,teamlead,002
008,Bharathi,Nambiayar,24,manager,001

employee_contact.txt
001,9848022337,[email protected],Hyderabad,003
002,9848022338,[email protected],Kolkata,003
003,9848022339,[email protected],Delhi,003
004,9848022330,[email protected],Pune,003
005,9848022336,[email protected],Bhuwaneshwar,003
006,9848022335,[email protected],Chennai,003
007,9848022334,[email protected],trivendram,002
008,9848022333,[email protected],Chennai,001

And we have loaded these two files into Pig with relations employee and employee_contact as
shown below.
grunt> employee = LOAD 'hdfs://localhost:9000/pig_data/employee.txt' USING
PigStorage(',')
as (id:int, firstname:chararray, lastname:chararray, age:int,
designation:chararray, jobid:int);

grunt> employee_contact = LOAD

'hdfs://localhost:9000/pig_data/employee_contact.txt' USING PigStorage(',')
as (id:int, phone:chararray, email:chararray, city:chararray, jobid:int);

Now, let us join the contents of these two relations using the JOIN operator as shown below.
grunt> emp = JOIN employee BY (id,jobid), employee_contact BY (id,jobid);

Verification
Verify the relation emp using the DUMP operator as shown below.
grunt> Dump emp;

Output
It will produce the following output, displaying the contents of the relation named emp as shown
below.
(1,Rajiv,Reddy,21,programmer,113,1,9848022337,[email protected],Hyderabad,113)
(2,siddarth,Battacharya,22,programmer,113,2,9848022338,[email protected],Kolka
ta,113)
(3,Rajesh,Khanna,22,programmer,113,3,9848022339,[email protected],Delhi,113)
(4,Preethi,Agarwal,21,programmer,113,4,9848022330,[email protected],Pune,113)
(5,Trupthi,Mohanthy,23,programmer,113,5,9848022336,[email protected],Bhuwaneshw
ar,113)
(6,Archana,Mishra,23,programmer,113,6,9848022335,[email protected],Chennai,113)
(7,Komal,Nayak,24,teamlead,112,7,9848022334,[email protected],trivendram,112)
(8,Bharathi,Nambiayar,24,manager,111,8,9848022333,[email protected],Chennai,111
)

A = load 'student' as (name:chararray, age:int, gpa:float);

B = load 'student' as (name:chararray, age:int, gpa:float);
dump B;
(joe,18,2.5)
(sam,,3.0)
(bob,,3.5)

X = join A by age, B by age;

dump X;
(joe,18,2.5,joe,18,2.5)

Project-Range Expressions:

Project-range ( .. ) expressions can be used to project a range of columns from input. For example:

• .. $x : projects columns $0 through $x, inclusive

• $x .. : projects columns through end, inclusive
• $x .. $y : projects columns through $y, inclusive
If the input relation has a schema, you can refer to columns by alias rather than by column position.
You can also combine aliases and column positions in an expression; for example, "col1 .. $5" is
valid.
Project-range can be used in all cases where the star expression ( * ) is allowed.
Project-range can be used in the following statements: FOREACH, JOIN, GROUP, COGROUP,
and ORDER BY (also when ORDER BY is used within a nested FOREACH block).
A few examples are shown here:
.....
grunt> F = foreach IN generate (int)col0, col1 .. col3;
grunt> describe F;
F: {col0: int,col1: bytearray,col2: bytearray,col3: bytearray}
.....
.....
grunt> SORT = order IN by col2 .. col3, col0, col4 ..;
.....
.....
J = join IN1 by $0 .. $3, IN2 by $0 .. $3;
.....
.....
g = group l1 by b .. c;
.....

There are some restrictions on the use of project-to-end form of project-range (eg "x .. ") when the
input schema is unknown (null):

• For GROUP/COGROUP, the project-to-end form of project-range is not allowed.

• For ORDER BY, the project-to-end form of project-range is supported only as the last sort
column.
.....
grunt> describe IN;
Schema for IN unknown.

/* This statement is supported */

SORT = order IN by $2 .. $3, $6 ..;

/* This statement is NOT supported */

SORT = order IN by $2 .. $3, $6 ..;
.....

ANSYS Inc. Licensing Guide
No ratings yet
ANSYS Inc. Licensing Guide
20 pages
MHT Cet Triumph Maths Mcqs Based On STD Xi Xii Syllabus MH Board Hints 12750 PDF
100% (3)
MHT Cet Triumph Maths Mcqs Based On STD Xi Xii Syllabus MH Board Hints 12750 PDF
699 pages
9_Pig Latin (1)
No ratings yet
9_Pig Latin (1)
42 pages
Pig
No ratings yet
Pig
55 pages
1.5 Relational Algebra
No ratings yet
1.5 Relational Algebra
94 pages
ABP W9-W10 Big Data Analytics Lab-PIG
No ratings yet
ABP W9-W10 Big Data Analytics Lab-PIG
11 pages
Lecture 2 Relational Algebra
No ratings yet
Lecture 2 Relational Algebra
37 pages
Database Fundamentals Slide 10
No ratings yet
Database Fundamentals Slide 10
39 pages
Relational Algebra and Relational Calculus
No ratings yet
Relational Algebra and Relational Calculus
45 pages
Relational Algebra
No ratings yet
Relational Algebra
47 pages
Lecture 1.3
No ratings yet
Lecture 1.3
58 pages
Chapter 3
No ratings yet
Chapter 3
41 pages
ADB Chapter 2 DB Part1
No ratings yet
ADB Chapter 2 DB Part1
10 pages
ch 4 relational model
No ratings yet
ch 4 relational model
61 pages
Chapter 4 - Relational Algebra
100% (1)
Chapter 4 - Relational Algebra
40 pages
Unit5 Joins SubQ Views Tran
No ratings yet
Unit5 Joins SubQ Views Tran
49 pages
Advanced D.base 4
No ratings yet
Advanced D.base 4
20 pages
The Relational Algebra and Calculus
No ratings yet
The Relational Algebra and Calculus
34 pages
Session 4 - Data Analysis For Complex Structures
100% (1)
Session 4 - Data Analysis For Complex Structures
23 pages
DBMS Experiment - Lab 5
No ratings yet
DBMS Experiment - Lab 5
26 pages
12.SQL Queiresjoins
No ratings yet
12.SQL Queiresjoins
36 pages
Relational Algera
No ratings yet
Relational Algera
11 pages
Relational Algebra
No ratings yet
Relational Algebra
18 pages
Joins
No ratings yet
Joins
43 pages
Join Type and Calculas
No ratings yet
Join Type and Calculas
36 pages
Acet
No ratings yet
Acet
8 pages
03-Relational Model
No ratings yet
03-Relational Model
40 pages
DBMS Unit - 2 Relational - Algebra
No ratings yet
DBMS Unit - 2 Relational - Algebra
66 pages
Relational - Algebra Examples
No ratings yet
Relational - Algebra Examples
35 pages
Chapter 06
No ratings yet
Chapter 06
57 pages
COMP303 Lecture No 08 - 153949
No ratings yet
COMP303 Lecture No 08 - 153949
35 pages
Relational Model: What Are Query Languages?
No ratings yet
Relational Model: What Are Query Languages?
12 pages
Introduction To Relational Model
No ratings yet
Introduction To Relational Model
29 pages
Lecture 18
No ratings yet
Lecture 18
20 pages
Module-4
No ratings yet
Module-4
8 pages
Chapter 4 - Relational Algebra
No ratings yet
Chapter 4 - Relational Algebra
35 pages
Apache Pig
No ratings yet
Apache Pig
28 pages
Grouping Records, Joins in SQL
No ratings yet
Grouping Records, Joins in SQL
11 pages
DBMS Complete
No ratings yet
DBMS Complete
26 pages
En Database Principle-c4 RelationalModeling TL - Copy
No ratings yet
En Database Principle-c4 RelationalModeling TL - Copy
21 pages
Day-3 Joins and Sub Qry
100% (1)
Day-3 Joins and Sub Qry
46 pages
Relational Algebra
No ratings yet
Relational Algebra
50 pages
04 - Relational Algebra and Calculus
No ratings yet
04 - Relational Algebra and Calculus
38 pages
Formal Relational Query Language Part 2
No ratings yet
Formal Relational Query Language Part 2
14 pages
Unit 3 Query Languages
No ratings yet
Unit 3 Query Languages
80 pages
siraj
No ratings yet
siraj
4 pages
C817b299unit 2 - Relational Algebra
No ratings yet
C817b299unit 2 - Relational Algebra
20 pages
DB Chapter3
No ratings yet
DB Chapter3
56 pages
Relational Algebra and Relational Calculus
No ratings yet
Relational Algebra and Relational Calculus
44 pages
Table and Integrity Constraints
No ratings yet
Table and Integrity Constraints
19 pages
Chapter 2-Query Processing_110554
No ratings yet
Chapter 2-Query Processing_110554
38 pages
Cartesian Product_ EQUI_JOIN
No ratings yet
Cartesian Product_ EQUI_JOIN
5 pages
ch2_dbms (1)
No ratings yet
ch2_dbms (1)
24 pages
CHAPTER 8. Display Data From Multiple Tables
No ratings yet
CHAPTER 8. Display Data From Multiple Tables
6 pages
Chapter 4 - RA
No ratings yet
Chapter 4 - RA
59 pages
Group By-Having-Join
No ratings yet
Group By-Having-Join
3 pages
dbms3
No ratings yet
dbms3
13 pages
Computer SQL Project
No ratings yet
Computer SQL Project
11 pages
database system Chapter04
No ratings yet
database system Chapter04
32 pages
Lab07 Mysql
No ratings yet
Lab07 Mysql
12 pages
Chapter 2 - SQL Basics and Query Optimization
No ratings yet
Chapter 2 - SQL Basics and Query Optimization
23 pages
Lisp Programming Language
From Everand
Lisp Programming Language
Faiz ul haque Zeya
No ratings yet
Power Bi Interview Question-All
No ratings yet
Power Bi Interview Question-All
4 pages
n509
No ratings yet
n509
18 pages
Applied Calculus 5th Edition Hughes-Hallett Solutions Manual pdf download
100% (1)
Applied Calculus 5th Edition Hughes-Hallett Solutions Manual pdf download
65 pages
EvaluationQuickStartGuide
No ratings yet
EvaluationQuickStartGuide
8 pages
4thKibo-RPC PGManual
No ratings yet
4thKibo-RPC PGManual
52 pages
Information 4
No ratings yet
Information 4
12 pages
SE Unit 3
No ratings yet
SE Unit 3
11 pages
Zigbee Technology: Bull Temple Road, Basavangudi, Bangalore - 560 019, Karnataka, India
No ratings yet
Zigbee Technology: Bull Temple Road, Basavangudi, Bangalore - 560 019, Karnataka, India
12 pages
Automatic Player Face Detection and Recognition for Players in Cricket Games - PPT 3
No ratings yet
Automatic Player Face Detection and Recognition for Players in Cricket Games - PPT 3
38 pages
Project Report Ajith
No ratings yet
Project Report Ajith
63 pages
Manual 408180 Samsung BD f8500 3d Blu Ray Player HDD Recorder 500 GB DVB C Twin HD Tuner Smart TV Wi Fi Black
No ratings yet
Manual 408180 Samsung BD f8500 3d Blu Ray Player HDD Recorder 500 GB DVB C Twin HD Tuner Smart TV Wi Fi Black
113 pages
Dice Resume CV Abhishek Gupta
No ratings yet
Dice Resume CV Abhishek Gupta
6 pages
cyber security syllabus
No ratings yet
cyber security syllabus
2 pages
ReleaseNotes EN
No ratings yet
ReleaseNotes EN
20 pages
AI Associate
No ratings yet
AI Associate
11 pages
75+ Vital Windows Commands
No ratings yet
75+ Vital Windows Commands
6 pages
Sturtevant Richmont: Exacta
No ratings yet
Sturtevant Richmont: Exacta
4 pages
AdministeringExperiencePortal PDF
No ratings yet
AdministeringExperiencePortal PDF
680 pages
BeneHeart D60 - Catálogo ES - 2023 en-US
No ratings yet
BeneHeart D60 - Catálogo ES - 2023 en-US
6 pages
Citrix ADC WAF Implementation Document - Karnataka Gramin Bank - Ver 7.0
No ratings yet
Citrix ADC WAF Implementation Document - Karnataka Gramin Bank - Ver 7.0
51 pages
Assignment #1 Net101
No ratings yet
Assignment #1 Net101
2 pages
Operating System Structure
No ratings yet
Operating System Structure
36 pages
Logg_20250227
No ratings yet
Logg_20250227
38 pages
Windows Server 2003 Network Administration
100% (1)
Windows Server 2003 Network Administration
47 pages
Maes - Lab 06
No ratings yet
Maes - Lab 06
9 pages
OS Lab Manual
No ratings yet
OS Lab Manual
21 pages
How To Change The Default Drag-And-Drop File Action in Windows 11 and 10
No ratings yet
How To Change The Default Drag-And-Drop File Action in Windows 11 and 10
11 pages
Python C2 Week1
No ratings yet
Python C2 Week1
6 pages

exp10

Uploaded by

exp10

Uploaded by

Experiment 10 : Week 11

• A relation is a bag (more specifically, an outer bag).

• A bag is a collection of tuples.

• A tuple is an ordered set of fields.

• A field is a piece of data.

DUMPING THE VALUES:

Sort the data using “ORDER BY”

For reference: https://pig.apache.org/docs/r0.7.0/piglatin_ref2.html

A: {name: chararray,age: int,gpa: float}

B: {group: int, A: {name: chararray,age: int,gpa: float}}

grunt> orders = LOAD 'hdfs://localhost:9000/pig_data/orders.txt' USING

Let us now perform various Join operations on these two relations.

grunt> customers2 = LOAD 'hdfs://localhost:9000/pig_data/customers.txt' USING

• Left outer join

Left Outer Join

Right Outer Join

Full Outer Join

Using Multiple Keys

grunt> employee_contact = LOAD

A = load 'student' as (name:chararray, age:int, gpa:float);

X = join A by age, B by age;

• .. $x : projects columns $0 through $x, inclusive

• For GROUP/COGROUP, the project-to-end form of project-range is not allowed.

/* This statement is supported */

/* This statement is NOT supported */

You might also like