Skip to content

Commit 56aa850

Browse files
author
Pedro Bernardo
committed
Added pairRdd/mapValues/*.py
1 parent 4f0a014 commit 56aa850

File tree

2 files changed

+41
-0
lines changed

2 files changed

+41
-0
lines changed
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
from pyspark import SparkContext
2+
3+
if __name__ == "__main__":
4+
5+
'''
6+
Create a Spark program to read the airport data from in/airports.text, generate a pair RDD with airport name
7+
being the key and country name being the value. Then convert the country name to uppercase and
8+
output the pair RDD to out/airports_uppercase.text
9+
10+
Each row of the input file contains the following columns:
11+
12+
Airport ID, Name of airport, Main city served by airport, Country where airport is located, IATA/FAA code,
13+
ICAO Code, Latitude, Longitude, Altitude, Timezone, DST, Timezone in Olson format
14+
15+
Sample output:
16+
17+
("Kamloops", "CANADA")
18+
("Wewak Intl", "PAPUA NEW GUINEA")
19+
...
20+
21+
'''
22+
23+
24+
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
from pyspark import SparkContext
2+
from commons.Utils import Utils
3+
4+
if __name__ == "__main__":
5+
6+
sc = SparkContext("local", "airports")
7+
sc.setLogLevel("ERROR")
8+
9+
airportsRDD = sc.textFile("in/airports.text")
10+
11+
airportPairRDD = airportsRDD.map(lambda line: \
12+
(Utils.COMMA_DELIMITER.split(line)[1], \
13+
Utils.COMMA_DELIMITER.split(line)[3]))
14+
15+
upperCase = airportPairRDD.mapValues(lambda countryName: countryName.upper())
16+
17+
upperCase.saveAsTextFile("out/airports_uppercase.text")

0 commit comments

Comments
 (0)