0% found this document useful (0 votes)
292 views

Parallel Functional Programming in Java 8

This document provides a 3-sentence summary of a presentation on parallel functional programming in Java 8: The presentation introduces functional programming concepts in Java 8 including lambda expressions, method references, and functional interfaces that allow functions as arguments; it describes the use of streams for bulk parallel data processing; and it provides examples of parallel array operations and higher-order functions using immutable data structures.

Uploaded by

Jacob Co
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
292 views

Parallel Functional Programming in Java 8

This document provides a 3-sentence summary of a presentation on parallel functional programming in Java 8: The presentation introduces functional programming concepts in Java 8 including lambda expressions, method references, and functional interfaces that allow functions as arguments; it describes the use of streams for bulk parallel data processing; and it provides examples of parallel array operations and higher-order functions using immutable data structures.

Uploaded by

Jacob Co
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Parallel Functional

Programming in Java 8

Peter Sestoft
IT University of Copenhagen

Chalmers Tekniska Högskola


Monday 2018-04-16

IT University of Copenhagen 1
The speaker
• MSc 1988 computer science and mathematics and
PhD 1991, DIKU, Copenhagen University
• KU, DTU, KVL and ITU; and Glasgow U, AT&T Bell
Labs, Microsoft Research UK, Harvard University
• Programming languages, software development, ...
• Open source software
– Moscow ML implementation, 1994…
– C5 Generic Collection Library, with Niels Kokholm, 2006…
– Funcalc spreadsheet implementation, 2014

1993 2002, 2005, 2016 2004 & 2012 2007 2012, 2017 2014
Plan
• Java 8 functional programming
– Package java.util.function
– Lambda expressions, method reference expressions
– Functional interfaces, targeted function type
• Java 8 streams for bulk data
– Package java.util.stream
• High-level parallel programming
– Streams: primes, queens, van der Corput, …
– Array parallel prefix operations
• Class java.util.Arrays static methods
• A multicore performance mystery

IT University of Copenhagen 3
Materials
• Java Precisely 3rd edition, MIT Press 2016
– 11.13: Lambda expressions
– 11.14: Method reference expressions
– 23: Functional interfaces
– 24: Streams for bulk data
– 25: Class Optional<T>

• Book examples are called Example154.java etc


– Get them from the book homepage
http://www.itu.dk/people/sestoft/javaprecisely/

IT University of Copenhagen 4
New in Java 8
• Lambda expressions
(String s) -> s.length
• Method reference expressions
String::length
• Functional interfaces
Function<String,Integer>
• Streams for bulk data
Stream<Integer> is = ss.map(String::length)
• Parallel streams
is = ss.parallel().map(String::length)
• Parallel array operations
Arrays.parallelSetAll(arr, i -> sin(i/PI/100.0))
Arrays.parallelPrefix(arr, (x, y) -> x+y)

IT University of Copenhagen 5
Functional programming in Java
• Immutable data instead of objects with state
• Recursion instead of loops
• Higher-order functions that either
– take functions as argument
– return functions as result
Immutable

Example154.java
class FunList<T> { list of T
final Node<T> first;
protected static class Node<U> {
public final U item;
public final Node<U> next;
public Node(U item, Node<U> next) { ... }
}
...
}

IT University of Copenhagen 6
Immutable data
• FunList<T>, linked lists of nodes
class FunList<T> {
final Node<T> first;
protected static class Node<U> {

Example154.java
public final U item;
public final Node<U> next;
public Node(U item, Node<U> next) { ... }
}

List of Integer
list1 9 13 0

Head Tail

7
Existing data do not change
FunList<Integer> empty = new FunList<>(null),
list1 = cons(9, cons(13, cons(0, empty))),

Example154.java
list2 = cons(7, list1),
list3 = cons(8, list1),
list4 = list1.insert(1, 12),
list5 = list2.removeAt(3);

list1 9 13 0
list2 7
list3
8
list4 9 12
list5 7 9 13
8
Recursion in insert
public FunList<T> insert(int i, T item) {

Example154.java
return new FunList<T>(insert(i, item, this.first));
}

static <T> Node<T> insert(int i, T item, Node<T> xs) {


return i == 0 ? new Node<T>(item, xs)
: new Node<T>(xs.item, insert(i-1, item, xs.next));
}

• “If i is zero, put item in a new node, and let


its tail be the old list xs”
• “Otherwise, put the first element of xs in a
new node, and let its tail be the result of
inserting item in position i-1 of the tail of xs”

IT University of Copenhagen 9
Immutable data: Bad and good
• Immutability leads to more allocation
– Takes time and space
– But modern garbage collectors are fast
• Immutable data can be safely shared
– May actually reduce amount of allocation
• Immutable data are automatically threadsafe
– No (other) thread can mess with it
– And also due to visibility effects of final modifier

Subtle point

IT University of Copenhagen 10
Lambda expressions 1

Example64.java
• One argument lambda expressions:
Function<String,Integer>
fsi1 = s -> Integer.parseInt(s);

... fsi1.apply("004711") ... Function that takes a string s


and parses it as an integer
Calling the function
Same, written
Function<String,Integer> in other ways
fsi2 = s -> { return Integer.parseInt(s); },
fsi3 = (String s) -> Integer.parseInt(s);

• Two-argument lambda expressions:


BiFunction<String,Integer,String>
fsis1 = (s, i) -> s.substring(i, Math.min(i+3, s.length()));

IT University of Copenhagen 11
Lambda expressions 2
• Zero-argument lambda expression:

Example64.java
Supplier<String>
now = () -> new java.util.Date().toString();

• One-argument result-less lambda (“void”):


Consumer<String>
show1 = s -> System.out.println(">>>" + s + "<<<”);

Consumer<String>
show2 = s -> { System.out.println(">>>" + s + "<<<"); };

IT University of Copenhagen 12
Method reference expressions
BiFunction<String,Integer,Character> charat

Example67.java
= String::charAt;
Same as (s,i) -> s.charAt(i)
System.out.println(charat.apply("ABCDEF", 1));

Function<String,Integer> parseint = Integer::parseInt;


Same as fsi1, fs2 and fs3

Function<Integer,Character> hex1
= "0123456789ABCDEF"::charAt;

Conversion to hex digit

Class and array constructors


Function<Integer,C> makeC = C::new;
Function<Integer,Double[]> make1DArray = Double[]::new;
13
Targeted function type (TFT)
• A lambda expression or method reference
expression does not have a type in itself
• Therefore must have a targeted function type
• Lambda or method reference must appear as
– Assignment right hand side:
• Function<String,Integer> f = Integer::parseInt;
– Argument to call: TFT
• stringList.map(Integer::parseInt)
– In a cast: map’s argument type is TFT
• (Function<String,Integer>)Integer::parseInt
– Argument to return statement: TFT
• return Integer::parseInt;
Enclosing method’s
return type is TFT 14
Functions as arguments: map
public <U> FunList<U> map(Function<T,U> f) {

Example154.java
return new FunList<U>(map(f, first));
}
static <T,U> Node<U> map(Function<T,U> f, Node<T> xs) {
return xs == null ? null
: new Node<U>(f.apply(xs.item), map(f, xs.next));
}

• Function map encodes general behavior


– Transform each list element to make a new list
– Argument f expresses the specific transformation
• Same effect as OO “template method pattern”

IT University of Copenhagen 15
Calling map
7 9 13

FunList<Double> list8 = list5.map(i -> 2.5 * i);


17.5 22.5 32.5

FunList<Boolean> list9 = list5.map(i -> i < 10);


true true false

IT University of Copenhagen 16
Functions as arguments: reduce
static <T,U> U reduce(U x0, BiFunction<U,T,U> op, Node<T> xs) {
return xs == null ? x0
: reduce(op.apply(x0, xs.item), op, xs.next);
}

• list.reduce(x0, op)
= x0vx1v...vxn
if we write op.apply(x,y) as xvy

Example154.java
• Example: list.reduce(0, (x,y) -> x+y)
= 0+x1+...+xn

IT University of Copenhagen 17
Calling reduce
17.5 22.5 32.5

Example154.java
double sum = list8.reduce(0.0, (res, item) -> res + item);

72.5

double product = list8.reduce(1.0, (res, item) -> res * item);

12796.875

boolean allBig
= list8.reduce(true, (res, item) -> res && item > 10);

true

IT University of Copenhagen 18
Tail recursion and loops
static <T,U> U reduce(U x0, BiFunction<U,T,U> op, Node<T> xs) {
return xs == null ? x0
: reduce(op.apply(x0, xs.item), op, xs.next);
}

Tail call

• A call that is the func’s last action is a tail call


• A tail-recursive func can be replaced by a loop
static <T,U> U reduce(U x0, BiFunction<U,T,U> op, Node<T> xs) {
while (xs != null) {
x0 = op.apply(x0, xs.item);

Example154.java
xs = xs.next;
} Loop version
return x0; of reduce
}

– The Java compiler does not do that automatically


19
Java 8 functional interfaces
• A functional interface has exactly one abstract
method
Type of functions
from T to R
interface Function<T,R> {
R apply(T x);
} C#: Func<T,R>
F#: T -> R

Type of functions
interface Consumer<T> { from T to void
void accept(T x);
} C#: Action<T>

F#: T -> unit

IT University of Copenhagen 20
(Too) many functional interfaces
interface IntFunction<R> {
R apply(int x);
}

Use instead of
Function<Integer,R>
to avoid (un)boxing

Primitive-type

Java Precisely page 125


specialized
interfaces

21
Primitive-type specialized interfaces
for int, double, and long
interface Function<T,R> {
R apply(T x);
}
Why
interface IntFunction<R> { both?
R apply(int x);
} What difference?
Function<Integer,String> f1 = i -> "#" + i;
IntFunction<String> f2 = i -> "#" + i;

• Calling f1.apply(i) will box i as Integer


– Allocating object in heap, takes time and memory
• Calling f2.apply(i) avoids boxing, is faster
• Purely a matter of performance
IT University of Copenhagen 22
Functions that return functions
• Conversion of n to English numeral, cases
n < 20 : one, two, ..., nineteen
Same pattern
n < 100: twenty-three, ...
n>=100: two hundred forty-three, ...
n>=1000: three thousand two hundred forty-three...
n >= 1 million: ... million …
n >= 1 billion: ... billion …
private static String less100(long n) {

Example158.java
return n<20 ? ones[(int)n] Convert n < 100
: tens[(int)n/10-2] + after("-", ones[(int)n%10]);
}
static LongFunction<String> less(long limit, String unit,
LongFunction<String> conv) {
return n -> n<limit ? conv.apply(n)
: conv.apply(n/limit) + " " + unit
+ after(" ", conv.apply(n%limit));
} 23
Functions that return functions
• Using the general higher-order function
static final LongFunction<String>

Example158.java
less1K = less( 100, "hundred", Example158::less100),
less1M = less( 1_000, "thousand", less1K),
less1B = less( 1_000_000, "million", less1M),
less1G = less(1_000_000_000, "billion", less1B);

• Converting to English numerals:


public static String toEnglish(long n) {
return n==0 ? "zero" : n<0 ? "minus " + less1G.apply(-n)
: less1G.apply(n);
}

toEnglish(2147483647)

two billion one hundred forty-seven million


four hundred eighty-three thousand six hundred forty-seven

IT University of Copenhagen 24
Streams for bulk data
• Stream<T> is a finite or infinite sequence of T
– Possibly lazily generated
– Possibly parallel
• Stream methods
– map, flatMap, reduce, filter, ...
– These take functions as arguments
– Can be combined into pipelines
– Java optimizes (and parallelizes) the pipelines well
• Similar to
– Java Iterators, but very different implementation
– The extension methods underlying .NET Linq

IT University of Copenhagen 25
Some stream operations
• Stream<Integer> s = Stream.of(2, 3, 5)
• s.filter(p) = the x where p.test(x) holds
s.filter(x -> x%2==0) gives 2
• s.map(f) = results of f.apply(x) for x in s
s.map(x -> 3*x) gives 6, 9, 15
• s.flatMap(f) = a flattening of the streams
created by f.apply(x) for x in s
s.flatMap(x -> Stream.of(x,x+1)) gives 2,3,3,4,5,6
• s.findAny() = some element of s, if any, or else
the absent Option<T> value
s.findAny() gives 2 or 3 or 5
• s.reduce(x0, op) = x0vs0v...vsn if we write
op.apply(x,y) as xvy
s.reduce(1, (x,y)->x*y) gives 1*2*3*5 = 30
26
Similar functions are everywhere
• Java stream map is called
– map in Haskell, Scala, F#, Clojure
– Select in C#
• Java stream flatMap is called
– concatMap in Haskell
– flatMap in Scala
– collect in F#
– SelectMany in C#
– mapcat in Clojure
• Java reduce is a special (assoc. op.) case of
– foldl in Haskell
– foldLeft in Scala
– fold in F#
– Aggregate in C#
– reduce in Clojure

IT University of Copenhagen 27
Counting primes on Java 8 streams
• Our old standard Java for loop:
int count = 0;
for (int i=0; i<range; i++) Classical efficient
imperative loop
if (isPrime(i))
count++;
• Sequential Java 8 stream:
IntStream.range(0, range)
Pure functional
.filter(i -> isPrime(i))
programming ...
.count()

• Parallel Java 8 stream:


IntStream.range(0, range) ... and thus
.parallel() parallelizable and
.filter(i -> isPrime(i)) thread-safe
.count()
28
Performance results (!!)
• Counting the primes in 0 ...99,999
Method Intel i7 (ms) AMD Opteron (ms)
Sequential for-loop 9.9 40.5
Sequential stream 9.9 40.8
Parallel stream 2.8 1.7
Best thread-parallel 3.0 4.9
Best task-parallel 2.6 1.9

• Functional streams give the simplest solution


• Nearly as fast as tasks and threads, or faster:
– Intel i7 (4 cores) speed-up: 3.6 x
– AMD Opteron (32 cores) speed-up: 24.2 x
– ARM Cortex-A7 (RP 2B) (4 cores) speed-up: 3.5 x
• The future is parallel – and functional J
IT University of Copenhagen 29
Side-effect freedom
• From the java.util.stream package docs:

This means
”catastrophic”

• Java compiler (type system) cannot enforce


side-effect freedom
• Java runtime cannot detect it

IT University of Copenhagen 30
Creating streams 1
• Explicitly or from array, collection or map:
IntStream is = IntStream.of(2, 3, 5, 7, 11, 13);

Example164.java
String[] a = { "Hoover", "Roosevelt", ...};
Stream<String> presidents = Arrays.stream(a);

Collection<String> coll = ...;


Stream<String> countries = coll.stream();

Map<String,Integer> phoneNumbers = ...;


Stream<Map.Entry<String,Integer>> phones
= phoneNumbers.entrySet().stream();

• Finite, ordered, sequential, lazily generated

31
Creating streams 2
• Useful special-case streams:
• IntStream.range(0, 10_000)
• random.ints(5_000)

Example164.java
• bufferedReader.lines()
• bitset.stream()
• Functional iterators for infinite streams
• Imperative generators for infinite streams
• StreamBuilder<T>: eager, only finite streams

IT University of Copenhagen 32
Creating streams 3: generators
• Generating 0, 1, 2, 3, ... Functional

Example165.java
IntStream nats1 = IntStream.iterate(0, x -> x+1);

Most efficient (!!), Object


and parallelizable imperative
IntStream nats2 = IntStream.generate(new IntSupplier() {
private int next = 0;
public int getAsInt() { return next++; }
});

Imperative, using final


array for mutable state
final int[] next = { 0 };
IntStream nats3 = IntStream.generate(() -> next[0]++);

IT University of Copenhagen 33
Creating streams 4: StreamBuilder
• Convert own linked IntList to an IntStream
class IntList {
public final int item;
public final IntList next;
...
public static IntStream stream(IntList xs) {
IntStream.Builder sb = IntStream.builder();
while (xs != null) {

Example182.java
sb.accept(xs.item);
xs = xs.next;
}
return sb.build();
}
}

• Eager: no stream element output until end


• Finite: does not work on cyclic or infinite lists
IT University of Copenhagen 34
Streams for backtracking
• Generate all n-permutations of 0, 1, ..., n-1
– Eg [2,1,0], [1,2,0], [2,0,1], [0,2,1], [0,1,2], [1,0,2]

Set of numbers An incomplete


not yet used permutation

public static Stream<IntList> perms(BitSet todo, IntList tail) {


if (todo.isEmpty())
return Stream.of(tail);

Example175.java
else
return todo.stream().boxed()
.flatMap(r -> perms(minus(todo, r), new IntList(r, tail)));
}

public static Stream<IntList> perms(int n) {


BitSet todo = new BitSet(n); todo.flip(0, n);
return perms(todo, null);
}
{ 0, ..., n-1 } Empty
permutation [ ] 35
A closer look at generation for n=3
({0,1,2}, [])
({1,2}, [0])
({2}, [1,0])
({}, [2,1,0]) Output to stream
({1}, [2,0])
({}, [1,2,0])
Output to stream
({0,2}, [1])
({2}, [0,1])
({}, [2,0,1]) Output to stream
({0}, [2,1])
({}, [0,2,1]) Output to stream
({0,1}, [2])
...
36
A permutation is a rook (tårn)
placement on a chessboard

n n n
n n n
n n n
[2, 1, 0] [1, 2, 0] [2, 0, 1]

n n n
n n n
n n n
[0, 2, 1] [0, 1, 2] [1, 0, 2]

IT University of Copenhagen 37
Solutions to the n-queens problem
• For queens, just take diagonals into account:
– consider only r that are safe for the partial solution
public static Stream<IntList> queens(BitSet todo, IntList tail) {
if (todo.isEmpty())

Example176.java
return Stream.of(tail); Diagonal
else check
return todo.stream()
.filter(r -> safe(r, tail)).boxed()
.flatMap(r -> queens(minus(todo, r), new IntList(r, tail)));
}
public static boolean safe(int mid, IntList tail) {
return safe(mid+1, mid-1, tail);
}
.parallel()
public static boolean safe(int d1, int d2, IntList tail) {
return tail==null || d1!=tail.item && d2!=tail.item && safe(d1+1, d2-1, tail.next);
}

• Simple, and parallelizable for free, 3.5 x faster


• Solve or generate sudokus: much the same
38
Versatility of streams
• Many uses of a stream of solutions
– Print the number of solutions
System.out.println(queens(8).count());

– Print all solutions


queens(8).forEach(System.out::println);
– Print an arbitrary solution (if there is one)

Example174.java
System.out.println(queens(8).findAny());
– Print the 20 first solutions
queens(8).limit(20).forEach(System.out::println);

• Much harder in an imperative version


• Separation of concerns (Dijkstra): production
of solutions versus consumption of solutions
IT University of Copenhagen 39
Streams for quasi-infinite sequences
• van der Corput numbers
– 1/2, 1/4, 3/4, 1/8, 5/8, 3/8, 7/8, 1/16, ...
– Dense and uniform in interval [0, 1]
– For simulation and finance, Black-Scholes options
• Trick: v d Corput numbers as base-2 fractions
0.1, 0.01, 0.11, 0.001, 0.101, 0.011, 0.111 ...
are bit-reversals of 1, 2, 3, 4, 5, 6, 7, ... in binary
public static DoubleStream vanDerCorput() {

Example183.java
return IntStream.range(1, 31).asDoubleStream()
.flatMap(b -> bitReversedRange((int)b));
}

private static DoubleStream bitReversedRange(int b) {


final long bp = Math.round(Math.pow(2, b));
return LongStream.range(bp/2, bp)
.mapToDouble(i -> (double)(bitReverse((int)i) >>> (32-b)) / bp);
} IT University of Copenhagen 40
Collectors: aggregation of streams
• To format an IntList as string “[2, 3, 5, 7]”
– Convert the list to an IntStream
– Convert each element to get Stream<String>
– Use a predefined Collector to build final result

Example182.java
public String toString() {
return stream(this).mapToObj(String::valueOf)
.collect(Collectors.joining(",", "[", "]"));
}

public static String toString(IntList xs) {


StringBuilder sb = new StringBuilder();
sb.append("[");
The alternative ”direct”
boolean first = true; solution requires care
while (xs != null) {
if (!first) and cleverness
sb.append(", ");
first = false;
sb.append(xs.item);
xs = xs.next;
}
return sb.append("]").toString();
}
41
Java 8 stream properties
• Some stream dimensions
– Finite vs infinite
– Lazily generated (by iterate, generate, ...)
vs eagerly generated (stream builders)
– Ordered (map, filter, limit ... preserve element
order) vs unordered
– Sequential (all elements processed on one thread)
vs parallel
• Java streams
– can be lazily generated, like Haskell lists
– but are use-once, unlike Haskell lists
• reduces risk of space leaks
• limits expressiveness, harder to compute average …
IT University of Copenhagen 42
How are Java streams implemented?
• Spliterators
interface Spliterator<T> {
long estimateSize();
void forEachRemaining(Consumer<T> action);
boolean tryAdvance(Consumer<T> action);
void Spliterator<T> trySplit();
}
– Many method calls (well inlined/fused by the JIT)
• Parallelization
– Divide stream into chunks using trySplit
– Process each chunk in a task (Haskell “spark”)
– Run on thread pool using work-stealing queues
– ... thus similar to Haskell parBuffer/parListChunk

IT University of Copenhagen 43
Parallel (functional) array operations
• Simulating random motion on a line
– Take n random steps of length at most [-1, +1]:

Example25.java
double[] a = new Random().doubles(n, -1.0, +1.0)
.toArray();

– Compute the positions at end of each step:


a[0], a[0]+a[1], a[0]+a[1]+a[2], ...
Arrays.parallelPrefix(a, (x,y) -> x+y); NB: Updates
array a

– Find the maximal absolute distance from start:


double maxDist = Arrays.stream(a).map(Math::abs)
.max().getAsDouble();

• A lot done, fast, without loops or assignments


– Just arrays and streams and functions
IT University of Copenhagen 44
Array and streams and parallel ...
• Associative array aggregation
Arrays.parallelPrefix(a, (x,y) -> x+y);

• Such operations can be parallelized well


– So-called prefix scans (Blelloch 1990)

• Streams and arrays complement each other


• Streams: lazy, possibly infinite,
non-materialized, use-once, parallel pipelines
• Array: eager, always finite, materialized,
use-many-times, parallel prefix scans

IT University of Copenhagen 45
Some problems with Java streams
• Streams are use-once & have other restrictions
– Probably to permit easy parallelization
• Hard to create lazy finite streams
– Probably to allow high-performance implementation
• Difficult to control resource consumption
• A single side-effect may mess all up completely
• Sometimes .parallel() hurts performance a lot
– See exercise
– And strange behavior, in parallel + limit in Sudoku generator
• Laziness in Java is subtle, easily goes wrong:
static Stream<String> getPageAsStream(String url) throws IOException {

Example216.java
try (BufferedReader in
= new BufferedReader(new InputStreamReader(
new URL(url).openStream()))) {
return in.lines();
} Closes the reader too early, so any
use of the Stream<String> causes
}
IOException: Stream closed Useless
46
2P
A multicore performance mystery
• K-means clustering 2P: Assign – Update –
Assign – Update … till convergence
while (!converged) { Pseudocode

TestKMeansSolution.java
let taskCount parallel tasks do { Assign
final int from = ..., to = ...;
for (int pi=from; pi<to; pi++)
myCluster[pi] = closest(points[pi], clusters);
}
let taskCount parallel tasks do { Update
final int from = ..., to = ...;
for (int pi=from; pi<to; pi++)
myCluster[pi].addToMean(points[pi]);
}
...
} Imperative
• Assign: writes a point to myCluster[pi]
• Update: calls addToMean on myCluster[pi]
47
2Q
A multicore performance mystery
• ”Improved” version 2Q:
– call addToMean directly on point
– instead of first writing it to myCluster array
while (!converged) {
let taskCount parallel tasks do {
final int from = ..., to = ...;
for (int pi=from; pi<to; pi++)
closest(points[pi], clusters).addToMean(points[pi]);
}
...
}

IT University of Copenhagen 48
Performance of k-means clustering
• Sequential: as you would expect, 5% speedup
• Parallel: surprisingly bad!
2P 2Q 2Q/2P
Sequential 4.240 4.019 0.95 Bad
4-core parallel 1.310 2.234 1.70
24-core parallel 0.852 6.587 7.70 Very
bad
Time in seconds for 200,000 points, 81 clusters, 1/8/48 tasks, 108 iterations

• Q: WHY is the “improved” code slower?


• A: Cache invalidation and false sharing

IT University of Copenhagen 49
The Point and Cluster classes
class Point {
public final double x, y;
}

static class Cluster extends ClusterBase {


private volatile Point mean;
private double sumx, sumy;
private int count;
public synchronized void addToMean(Point p) {
sumx += p.x;
sumy += p.y;
count++;
}
...
}

Cluster object
mean sumx sumy count layout (maybe)

50
KMeans 2P
• Assignment step
– Reads each Cluster’s mean field 200,000 times
– Writes only myCluster array segments, separately
– Takes no locks at all
• Update step
– Calls addToMean 200,000 times
– Writes the 81 clusters’ sumx, sumy, count fields
200,000 times in total
– Takes Cluster object locks 200,000 times

IT University of Copenhagen 51
KMeans 2Q
• Unified loop
– Reads each Cluster’s mean field 200,000 times
– Calls addToMean 200,000 times and writes the
sumx, sumy, count fields 200,000 times in total
– Takes Cluster object locks 200,000 times
• Problem in 2Q:
– mean reads are mixed with sumx, sumy, ... writes
– The writes invalidate the cached mean field
– The 200,000 mean field reads become slower
– False sharing: mean and sumx on same cache line
– (A problem on Intel i7, not on 20 x slower ARM A7)

• See http://www.itu.dk/people/sestoft/papers/cpucache-20170319.pdf

IT University of Copenhagen 52
3P
Parallel streams to the rescue, 3P
while (!converged) {
• fff
final Cluster[] clustersLocal = clusters;
Map<Cluster, List<Point>> groups = Assign
Arrays.stream(points).parallel()
.collect(Collectors.groupingBy(p -> closest(p,clustersLocal)));
clusters = groups.entrySet().stream().parallel()
.map(kv -> new Cluster(kv.getKey().getMean(), kv.getValue()))
.toArray(Cluster[]::new);
Cluster[] newClusters = Update
Arrays.stream(clusters).parallel()
.map(Cluster::computeMean).toArray(Cluster[]::new);
converged = Arrays.equals(clusters, newClusters);
clusters = newClusters;
} Functional

2P 2Q 3P
Sequential 4.240 4.019 5.353
4-core parallel i7 1.310 2.234 1.350
24-core parallel Xeon 0.852 6.587 0.553
Time in seconds for 200,000 points, 81 clusters, 1/8/48 tasks, 108 iterations

53
Exercise: Streams & floating-point
sum
• Compute series sum:
for N=999,999,999
• For-loop, forwards summation
double sum = 0.0;

TestStreamSums.java
for (int i=1; i<N; i++)
sum += 1.0/i;
Different
• For-loop, backwards summation results!
results?
double sum = 0.0;
for (int i=1; i<N; i++)
sum += 1.0/(N-i);

• Could make a DoubleStream, and use .sum()


• Or parallel DoubleStream and .sum() Different
results?
54
This week
• Reading
– Java Precisely 3rd ed. 11.13, 11.14, 23, 24, 25
– Optional:
• http://www.itu.dk/people/sestoft/papers/benchmarking.pdf
• http://www.itu.dk/people/sestoft/papers/cpucache-20170319.pdf

• Exercises
– Extend immutable list class with functional
programming; use parallel array operations; use
streams of words and streams of numbers
– Alternatively: Make a faster and more scalable k-
means clustering implementation, if possible, in
any language

IT University of Copenhagen 57

You might also like