Fruity Eclipse Collections
Author: Paul King
Published: 2022-10-13 11:05AM
This blog post continues on the theme of using fruit emoji from the previous post, but instead of deep learning, we’ll look at clustering using k-means after first exploring some top methods of Eclipse Collections with fruit emoji examples.
Eclipse Collections Fruit Salad
First, we’ll define a Fruit enum (it adds one additional fruit compared to the related Eclipse Collections kata):
enum Fruit {
APPLE('🍎', Color.RED),
PEACH('🍑', Color.ORANGE),
BANANA('🍌', Color.YELLOW),
CHERRY('🍒', Color.RED),
ORANGE('🍊', Color.ORANGE),
GRAPE('🍇', Color.MAGENTA)
public static ImmutableList<Fruit> ALL = Lists.immutable.with(values())
public static ImmutableList<String> ALL_EMOJI = Lists.immutable.with(*values()*.emoji)
final String emoji
final Color color
Fruit(String emoji, Color color) {
this.emoji = emoji
this.color = color
}
static Fruit of(String emoji) {
values().find{it.emoji == emoji }
}
}
We can use this enum in the following examples which show off numerous common Eclipse Collections methods:
assert Lists.mutable.with('🍎', '🍎', '🍌', '🍌').distinct() ==
Lists.mutable.with('🍎', '🍌')
var onlyBanana = Sets.immutable.with('🍌')
assert Fruit.ALL_EMOJI.select(onlyBanana::contains) == List.of('🍌')
assert Fruit.ALL_EMOJI.reject(onlyBanana::contains) ==
List.of('🍎', '🍑', '🍒', '🍊', '🍇')
assert Fruit.ALL.groupBy(Fruit::getColor) ==
Multimaps.mutable.list.empty()
.withKeyMultiValues(RED, Fruit.of('🍎'), Fruit.of('🍒'))
.withKeyMultiValues(YELLOW, Fruit.of('🍌'))
.withKeyMultiValues(ORANGE, Fruit.of('🍑'), Fruit.of('🍊'))
.withKeyMultiValues(MAGENTA, Fruit.of('🍇'))
assert Fruit.ALL.countBy(Fruit::getColor) ==
Bags.immutable.withOccurrences(RED, 2, YELLOW, 1, ORANGE, 2, MAGENTA, 1)
Fruit.ALL_EMOJI.chunk(4).with {
assert first == Lists.mutable.with('🍎', '🍑', '🍌', '🍒')
assert last == Lists.mutable.with('🍊', '🍇')
}
// For normal threads, replace 'withExistingPool' line of code with:
//GParsExecutorsPool.withPool { pool ->
GParsExecutorsPool.withExistingPool(Executors.newVirtualThreadPerTaskExecutor()) { pool ->
var parallelFruit = Fruit.ALL.asParallel(pool, 1)
var redFruit = parallelFruit.select(fruit -> fruit.color == RED).toList()
assert redFruit == Lists.mutable.with(Fruit.of('🍎'), Fruit.of('🍒'))
}
The last example calculates red fruit in parallel threads. As coded, it uses virtual threads when run on JDK19 with preview features enabled. You can follow the suggestion in the comment to run on other JDK versions or with normal threads. In addition to Eclipse Collections, we have the GPars library on our classpath. Here we are only using one method which is managing pool lifecycle for us.
Exploring emoji colors
For some fun, let’s look at whether the nominated color of each fruit matches the color of the related emoji. As in the previous blog, we’ll use the slightly nicer Noto Color Emoji fonts for our fruit as shown here:
We’ll use an Eclipse Collections BiMap
to switch back and forth between the color names
and java.awt colors:
@Field public static COLOR_OF = BiMaps.immutable.ofAll([
WHITE: WHITE, RED: RED, GREEN: GREEN, BLUE: BLUE,
ORANGE: ORANGE, YELLOW: YELLOW, MAGENTA: MAGENTA
])
@Field public static NAME_OF = COLOR_OF.inverse()
We are also going to use some helper functions to switch between RGB and HSB color values:
static hsb(int r, int g, int b) {
float[] hsb = new float[3]
RGBtoHSB(r, g, b, hsb)
hsb
}
static rgb(BufferedImage image, int x, int y) {
int rgb = image.getRGB(x, y)
int r = (rgb >> 16) & 0xFF
int g = (rgb >> 8) & 0xFF
int b = rgb & 0xFF
[r, g, b]
}
The HSB color space represents colors in a spectrum from 0 to 360 degrees:
We have two helper methods to assist with colors. The first picks out "mostly black" and "mostly white" colors while the second uses a switch expression to carve out some regions of the color space for our colors of interest:
static range(float[] hsb) {
if (hsb[1] < 0.1 && hsb[2] > 0.9) return [0, WHITE]
if (hsb[2] < 0.1) return [0, BLACK]
int deg = (hsb[0] * 360).round()
return [deg, range(deg)]
}
static range(int deg) {
switch (deg) {
case 0..<16 -> RED
case 16..<35 -> ORANGE
case 35..<75 -> YELLOW
case 75..<160 -> GREEN
case 160..<250 -> BLUE
case 250..<330 -> MAGENTA
default -> RED
}
}
Note that the JDK doesn’t have a standard color of PURPLE, so we combine purple with magenta by choosing an appropriate broad spectrum for MAGENTA.
We used a Plotly 3D interactive scatterplot (as supported by the Tablesaw Java dataframe and visualization library to visualize our emoji colors (as degrees on the color spectrum) vs the XY coordinates:
We are going to try out 3 approaches for determining the predominant color of each emoji:
-
Most common color: We find the color spectrum value for each point and count up the number of points of each color. The color with the most points will be selected. This is simple and works in many scenarios but if an apple or cherry has 100 shades of red but only one shade of green for the stalk or a leaf, green may be selected.
-
Most common range: We group each point into a color range. The range with the most points will be selected.
-
Centroid of biggest cluster: We divide our emoji image into a grid of sub-images. We will perform k-means clustering of the RGB values for each point in the sub-image. This will cluster similar colored points together in a cluster. The cluster with the most points will be selected and its centroid will be chosen as the selected pre-dominant color. This approach has the affect of pixelating our sub-image by color. This approach is inspired by this python article.
Most Common Color
Ignoring the background white color, the most common color for our PEACH emoji is a shade of orange. The graph below shows the count of each color:
Most Common Range
If instead of counting each color, we group colors into their range and count the numbers in each range, we get the following graph for PEACH:
K-Means
K-Means is an algorithm for finding cluster centroids. For k=3, we would start by picking 3 random points as our starting centroids.
We allocate all points to their closest centroid:
Given this allocation, we re-calculate each centroid from all of its points:
We repeat this process until either a stable centroid selection is found, or we have reached a certain number of iterations. We used the K-Means algorithm from Apache Commons Math.
Here is the kind of result we would expect if run on the complete set of points for the PEACH emoji. The black dots are the centroids. It has found one green, one orange and one red centroid. The centroid with the most points allocated to it should be the most predominant color. (This is another interactive 3D scatterplot.)
We can plot the number of points allocated to each cluster as a bar chart. (We used a Scala plotting library to show Groovy integration with Scala.)
The code for drawing the above chart looks like this:
var trace = new Bar(intSeq([1, 2, 3]), intSeq(sizes))
.withMarker(new Marker().withColor(oneOrSeq(colors)))
var traces = asScala([trace]).toSeq()
var layout = new Layout()
.withTitle("Centroid sizes for $fruit")
.withShowlegend(false)
.withHeight(600)
.withWidth(800)
Plotly.plot(path, traces, layout, defaultConfig, false, false, true)
K-Means with subimages
The approach we will take for our third option enhances K-Means. Instead of finding centroids for the whole image as the graphs just shown do, we divide the image into subimages and perform the K-Means on each subimage. Our overall pre-dominant color is determined to be the most common color predicated across all of our subimages.
Putting it all together
Here is the final code covering all three approaches (including printing some pretty images highlighting the third approach and the Plotly 3D scatter plots):
var results = Fruit.ALL.collect { fruit ->
var file = getClass().classLoader.getResource("${fruit.name()}.png").file as File
var image = ImageIO.read(file)
var colors = [:].withDefault { 0 }
var ranges = [:].withDefault { 0 }
for (x in 0..<image.width) {
for (y in 0..<image.height) {
def (int r, int g, int b) = rgb(image, x, y)
float[] hsb = hsb(r, g, b)
def (deg, range) = range(hsb)
if (range != WHITE) { // ignore white background
ranges[range]++
colors[deg]++
}
}
}
var maxRange = ranges.max { e -> e.value }.key
var maxColor = range(colors.max { e -> e.value }.key)
int cols = 8, rows = 8
int grid = 5 // thickness of black "grid" between subimages
int stepX = image.width / cols
int stepY = image.height / rows
var splitImage = new BufferedImage(image.width + (cols - 1) * grid, image.height + (rows - 1) * grid, image.type)
var g2a = splitImage.createGraphics()
var pixelated = new BufferedImage(image.width + (cols - 1) * grid, image.height + (rows - 1) * grid, image.type)
var g2b = pixelated.createGraphics()
ranges = [:].withDefault { 0 }
for (i in 0..<rows) {
for (j in 0..<cols) {
def clusterer = new KMeansPlusPlusClusterer(5, 100)
List<DoublePoint> data = []
for (x in 0..<stepX) {
for (y in 0..<stepY) {
def (int r, int g, int b) = rgb(image, stepX * j + x, stepY * i + y)
var dp = new DoublePoint([r, g, b] as int[])
var hsb = hsb(r, g, b)
def (deg, col) = range(hsb)
data << dp
}
}
var centroids = clusterer.cluster(data)
var biggestCluster = centroids.max { ctrd -> ctrd.points.size() }
var ctr = biggestCluster.center.point*.intValue()
var hsb = hsb(*ctr)
def (_, range) = range(hsb)
if (range != WHITE) ranges[range]++
g2a.drawImage(image, (stepX + grid) * j, (stepY + grid) * i, stepX * (j + 1) + grid * j, stepY * (i + 1) + grid * i,
stepX * j, stepY * i, stepX * (j + 1), stepY * (i + 1), null)
g2b.color = new Color(*ctr)
g2b.fillRect((stepX + grid) * j, (stepY + grid) * i, stepX, stepY)
}
}
g2a.dispose()
g2b.dispose()
var swing = new SwingBuilder()
var maxCentroid = ranges.max { e -> e.value }.key
swing.edt {
frame(title: 'Original vs Subimages vs K-Means',
defaultCloseOperation: DISPOSE_ON_CLOSE, pack: true, show: true) {
flowLayout()
label(icon: imageIcon(image))
label(icon: imageIcon(splitImage))
label(icon: imageIcon(pixelated))
}
}
[fruit, maxRange, maxColor, maxCentroid]
}
println "Fruit Expected By max color By max range By k-means"
results.each { fruit, maxRange, maxColor, maxCentroid ->
def colors = [fruit.color, maxColor, maxRange, maxCentroid].collect {
NAME_OF[it].padRight(14)
}.join().trim()
println "${fruit.emoji.padRight(6)} $colors"
}
Here are the resulting images:
And, here are the final results:
In our case, all three approaches yielded the same results. Results for other emojis may vary.
Further information
-
Repo with example code: https://github.com/paulk-asert/fruity-eclipse-collections
-
Further examples of k-means clustering: https://github.com/paulk-asert/groovy-data-science/tree/master/subprojects/Whiskey
-
Related slides for clustering: https://speakerdeck.com/paulk/groovy-data-science?slide=94
-
Eclipse collections homepage: https://www.eclipse.org/collections/