Exploring Gatherers4J with Groovy
Published: 2025-04-11 10:30AM
Let’s explore using Groovy with the Gatherers4J library and the JDK 24 gatherers API. We’ll also look at iterator variants for JDK 8/11+ users! |
An interesting feature in recent JDK versions is Gatherers. JDK 24 includes a bunch of built-in gatherers, but the main goal of the gatherers API was to allow custom intermediate operations to be developed rather than provide a huge range of in-built gatherers.
We looked in an earlier blog post at how to write your own gatherer equivalents for chop, collate and other built-in Groovy functionality.
Other folks have also been looking at useful gatherers and libraries are starting to emerge. Gatherers4J is one such library. We’ll use 0.11.0 (pre-release) and Groovy 5 snapshot (pre-release) and JDK 24.
Let’s now look at numerous other gatherers (and their Groovy iterator equivalents) in the Gatherers4J library.
Revisiting Collate with Gatherers4J
In an
earlier blog post we showed how to provide
a streams equivalent of Groovy’s collate
extension method on collections and iterators.
Some of its functionality is supported by the
built-in windowFixed
and windowSliding
gatherers, and
we showed how to write some custom gatherers, windowSlidingByStep
and windowFixedTruncating
,
to handle the remaining functionality.
Let’s look at instead using the window
gatherer from Gatherers4J:
assert (1..5).stream().gather(Gatherers4j.window(3, 1, true)).toList() ==
[[1, 2, 3], [2, 3, 4], [3, 4, 5], [4, 5], [5]]
assert (1..8).stream().gather(Gatherers4j.window(3, 2, true)).toList() ==
[[1, 2, 3], [3, 4, 5], [5, 6, 7], [7, 8]]
assert (1..8).stream().gather(Gatherers4j.window(3, 2, false)).toList() ==
[[1, 2, 3], [3, 4, 5], [5, 6, 7]]
assert (1..8).stream().gather(Gatherers4j.window(3, 4, false)).toList() ==
[[1, 2, 3], [5, 6, 7]]
assert (1..8).stream().gather(Gatherers4j.window(3, 3, true)).toList() ==
[[1, 2, 3], [4, 5, 6], [7, 8]]
For comparison, here was the output shown using Groovy’s collate
extension method on collections:
assert (1..5).collate(3, 1) == [[1, 2, 3], [2, 3, 4], [3, 4, 5], [4, 5], [5]]
assert (1..8).collate(3, 2) == [[1, 2, 3], [3, 4, 5], [5, 6, 7], [7, 8]]
assert (1..8).collate(3, 2, false) == [[1, 2, 3], [3, 4, 5], [5, 6, 7]]
assert (1..8).collate(3, 4, false) == [[1, 2, 3], [5, 6, 7]]
assert (1..8).collate(3, 3) == [[1, 2, 3], [4, 5, 6], [7, 8]]
These aren’t exact equivalents. The Groovy versions operate eagerly on collections, while the
gatherer ones are stream-based. We can show a more "apples with apples" comparison by Groovy’s
iterator variants of the collate
method. Let’s look at some
infinite stream examples.
The gatherer version:
assert Stream.iterate(0, n -> n + 1)
.gather(Gatherers4j.window(3, 3, true))
.limit(3)
.toList() == [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
Note
|
Because we aren’t worried about remainders or different step sizes,
we could have just used the JDK 24’s built-in Gatherers.windowFixed(3) gatherer here.
|
Now, let’s look at Groovy’s iterator equivalent:
assert Iterators.iterate(0, n -> n + 1)
.collate(3)
.take(3)
.toList() == [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
When we say equivalent, we aren’t implying that Iterators offer the same power or flexibility as streams, but for many simple scenarios they will achieve the same results and can be more efficient.
Exploring Gatherers4J other functionality
Let’s now explore some of the other gatherers in Gatherers4J. For a somewhat "apples vs apples" comparison, we’ll compare against the equivalent Groovy Iterator functionality, unless otherwise stated. There are some case where Groovy doesn’t offer an iterator-based equivalent, so we’ll look at collection-based solutions in those cases.
Just recall that using the collection variants of these examples might be simpler and equally suitable depending on your scenario. Streaming solutions really shine for larger streams, but we need to keep the examples simple in a blog post like this.
We’ll use a light green background for Gatherers4J examples and a light blue background for standard Groovy iterator functionality. For some of the examples, we’ll also look at using the Groovy-stream library where we’ll use a light orange background.
Gatherers4J has over 50 gatherers organised into five different categories:
-
Sequence Operations - Reorder, combine, or manipulate the sequence of elements.
-
Filtering and Selection - Select or remove elements based on some criteria.
-
Grouping and Windowing - Collect elements into groups or windows.
-
Mathematical Operations - Perform calculations over the stream.
-
Validation and Constraints - Enforce conditions on the stream.
We’ll explore about half of them and subjectively pick what might be the more commonly occurring scenarios, but if you are using streams, the whole library is worth looking at. It is worth remembering that the JDK has built-in functionality that Gatherers4J doesn’t try to replicate. In that sense, Gatherers4J does have some less-used gatherers, but many are still very useful.
Our goal in performing these comparisons isn’t to pitch gatherer solutions or iterator solutions as competitors. Streams and iterators each have their benefits. If using either one, it is good to recognise what the other might look like and what it might have to offer.
Before, starting, let’s create some variables we’ll use in later examples:
var abc = 'A'..'C'
var abcde = 'A'..'E'
var nums = 1..3
crossWith, combine
Let’s look at creating all pairs of combinations between two sources, the cross product.
Gatherers4J provides the crossWith
gatherer:
assert abc.stream()
.gather(Gatherers4j.crossWith(nums.stream()))
.map(pair -> pair.first + pair.second)
.toList() == ['A1', 'A2', 'A3', 'B1', 'B2', 'B3', 'C1', 'C2', 'C3']
For collections, Groovy provides combinations
or eachCombination
, but for iterators you can do the following:
assert Iterators.combine(letter: abc.iterator(), number: nums.iterator())
.collectLazy(map -> map.letter + map.number)
.toList() == ['A1', 'A2', 'A3', 'B1', 'B2', 'B3', 'C1', 'C2', 'C3']
foldIndexed, inject+withIndex
The fold (aka inject and reduce) operation folds a stream of items into a single value. Let’s explore a variant of fold that also provides the index of the item.
Gatherers4J provides the foldIndexed
gatherer:
assert abc.stream()
.gather(Gatherers4j.foldIndexed(
() -> '', // initialValue
(index, carry, next) -> carry + next + index
))
.findFirst().get() == 'A0B1C2'
Groovy uses the name inject
for fold, but doesn’t offer a special variant
with index values. Instead, you use normal inject in combination with either withIndex
or indexed
:
assert abc.iterator().withIndex().inject('') { carry, next ->
carry + next.first + next.last
} == 'A0B1C2'
interleaveWith, interleave
The interleaveWith
gatherer interleaves the elements from two streams:
assert abc.stream()
.gather(Gatherers4j.interleaveWith(nums.stream()))
.toList() == ['A', 1, 'B', 2, 'C', 3]
Groovy supports interleave
:
assert abc.iterator()
.interleave(nums.iterator())
.toList() == ['A', 1, 'B', 2, 'C', 3]
mapIndexed, collectLazy+withIndex, mapWithIndex
The map (also called transform or collect) operation transforms elements in a stream into a stream of new values.
The mapIndexed
gatherer in Gatherers4J provides access to each element and its index:
assert abc.stream()
.gather(Gatherers4j.mapIndexed (
(i, s) -> s + i
))
.toList() == ['A0', 'B1', 'C2']
In Groovy, you’d typically use withIndex
and collectLazy
for this functionality:
assert abc.iterator()
.withIndex()
.collectLazy { s, i -> s + i }
.toList() == ['A0', 'B1', 'C2']
Groovy-stream has a mapWithIndex
method for this scenario:
assert Stream.from(abc)
.mapWithIndex { s, i -> s + i }
.toList() == ['A0', 'B1', 'C2']
orderByFrequency, countBy
The orderByFrequency
gatherer in Gatherers4J counts elements in a stream
then, once the stream is complete, returns the unique values from the stream
(and their frequency count) in frequency count order (ascending or descending).
Given this behavior, it could be implemented as a collector.
We’ll look at using the built-in collectors, and then for completeness,
show the ascending and descending variations from Gatherers4J:
var letters = ['A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C']
assert letters.stream()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))
.toString() == '[A:3, B:4, C:2]'
assert letters.stream()
.gather(Gatherers4j.orderByFrequency(Frequency.Ascending))
.map(withCount -> [withCount.value, withCount.count])
.toList()
.collectEntries()
.toString() == '[C:2, A:3, B:4]'
assert letters.stream()
.gather(Gatherers4j.orderByFrequency(Frequency.Descending))
.map(withCount -> [withCount.value, withCount.count])
.toList()
.collectEntries()
.toString() == '[B:4, A:3, C:2]'
Groovy’s countBy
method on collections does a similar thing but, by default,
returns the unique items in order of first appearance. Since it is somewhat like a terminal
operator, we’ll just use the Groovy collection methods here.
The ascending and descending behaviors can be achieved with sorting:
assert letters.countBy().toString() == '[A:3, B:4, C:2]'
assert letters.countBy()
.sort{ e -> e.value }
.toString() == '[C:2, A:3, B:4]'
assert letters
.countBy()
.sort{ e -> -e.value }
.toString() == '[B:4, A:3, C:2]'
peekIndexed, tapEvery+withIndex, tapWithIndex
The peekIndexed
gatherer in Gatherers4J is similar to the JDK Streams peek
intermediate operation but also provides access to the index value:
assert abc.stream()
.gather(Gatherers4j.peekIndexed(
(index, element) -> println "Element $element at index $index"
))
.toList() == abc
Groovy’s eachWithIndex
provides similar functionality but exhausts the iterator. Instead, you can use withIndex
with tapEvery
:
assert abc.iterator().withIndex().tapEvery { tuple ->
println "Element $tuple.first at index $tuple.last"
}*.first == abc
Groovy-stream provides a tapWithIndex
method:
assert Stream.from(abc)
.tapWithIndex { s, i -> println "Element $s at index $i" }
.toList() == abc
All of the above produce the following output:
Element A at index 0 Element B at index 1 Element C at index 2
repeat
Gatherers4J has a repeat
gatherer that allows a source of elements to be repeated a given number of times:
assert abc.stream()
.gather(Gatherers4j.repeat(3))
.toList() == ['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C']
Groovy offers a multiply
operator for this for collections, but also has a repeat
method for iterators:
assert abc * 3 == ['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C']
assert abc.iterator()
.repeat(3)
.toList() == ['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C']
Groovy-streams also has a repeat
method:
assert Stream.from(abc)
.repeat(3)
.toList() == ['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C']
repeatInfinitely
Gatherers4J has a repeatInfinitely
gatherer that allows a source of elements to be repeated in a cycle indefinitely:
assert abc.stream()
.gather(Gatherers4j.repeatInfinitely())
.limit(5)
.toList() == ['A', 'B', 'C', 'A', 'B']
Groovy provides a repeat
method for this scenario:
assert abc.iterator()
.repeat()
.take(5)
.toList() == ['A', 'B', 'C', 'A', 'B']
Groovy-stream has a repeat
method for this:
assert Stream.from(abc)
.repeat()
.take(5)
.toList() == ['A', 'B', 'C', 'A', 'B']
reverse
The reverse
gatherer in Gatherers4J returns the elements from the stream,
in reverse order, once all elements have been received:
assert abc.stream()
.gather(Gatherers4j.reverse())
.toList() == 'C'..'A'
Since the whole stream is examined before outputting elements,
there is no significant benefit to using streams here, and reverse
isn’t suitable
for use when processing infinite streams.
The same applies for Groovy’s iterator implementation,
but Groovy, like Gatherers4J, offers a reverse
extension method anyway:
assert abc.iterator()
.reverse()
.toList() == 'C'..'A'
rotate
The rotate
gatherer in Gatherers4J returns the elements from the stream,
in rotated positions, once all elements have been received.
Again, since the whole stream is processed, there is no significant benefit
here compared to working with collections. We can rotate in either direction:
var abcde = ['A', 'B', 'C', 'D', 'E']
var shift = 2
assert abcde.stream()
.gather(Gatherers4j.rotate(Rotate.Left, shift))
.toList() == ['C', 'D', 'E', 'A', 'B']
assert abcde.stream()
.gather(Gatherers4j.rotate(Rotate.Right, shift))
.toList() == ['D', 'E', 'A', 'B', 'C']
Groovy doesn’t provide any iterator-based equivalent methods.
For collections, users could piggyback on the JDK libraries using Collections.rotate
if a mutating method is acceptable:
var temp = abcde.clone() // unless mutating original is okay
Collections.rotate(temp, -shift) // -ve for left
assert temp == ['C', 'D', 'E', 'A', 'B']
temp = abcde.clone()
Collections.rotate(temp, shift) // +ve for right
assert temp == ['D', 'E', 'A', 'B', 'C']
Or otherwise Groovy’s indexing allows for very flexible slicing of collections, so rotation can be achieved via such index mangling:
assert abcde[shift..-1] + abcde[0..<shift] == ['C', 'D', 'E', 'A', 'B'] // left
assert abcde[shift<..-1] + abcde[0..shift] == ['D', 'E', 'A', 'B', 'C'] // right
An iterator-based rotate extension method might be a possible future Groovy feature. For shifting to the left, it would seem possible to not store the whole list, like the current Gatherers4J implementation does, but only the "shift" distance number of elements. For shifting to the right, you’d need the stream size minus the "shift" distance, and you won’t know the size ahead of time.
scanIndexed, injectAll+withIndex
Gatherers4J provides the scanIndexed
gatherer. It’s like the JDK’s built-in scan
gatherer
but also provides access to the index:
assert abc.stream()
.gather(
Gatherers4j.scanIndexed(
() -> '',
(index, carry, next) -> carry + next + index
)
)
.toList() == ['A0', 'A0B1', 'A0B1C2']
For Groovy, the injectAll
method in combination with withIndex
can be used:
assert abc.iterator()
.withIndex()
.injectAll('') { carry, next ->
carry + next.first + next.last
}
.toList() == ['A0', 'A0B1', 'A0B1C2']
shuffle
Gatherers4J offers the shuffle
gatherer:
int seed = 42
assert Stream.of(*'A'..'G')
.gather(Gatherers4j.shuffle(new Random(seed)))
.toList() == [ 'B', 'D', 'F', 'A', 'E', 'G', 'C' ]
This is another gatherer which consumes the entire stream and holds it in memory
before producing values.
There is no significant advantage compared to using collections.
Groovy only offers collection-based functionality for this feature using the shuffled
extension method:
assert ('A'..'G').shuffled(new Random(seed)) == ['C', 'G', 'E', 'A', 'F', 'D', 'B']
withIndex
Gatherers4J and Groovy both provide withIndex
. Here is the gatherer version:
assert abc.stream()
.gather(Gatherers4j.withIndex())
.map(withIndex -> "$withIndex.value$withIndex.index")
.toList() == ['A0', 'B1', 'C2']
Here is the iterator version:
assert abc.iterator().withIndex()
.collectLazy(tuple -> "$tuple.v1$tuple.v2")
.toList() == ['A0', 'B1', 'C2']
zipWith, zip
Gatherers4J provides a zipWith
gatherer:
assert abc.stream()
.gather(Gatherers4j.zipWith(nums.stream()))
.map(pair -> "$pair.first$pair.second")
.toList() == ['A1', 'B2', 'C3']
Groovy provides zip
:
assert abc.iterator()
.zip(nums.iterator())
.collectLazy { s, n -> s + n }
.toList() == ['A1', 'B2', 'C3']
Groovy-stream, offers the same:
assert Stream.from(abc)
.zip(nums) { s, i -> s + i }
.toList() == ['A1', 'B2', 'C3']
distinctBy, toUnique
Gatherers4J has a distinctBy
gatherer that finds unique elements
using a predicate to determine equality:
assert Stream.of('A', 'BB', 'CC', 'D')
.gather(Gatherers4j.distinctBy(String::size))
.toList() == ['A', 'BB']
Groovy provides toUnique
for this:
assert ['A', 'BB', 'CC', 'D'].iterator()
.toUnique(String::size)
.toList() == ['A', 'BB']
dropEveryNth/takeEveryNth
Gatherers4J has special gatherers to take or drop every nth element:
// drop every 3rd
assert ('A'..'G').stream()
.gather(Gatherers4j.dropEveryNth(3))
.toList() == ['B', 'C', 'E', 'F']
// take every 3rd
assert ('A'..'G').stream()
.gather(Gatherers4j.takeEveryNth(3))
.toList() == ['A', 'D', 'G']
Groovy doesn’t have a specific method for taking/dropping the nth element
but you either use findAllLazy
with withIndex
, or tapEvery
:
// drop every 3rd
assert ('A'..'G').iterator().withIndex()
.findAllLazy { next, i -> i % 3 }
.toList()*.first == ['B', 'C', 'E', 'F']
// take every 3rd
assert ('A'..'G').iterator().withIndex()
.findAllLazy { next, i -> i % 3 == 0 }
.toList()*.first == ['A', 'D', 'G']
// also take every 3rd
var result = []
('A'..'G').iterator().tapEvery(3) { result << it }.toList()
assert result == ['A', 'D', 'G']
Groovy-stream doesn’t have a specific method either,
but you can achieve a similar thing using either filterWithIndex
or tapEvery
:
// drop every 3rd
assert Stream.from('A'..'G')
.filterWithIndex { next, idx -> idx % 3 }
.toList() == ['B', 'C', 'E', 'F']
// take every 3rd
assert Stream.from('A'..'G')
.filterWithIndex { next, idx -> idx % 3 == 0 }
.toList() == ['A', 'D', 'G']
// also take every 3rd (starting from 3rd)
var result = []
Stream.from('A'..'G').tapEvery(3) { result << it }.toList()
assert result == ['C', 'F']
result = []
dropLast, dropRight
The dropLast
gatherer removes the last n elements from the stream:
assert abcde.stream()
.gather(Gatherers4j.dropLast(2))
.toList() == abc
Groovy provides dropRight
for this:
assert abcde.iterator().dropRight(2).toList() == abc
filterIndexed
The filterWithIndex
gatherer allows filtering elements with access to the index:
assert abcde.stream()
.gather(Gatherers4j.filterIndexed{ n, s -> n % 2 == 0 })
.toList() == ['A', 'C', 'E']
assert abcde.stream()
.gather(Gatherers4j.filterIndexed{ n, s -> n < 2 || s == 'E' })
.toList() == ['A', 'B', 'E']
Groovy doesn’t have an all-in-one equivalent, but you can use withIndex
plus findAllLazy
:
assert abcde.iterator().withIndex()
.findAllLazy { s, n -> n % 2 == 0 }*.first == ['A', 'C', 'E']
assert abcde.iterator().withIndex()
.findAllLazy { s, n -> n < 2 || s == 'E' }*.first == ['A', 'B', 'E']
Groovy-stream provides a filterWithIndex
method:
assert Stream.from(abcde)
.filterWithIndex{ s, n -> n % 2 == 0 }
.toList() == ['A', 'C', 'E']
assert Stream.from(abcde)
.filterWithIndex{ s, n -> n < 2 || s == 'E' }
.toList() == ['A', 'B', 'E']
filterInstanceOf
The filterInstanceOf
gatherer combines filter
with instanceof
:
var mixed = [(byte)1, (short)2, 3, (long)4, 5.0, 6.0d, '7', '42']
assert mixed.stream()
.gather(Gatherers4j.filterInstanceOf(Integer))
.toList() == [3]
assert mixed.stream()
.gather(Gatherers4j.filterInstanceOf(Number))
.toList() == [1, 2, 3, 4, 5.0, 6.0]
assert mixed.stream()
.gather(Gatherers4j.filterInstanceOf(Integer, Short))
.toList() == [2, 3]
Groovy doesn’t have an exact equivalent but does have an eager grep
which lets you do similar things, and other functionality like matching
regex patterns. You can also use findAllLazy
in combination with
getClass()
or instanceof
as needed:
var mixed = [(byte)1, (short)2, 3, (long)4, 5.0, 6.0d, '7', '42']
assert mixed.iterator().grep(Integer) == [3]
assert mixed.iterator().grep(Number) == [1, 2, 3, 4, 5.0, 6.0]
assert mixed.iterator().grep(~/\d/).toString() == '[1, 2, 3, 4, 7]'
assert mixed.iterator()
.findAllLazy{ it.getClass() in [Integer, Short] }
.toList() == [2, 3]
takeLast/takeRight
The takeLast
gatherer returns the last n elements from a stream:
assert abcde.stream()
.gather(Gatherers4j.takeLast(3))
.toList() == ['C', 'D', 'E']
It reads the entire stream before emitting elements.
Groovy doesn’t have an iterator equivalent but does have a takeRight
collections method:
assert abcde.takeRight(3) == 'C'..'E'
takeUntil, takeWhile, until
The takeUntil
gatherer takes elements until some condition is satisfied
including the element that satisfied the condition:
assert abcde.stream()
.gather(Gatherers4j.takeUntil{ it == 'C' })
.toList() == abc
The takeWhile
extension method takes elements while some condition is satisfied
(so needs the reverse condition):
assert abcde.iterator()
.takeWhile { it != 'D' }
.toList() == abc
Groovy-stream’s until
method has a condition similar to gatherers4j,
but doesn’t include the element that triggers the condition:
assert Stream.from(abcde)
.until { it == 'D' }
.toList() == abc
uniquelyOccurring
The uniquelyOccurring
gatherer reads the entire stream before returning the elements
that occur only once:
assert Stream.of('A', 'B', 'C', 'A')
.gather(Gatherers4j.uniquelyOccurring())
.toList() == ['B', 'C']
Groovy doesn’t have an equivalent, but countBy
could be used to achieve similar functionality.
Groovy’s countBy
method is an eager (think terminal) operator:
assert ['A', 'B', 'C', 'A'].iterator()
.countBy()
.findAll{ it.value == 1 }*.key == ['B', 'C']
Further information
Conclusion
We have looked at how to use gatherers, and how to achieve similar functionality using iterators.