Exploring Gatherers4J with Groovy

Author: Paul King

Published: 2025-04-11 10:30AM


Let’s explore using Groovy with the Gatherers4J library and the JDK 24 gatherers API. We’ll also look at iterator variants for JDK 8/11+ users!

An interesting feature in recent JDK versions is Gatherers. JDK 24 includes a bunch of built-in gatherers, but the main goal of the gatherers API was to allow custom intermediate operations to be developed rather than provide a huge range of in-built gatherers.

We looked in an earlier blog post at how to write your own gatherer equivalents for chop, collate and other built-in Groovy functionality.

Other folks have also been looking at useful gatherers and libraries are starting to emerge. Gatherers4J is one such library. We’ll use 0.11.0 (pre-release) and Groovy 5 snapshot (pre-release) and JDK 24.

Let’s now look at numerous other gatherers (and their Groovy iterator equivalents) in the Gatherers4J library.

Revisiting Collate with Gatherers4J

collate a list - produced by Dall-E 3

In an earlier blog post we showed how to provide a streams equivalent of Groovy’s collate extension method on collections and iterators. Some of its functionality is supported by the built-in windowFixed and windowSliding gatherers, and we showed how to write some custom gatherers, windowSlidingByStep and windowFixedTruncating, to handle the remaining functionality.

Let’s look at instead using the window gatherer from Gatherers4J:

assert (1..5).stream().gather(Gatherers4j.window(3, 1, true)).toList() ==
    [[1, 2, 3], [2, 3, 4], [3, 4, 5], [4, 5], [5]]
assert (1..8).stream().gather(Gatherers4j.window(3, 2, true)).toList() ==
    [[1, 2, 3], [3, 4, 5], [5, 6, 7], [7, 8]]
assert (1..8).stream().gather(Gatherers4j.window(3, 2, false)).toList() ==
    [[1, 2, 3], [3, 4, 5], [5, 6, 7]]
assert (1..8).stream().gather(Gatherers4j.window(3, 4, false)).toList() ==
    [[1, 2, 3], [5, 6, 7]]
assert (1..8).stream().gather(Gatherers4j.window(3, 3, true)).toList() ==
    [[1, 2, 3], [4, 5, 6], [7, 8]]

For comparison, here was the output shown using Groovy’s collate extension method on collections:

assert (1..5).collate(3, 1) == [[1, 2, 3], [2, 3, 4], [3, 4, 5], [4, 5], [5]]
assert (1..8).collate(3, 2) == [[1, 2, 3], [3, 4, 5], [5, 6, 7], [7, 8]]
assert (1..8).collate(3, 2, false) == [[1, 2, 3], [3, 4, 5], [5, 6, 7]]
assert (1..8).collate(3, 4, false) == [[1, 2, 3], [5, 6, 7]]
assert (1..8).collate(3, 3) == [[1, 2, 3], [4, 5, 6], [7, 8]]

These aren’t exact equivalents. The Groovy versions operate eagerly on collections, while the gatherer ones are stream-based. We can show a more "apples with apples" comparison by Groovy’s iterator variants of the collate method. Let’s look at some infinite stream examples.

The gatherer version:

assert Stream.iterate(0, n -> n + 1)
    .gather(Gatherers4j.window(3, 3, true))
    .limit(3)
    .toList() == [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
Note
Because we aren’t worried about remainders or different step sizes, we could have just used the JDK 24’s built-in Gatherers.windowFixed(3) gatherer here.

Now, let’s look at Groovy’s iterator equivalent:

assert Iterators.iterate(0, n -> n + 1)
    .collate(3)
    .take(3)
    .toList() == [[0, 1, 2], [3, 4, 5], [6, 7, 8]]

When we say equivalent, we aren’t implying that Iterators offer the same power or flexibility as streams, but for many simple scenarios they will achieve the same results and can be more efficient.

Exploring Gatherers4J other functionality

Let’s now explore some of the other gatherers in Gatherers4J. For a somewhat "apples vs apples" comparison, we’ll compare against the equivalent Groovy Iterator functionality, unless otherwise stated. There are some case where Groovy doesn’t offer an iterator-based equivalent, so we’ll look at collection-based solutions in those cases.

Just recall that using the collection variants of these examples might be simpler and equally suitable depending on your scenario. Streaming solutions really shine for larger streams, but we need to keep the examples simple in a blog post like this.

We’ll use a light green background for Gatherers4J examples and a light blue background for standard Groovy iterator functionality. For some of the examples, we’ll also look at using the Groovy-stream library where we’ll use a light orange background.

Gatherers4J has over 50 gatherers organised into five different categories:

  • Sequence Operations - Reorder, combine, or manipulate the sequence of elements.

  • Filtering and Selection - Select or remove elements based on some criteria.

  • Grouping and Windowing - Collect elements into groups or windows.

  • Mathematical Operations - Perform calculations over the stream.

  • Validation and Constraints - Enforce conditions on the stream.

We’ll explore about half of them and subjectively pick what might be the more commonly occurring scenarios, but if you are using streams, the whole library is worth looking at. It is worth remembering that the JDK has built-in functionality that Gatherers4J doesn’t try to replicate. In that sense, Gatherers4J does have some less-used gatherers, but many are still very useful.

Our goal in performing these comparisons isn’t to pitch gatherer solutions or iterator solutions as competitors. Streams and iterators each have their benefits. If using either one, it is good to recognise what the other might look like and what it might have to offer.

Before, starting, let’s create some variables we’ll use in later examples:

var abc = 'A'..'C'
var abcde = 'A'..'E'
var nums = 1..3

crossWith, combine

Let’s look at creating all pairs of combinations between two sources, the cross product.

Gatherers4J provides the crossWith gatherer:

assert abc.stream() .gather(Gatherers4j.crossWith(nums.stream())) .map(pair -> pair.first + pair.second) .toList() == ['A1', 'A2', 'A3', 'B1', 'B2', 'B3', 'C1', 'C2', 'C3']

For collections, Groovy provides combinations or eachCombination, but for iterators you can do the following:

assert Iterators.combine(letter: abc.iterator(), number: nums.iterator()) .collectLazy(map -> map.letter + map.number) .toList() == ['A1', 'A2', 'A3', 'B1', 'B2', 'B3', 'C1', 'C2', 'C3']

foldIndexed, inject+withIndex

The fold (aka inject and reduce) operation folds a stream of items into a single value. Let’s explore a variant of fold that also provides the index of the item.

Gatherers4J provides the foldIndexed gatherer:

assert abc.stream() .gather(Gatherers4j.foldIndexed( () -> '', // initialValue (index, carry, next) -> carry + next + index )) .findFirst().get() == 'A0B1C2'

Groovy uses the name inject for fold, but doesn’t offer a special variant with index values. Instead, you use normal inject in combination with either withIndex or indexed:

assert abc.iterator().withIndex().inject('') { carry, next -> carry + next.first + next.last } == 'A0B1C2'

interleaveWith, interleave

The interleaveWith gatherer interleaves the elements from two streams:

assert abc.stream() .gather(Gatherers4j.interleaveWith(nums.stream())) .toList() == ['A', 1, 'B', 2, 'C', 3]

Groovy supports interleave:

assert abc.iterator() .interleave(nums.iterator()) .toList() == ['A', 1, 'B', 2, 'C', 3]

mapIndexed, collectLazy+withIndex, mapWithIndex

The map (also called transform or collect) operation transforms elements in a stream into a stream of new values.

The mapIndexed gatherer in Gatherers4J provides access to each element and its index:

assert abc.stream() .gather(Gatherers4j.mapIndexed ( (i, s) -> s + i )) .toList() == ['A0', 'B1', 'C2']

In Groovy, you’d typically use withIndex and collectLazy for this functionality:

assert abc.iterator() .withIndex() .collectLazy { s, i -> s + i } .toList() == ['A0', 'B1', 'C2']

Groovy-stream has a mapWithIndex method for this scenario:

assert Stream.from(abc) .mapWithIndex { s, i -> s + i } .toList() == ['A0', 'B1', 'C2']

orderByFrequency, countBy

The orderByFrequency gatherer in Gatherers4J counts elements in a stream then, once the stream is complete, returns the unique values from the stream (and their frequency count) in frequency count order (ascending or descending). Given this behavior, it could be implemented as a collector. We’ll look at using the built-in collectors, and then for completeness, show the ascending and descending variations from Gatherers4J:

var letters = ['A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C'] assert letters.stream() .collect(Collectors.groupingBy(Function.identity(), Collectors.counting())) .toString() == '[A:3, B:4, C:2]' assert letters.stream() .gather(Gatherers4j.orderByFrequency(Frequency.Ascending)) .map(withCount -> [withCount.value, withCount.count]) .toList() .collectEntries() .toString() == '[C:2, A:3, B:4]' assert letters.stream() .gather(Gatherers4j.orderByFrequency(Frequency.Descending)) .map(withCount -> [withCount.value, withCount.count]) .toList() .collectEntries() .toString() == '[B:4, A:3, C:2]'

Groovy’s countBy method on collections does a similar thing but, by default, returns the unique items in order of first appearance. Since it is somewhat like a terminal operator, we’ll just use the Groovy collection methods here. The ascending and descending behaviors can be achieved with sorting:

assert letters.countBy().toString() == '[A:3, B:4, C:2]' assert letters.countBy() .sort{ e -> e.value } .toString() == '[C:2, A:3, B:4]' assert letters .countBy() .sort{ e -> -e.value } .toString() == '[B:4, A:3, C:2]'

peekIndexed, tapEvery+withIndex, tapWithIndex

The peekIndexed gatherer in Gatherers4J is similar to the JDK Streams peek intermediate operation but also provides access to the index value:

assert abc.stream() .gather(Gatherers4j.peekIndexed( (index, element) -> println "Element $element at index $index" )) .toList() == abc

Groovy’s eachWithIndex provides similar functionality but exhausts the iterator. Instead, you can use withIndex with tapEvery:

assert abc.iterator().withIndex().tapEvery { tuple -> println "Element $tuple.first at index $tuple.last" }*.first == abc

Groovy-stream provides a tapWithIndex method:

assert Stream.from(abc) .tapWithIndex { s, i -> println "Element $s at index $i" } .toList() == abc

All of the above produce the following output:

Element A at index 0
Element B at index 1
Element C at index 2

repeat

Gatherers4J has a repeat gatherer that allows a source of elements to be repeated a given number of times:

assert abc.stream() .gather(Gatherers4j.repeat(3)) .toList() == ['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C']

Groovy offers a multiply operator for this for collections, but also has a repeat method for iterators:

assert abc * 3 == ['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C'] assert abc.iterator() .repeat(3) .toList() == ['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C']

Groovy-streams also has a repeat method:

assert Stream.from(abc) .repeat(3) .toList() == ['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C']

repeatInfinitely

Gatherers4J has a repeatInfinitely gatherer that allows a source of elements to be repeated in a cycle indefinitely:

assert abc.stream() .gather(Gatherers4j.repeatInfinitely()) .limit(5) .toList() == ['A', 'B', 'C', 'A', 'B']

Groovy provides a repeat method for this scenario:

assert abc.iterator() .repeat() .take(5) .toList() == ['A', 'B', 'C', 'A', 'B']

Groovy-stream has a repeat method for this:

assert Stream.from(abc) .repeat() .take(5) .toList() == ['A', 'B', 'C', 'A', 'B']

reverse

The reverse gatherer in Gatherers4J returns the elements from the stream, in reverse order, once all elements have been received:

assert abc.stream() .gather(Gatherers4j.reverse()) .toList() == 'C'..'A'

Since the whole stream is examined before outputting elements, there is no significant benefit to using streams here, and reverse isn’t suitable for use when processing infinite streams.

The same applies for Groovy’s iterator implementation, but Groovy, like Gatherers4J, offers a reverse extension method anyway:

assert abc.iterator() .reverse() .toList() == 'C'..'A'

rotate

The rotate gatherer in Gatherers4J returns the elements from the stream, in rotated positions, once all elements have been received. Again, since the whole stream is processed, there is no significant benefit here compared to working with collections. We can rotate in either direction:

var abcde = ['A', 'B', 'C', 'D', 'E'] var shift = 2 assert abcde.stream() .gather(Gatherers4j.rotate(Rotate.Left, shift)) .toList() == ['C', 'D', 'E', 'A', 'B'] assert abcde.stream() .gather(Gatherers4j.rotate(Rotate.Right, shift)) .toList() == ['D', 'E', 'A', 'B', 'C']

Groovy doesn’t provide any iterator-based equivalent methods. For collections, users could piggyback on the JDK libraries using Collections.rotate if a mutating method is acceptable:

var temp = abcde.clone() // unless mutating original is okay Collections.rotate(temp, -shift) // -ve for left assert temp == ['C', 'D', 'E', 'A', 'B'] temp = abcde.clone() Collections.rotate(temp, shift) // +ve for right assert temp == ['D', 'E', 'A', 'B', 'C']

Or otherwise Groovy’s indexing allows for very flexible slicing of collections, so rotation can be achieved via such index mangling:

assert abcde[shift..-1] + abcde[0..<shift] == ['C', 'D', 'E', 'A', 'B'] // left assert abcde[shift<..-1] + abcde[0..shift] == ['D', 'E', 'A', 'B', 'C'] // right

An iterator-based rotate extension method might be a possible future Groovy feature. For shifting to the left, it would seem possible to not store the whole list, like the current Gatherers4J implementation does, but only the "shift" distance number of elements. For shifting to the right, you’d need the stream size minus the "shift" distance, and you won’t know the size ahead of time.

scanIndexed, injectAll+withIndex

Gatherers4J provides the scanIndexed gatherer. It’s like the JDK’s built-in scan gatherer but also provides access to the index:

assert abc.stream() .gather( Gatherers4j.scanIndexed( () -> '', (index, carry, next) -> carry + next + index ) ) .toList() == ['A0', 'A0B1', 'A0B1C2']

For Groovy, the injectAll method in combination with withIndex can be used:

assert abc.iterator() .withIndex() .injectAll('') { carry, next -> carry + next.first + next.last } .toList() == ['A0', 'A0B1', 'A0B1C2']

shuffle

Gatherers4J offers the shuffle gatherer:

int seed = 42 assert Stream.of(*'A'..'G') .gather(Gatherers4j.shuffle(new Random(seed))) .toList() == [ 'B', 'D', 'F', 'A', 'E', 'G', 'C' ]

This is another gatherer which consumes the entire stream and holds it in memory before producing values. There is no significant advantage compared to using collections. Groovy only offers collection-based functionality for this feature using the shuffled extension method:

assert ('A'..'G').shuffled(new Random(seed)) == ['C', 'G', 'E', 'A', 'F', 'D', 'B']

withIndex

Gatherers4J and Groovy both provide withIndex. Here is the gatherer version:

assert abc.stream() .gather(Gatherers4j.withIndex()) .map(withIndex -> "$withIndex.value$withIndex.index") .toList() == ['A0', 'B1', 'C2']

Here is the iterator version:

assert abc.iterator().withIndex() .collectLazy(tuple -> "$tuple.v1$tuple.v2") .toList() == ['A0', 'B1', 'C2']

zipWith, zip

Gatherers4J provides a zipWith gatherer:

assert abc.stream() .gather(Gatherers4j.zipWith(nums.stream())) .map(pair -> "$pair.first$pair.second") .toList() == ['A1', 'B2', 'C3']

Groovy provides zip:

assert abc.iterator() .zip(nums.iterator()) .collectLazy { s, n -> s + n } .toList() == ['A1', 'B2', 'C3']

Groovy-stream, offers the same:

assert Stream.from(abc) .zip(nums) { s, i -> s + i } .toList() == ['A1', 'B2', 'C3']

distinctBy, toUnique

Gatherers4J has a distinctBy gatherer that finds unique elements using a predicate to determine equality:

assert Stream.of('A', 'BB', 'CC', 'D') .gather(Gatherers4j.distinctBy(String::size)) .toList() == ['A', 'BB']

Groovy provides toUnique for this:

assert ['A', 'BB', 'CC', 'D'].iterator() .toUnique(String::size) .toList() == ['A', 'BB']

dropEveryNth/takeEveryNth

Gatherers4J has special gatherers to take or drop every nth element:

// drop every 3rd assert ('A'..'G').stream() .gather(Gatherers4j.dropEveryNth(3)) .toList() == ['B', 'C', 'E', 'F'] // take every 3rd assert ('A'..'G').stream() .gather(Gatherers4j.takeEveryNth(3)) .toList() == ['A', 'D', 'G']

Groovy doesn’t have a specific method for taking/dropping the nth element but you either use findAllLazy with withIndex, or tapEvery:

// drop every 3rd assert ('A'..'G').iterator().withIndex() .findAllLazy { next, i -> i % 3 } .toList()*.first == ['B', 'C', 'E', 'F'] // take every 3rd assert ('A'..'G').iterator().withIndex() .findAllLazy { next, i -> i % 3 == 0 } .toList()*.first == ['A', 'D', 'G'] // also take every 3rd var result = [] ('A'..'G').iterator().tapEvery(3) { result << it }.toList() assert result == ['A', 'D', 'G']

Groovy-stream doesn’t have a specific method either, but you can achieve a similar thing using either filterWithIndex or tapEvery:

// drop every 3rd assert Stream.from('A'..'G') .filterWithIndex { next, idx -> idx % 3 } .toList() == ['B', 'C', 'E', 'F'] // take every 3rd assert Stream.from('A'..'G') .filterWithIndex { next, idx -> idx % 3 == 0 } .toList() == ['A', 'D', 'G'] // also take every 3rd (starting from 3rd) var result = [] Stream.from('A'..'G').tapEvery(3) { result << it }.toList() assert result == ['C', 'F'] result = []

dropLast, dropRight

The dropLast gatherer removes the last n elements from the stream:

assert abcde.stream() .gather(Gatherers4j.dropLast(2)) .toList() == abc

Groovy provides dropRight for this:

assert abcde.iterator().dropRight(2).toList() == abc

filterIndexed

The filterWithIndex gatherer allows filtering elements with access to the index:

assert abcde.stream() .gather(Gatherers4j.filterIndexed{ n, s -> n % 2 == 0 }) .toList() == ['A', 'C', 'E'] assert abcde.stream() .gather(Gatherers4j.filterIndexed{ n, s -> n < 2 || s == 'E' }) .toList() == ['A', 'B', 'E']

Groovy doesn’t have an all-in-one equivalent, but you can use withIndex plus findAllLazy:

assert abcde.iterator().withIndex() .findAllLazy { s, n -> n % 2 == 0 }*.first == ['A', 'C', 'E'] assert abcde.iterator().withIndex() .findAllLazy { s, n -> n < 2 || s == 'E' }*.first == ['A', 'B', 'E']

Groovy-stream provides a filterWithIndex method:

assert Stream.from(abcde) .filterWithIndex{ s, n -> n % 2 == 0 } .toList() == ['A', 'C', 'E'] assert Stream.from(abcde) .filterWithIndex{ s, n -> n < 2 || s == 'E' } .toList() == ['A', 'B', 'E']

filterInstanceOf

The filterInstanceOf gatherer combines filter with instanceof:

var mixed = [(byte)1, (short)2, 3, (long)4, 5.0, 6.0d, '7', '42'] assert mixed.stream() .gather(Gatherers4j.filterInstanceOf(Integer)) .toList() == [3] assert mixed.stream() .gather(Gatherers4j.filterInstanceOf(Number)) .toList() == [1, 2, 3, 4, 5.0, 6.0] assert mixed.stream() .gather(Gatherers4j.filterInstanceOf(Integer, Short)) .toList() == [2, 3]

Groovy doesn’t have an exact equivalent but does have an eager grep which lets you do similar things, and other functionality like matching regex patterns. You can also use findAllLazy in combination with getClass() or instanceof as needed:

var mixed = [(byte)1, (short)2, 3, (long)4, 5.0, 6.0d, '7', '42'] assert mixed.iterator().grep(Integer) == [3] assert mixed.iterator().grep(Number) == [1, 2, 3, 4, 5.0, 6.0] assert mixed.iterator().grep(~/\d/).toString() == '[1, 2, 3, 4, 7]' assert mixed.iterator() .findAllLazy{ it.getClass() in [Integer, Short] } .toList() == [2, 3]

takeLast/takeRight

The takeLast gatherer returns the last n elements from a stream:

assert abcde.stream() .gather(Gatherers4j.takeLast(3)) .toList() == ['C', 'D', 'E']

It reads the entire stream before emitting elements.

Groovy doesn’t have an iterator equivalent but does have a takeRight collections method:

assert abcde.takeRight(3) == 'C'..'E'

takeUntil, takeWhile, until

The takeUntil gatherer takes elements until some condition is satisfied including the element that satisfied the condition:

assert abcde.stream() .gather(Gatherers4j.takeUntil{ it == 'C' }) .toList() == abc

The takeWhile extension method takes elements while some condition is satisfied (so needs the reverse condition):

assert abcde.iterator() .takeWhile { it != 'D' } .toList() == abc

Groovy-stream’s until method has a condition similar to gatherers4j, but doesn’t include the element that triggers the condition:

assert Stream.from(abcde) .until { it == 'D' } .toList() == abc

uniquelyOccurring

The uniquelyOccurring gatherer reads the entire stream before returning the elements that occur only once:

assert Stream.of('A', 'B', 'C', 'A') .gather(Gatherers4j.uniquelyOccurring()) .toList() == ['B', 'C']

Groovy doesn’t have an equivalent, but countBy could be used to achieve similar functionality. Groovy’s countBy method is an eager (think terminal) operator:

assert ['A', 'B', 'C', 'A'].iterator() .countBy() .findAll{ it.value == 1 }*.key == ['B', 'C']

Conclusion

We have looked at how to use gatherers, and how to achieve similar functionality using iterators.

Update history

11/Apr/2025: Initial version