Java Stream & Why To Use It

Welcome to the series "Java Coding"; brought to you by "Coding Mantis"!

We're familiar with indexed for-loops, they were helpful when we needed to iterate over a collection so as to perform some actions per collection element.

In 2004, with Java 1.5, the enhanced for-loop was added which made the iteration of collection elements much cleaner and easier to use. Alas, a nifty little feature called Streams came into the picture 10 years later with the release of Java 8 (or 1.8 in the old versioning system).

What's a Stream?

A Stream is an abstraction over the original collection, it's immutable and functions as a pipeline that allows us to specify a set of operations to be applied to the collection's elements in the order they are declared. Additionally, it enables us to operate in a functional style.

It's essential to note that a Stream is not a Collection itself, meaning that since it does not hold any data we cannot use it to store elements inside it as a replacement for other Collection implementations.

How does a Stream work?

As we said above, a Stream is a pipeline and it consists of a source, intermediate operations and one terminal operation.

The source can be obtained by simply invoking the .stream() method on a Collection, by using the factory method .of(...) of the Stream interface, or other.

An intermediate operation returns a Stream and has a lazy execution which means that it will not be run until a terminal operator has been invoked. Think of the intermediate operations as instructions that will be executed in the order they are specified when a terminal operation is declared and reached. The most common intermediate operations of Stream are map, filter, distinct, sorted, and flatMap.

As previously mentioned, as a final step in each Stream pipeline, we have a terminal operator. This operator will take into account all of the specified intermediate operations, if any, and then return a value depending on which terminal operator was invoked. The most common terminal operations of Stream are collect, forEach, allMatch, noneMatch, anyMatch, findAny, findFirst.

Of course, there are more intermediate and terminal operators, but these are the ones most commonly encountered, in my experience at least.

Let's see an example of a Stream:

Collection<String> names = Arrays.asList("John", "Mary", "Trevor", null, "Joe"); 
Collection<String> result = names .stream()
  .filter(name -> name != null) // Functionally: .map(Objects::nonNull)
  .filter(name -> name.length() > 3)
  .map(name -> name.toUpperCase()) // Functionally: .map(String::toUpperCase)
  .collect(Collectors.toList());

The above will filter out all null elements first, filter out any names with less than four characters, convert each name to upper case, and then collect all three upper-case names in a List.

Common Pitfalls

Execution Order

It's quite interesting to note that even though we say that intermediate operations are executed in order, this does not mean that each pipeline step is completed before the next one starts.

What actually happens is that each element completes a round of all the instructions and is then collected into a List, so "John" from the previous example would pass through filter, filter, map and collect before "Mary" takes its turn. To speed things up, Stream offers an option for parallel processing which would keep the same logic as above but would happen for up to all the elements simultaneously, depending on the thread pool availability.

Mutation of Shared State

Since intermediate operations are only executed in the presence of a terminal operation, populating an existing Collection with elements processed in a Stream can lead to concurrency issues and a trivial impact on the memory due to unnecessary variables if that new Collection is to be returned. Let's assume the following example:

class Car {
  String model;
  CarType carType;
}
enum CarType { SEDAN, HATCHBACK, OTHER }
Collection<String> models = new ArrayList<>();
Collection<Car> cars = getPopulatedCarList();

cars.stream()
  .parallel()
  .filter(car -> CarType.HATCHBACK == car.getCarType())
  .map(car -> car.getModel()) // Functionally: .map(Car::getModel)
  .forEach(model-> models.add(model));

return models;

In the above example, we are prone to concurrency issues since in a parallel computation the concurrent addition of elements can lead to errors. Additionally, we can see that the models variable is redundant. The correct and safe way to tackle the above would be to collect the models directly and either return them or assign them to a variable.

Collection<Car> cars = getPopulatedCarList();

return cars.stream()
  .parallel()
  .filter(car -> CarType.HATCHBACK == car.getCarType())
  .map(car -> car.getModel()) // Functionally: .map(Car::getModel)
  .collect(Collectors.toList());

Parallelizability Confusion

We've established that a Stream can either be sequential or parallel. However, a common misconception is expecting a Stream to behave in parallel just because we invoked the .parallelStream() or .stream().parallel() methods.

Although similar in first glance, the two aforementioned methods derive from different APIs since .parallelStream() is a member of Collection and .stream().parallel()is a member of Stream. This means that each concrete (or default) implementation of Collection might return a parallel or a sequential Stream.

So that must mean that calling .parallel() on a Stream ensures its parallelizability, right?

Unfortunately, no. There is no guarantee about the number of threads that will be used, and over which operations parallel processing will be allowed, but depending on the Stream's nature we have the potential to reduce the computation time.

Accidental Consumption

Even if a Stream does not consume the original source i.e. the collection it was created from, a terminal operation will consume the Stream itself leaving it in an inoperable state for further use.

Stream<String> stream = Stream.of("a", "b", "c");
List<String> res0 = stream
  .map(s -> s.toUpperCase()) // Functionally: .map(String::toUpperCase)
  .collect(Collectors.toList());

List<String> res1 = stream.collect(Collectors.toList());
// The above line would print: 
// 'Exception in thread "main" java.lang.IllegalStateException: stream has already been operated upon or closed.'

Conclusion

Streams as well as other APIs can help us clearly and fastly describe our intentions when working with collections in very few lines of code and they can be extended with a functional touch (i.e. method references, etc.) that definitely put the fun in functional.

If Streams seem unclear at first, just take a deep breath and give it a shot; it's only natural to struggle initially since functional programming requires a different way of thinking. With a bit of patience, imagination and a bit of caution to avoid abuse, such tools can help us do wonders in just a few lines

What do you think about the subject?