Concurrency in Elixir and the Keys to Solving “Parallel Letter Frequency”


Percy Grunwald's Profile Picture

Written by Percy Grunwald

— Last Updated March 11, 2019

The Parallel Letter Frequency problem on Exercism’s Elixir Track is a medium difficulty problem that I found really interesting to solve. If you haven’t seen the problem before, you’re tasked with writing a function that can calculate the frequency of letters in a list of strings. Sounds simple enough, right? Well, it also needs to handle non-English characters and do the calculation concurrently with a user-controlled number of workers:

iex> Frequency.frequency(["Freude", "schöner", "Götterfunken"], num_workers)
%{
  "c" => 1, 
  "d" => 1, 
  "e" => 5, 
  ...
  "ö" => 2
}

Simply throwing /[a-z]/ at this problem is not going to cut it.

Luckily for all you Alchemists out there, Elixir has core features that make meeting these extra requirements quite easy. But – if you’re like me – you might have never used or even heard of them. The features I’m talking about are Unicode matching in the Regex module and using the Task module to write asynchronous/concurrent code.

Prior to solving this problem I barely knew about these features, but I would now consider them an indispensible part of my Elixir toolbox. To me, this really demonstrates the value of Exercism even for more-experienced developers: solving challenging “synthetic” problems pushes you to learn and apply parts of your language that you may not have explored. Often this new learning lets you solve “real world” problems in your production code in a more efficient or expressive way. This was certainly the case for me with this specific problem.

In this article I’ll show you how you can apply Unicode matches and the Task module to solve the Parallel Letter Frequency problem in Elixir. I’ll also show that although concurrency might be seen as a “free” way to make your code faster, in some cases it can have the opposite effect. The concurrent code in my solution ran around 80% faster on my system for certain workloads, but in some cases was over 3x slower than the original code. As the benchmarks later in the post demonstrate, depending on the situation, adding concurrency can actually reduce performance while also increasing the complexity of your code.

View my published solution on Exercism.

Determine whether a character is a letter in Elixir

How would you use Elixir to determine whether or not "a" is a letter?

I think most people would apply a regular expression like /[a-z]/:

iex> String.match?("a", ~r/^[a-z]$/)
true

What about "A"?

Adding the i (case insensitive) modifier would probably be the easiest way:

iex> String.match?("A", ~r/^[a-z]$/i)
true

Ok, now what about "ö"?

If you’re like me a week ago, you might be struggling to think of the best solution –[a-z] is definitely not going to work:

iex> String.match?("ö", ~r/^[a-z]$/i)
false

Determining whether or not "ö" is a letter is a core part of solving this Exercism problem, since one of the texts thrown at you in the tests is in German:

# Poem by Friedrich Schiller. The corresponding music is the European Anthem.
@ode_an_die_freude """
Freude schöner Götterfunken
...
"""

Initially I thought I could use a regular expression to check if a character isn’t a special character, but decided that was a dead end. A regular expression like that was going to be long, inelegant and fragile; I was not confident that I could list every special character that could possibly come up, and it seemed to be the wrong way to approach the problem anyway.

Unicode regular expressions in Elixir

Digging deeper, I learned about the u modifier in Elixir’s Regex module:

unicode (u) - enables Unicode specific patterns like \p and change modifiers like \w, \W, \s and friends to also match on Unicode.

It turns out that the u modifier – and specifically the \p pattern – was exactly what I needed. The \p pattern lets you match a grapheme (another name for a single Unicode character) in any of the Unicode character categories. This not only includes specific categories like Ll (Letter, lowercase) and Sc (Symbol, currency), but also the parent categories like L (Letter) and S (Symbol).

You can match any letter of any case in any human language covered by Unicode with the pattern \p{L}. This is super powerful, and exactly what I needed to solve the problem. Converting this regular expression to Elixir form~r/^\p{L}$/u – I was able to match any single Unicode letter in Elixir:

iex> String.match?("a", ~r/^\p{L}$/u)
true

iex> String.match?("A", ~r/^\p{L}$/u)
true

iex> String.match?("ö", ~r/^\p{L}$/u)
true

iex> String.match?("$", ~r/^\p{L}$/u)
false

I applied this specific regular expression in the count_letters/1 function of my solution. This function accepts a list of graphemes, ["a", "A", "ö", "$"], and returns a map that counts only the letters while ignoring the case, %{"a" => 2, "ö" => 1}:

defp count_letters(graphemes) do
  Enum.reduce(graphemes, %{}, fn grapheme, acc ->
    if String.match?(grapheme, ~r/^\p{L}$/u) do
      downcased_letter = String.downcase(grapheme)
      Map.update(acc, downcased_letter, 1, fn count -> count + 1 end)
    else
      acc
    end
  end)
end

How to make your code concurrent using the Task module

The other major challenge in solving this problem was adding concurrency. The function is required to do the letter frequency calculation by distributing the work to a number of worker processes, which the user can set with the workers argument.

While Elixir allows you to very easily spawn processes with something like Kernel.spawn_link/1, you’re much better off using the awesomely powerful abstractions provided by the Task module:

The most common use case for [Task] is to convert sequential code into concurrent code by computing a value asynchronously.

The Task module lets you write unbelievably clean concurrent code in Elixir – no callback hell and no need for Promises.

I decided that Task.async_stream/5 was the best option in the Task toolbox for this particular problem:

async_stream(enumerable, module, function, args, options \\ [])

Task.async_stream/5 returns a stream that runs the given module, function, and args concurrently on each item in enumerable. There are two ways to control the level of concurrency with Task.async_stream/5:

  1. The number of items in enumerable, which determines the number of workers spawned (in the absence of the :max_concurrency option)
  2. Setting the :max_concurrency option to the number of workers, given that the number of items in enumerable is greater than or equal to :max_concurrency

I chose the first option for my solution since it was the easiest for me to understand and also meant that I could ensure that the correct number of workers was used even for very short strings. Here’s a diagram of my approach to adding concurrency to this problem:

Making the letter frequency computation concurrent

Or in words:

  1. Split up a list of graphemes into a number of chunks (equal to the number of workers)
  2. Process each chunk in a worker using Task.async_stream/5
  3. Merge the results from each worker into a single result

My implementation using Task.async_stream/5 is shown below (using the count_letters/1 function from the previous section). Notice how this reads exactly like regular sequential code, despite potentially massive concurrency. Such is the power of the abstractions provided in Task:

def frequency(texts, workers) do
  texts
  |> get_all_graphemes()
  |> split_into_chunks(workers)
  |> Task.async_stream(__MODULE__, :count_letters, [])
  |> merge_results_stream()
end

defp split_into_chunks(all_graphemes, num_chunks) do
  all_graphemes_count = Enum.count(all_graphemes)
  graphemes_per_chunk = :erlang.ceil(all_graphemes_count / num_chunks)

  Enum.chunk_every(all_graphemes, graphemes_per_chunk)
end

defp merge_results_stream(results_stream) do
  Enum.reduce(results_stream, %{}, fn {:ok, worker_result}, acc ->
    Map.merge(acc, worker_result, fn _key, acc_val, worker_val ->
      acc_val + worker_val
    end)
  end)
end

Does concurrency actually make the code faster?

I showed how I made the frequency/2 function concurrent in the previous section, but did it actually make the code faster? Theoretically, concurrency can speed up code on a system by distributing work across all available CPU cores. However, it’s a really good idea test this theory before accepting the additional complexity and potential fragility that concurrency adds to your code.

I split the clauses of the function based on the number of workers to allow me to test the concurrent and non-concurrent implementations separately:

def frequency([], _workers), do: %{}

def frequency(texts, 1) do
  texts
  |> get_all_graphemes()
  |> count_letters()
end

def frequency(texts, workers) do
  texts
  |> get_all_graphemes()
  |> split_into_chunks(workers)
  |> Task.async_stream(__MODULE__, :count_letters, [])
  |> merge_results_stream()
end

The original implementation of the function is called when workers == 1, and the concurrent implementation when workers > 1. I could then benchmark the function for the same input, with varying numbers of workers.

Benchmarking the function

There are a number of ways to benchmark Elixir code:

  1. Use benchee, a package made specifically for benchmarking Elixir code
  2. Use Erlang’s :timer.tc/1 to time how long an Elixir function takes to execute
  3. Use the time command found on Unix and Unix-like operating systems to time the execution of an Elixir script (.exs)

I felt that option 1 was the best overall; it required the most work, but gave the best results.

Since benchee is an external dependency, I first needed to convert the Frequency module into a Mix application.

Convert Frequency into a Mix application

I created a new Mix application called frequency with mix new:

$ mix new frequency
...

Your Mix project was created successfully.
You can use "mix" to compile it, test it, and more:

    cd frequency
    mix test

Run "mix help" for more commands.

Installed benchee:

# mix.exs

defp deps do
  [
    {:benchee, "~> 0.13", only: :dev}
  ]
end
$ cd frequency
$ mix deps.get
Resolving Hex dependencies...
Dependency resolution completed:
New:
  benchee 0.13.2
  deep_merge 0.2.0
* Getting benchee (Hex package)
* Getting deep_merge (Hex package)

And copied the contents of the original frequency.exs file into lib/frequency.ex:

# lib/frequency.ex

defmodule Frequency do
  ...
  def frequency([], _workers), do: %{}

  def frequency(texts, 1) do
  ...

  def frequency(texts, workers) do
  ...

  defp split_into_chunks(all_graphemes, num_chunks) do
  ...

  defp merge_results_stream(results_stream) do
  ...

  defp get_all_graphemes(texts) do
  ...

  def count_letters(graphemes) do
  ...
end

Create a script that calls benchee to benchmark the code

Finally, I created an Elixir script that calls Benchee.run/2 to benchmark the function with different numbers of workers:

# benchmark.exs

duplicates =
  System.argv()
  |> List.first()
  |> String.to_integer()

text = "The quick brown fox jumps over the lazy dog"
texts = List.duplicate(text, duplicates)

Benchee.run(%{
  "original code" => fn -> Frequency.frequency(texts, 1) end,
  "concurrent code: 2 workers" => fn -> Frequency.frequency(texts, 2) end,
  "concurrent code: 4 workers" => fn -> Frequency.frequency(texts, 4) end,
  "concurrent code: 8 workers" => fn -> Frequency.frequency(texts, 8) end
})

The texts variable is created by multiplying text a certain number of times. List.duplicate/2 is a convenient way of creating a large list of strings to pass to the frequency/2 function:

iex> List.duplicate("hello", 3)
["hello", "hello", "hello"]

The duplicates variable uses the first argument when calling the script from the command line, so I was able to vary the size of texts by changing the value of the argument.

For example, I could set duplicates to 1 by running:

$ mix run benchmark.exs 1

Or 1000 by running:

$ mix run benchmark.exs 1000

Concurrent vs non-current benchmark results

Benchmark results: duplicates == 1

$ mix run benchmark.exs 1
Operating System: macOS"
CPU Information: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
Number of Available Cores: 8
Available memory: 16 GB
Elixir 1.8.0
Erlang 21.1.1

...

Name                                 ips        average  deviation         median         99th %
original code                    18.91 K       52.88 μs    ±26.19%          49 μs         114 μs
concurrent code: 2 workers        8.96 K      111.59 μs    ±26.41%         103 μs         230 μs
concurrent code: 4 workers        7.27 K      137.57 μs    ±24.44%         125 μs         285 μs
concurrent code: 8 workers        6.00 K      166.76 μs    ±22.26%         154 μs         329 μs

Comparison:
original code                    18.91 K
concurrent code: 2 workers        8.96 K - 2.11x slower
concurrent code: 4 workers        7.27 K - 2.60x slower
concurrent code: 8 workers        6.00 K - 3.15x slower

Wow! For duplicates == 1, the concurrent code is slower than the original code by a significant amount. In fact, the code gets slower as the workers increase, with 8 workers being over 3 times slower than the original code.

I sort of expected this. The text is so short that with duplicates == 1 there is likely to be much more time spent in breaking up the text, spawning separate processes and merging the results than the actual letter frequency computation.

Benchmark results: duplicates == 100

$ mix run benchmark.exs 100
Operating System: macOS"
CPU Information: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
Number of Available Cores: 8
Available memory: 16 GB
Elixir 1.8.0
Erlang 21.1.1

...

Name                                 ips        average  deviation         median         99th %
concurrent code: 8 workers        190.34        5.25 ms    ±10.34%        5.08 ms        7.24 ms
concurrent code: 4 workers        182.38        5.48 ms     ±8.78%        5.34 ms        7.18 ms
concurrent code: 2 workers        176.21        5.67 ms     ±8.98%        5.52 ms        7.55 ms
original code                     159.70        6.26 ms     ±9.47%        6.07 ms        8.71 ms

Comparison:
concurrent code: 8 workers        190.34
concurrent code: 4 workers        182.38 - 1.04x slower
concurrent code: 2 workers        176.21 - 1.08x slower
original code                     159.70 - 1.19x slower

With duplicates == 100 the concurrent code started to perform better than the original code by a small margin. There was little difference between 4 and 8 workers, but both were around 20% faster than the original code.

For such a small improvement, the additional complexity added by the concurrent code is probably not worthwhile at this workload.

Benchmark results: duplicates == 1000

$ mix run benchmark.exs 1000
Operating System: macOS"
CPU Information: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
Number of Available Cores: 8
Available memory: 16 GB
Elixir 1.8.0
Erlang 21.1.1

...

Name                                 ips        average  deviation         median         99th %
concurrent code: 8 workers         28.04       35.66 ms    ±17.33%       33.45 ms       56.97 ms
concurrent code: 4 workers         26.72       37.42 ms    ±15.06%       35.69 ms       58.45 ms
concurrent code: 2 workers         20.84       47.98 ms    ±11.75%       46.48 ms       73.73 ms
original code                      15.34       65.21 ms    ±12.37%       62.90 ms       88.35 ms

Comparison:
concurrent code: 8 workers         28.04
concurrent code: 4 workers         26.72 - 1.05x slower
concurrent code: 2 workers         20.84 - 1.35x slower
original code                      15.34 - 1.83x slower

At duplicates == 1000 there was a significant difference between the original code and the concurrent code, with 4 and 8 workers both around 80% faster.

The difference between 4 and 8 workers was minimal, most likely because the number of physical cores on my system is 4. The value of 8 reported by benchee is due to my processor having Hyper-threading, which doubles the number of logical cores seen by the OS.

Benchmark results: duplicates == 10_000

$ mix run benchmark.exs 10000
Operating System: macOS"
CPU Information: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
Number of Available Cores: 8
Available memory: 16 GB
Elixir 1.8.0
Erlang 21.1.1

...

Name                                 ips        average  deviation         median         99th %
concurrent code: 8 workers          2.59      386.24 ms     ±3.59%      388.54 ms      404.97 ms
concurrent code: 4 workers          2.40      417.01 ms     ±5.90%      413.12 ms      464.11 ms
concurrent code: 2 workers          1.91      523.79 ms     ±3.70%      525.62 ms      564.83 ms
original code                       1.44      696.58 ms     ±4.97%      690.44 ms      750.42 ms

Comparison:
concurrent code: 8 workers          2.59
concurrent code: 4 workers          2.40 - 1.08x slower
concurrent code: 2 workers          1.91 - 1.36x slower
original code                       1.44 - 1.80x slower

At duplicates == 10_000 the concurrent code was still the clear victor. 8 workers performed the best on my system, at around 80% faster than the original non-concurrent implementation.

On average, 8 workers computed the result over 300ms faster than the original code (386ms vs 696ms), which is pretty significant.

Conclusion

In this post I walked you through how I used 2 awesome features of Elixir to help me solve the Parallel Letter Frequency problem on Exercism’s Elixir Track. I hope it’s been useful for you to be able see my approach to solving the problem and how you might use the Task module to add concurrency to your applications. A word of warning: as the benchmarks demonstrate, depending on your specific workload, adding concurrency can actually reduce performance while also making your code more complex. When in doubt, write the cleanest, most expressive code you can and test whether or not changes actually improve performance.

I was able to make the frequency/2 function run around 80% faster on my system for certain workloads by using 4 to 8 workers. This improvement could be crucial for your application – returning a result in around half the time could be the difference between acceptable and unusable. But if your application demands such high CPU performance for acceptable usability, Elixir may not be the correct technology choice.

This shouldn’t be interpreted as a criticism of Elixir’s Task module, which is an incredibly useful toolbox for writing asynchronous code with ease. Rather than applying Task for CPU-bound tasks like the letter frequency calculation, a better-suited application might be to make multiple HTTP requests concurrently, rather than waiting for each request to succeed/fail before making the next one. This could shave off hundreds of milliseconds and significantly improve the experience for users or your PageSpeed results.

Full solution code

defmodule Frequency do
  @doc """
  Count letter frequency in parallel.

  Returns a map of characters to frequencies.

  The number of worker processes to use can be set with 'workers'.
  """
  @spec frequency([String.t()], pos_integer) :: map
  def frequency([], _workers), do: %{}

  def frequency(texts, 1) do
    texts
    |> get_all_graphemes()
    |> count_letters()
  end

  def frequency(texts, workers) do
    texts
    |> get_all_graphemes()
    |> split_into_chunks(workers)
    |> Task.async_stream(__MODULE__, :count_letters, [])
    |> merge_results_stream()
  end

  defp split_into_chunks(all_graphemes, num_chunks) do
    all_graphemes_count = Enum.count(all_graphemes)
    graphemes_per_chunk = :erlang.ceil(all_graphemes_count / num_chunks)

    Enum.chunk_every(all_graphemes, graphemes_per_chunk)
  end

  defp merge_results_stream(results_stream) do
    Enum.reduce(results_stream, %{}, fn {:ok, worker_result}, acc ->
      Map.merge(acc, worker_result, fn _key, acc_val, worker_val ->
        acc_val + worker_val
      end)
    end)
  end

  defp get_all_graphemes(texts) do
    texts
    |> Enum.join()
    |> String.graphemes()
  end

  def count_letters(graphemes) do
    Enum.reduce(graphemes, %{}, fn grapheme, acc ->
      if String.match?(grapheme, ~r/^\p{L}$/u) do
        downcased_letter = String.downcase(grapheme)
        Map.update(acc, downcased_letter, 1, fn count -> count + 1 end)
      else
        acc
      end
    end)
  end
end

Comment & Share