Split, Apply, Merge in D
(Source/Credits: https://dev.to/jessekphillips/split-apply-merge-in-d-4c48)
I wanted to find Groupby, a means to iterate a list in groups (lists of lists). In that search I came...
I wanted to find Groupby, a means to iterate a list in groups (lists of lists). In that search I came across this article about split, apply, merge for datatables. This looked like what I wanted, but it being specific to data science had me confused.
In D these function are chunkBy, map, joiner. The pattern of consistency continues as we just need to specify what to group on, once our list is sorted.
```dlang import std.algorithm;
auto data = [1,1,2,2]; assert(data.chunkBy!((a, b) => a==b) .equal!equal([[1,1],[2,2])); ```
Unlike previous lambdas, this one is taking two arguments, this allows for elements to be grouped in interesting ways.
```dlang import std.algorithm;
auto data = [1,1,2,2,3,3]; auto evenGrouping(int a, int b) { if(a%2 == b%2) return a < b; return a%2 < b%2; }
assert(data.sort!evenGrouping .chunkBy!((a,b) => a%2==b%2) .equal([[2,2],[1,1,3,3]])); ```
As mentioned sorting needs to happen first.
```dlang import std.algorithm; import std.range;
auto data = [3,3,1,1,2,2];
assert(data.sort!((a, b) => a%2 < b%2) .chunkBy!((a,b) => a%2==b%2) .map!(x => x.array.sort) .equal!equal([[2,2],[1,1,3,3]])); ```
In this contrived example I decided it best to run it through a compiler. It was a good thing as I found a difference in behavior. I'll save map
for another day.
Two types of lambda functions are supplied to these functions. One takes a single argument which gets referred to as unary predicate and one that takes two which gets referred to as binary predicate.
When a unary predicate is supplied to chunkBy
it returns a tuple of the quality found and the value. This is an interesting optimization but this overload should live with group
which already has this behavior.
Comments section