Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
What R
users now call piping, popularized by Stefan Milton Bache and Hadley Wickham, is inline function application (this is notationally similar to, but distinct from the powerful interprocess communication and concurrency tool introduced to Unix by Douglas McIlroy in 1973). In object oriented languages this sort of notation for function application has been called “method chaining” since the days of Smalltalk
(~1972). Let’s take a look at method chaining in Python
, in terms of pipe notation.
Let’s work an example using Python
‘s Pandas
package (and classes).
import pandas as pd
data = [['alpha', 'a', 1, 0], ['beta', 'b', 2, 10], ['gamma', 'b', 3, 10]] df = pd.DataFrame(data, columns=['name', 'group', 'value', 'cost']) print(df)
name group value cost 0 alpha a 1 0 1 beta b 2 10 2 gamma b 3 10
Method chaining is when methods return a reference to their host-object (or reference to a replacement for their host-object). This lets us call a sequence of methods one after the other as we show below.
print(df.groupby("group").agg({"value":["max", "min"], "cost":["mean"]}))
value cost max min mean group a 1 1 0 b 3 2 10
This may not be considered legible (especially as it was combined with print()
function notation), so we use a common notation convention and insert a line-break before each method dispatch “.
“. The parenthesis surrounding the whole expression are a common Python
convention to facilitate multi-line expressions.
( df .groupby("group") .agg({"value":["max", "min"], "cost":["mean"]}) .pipe(print) )
value cost max min mean group a 1 1 0 b 3 2 10
Or, to emphasize the similarity to pipes, we can use another convention (that contravenes the PEP8 style guide): end the lines with .\
which is the method dispatch “.
” symbol plus a line continuation mark.
df .\ groupby("group") .\ agg({"value":["max", "min"], "cost":["mean"]}) .\ pipe(print)
value cost max min mean group a 1 1 0 b 3 2 10
The above is just as with the Bizarro Pipe
in R
: the pipe is available as a convention over the existing language syntax. In Python
(for method chaining enabled classes and methods) the glyph “.
” is in fact already a method application operator or pipe (as is the glyph “.\EOL
“, where EOL
denotes the line-break or end of line). With method-chaining conventions the “.
” already is “a pipe” organizing method application form left to right without the need for illegible nesting. In R
the glyph “->.;
” is a function application operator or pipe (which we called the Bizarro Pipe
; the Bizarro Pipe
is a first-rate pipe, faster than other pipes, and interferes less with debugging than other pipes).
Both languages have had this application capability for a very long time. We are using Pandas
and pipe()
as our example, but any package that whose methods return a reference to the object being worked on (or a reference to a replacement object) can be treated as a pipe-able object. If the class further implements one function re-director method (such as pipe()
) then a lot more becomes practical. Here is another example showing how additional named and unnamed arguments can be handled.
def add_delta_to_column(df, colname, delta): df[colname] = df[colname] + delta return df
df .\ pipe(add_delta_to_column, "cost", 5) .\ groupby("group") .\ agg({"value":["max", "min"], "cost":["mean"]}) .\ pipe(print, "DEBUG1", sep = " | ")
value cost max min mean group a 1 1 5 b 3 2 15 | DEBUG1
So depending on your point of view: “piping is poor-persons’s method chaining” or “method chaining is poor-persons’s piping” (taken from the usual quote comparing objects and closures).
If one wants to go further, there are a number of Python
packages adding additional significant piping capabilities (either through notation, operator overloading, or other methods).
- sklearn.pipeline
- Stack overflow notes 1
- Stack overflow notes 2
- dplython
- sspipe
- Tidyverse pipes in Pandas
- chainlearn
And that is piping versus method chaining.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.