ninepints.cohttp://ninepints.co/2018-11-01T06:43:25.876332+00:00The Great Slate2018-10-01T11:28:00+00:002018-11-01T06:43:25.876332+00:00http://ninepints.co/2018/10/great-slate/
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8">
<p>The last two years have been, to put it mildly, politically frustrating. Not a week has gone by without some terrible policy decision or profound ethical lapse (often several of both!) emanating from the current administration. I've had to dial back my news consumption for the sake of my time and my sanity.<br/><br/>With midterm elections approaching, we finally have a chance to do something about it. It goes without saying that you should vote, whatever your political affiliation. It's your duty as a member of this democracy to help us determine the future of the country. But if you live in a state where your favored candidates are overwhelmingly likely to win, as I do, casting your vote might not feel particularly satisfying.<br/><br/>The obvious solution is to work towards changing other people's votes. I recently discovered an organization called <a href="https://techsolidarity.org">Tech Solidarity</a>, and they're funding a slate of thirteen house candidates dubbed the Great Slate for the upcoming midterms. You can learn more about the project, meet the candidates, and donate by visiting <a href="https://techsolidarity.org/resources/great_slate.html">this page</a>. I also encourage you to read <a href="https://sfbay.techsolidarity.org/2017/09/notes_jess_king.htm">this 2017 meeting transcript</a> for a better look at what Tech Solidarity is about. I think they're doing good work that deserves my support, and I hope you feel the same way.<br/><br/>The end of the transcript mentions the importance of early investment. Campaign funding tends to pick up as election day approaches and the closer races become apparent, but it's most needed at the beginning of the election cycle. Intuitively, a strong base is built on early, persistent, ground-level voter engagement, not a deluge of last-minute ads. I wish I'd though about all of this last year, but I figure better late than never.</p><p>All of these candidates have pledged to accept no money from corporations or associated political action committees. They're running campaigns powered by individual donations, which means your dollar really counts.</p>
</div></div></div>
Reflow: Graph-Based Workflows with Checkpointing2018-02-21T19:00:00+00:002018-02-25T01:37:27.471465+00:00http://ninepints.co/2018/02/reflow-graph-based-workflows-with-checkpointing/
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8">
<p>One of my work projects is publicly available! We call it Reflow. Written in Java, it's a library for composing individual units of work into a <a href="https://en.wikipedia.org/wiki/Directed_acyclic_graph">directed acyclic graph</a>. My team has started using it to drive a bunch of our data processing, and now you can try it out too. Source and documentation can be found <a href="https://github.com/tripadvisor/reflow">on GitHub</a>, and as of today, we're also publishing build artifacts to <a href="https://bintray.com/tripadvisor/reflow/reflow">JCenter</a>. </p><p>Once a dependency graph has been defined, Reflow enables you to run it end to end in a single method call, with multiple tasks executing in parallel when possible. And there's more:</p><ul><li>Each task can declare that it will produce some output (database tables, files on local disk, etc.), and those tasks can be skipped when the output is already present.<br/></li><li>Task definitions are extremely flexible—in fact, the only real requirement is that each task be representable with a Java object. If your tasks happen to implement the <code>Runnable</code> interface, it's easy to get them scheduled on an <code>Executor</code> of your choosing, but you can opt to handle scheduling yourself and even schedule tasks outside of the JVM.</li><li>If you do schedule tasks externally, the state of the overall workflow can be serialized even while tasks are running. This allows you to bring down one “coordinator” process and bring up another without missing a beat.</li></ul>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8">
<p>The library was inspired by my team's data handling needs. Every morning, we ingest data from the previous day and kick off an hours-long processing pipeline. It's important that we finish by the end of the day, and things don't always go off without a hitch. The input data is occasionally incorrect or contains unanticipated combinations of values, and such issues must be identified and resolved quickly if we want to meet our deadline.</p><p>One of the easiest ways to save time is to do several things at once. Several of the stages in our pipeline admit parallelism: for example, we need to aggregate a particular data set over multiple independent attributes, meaning each aggregation can begin as soon as the common input data is ready. With Reflow, all we have to do is define those dependencies, and the work is automatically performed in parallel.</p><p>Another easy win is avoiding work altogether. If a piece of input data turns out to be bad, only the pipeline stages downstream of that data should be affected; if a particular stage fails, there's no need to rerun everything upstream when it's been fixed. By resuming intelligently in the wake of a failure, we save both time and resources and the experience is less stressful for everyone.</p><p>“Hang on,” you cry, “it's 2018! Surely someone has done this already?” I did spend some time searching for alternatives, and the closest thing I found was <a href="https://github.com/spotify/flo">this library from Spotify</a>. Java 8 also introduced a <a href="https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/CompletableFuture.html"><code>CompletableFuture</code></a> class for chaining asynchronous computations.</p><p>In contrast to those options, Reflow is geared towards larger-scale tasks that communicate through shared external state. It makes no attempt to handle inter-task data transfer, and the in-memory graph data structure is fairly heavyweight. If you're looking to chain together some ordinary quick-running Java methods, you'll have to handle all the input and output, and the scheduling overhead will be relatively high. But Reflow is especially well-suited to JVM-external tasks, which tend to run longer and work with JVM-external data: We often use it to coordinate jobs on a Hadoop cluster, for example.</p><p>Above all, I've tried to make a library that I enjoy working with, and I hope you like it too.</p>
</div></div></div>
A Statistical View of Regression2018-01-02T16:00:00+00:002018-03-11T22:32:47.333789+00:00http://ninepints.co/2018/01/statistical-view-regression/
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8">
<p>Change in gears! Time for a math post.</p><p>I've been working through various online machine learning courses over the last eighteen months, beginning with the <a href="https://www.coursera.org/learn/machine-learning/">Stanford/Coursera ML course</a> taught by Andrew Ng. It opens with three weeks on linear and logistic regression, covering the structure of the linear/logistic regression models, the specific loss functions involved in each, and how to minimize said functions via gradient descent.</p><p>The lectures are well executed, but I could have gone for more background on the loss functions, which are sort of handed down from above. Where did they come from? Why do they produce good regression coefficients? I think the answers to these questions are pretty neat—there's actually a straightforward statistical interpretation of what's happening.</p>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8">
<h2>Problem Setting</h2><p>We're presented with a collection of data points \((\mathbf x_1, \mathbf x_2, \dotsc, \mathbf x_n), \mathbf x_i \in \mathbb R^m\) and the corresponding value of some dependent variable \(y_i\) for each point. We'd like to model how \(y\) depends on \(\mathbf x\) in order to predict its value for previously-unseen \(\mathbf x\).</p><p>This setting is general enough to capture a wide variety of real-world problems. As a holiday-inspired example, perhaps we want to predict the density of pumpkin pie custard based on the volume of milk used (other ingredient quantities held constant) and the brand (one of <i>Annie's Animals</i>, <i>Bob's Barnyard</i>, or <i>Chuck's Cowpasture</i>). Dairy brand is not a continuous variable, so we'll encode it as three indicator variables which assume the value one (for custards using a particular brand) or zero (for custards using the other two brands). So! Each vector \(\mathbf x_i \in \mathbb R^4\) will represent a single custard, with the first element \(x_i^1\) representing the volume of milk used in liters and the remaining three elements representing the milk brand. Each \(y_i\) will represent the density of the set custard in grams per liter. If we record these values every time we make custard, we'll soon have lots of data that we can use to predict the properties of future custards.</p><p>Data in hand, we're going to model the density of our custard by assuming that density is a linear function of our input variables. Our model will have the form</p>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8 x_scrollable">
<p>\[y = w^0 + x^1 w^1 + \dotsb + x^4 w^4 \]</p>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8">
<p>where \(w^0\) is a bias variable representing the “default” density in the absence of any data. Our task, then, is to find a weight vector \(\mathbf w\) such that for each \(\mathbf x_i\), the vector product \(\mathbf x_i \mathbf w^T\) is close to \(y_i\). (You can think of each \(\mathbf x\) as including an element \(x^0 = 1\) for the sole purpose of being multiplied by the bias. Note that the superscripts here indicate element indices rather than exponents.)<br/></p><p>Just for fun, we're also going to model a discrete dependent variable: whether the surface of our custard cracks during baking. Let \(z_i \in \{0, 1\}\) indicate whether the i-th custard was cracked (\(1\)) or smooth (\(0\)). Predicting the value of \(z\) directly using a weighted sum as we did for density is a little bit awkward. But we can adapt that model by instead predicting the <a href="https://en.wikipedia.org/wiki/Logit">log-odds</a> of a crack. If \(h(\mathbf x, \mathbf w)\) is the hypothesized probability of cracking, our second model will look like</p>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8 x_scrollable">
<p>\[\log \frac {h(\mathbf x, \mathbf w)} {1-h(\mathbf x, \mathbf w)} = \mathbf x \mathbf w^T \]</p>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8">
<p>or equivalently</p>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8 x_scrollable">
<p>\[h(\mathbf x, \mathbf w) = \frac 1 {1 + e^{- \mathbf x \mathbf w^T}} \]</p>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8">
<p>The model should yield probabilities closer to one when \(z_i = 1\) and closer to zero otherwise. Equivalently, when \(z_i = 1\), we want \(\mathbf x_i \mathbf w^T > 0\).<br/></p><p>To differentiate between the weights for the density model and the weights for the crack-probability model, let's call them \(\mathbf w_y\) and \(\mathbf w_z\) respectively. To recap, we want to find \(\mathbf w_y\) and \(\mathbf w_z\) such that \(\mathbf x \mathbf w_y^T\) predicts density and \(\mathbf x \mathbf w_z^T\) predicts the log-odds of a crack. These are examples of linear regression and logistic regression models respectively.</p><h2>Selecting a Model</h2><p>In reality, our models won't be perfect. A density model that's both linear and accurate requires a linear relationship between density and milk volume which obviously doesn't hold in real life. (Think about what happens as our milk volume approaches infinity!) The entire premise of our model is arguably flawed, but we're going to hope that the density-volume relationship is <i>locally</i> linear in the neighborhood of reasonable custard recipes, and try to discover the coefficients governing that local relationship.<br/></p><p>Given that we're not going to nail it, we need a way to choose among imperfect models.</p><p>We do this by defining and then optimizing a <b>loss function</b>, also known as a <b>cost function</b>, which takes in our data and a set of weights and tells us how well we've modeled the data. You can think of a loss function as measuring how many mistakes a model makes: bad mistakes are expensive and drive up loss, while better models that make smaller mistakes result in low loss. Our goal is to minimize the loss function with regard to the weights.</p><p>The linear regression loss function looks like this:</p>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8 x_scrollable">
<p>\[\mathcal L_y(X, \mathbf y, \mathbf w) = \sum_{i=1}^n (\mathbf x_i \mathbf w^T - y_i)^2 = \| X \mathbf w^T- \mathbf y \|_2^2 \]</p>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8">
<p>…where \(X\) is the matrix created by stacking the row-vector data points \(\mathbf x_i\) on top of each other, \(\mathbf y\) is the column vector comprised of individual density measurements \(y_i\), and \(\| \mathbf v \|_2^2\) denotes the squared <a href="https://en.wikipedia.org/wiki/L2_norm">L2 norm</a> of some vector \(\mathbf v\). Looking at the middle of the equation, it's clear that the loss is simply the sum of squared distances between the model's prediction and the observed value (over all prediction-observation pairs). On the right-hand side, this is expressed using a matrix product.<br/></p><p>The logistic regression loss function looks like this:</p>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8 x_scrollable">
<p>\[\begin{equation} \begin{split} \mathcal L_z(X, \mathbf z, \mathbf w) & = -\sum_{i=1}^n \bigl[ z_i \log h(\mathbf x_i, \mathbf w) + (1 - z_i) \log (1 - h(\mathbf x_i, \mathbf w)) \bigr] \\& = -\bigl\| \mathbf z \log h(X, \mathbf w)^T + (1 - \mathbf z) \log (1 - h(X, \mathbf w))^T \bigr\|_1 \end{split} \end{equation} \]</p>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8">
<p>where \(h(X, \mathbf w)\) is our crack probability function from before, applied to all data points at once to produce a vector of results. Note that \(z_i\) and \(1 - z_i\) are opposites in the sense that the former is equal to zero when the latter is equal to one and vice vera. That means that the logistic loss is just a matter of calculating how “off” the model's probability estimate was, taking the negative log of (one minus off-ness), and summing the resulting values over all data points. Large mistakes are penalized more heavily than small mistakes, as in linear regression.<br/></p><p>How can we find weights that minimize these loss functions? Conveniently, both functions are differentiable and convex with regard to the weights, which makes them excellent candidates for optimization algorithms like <a href="https://en.wikipedia.org/wiki/Gradient_descent">gradient descent</a>. In fact, we can be even more efficient in the case of linear regression, which admits a closed form solution.</p><p>Today, though, we're not focused on minimizing loss. We want to know where those loss functions came from in the first place.</p><h2>Loss Function Derivation</h2><p>Setting the loss functions aside for a moment, let's think about our models from a statistical perspective.</p><p>One nice feature of our logistic regression model is that it gives us probabilities rather than an inflexible yes/no answer. It might give one custard a 5% chance of developing cracks, while a second case might be a toss-up. In comparison, our linear regression model is quite rigid in its density estimations. We can make it more flexible by assuming that observed densities include some normally-distributed error. We'll say that each \(y_i\) includes an error term \(\varepsilon\), independently drawn from a normal distribution with zero mean and unknown variance \(\sigma^2\) (which is common to all data points). Symbolically:</p>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8 x_scrollable">
<p>\[ y_i \sim \mathcal N(\mathbf x_i \mathbf w_y^T, \sigma^2) \]</p>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8">
<p>We now have a way to express the probability of individual observations \(y_i\) and \(z_i\) under weights \(\mathbf w_y\) and \(\mathbf w_z\). Multiplying those together, we can calculate \(P(\mathbf y | \mathbf w_y)\) and \(P(\mathbf z | \mathbf w_z)\), the probabilities of the entire series of observations conditioned on the weights.<br/></p><p>What happens if we try to maximize those probabilities?</p>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8 x_scrollable">
<p>\[ \begin{equation} \begin{split} \mathbf w_y^* & = \operatorname*{\arg\!\max}_{\mathbf w_y} P(\mathbf y | \mathbf w_y) \\& = \operatorname*{\arg\!\max}_{\mathbf w_y} \prod_{i=1}^n P(y_i | \mathbf w_y) \\& = \operatorname*{\arg\!\max}_{\mathbf w_y} \prod_{i=1}^n \Biggl[ \frac 1 {\sqrt{2\pi\sigma^2}} \,\exp\Biggl(\frac {-(\mathbf x_i \mathbf w_y^T - y_i)^2} {2\sigma^2}\Biggr) \Biggr] \end{split} \end{equation} \]</p>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8">
<p>Because probabilities are nonnegative, we can take the log of the entire right-hand side of the equation. Maximizing the log of probability is equivalent to maximizing probability and the resulting equation is easier to work with.</p>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8 x_scrollable">
<p>\[ \begin{equation} \begin{split} \mathbf w_y^* & = \operatorname*{\arg\!\max}_{\mathbf w_y} \sum_{i=1}^n \Biggl[ - \log \sqrt{2\pi\sigma^2} - \frac {(\mathbf x_i \mathbf w_y^T - y_i)^2} {2\sigma^2} \Biggr] \\& = \operatorname*{\arg\!\min}_{\mathbf w_y} \sum_{i=1}^n \Biggl[ \log \sqrt{2\pi\sigma^2} + \frac {(\mathbf x_i \mathbf w_y^T - y_i)^2} {2\sigma^2} \Biggr] \\& = \operatorname*{\arg\!\min}_{\mathbf w_y} \sum_{i=1}^n (\mathbf x_i \mathbf w_y^T - y_i)^2 \end{split} \end{equation} \]</p>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8">
<p>Hey, this is the same as minimizing our loss function! Let's see if the same thing happens with the second model.</p>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8 x_scrollable">
<p>\[ \begin{equation} \begin{split} \mathbf w_z^* & = \operatorname*{\arg\!\max}_{\mathbf w_z} P(\mathbf z | \mathbf w_z) \\& = \operatorname*{\arg\!\max}_{\mathbf w_z} \prod_{i=1}^n \bigl[ z_i h(\mathbf x_i, \mathbf w_z) + (1 - z_i)(1 - h(\mathbf x_i, \mathbf w_z)) \bigr] \\& = \operatorname*{\arg\!\max}_{\mathbf w_z} \sum_{i=1}^n \log \bigl[ z_i h(\mathbf x_i, \mathbf w_z) + (1 - z_i)(1 - h(\mathbf x_i, \mathbf w_z)) \bigr] \\& = \operatorname*{\arg\!\max}_{\mathbf w_z} \sum_{i=1}^n \bigl[ z_i \log h(\mathbf x_i, \mathbf w_z) + (1 - z_i) \log(1 - h(\mathbf x_i, \mathbf w_z)) \bigr] \\& = \operatorname*{\arg\!\min}_{\mathbf w_z} - \sum_{i=1}^n \bigl[ z_i \log h(\mathbf x_i, \mathbf w_z) + (1 - z_i) \log(1 - h(\mathbf x_i, \mathbf w_z)) \bigr] \end{split} \end{equation} \]</p>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8">
<p>Nice. It's now clear that our loss functions have a firm basis in statistics: when we minimize the loss functions, we're actually choosing weights that maximize the probability of our observed results. This is known as the <b>maximum-likelihood estimate (MLE)</b> of the weight values.<br/></p><h2>Regularized Regression</h2><p>In scenarios with a large number of features (that is, high-dimensional data points \(\mathbf x\)) and therefore a large number of parameters, but relatively few data points, a common problem is <b>overfitting</b>. This means that the model picks up on details in the training data that are not representative of the distribution from which the data was generated. The model will be quite accurate given points from the training data, but may fail to generalize to previously-unseen inputs.</p><p>As an extreme example, consider what happens when there are more model parameters than data points. If we use only three custards to train our density-estimation model, each made with a quarter-liter of milk from one of the three brands, our input data will look like this:</p>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8 x_scrollable">
<p>\[ X = \begin{bmatrix} 0.25 & 1 & 0 & 0 \\ 0.25 & 0 & 1 & 0 \\ 0.25 & 0 & 0 & 1 \end{bmatrix} \]</p>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8">
<p>…and we can perfectly match that data by setting the brand weights to the observed densities, i.e. \(\mathbf w_y^* = (0, y_1, y_2, y_3)\). This model is terrible—it says that custard density is completely independent of milk volume—but we wouldn't know it based on the stellar training-data performance.<br/></p><p>Even with more data points than parameters, it's possible for regression models to fit too closely to the training data. The problem in a nutshell is that lots of parameters can provide too much modeling flexibility. One way to counteract this is with <b>regularization</b>, which penalizes complicated models having heavier weights in favor of simpler ones, all else being equal. We can express a preference for simpler models by adding a penalty term to our loss functions. Many penalty terms are possible; our penalty term will be based on the L2 norm of the weight vector, a strategy that's variously known as L2 regularization, Tikhonov regularization, or ridge regression.</p>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8 x_scrollable">
<p>\[ \begin{align} \mathcal L_{y\_reg}(X, \mathbf y, \mathbf w, \lambda) & = \mathcal L_y(X, \mathbf y, \mathbf w) + \lambda \| \mathbf w \|_2^2 \\ \mathcal L_{z\_reg}(X, \mathbf z, \mathbf w, \lambda) & = \mathcal L_z(X, \mathbf z, \mathbf w) + \lambda \| \mathbf w \|_2^2 \end{align} \]</p>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8">
<p>By penalizing larger weights based on the norm of the weight vector, we require the model to make a tradeoff between simplicity and training accuracy. The nonnegative hyperparameter \(\lambda\) dictates the “exchange rate” between the two, with larger values favoring simplicity and smaller values favoring accuracy. Values approaching infinity result in a very simple model with zero weight, while \(\lambda = 0\) returns us to unregularized regression.<br/></p><p>But where does regularization fit in our statistical interpretation of regression? Glad you asked. </p><p>Previously, we chose weights to maximize \(P(\mathbf y | \mathbf w_y)\) and \(P(\mathbf z | \mathbf w_z)\), the observation probabilities conditioned on weights. As an alternative, we can try to maximize \(P(\mathbf w_y | \mathbf y)\) and \(P(\mathbf w_z | \mathbf z)\): Out of all the weights that could have produced the data we observed, we want the most probable weights. We can relate these new probabilities to the previous ones using <a href="https://en.wikipedia.org/wiki/Bayes%27_theorem">Bayes' law</a>.</p>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8 x_scrollable">
<p>\[\begin{align} \mathbf w_y^{**} = \operatorname*{\arg\!\max}_{\mathbf w_y} P(\mathbf w_y | \mathbf y) = \operatorname*{\arg\!\max}_{\mathbf w_y} \frac {P(\mathbf y | \mathbf w_y)P(\mathbf w_y)} {P(\mathbf y)} \end{align}\]</p>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8">
<p>The observation probability \(P(\mathbf y)\) is independent of the weights and therefore plays no role in optimization. Note also that when \(P(\mathbf w_y)\) is constant—when we assume all weights are equally likely—we have \(\mathbf w_y^{**} = \mathbf w_y^* \) and we're back to unregularized regression again.</p><p>In contrast, since regularization prefers smaller weights, it can be viewed as an assumption about the prior probability distribution of the weights \(P(\mathbf w_y)\) where smaller weights are more probable. The most probable weight vector taking prior probability into account is known as the <b>maximum a posteriori (MAP) estimate</b>.</p><p>A regularization term of \(\lambda \| \mathbf w \|_2^2\) translates to a prior proportional to \(e^{- \lambda \| \mathbf w \|_2^2}\). This is just a multivariate normal distribution with zero mean and variance \((2\lambda)^{-1} I\), which makes sense—as we increase \(\lambda\), we're narrowing the prior weight distribution that's centered around zero.<br/></p><p>And with that, we're done! We've got a compelling statistical justification for both linear and logistic regression, with or without regularization. None of this background is strictly necessary for performing regression analysis, but it's nice to have some extra intuition for what's going on.</p>
</div></div></div>
Functional Programming with Haskell2017-09-10T23:00:00+00:002018-10-01T08:27:10.709060+00:00http://ninepints.co/2017/09/functional-programming-haskell/
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8">
<p>COMP 211, the first computer science course I ever signed up for, was taught in the Racket programming language. If you haven't heard of it, you're in the majority. You may have heard of functional programming or Lisp; Racket is one of the more obscure members of the Lisp family of functional languages.</p><p>As a college freshman, I had certainly never heard of Racket. All of my programming experience (limited as it was) involved imperative languages, and the professor had his hands full teaching us the basics. At the end of the semester I moved on to 300-level courses taught in Java. The department retired COMP 211 and my knowledge of Racket began to fade.</p><p>From time to time, though, I would come across a discussion of functional programming on the internet. The postings were full of mysterious words like “functor” and “monad”, and I couldn't help but be intrigued. What are monads? They sound pretty cool. I want to be in the monad club.</p><p>So functional programming has been on the back of my mind for a while. This spring, I finally sat down to learn it, settling on the Haskell language. I was prepared to slog through lots of esotericism, but I found Haskell quite accessible. In fact, I recommend it: It's given me a fresh perspective on the day-to-day problems we tackle at work, and though I'm hardly an expert, I'm certain it's made me a better developer.</p><p>Here are highlights of my experience. It turns out that monads have been thoroughly covered by other people—we've got <a href="https://blog.plover.com/prog/burritos.html">blog posts</a> about <a href="https://byorgey.wordpress.com/2009/01/12/abstraction-intuition-and-the-monad-tutorial-fallacy/">blog posts</a> about blog posts about monads—but I've put together a walkthrough of two other Haskell features that are powerful and easy to understand.</p>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8">
<h2>Why Haskell?</h2><p>Actually, I'm going to start with a discussion of Java, which is more widely known and gives us some useful context. This is going to be old hat for lots of you, but bear with me!</p><p>The Java language is rooted in imperative, object-oriented programming. Code is expressed as a sequence of commands, and the commands are bundled into <b>methods</b> that can be individually invoked. Pieces of program state are bundled into <b>objects</b>. Methods are attached to objects, accept objects as input, and return objects as output, and in this way the objects can interact with each other.</p><p>As a quick illustration, let's tackle an example problem. Given a list of strings, each representing an integer, we'd like to parse the integers out of the strings and put them in another list, but only if they're greater than zero. An imperative solution might look like this:</p>
</div></div></div>
<div class="parallax_section"><div class="col_container alignleft"><div class="col8 x_scrollable bg_secondary">
<pre><code class="language-java"><span class="n">List</span><span class="o"><</span><span class="n">Integer</span><span class="o">></span> <span class="nf">parsePositiveInts</span><span class="p">(</span><span class="n">List</span><span class="o"><</span><span class="n">String</span><span class="o">></span> <span class="n">strings</span><span class="p">)</span><br/><span class="p">{</span><br/> <span class="n">List</span><span class="o"><</span><span class="n">Integer</span><span class="o">></span> <span class="n">ints</span> <span class="o">=</span> <span class="k">new</span> <span class="n">ArrayList</span><span class="o"><></span><span class="p">();</span><br/> <span class="n">Iterator</span><span class="o"><</span><span class="n">String</span><span class="o">></span> <span class="n">iter</span> <span class="o">=</span> <span class="n">strings</span><span class="p">.</span><span class="na">iterator</span><span class="p">();</span><br/><br/> <span class="k">while</span> <span class="p">(</span><span class="n">iter</span><span class="p">.</span><span class="na">hasNext</span><span class="p">())</span><br/> <span class="p">{</span><br/> <span class="n">String</span> <span class="n">string</span> <span class="o">=</span> <span class="n">iter</span><span class="p">.</span><span class="na">next</span><span class="p">();</span><br/> <span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="n">Integer</span><span class="p">.</span><span class="na">parseInt</span><span class="p">(</span><span class="n">string</span><span class="p">);</span><br/><br/> <span class="k">if</span> <span class="p">(</span><span class="n">i</span> <span class="o">></span> <span class="mi">0</span><span class="p">)</span><br/> <span class="p">{</span><br/> <span class="n">ints</span><span class="p">.</span><span class="na">add</span><span class="p">(</span><span class="n">i</span><span class="p">);</span><br/> <span class="p">}</span><br/> <span class="p">}</span><br/><br/> <span class="k">return</span> <span class="n">ints</span><span class="p">;</span><br/><span class="p">}</span><br/></code></pre>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8">
<p>We've written a single method that accepts a <code>List</code> object containing strings and returns another <code>List</code> containing integers. We first call the <code>iterator()</code> method of the input list, retrieving an <code>Iterator</code> object used to iterate over the list's contents. With repeated calls to the iterator's <code>next()</code> method, we retrieve individual strings, converting each one to an integer. The integers are conditionally added to a new <code>ArrayList</code> using its <code>add()</code> method.</p><p>It's a trivial example, but similar logic appears time after time in real applications. We often find ourselves applying a transformation to each element in a collection or filtering the contents of a collection based on various criteria. In the example above, our transformation and filtration predicate are embedded in the boilerplate iteration code. What if we could write the first two without the latter?</p><p>Functional programming is a programming paradigm that attempts to solve this problem. It's built around <b>higher-order functions</b>: functions that accept or return other functions. The poster children, available across many languages, are <code>map</code> and <code>filter</code>. <code>map</code> applies a transformation to a collection or stream of elements, while <code>filter</code> discards elements based on a predicate. That's exactly what we wanted to do in the example above.</p><p>With the 2014 release of Java 8, Oracle added some functionally-inspired features to the language. They formalized the concept of a “functional interface”, which is any ordinary Java interface that has a single method. A functional interface represents an implementation of its method, and nothing more or less. Several new functional interfaces were introduced in the <code>java.util.function</code> package.</p><p>Among those interfaces is <a href="https://docs.oracle.com/javase/8/docs/api/java/util/function/ToIntFunction.html"><code>ToIntFunction</code></a>, which represents a transformation from an object to an integer primitive. Our integer-parsing transformation could be implemented like so:</p>
</div></div></div>
<div class="parallax_section"><div class="col_container alignleft"><div class="col8 x_scrollable bg_secondary">
<pre><code class="language-java"><span class="n">ToIntFunction</span><span class="o"><</span><span class="n">String</span><span class="o">></span> <span class="n">parseInt</span> <span class="o">=</span> <span class="k">new</span> <span class="n">ToIntFunction</span><span class="o"><></span><span class="p">()</span><br/><span class="p">{</span><br/> <span class="nd">@Override</span><br/> <span class="kd">public</span> <span class="kt">int</span> <span class="nf">applyAsInt</span><span class="p">(</span><span class="n">String</span> <span class="n">s</span><span class="p">)</span><br/> <span class="p">{</span><br/> <span class="k">return</span> <span class="n">Integer</span><span class="p">.</span><span class="na">parseInt</span><span class="p">(</span><span class="n">s</span><span class="p">);</span><br/> <span class="p">}</span><br/><span class="p">};</span><br/></code></pre>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8">
<p>Our integer predicate could be implemented as an <a href="https://docs.oracle.com/javase/8/docs/api/java/util/function/IntPredicate.html"><code>IntPredicate</code></a>:</p>
</div></div></div>
<div class="parallax_section"><div class="col_container alignleft"><div class="col8 x_scrollable bg_secondary">
<pre><code class="language-java"><span class="n">IntPredicate</span> <span class="n">isPositive</span> <span class="o">=</span> <span class="k">new</span> <span class="n">IntPredicate</span><span class="p">()</span><br/><span class="p">{</span><br/> <span class="nd">@Override</span><br/> <span class="kd">public</span> <span class="kt">boolean</span> <span class="nf">test</span><span class="p">(</span><span class="kt">int</span> <span class="n">i</span><span class="p">)</span><br/> <span class="p">{</span><br/> <span class="k">return</span> <span class="n">i</span> <span class="o">></span> <span class="mi">0</span><span class="p">;</span><br/> <span class="p">}</span><br/><span class="p">};</span><br/></code></pre>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8">
<p>All of this had been possible before—indeed, some libraries had included functional interfaces, described as such, for years—but the version 8 release was a categorical endorsement of methods as a first-class entity. To this end, it introduced concise syntax for declaring anonymous functional interface implementations:</p>
</div></div></div>
<div class="parallax_section"><div class="col_container alignleft"><div class="col8 x_scrollable bg_secondary">
<pre><code class="language-java"><span class="n">IntFunction</span><span class="o"><</span><span class="n">String</span><span class="o">></span> <span class="n">parseInt</span> <span class="o">=</span> <span class="n">Integer</span><span class="p">::</span><span class="n">parseInt</span><span class="p">;</span><br/><span class="n">IntPredicate</span> <span class="n">isPositive</span> <span class="o">=</span> <span class="n">i</span> <span class="o">-></span> <span class="n">i</span> <span class="o">></span> <span class="mi">0</span><span class="p">;</span><br/></code></pre>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8">
<p>The new syntax was complemented by a slew of standard library additions written in the functional style. The <a href="https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html"><code>Stream</code></a> interface available in version 8 allows streams of objects to be transformed, filtered, and generally manipulated every which way. When we replace the boilerplate in our example code with a stream, it becomes shorter, more clear, and more obviously correct.</p>
</div></div></div>
<div class="parallax_section"><div class="col_container alignleft"><div class="col8 x_scrollable bg_secondary">
<pre><code class="language-java"><span class="n">List</span><span class="o"><</span><span class="n">Integer</span><span class="o">></span> <span class="nf">parsePositiveInts</span><span class="p">(</span><span class="n">List</span><span class="o"><</span><span class="n">String</span><span class="o">></span> <span class="n">strings</span><span class="p">)</span><br/><span class="p">{</span><br/> <span class="k">return</span> <span class="n">strings</span><span class="p">.</span><span class="na">stream</span><span class="p">()</span><br/> <span class="p">.</span><span class="na">mapToInt</span><span class="p">(</span><span class="n">Integer</span><span class="p">::</span><span class="n">parseInt</span><span class="p">)</span><br/> <span class="p">.</span><span class="na">filter</span><span class="p">(</span><span class="n">i</span> <span class="o">-></span> <span class="n">i</span> <span class="o">></span> <span class="mi">0</span><span class="p">)</span><br/> <span class="p">.</span><span class="na">boxed</span><span class="p">().</span><span class="na">collect</span><span class="p">(</span><span class="n">toList</span><span class="p">());</span><br/><span class="p">}</span><br/></code></pre>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8">
<p>This is much nicer than the imperative approach, but there are still some pitfalls to be aware of. The <code>Stream</code> documentation recommends that user-provided functions be “non-interfering and stateless”, meaning they should not perturb program state (in particular, the stream's source) and they should not be affected by program state. Violating this assumption of statelessness can lead to confusing or nondeterministic behavior.</p>
</div></div></div>
<div class="parallax_section"><div class="col_container alignleft"><div class="col8 x_scrollable bg_secondary">
<pre><code class="language-java"><span class="kd">class</span> <span class="nc">StatefulPredicate</span> <span class="kd">implements</span> <span class="n">IntPredicate</span><br/><span class="p">{</span><br/> <span class="kd">private</span> <span class="kt">int</span> <span class="n">maxObservedInt</span><span class="p">;</span><br/> <span class="kd">private</span> <span class="kt">boolean</span> <span class="n">observedAny</span><span class="p">;</span><br/><br/> <span class="nd">@Override</span><br/> <span class="kd">public</span> <span class="kt">boolean</span> <span class="nf">test</span><span class="p">(</span><span class="kt">int</span> <span class="n">i</span><span class="p">)</span><br/> <span class="p">{</span><br/> <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">observedAny</span> <span class="o">||</span> <span class="n">i</span> <span class="o">></span> <span class="n">maxObservedInt</span><span class="p">)</span><br/> <span class="p">{</span><br/> <span class="n">maxObservedInt</span> <span class="o">=</span> <span class="n">i</span><span class="p">;</span><br/> <span class="n">observedAny</span> <span class="o">=</span> <span class="kc">true</span><span class="p">;</span><br/> <span class="k">return</span> <span class="kc">true</span><span class="p">;</span><br/> <span class="p">}</span><br/> <span class="k">return</span> <span class="kc">false</span><span class="p">;</span><br/> <span class="p">}</span><br/><span class="p">}</span><br/><br/><span class="n">List</span><span class="o"><</span><span class="n">Integer</span><span class="o">></span> <span class="nf">parseAscendingInts</span><span class="p">(</span><span class="n">List</span><span class="o"><</span><span class="n">String</span><span class="o">></span> <span class="n">strings</span><span class="p">)</span><br/><span class="p">{</span><br/> <span class="k">return</span> <span class="n">strings</span><span class="p">.</span><span class="na">stream</span><span class="p">()</span><br/> <span class="p">.</span><span class="na">mapToInt</span><span class="p">(</span><span class="n">Integer</span><span class="p">::</span><span class="n">parseInt</span><span class="p">)</span><br/> <span class="p">.</span><span class="na">filter</span><span class="p">(</span><span class="k">new</span> <span class="n">StatefulPredicate</span><span class="p">())</span><br/> <span class="p">.</span><span class="na">boxed</span><span class="p">().</span><span class="na">collect</span><span class="p">(</span><span class="n">toList</span><span class="p">());</span><br/><span class="p">}</span><br/></code></pre>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8">
<p>Above is an example of a stateful predicate that only accepts integers greater than those it has already seen. In the method <code>parseAscendingInts</code>, we've replaced our old predicate with the new, stateful one. Passing in the list <code>["1", "2"]</code> will give us <code>[1, 2]</code> as before. However, passing <code>["2", "1"]</code> will yield <code>[2]</code> because the first integer is greater than the second. The output of this code depends on the order in which integers are passed to the predicate, and that's often problematic. If the input type is changed from <code>List</code> to <code>Set</code>, the input iteration order may not be well-defined. If we call <code>strings.parallelStream()</code> instead of <code>strings.stream()</code>, the result becomes nondeterministic.</p><p>We've run in to trouble because it's tempting to think of a <code>Predicate</code> as a static set of criteria that a subject might satisfy (or not). But in reality, we have nothing more than a method that accepts an object and returns a boolean, and it's free to return whichever boolean it likes. Two consecutive invocations of a predicate on a particular subject can produce two conflicting answers, and it's not generally possible to catch the problem in advance.</p><p>Clearly, it's easier to reason about higher-order functions like <code>filter</code> when we don't have to worry about state, but Java has no way of requiring well-behaved functional interface implementations.<br/></p><h2>Haskell Can Fix That</h2><p>Haskell tackles the problem head-on by declaring that all functions are stateless, full stop. The result of a Haskell function must depend only on the input. Because it restricts our ability to mutate state, the compiler finds itself with an unusual degree of freedom: Stateless computations can be delayed or reordered with no effect on the correctness of the overall program. In fact, Haskell will avoid doing any computation until it absolutely has to (e.g. when a result must be printed to standard output).</p><p>Sounds promising! So what does it look like? Let's take a crash course in Haskell, starting with a variable definition:</p>
</div></div></div>
<div class="parallax_section"><div class="col_container alignleft"><div class="col8 x_scrollable bg_secondary">
<pre><code class="language-haskell"><span class="nf">foo</span> <span class="ow">::</span> <span class="kt">Int</span><br/><span class="nf">foo</span> <span class="ow">=</span> <span class="mi">2</span><br/></code></pre>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8">
<p>Here's the equivalent in Java:</p>
</div></div></div>
<div class="parallax_section"><div class="col_container alignleft"><div class="col8 x_scrollable bg_secondary">
<pre><code class="language-java"><span class="kt">int</span> <span class="n">foo</span><span class="p">;</span><br/><span class="n">foo</span> <span class="o">=</span> <span class="mi">2</span><span class="p">;</span><br/></code></pre>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8">
<p>Just as we can combine declaration and assignment in Java (<code>int foo = 2</code>), we can combine them in Haskell (<code>foo = 2 :: Int</code>), but the Haskell declaration is usually kept separate by convention.</p><p>On the Java side of things, we could follow our assignment with a second assignment <code>foo = 3</code> that overwrites the initial value. When we try the same thing in Haskell, the compiler puts its foot down, giving us the error “Multiple declarations of ‘foo’”. Changing the value of <code>foo</code> would constitute a change in our program's state, and we've agreed not to do that. There's an obvious sticking point here: If we can't mutate values, how do we accomplish anything? Going down that path is interesting, but complicated enough to merit a post of its own. Besides, it turns out we can do quite a lot without state.</p><p>Haskell is statically typed, but includes powerful type inference, so not every variable requires a type declaration. We can define a second variable <code>bar = foo + 1</code> and the compiler will know that it has type <code>Int</code>, the same as <code>foo</code>.</p><p>Lots of types we've seen in Java have analogues in Haskell:</p>
</div></div></div>
<div class="parallax_section"><div class="col_container alignleft"><div class="col8 x_scrollable bg_secondary">
<pre><code class="language-haskell"><span class="nf">myBoolean</span> <span class="ow">::</span> <span class="kt">Bool</span><br/><span class="nf">myBoolean</span> <span class="ow">=</span> <span class="kt">True</span><br/><br/><span class="nf">myChar</span> <span class="ow">::</span> <span class="kt">Char</span><br/><span class="nf">myChar</span> <span class="ow">=</span> <span class="sc">'a'</span><br/><br/><span class="nf">myString</span> <span class="ow">::</span> <span class="kt">String</span><br/><span class="nf">myString</span> <span class="ow">=</span> <span class="s">"hello"</span><br/></code></pre>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8">
<p>There are also some types that we haven't seen before.</p><h3>Lists</h3><p>A list is homogenous ordered collection of zero or more items. In contrast to Java, all of the items must be of the same type. For example, you can't have a list containing both a <code>String</code> and an <code>Int</code>. The type of a list of <code>Int</code> is written <code>[Int]</code>.</p><p>Lists are defined recursively. A list is either the empty list, denoted <code>[]</code>, or an item <code>x</code> prepended to another list <code>xs</code> using a colon: <code>x:xs</code>. Note that this implies a singly-linked list implementation.</p><p>When defining multi-element lists, we can use some syntactic sugar, separating the elements with commas and wrapping the whole thing in square braces. <code>[1,2,3,4]</code> is equivalent to <code>1:2:3:4:[]</code>. Strings are actually another example of syntactic sugar—behind the scenes, they're just lists of characters. The <code>String</code> type is an alias for <code>[Char]</code> and the string <code>"hey"</code> is equivalent to <code>'h':'e':'y':[]</code>.</p><h3>Tuples</h3><p>A tuple is a heterogenous ordered collection of a <i>fixed</i> number of items. Whereas lists can contain any number of items of a single type, tuples contain a specific number of items with specific but potentially different types. For example, the type <code>(Int, String)</code> describes a 2-tuple, also referred to as a pair. The first element of the pair will be an integer and the second a string. We can create a tuple by separating elements with commas and wrapping the whole thing in parentheses, as in <code>(4, "oy")</code>.</p><p>The 0-tuple is called the unit, and both the type and value are written <code>()</code>.</p><h3>Functions</h3><p>Functions are essential in Haskell (as in any functional language). Since a Haskell function can't affect or be affected by program state, it must return the same output given the same inputs, making it closer to a function in the mathematical sense of the word. Similarly, the evaluation of a Haskell function can't cause any observable side effects such as performing I/O. Functions with these properties are known as <b>pure functions</b>.</p><p>To get a feeling for how functions work in Haskell, let's look at an example. The <code>head</code> function accepts a list and returns the first element of that list. (Taking the head of an empty list will cause your program to explode at run time, so it's important to only pass lists that are known to be nonempty.) <code>head</code> has type <code>[a] -> a</code>. The <code>a</code> here is a <b>type variable</b>, which is sort of a placeholder that gets replaced with a concrete type when the function is used in context. The <code>-></code> indicates a function which accepts the type on the left and returns the type on the right. So the overall type <code>[a] -> a</code> tells us that <code>head</code> accepts a list of any type and returns a single element of that type, as expected.</p><p>Functions are applied using prefix notation. We write the function name followed by a space and then the parameters:</p>
</div></div></div>
<div class="parallax_section"><div class="col_container alignleft"><div class="col8 x_scrollable bg_secondary">
<pre><code class="language-haskell"><span class="nf">head</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">]</span><br/></code></pre>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8">
<p>The above expression evaluates to <code>1</code>.</p><p>Function application is left-associative. If we have a list of lists of numbers and we want to retrieve the first number from the first inner list, writing <code>head head [[1]]</code> won't work. We're applying the function <code>head</code> to itself, which doesn't make any sense! We have to use parentheses to apply <code>head</code> to the inner list: <code>head (head [[1]])</code>.</p><p>A few functions are special in that they use infix notation by default. We've actually seen an example already in the <code>:</code> function that's used to build lists. Given an element and a list of elements of the same type, <code>:</code> prepends the first element to the list, so it has type <code>a -> [a] -> [a]</code>. Other examples of infix functions include the familiar arithmetic functions <code>+</code>, <code>-</code>, <code>*</code>, and <code>/</code>; also the equality and inequality functions <code>==</code>, <code>/=</code>, <code><=</code>, <code>>=</code>, <code><</code>, and <code>></code>.</p><p>The arithmetic functions have the precedence you would expect, so <code>1 + 2 * 3</code> yields <code>7</code> and not <code>9</code>. Prefix function application has a higher precedence than any infix applications, meaning <code>head [1] + head [2]</code> is equivalent to <code>(head [1]) + (head [2])</code>.</p><p>Let's look at some more examples of functions that operate on lists and tuples.</p><ul><li><code>tail</code> is the counterpart to <code>head</code>, accepting a list and returning everything except the first element. Empty lists will cause a run time failure here too. It has type <code>[a] -> [a]</code>.</li><li><code>fst</code> and <code>snd</code> accept a 2-tuple and return the first and second tuple elements respectively. <code>fst</code> has type <code>(a, b) -> a</code> and <code>snd</code> type <code>(a, b) -> b</code>.</li><li><code>zip</code> takes two lists and combines their elements pairwise into tuples, stopping when it hits the end of the shorter list. For example, <code>zip [1,2,3] "abc"</code> will give us <code>[(1, 'a'), (2, 'b'), (3, 'c')]</code>. The type of <code>zip</code> is <code>[a] -> [b] -> (a, b)</code>.</li></ul><p>Finally, we can write functions of our own. The syntax is similar to function application. Check out this groundbreaking code:</p>
</div></div></div>
<div class="parallax_section"><div class="col_container alignleft"><div class="col8 x_scrollable bg_secondary">
<pre><code class="language-haskell"><span class="nf">increment</span> <span class="ow">::</span> <span class="kt">Int</span> <span class="ow">-></span> <span class="kt">Int</span><br/><span class="nf">increment</span> <span class="n">x</span> <span class="ow">=</span> <span class="n">x</span> <span class="o">+</span> <span class="mi">1</span><br/></code></pre>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8">
<p>We can use our new function to add one to any integer. <code>increment 2</code> will evaluate to <code>3</code>.</p><p>With the preliminaries covered, let's move on to the cool stuff.</p><h2>Standout Feature #1: Pattern Matching</h2><p>In many languages, defining a function parameter involves a type and a name and nothing else. The parameter can assume any of the possible values of that type, and it's up to the function implementation to figure out what sort of input it's dealing with.</p><p>In Haskell, we provide a type and also one or more <b>patterns</b> that the parameter value might match. Suppose our parameter type is <code>[Int]</code>. The simplest pattern is just a variable name: <code>foo</code> used as a pattern will match any input list, assigning it the name <code>foo</code>. But instead of a variable name, we can also write <code>[]</code>, a pattern that matches only the empty list. We can even write <code>[a,b,c]</code>, which will match any list containing three elements and assign the names <code>a</code>, <code>b</code>, and <code>c</code> to those elements. When a pattern matches, it's used to deconstruct the input value and assign variable names to its components.</p><p>This is best illustrated with another example. Here's a function that accepts a list and returns a string describing how long the list was.</p>
</div></div></div>
<div class="parallax_section"><div class="col_container alignleft"><div class="col8 x_scrollable bg_secondary">
<pre><code class="language-haskell"><span class="nf">describeLength</span> <span class="ow">::</span> <span class="p">[</span><span class="n">a</span><span class="p">]</span> <span class="ow">-></span> <span class="kt">String</span><br/><span class="nf">describeLength</span> <span class="kt">[]</span> <span class="ow">=</span> <span class="s">"An empty list!"</span><br/><span class="nf">describeLength</span> <span class="p">[</span><span class="n">a</span><span class="p">]</span> <span class="ow">=</span> <span class="s">"One element."</span><br/><span class="nf">describeLength</span> <span class="p">[</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">]</span> <span class="ow">=</span> <span class="s">"Two elements."</span><br/><span class="nf">describeLength</span> <span class="n">xs</span> <span class="ow">=</span> <span class="s">"Three or more elements."</span><br/></code></pre>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8">
<p>Without pattern matching, this would involve bunch of if-statements, but the patterns let us clean things up considerably. As you can see, we get to provide a different function implementation for each set of patterns. Patterns are matched from top to bottom, so the most general pattern should come last. If we moved our last line before the empty-list case, our function would always return <code>"Three or more elements."</code></p><p>What happens if none of the patterns match? We get a run time failure. It's bad practice to write functions that are undefined for some inputs, so the last pattern in a function declaration is often a catch-all.</p><p>In our example, we don't really care what the list elements are, only that they exist. We can make this explicit by using an underscore in our pattern, which does the same thing as a variable name pattern-wise but doesn't extract the matched component.</p>
</div></div></div>
<div class="parallax_section"><div class="col_container alignleft"><div class="col8 x_scrollable bg_secondary">
<pre><code class="language-haskell"><span class="nf">describeLength</span> <span class="kt">[]</span> <span class="ow">=</span> <span class="s">"An empty list!"</span><br/><span class="nf">describeLength</span> <span class="p">[</span><span class="kr">_</span><span class="p">]</span> <span class="ow">=</span> <span class="s">"One element."</span><br/><span class="nf">describeLength</span> <span class="p">[</span><span class="kr">_</span><span class="p">,</span> <span class="kr">_</span><span class="p">]</span> <span class="ow">=</span> <span class="s">"Two elements."</span><br/><span class="nf">describeLength</span> <span class="kr">_</span> <span class="ow">=</span> <span class="s">"Three or more elements."</span><br/></code></pre>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8">
<p>Just to make sure we've got a handle on this, let's work through a second example. We're going to write a function that reverses lists. Because lists are singly-linked, the best approach is to make one pass over the original list and construct the reversed version as we go. Each time we visit an element, we'll add that element to the front of the growing reversed list. I like to visualize this as two stacks of books: imagine we're shifting books one at a time from the first stack to the second. It's not a perfect analogy because the first stack of books is immutable, but hopefully you get the idea.</p><p>The implementation below consists of two functions, but the first one does all the work. We can see from its type <code>[a] -> [a] -> [a]</code> that it accepts two lists and returns a third. The first parameter is the input list, or what's left of it. These are the elements that we have yet to transfer to the reversed list. The second parameter <code>acc</code> is the reversed list that we're constructing as we go. (The name "acc" is short for "accumulator", since we're accumulating a result.)</p><p>In an imperative language, we might loop over the input list, assigning each element in turn to a shared variable. But functions in Haskell are stateless, and the language has no concept of a loop. Instead, we turn to recursion.<br/></p>
</div></div></div>
<div class="parallax_section"><div class="col_container alignleft"><div class="col8 x_scrollable bg_secondary">
<pre><code class="language-haskell"><span class="nf">myReverseHelper</span> <span class="ow">::</span> <span class="p">[</span><span class="n">a</span><span class="p">]</span> <span class="ow">-></span> <span class="p">[</span><span class="n">a</span><span class="p">]</span> <span class="ow">-></span> <span class="p">[</span><span class="n">a</span><span class="p">]</span><br/><span class="nf">myReverseHelper</span> <span class="kt">[]</span> <span class="n">acc</span> <span class="ow">=</span> <span class="n">acc</span><br/><span class="nf">myReverseHelper</span> <span class="p">(</span><span class="n">x</span><span class="kt">:</span><span class="n">xs</span><span class="p">)</span> <span class="n">acc</span> <span class="ow">=</span> <span class="n">myReverseHelper</span> <span class="n">xs</span> <span class="p">(</span><span class="n">x</span><span class="kt">:</span><span class="n">acc</span><span class="p">)</span><br/><br/><span class="nf">myReverse</span> <span class="ow">::</span> <span class="p">[</span><span class="n">a</span><span class="p">]</span> <span class="ow">-></span> <span class="p">[</span><span class="n">a</span><span class="p">]</span><br/><span class="nf">myReverse</span> <span class="n">xs</span> <span class="ow">=</span> <span class="n">myReverseHelper</span> <span class="n">xs</span> <span class="kt">[]</span><br/></code></pre>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8">
<p>The first line of <code>myReverseHelper</code> defines our base case, using pattern matching to identify when the input list is empty. If we're out of input, we must be finished constructing the reverse list, so our result is simply the accumulator.</p><p>The second line handles the recursive case. We know the input list is not empty; therefore, it must have the form <code>x:xs</code>, where <code>x</code> is the first element of the list and <code>xs</code> is the remainder. We use pattern matching to extract both components, then immediately recurse, passing <code>xs</code> in place of the original input and <code>x:acc</code> as the accumulator value. For each <code>x</code> in the original list, <code>x</code> is prepended to the result.</p><p>Last, we wrap <code>myReverseHelper</code> in a nicer interface by defining <code>myReverse</code>, which just delegates to the helper function with an empty list as the second parameter.</p><h2>Standout Feature #2: Partial Function Application</h2><p>We made a big deal out of higher-order functions earlier, so let's see what our friends <code>map</code> and <code>filter</code> look like in Haskell.</p><ul><li><code>map</code> takes a function and a list and applies the function to each list element, returning a new list with the results. It has type <code>(a -> b) -> [a] -> [b]</code>. Note the parentheses—this is not the same type as <code>a -> b -> [a] -> [b]</code>. The first type represents a two-parameter function (where the first parameter is itself a function) and the second represents a three-parameter function.</li><li><code>filter</code> takes a predicate and a list and removes any elements that don't satisfy the predicate, returning a new list. It has type <code>(a -> Bool) -> [a] -> [a]</code>.</li></ul><p>There's one more thing we need to recreate the Java example, and that's a way to parse integers. We'll use the <code>read</code> function, which, without going in to too much detail, is for parsing the string representation of things. <code>read</code> determines what kind of thing it should be parsing based on the (possibly inferred) result type.</p><p>Here's a Haskell version of <code>parsePositiveInts</code>:</p>
</div></div></div>
<div class="parallax_section"><div class="col_container alignleft"><div class="col8 x_scrollable bg_secondary">
<pre><code class="language-haskell"><span class="nf">isPositive</span> <span class="ow">::</span> <span class="kt">Int</span> <span class="ow">-></span> <span class="kt">Bool</span><br/><span class="nf">isPositive</span> <span class="n">x</span> <span class="ow">=</span> <span class="n">x</span> <span class="o">></span> <span class="mi">0</span><br/><br/><span class="nf">parsePositiveInts</span> <span class="ow">::</span> <span class="p">[</span><span class="kt">String</span><span class="p">]</span> <span class="ow">-></span> <span class="p">[</span><span class="kt">Int</span><span class="p">]</span><br/><span class="nf">parsePositiveInts</span> <span class="n">strings</span> <span class="ow">=</span> <span class="n">filter</span> <span class="n">isPositive</span> <span class="p">(</span><span class="n">map</span> <span class="n">read</span> <span class="n">strings</span><span class="p">)</span><br/></code></pre>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8">
<p>One thing that jumps out is that our predicate definition is kind of verbose. In Java, we just wrote <code>x -> x > 0</code> and let the compiler figure out all the types for us. We can also omit the type here (i.e. remove the first line) and the compiler will infer it. In fact, we can do even better.</p><p>Let's take a close look at the type of <code>map</code> function, which is <code>(a -> b) -> [a] -> [b]</code>. The return type <code>[b]</code> is separated from the parameters by a <code>-></code> token, but interestingly, there's also a <code>-></code> between the first and second parameters. What should we make of that? Do both usages mean the same thing? We know that <code>-></code> indicates a function, but we know it can't be left-associative: <code>(a -> b) -> [a] -> [b]</code> and <code>a -> b -> [a] -> [b]</code> are different types.</p><p>It turns out that <code>-></code> is right-associative, and <code>(a -> b) -> [a] -> [b]</code> is equivalent to <code>(a -> b) -> ([a] -> [b])</code>. In other words, <code>map</code> isn't really a two-parameter function, but a single-parameter function that returns another function. Recall that function application is left-associative. In the expression below, we're mapping the <code>increment</code> function we wrote earlier over a list:</p>
</div></div></div>
<div class="parallax_section"><div class="col_container alignleft"><div class="col8 x_scrollable bg_secondary">
<pre><code class="language-haskell"><span class="nf">map</span> <span class="n">increment</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span><br/></code></pre>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8">
<p>Because <code>map</code> is secretly a single-parameter function, what this actually does is apply <code>map</code> to <code>increment</code>, and then apply the resulting function to the list. We could put parentheses around <code>map increment</code> without changing the meaning of the overall expression.</p><p>That raises the question of whether <code>map increment</code> can stand on its own. It turns out it can!</p>
</div></div></div>
<div class="parallax_section"><div class="col_container alignleft"><div class="col8 x_scrollable bg_secondary">
<pre><code class="language-haskell"><span class="nf">mapIncrement</span> <span class="ow">::</span> <span class="p">[</span><span class="kt">Int</span><span class="p">]</span> <span class="ow">-></span> <span class="p">[</span><span class="kt">Int</span><span class="p">]</span><br/><span class="nf">mapIncrement</span> <span class="ow">=</span> <span class="n">map</span> <span class="n">increment</span><br/></code></pre>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8">
<p>If we stop after the first parameter, we get a function that accepts a list of integers and returns a new list with every integer incremented. Neat! Every “multi-parameter” function in Haskell can be <b>partially applied</b> by omitting parameters, and it leads to extremely concise code.</p><p>Infix functions can be partially applied as well. Returning to our integer-parsing example, we can move the predicate inside the body of the main function.</p>
</div></div></div>
<div class="parallax_section"><div class="col_container alignleft"><div class="col8 x_scrollable bg_secondary">
<pre><code class="language-haskell"><span class="nf">parsePositiveInts</span> <span class="ow">::</span> <span class="p">[</span><span class="kt">String</span><span class="p">]</span> <span class="ow">-></span> <span class="p">[</span><span class="kt">Int</span><span class="p">]</span><br/><span class="nf">parsePositiveInts</span> <span class="n">strings</span> <span class="ow">=</span> <span class="n">filter</span> <span class="p">(</span><span class="o">></span> <span class="mi">0</span><span class="p">)</span> <span class="p">(</span><span class="n">map</span> <span class="n">read</span> <span class="n">strings</span><span class="p">)</span><br/></code></pre>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8">
<p>Finally, there's a third higher-order function that allows partial application to really shine. The <code>.</code> function (that's a period character) is an infix function that performs function composition. Its type, which is <code>(b -> c) -> (a -> b) -> a -> c</code>, tells us all we need to know about its behavior: Given a function that turns Bs into Cs and a function that turns As into Bs, it smashes the two together to produce a function that turns As into Cs.</p><p>Function composition lets us do this:</p>
</div></div></div>
<div class="parallax_section"><div class="col_container alignleft"><div class="col8 x_scrollable bg_secondary">
<pre><code class="language-haskell"><span class="nf">parsePositiveInts</span> <span class="ow">::</span> <span class="p">[</span><span class="kt">String</span><span class="p">]</span> <span class="ow">-></span> <span class="p">[</span><span class="kt">Int</span><span class="p">]</span><br/><span class="nf">parsePositiveInts</span> <span class="ow">=</span> <span class="n">filter</span> <span class="p">(</span><span class="o">></span> <span class="mi">0</span><span class="p">)</span> <span class="o">.</span> <span class="n">map</span> <span class="n">read</span><br/></code></pre>
</div></div></div>
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8">
<p>In its most essential form, our example function is the composition of two list transformations. This way of writing functions is known as <b>point-free style</b> because we've omitted the parameter, or “point”, on which the function operates. In doing so, we've focused more on the high-level behavior of the function and less on how that behavior is implemented.</p><h2>Wrapping Up</h2><p>I won't pretend that I've started coding exclusively in Haskell. At work, the majority of our code and knowledge is grounded in Java, and a sudden switch to Haskell would be roughly equivalent to addressing my coworkers in Latin. Having said that, a big part of the value of functional programming is in the way it teaches you to approach problems. I've found myself replacing big chunks of imperative code with stream expressions and cleaning up careless state management where I might not have noticed it before.</p><p>As a next step, I highly recommend Miran Lipovača's <i>Learn You a Haskell for Great Good</i>, available at <a href="http://learnyouahaskell.com">learnyouahaskell.com</a>. It's well-paced and includes lots of practical examples, not to mention some excellent drawings. If you just want to evaluate some expressions in your browser, there's a demo available on the front page of <a href="https://www.haskell.org">haskell.org</a>. Try it out!</p>
</div></div></div>
I Cobbled Together A Blog2017-08-13T01:00:00+00:002017-09-04T19:16:17.102244+00:00http://ninepints.co/2017/08/i-cobbled-together-a-blog/
<div class="parallax_section margincollapsable"><div class="col_container alignleft"><div class="col8">
<p>Last year, I told myself I should take a shot at writing. Of course, that meant putting together my own blog application—first things first—and I promptly got lost in the weeds of web frameworks, content management systems, static site generators, etc. Next to those weeds is another patch of weeds involving CSS, and next to that is the matter of deployment. However, I'm happy to say that I made it through, a little more knowledgable for it. The site has support for prose and math and code and not really images, which are still popping up at full resolution and breaking my layout, but I think it's in an acceptable state for launch.</p><p>On the back end, I'm using <a href="https://www.djangoproject.com">Django</a> and the <a href="https://wagtail.io">Wagtail</a> CMS, then rendering a static version of the site with <a href="https://github.com/moorinteractive/wagtail-bakery">wagtail-bakery</a>. Using static content means I can throw it in the cloud for cheap and not worry about maintenance, but if I want to transition to something dynamic down the road, most of the work is done.</p><p>On the front end, it's just hand-written markup. There are plenty of UI frameworks out there, but I wanted the presentation to be entirely my own. I'm also taking inspiration from a <a href="http://idlewords.com/talks/website_obesity.htm">talk on website obesity</a> by Maciej Cegłowski: Many client-side frameworks include rarely used components that are sent to the client nonetheless. I hope that most of the bytes I send will contribute to the reading experience. (I'm making an exception for Google's analytics script, which looks relatively unobtrusive.)</p><p>I have a few posts planned for this summer, so all that's left do to is write.</p>
</div></div></div>