Change in gears! Time for a math post.

I've been working through various online machine learning courses over the last eighteen months, beginning with the Stanford/Coursera ML course taught by Andrew Ng. It opens with three weeks on linear and logistic regression, covering the structure of the linear/logistic regression models, the specific loss functions involved in each, and how to minimize said functions via gradient descent.

The lectures are well executed, but I could have gone for more background on the loss functions, which are sort of handed down from above. Where did they come from? Why do they produce good regression coefficients? I think the answers to these questions are pretty neatâ€”there's actually a straightforward statistical interpretation of what's happening.