Parameterization
Introduction
To parameterize, or not parameterize, that is the question. Here the eternal question of data science is explored.
Developing Intuition
In a data science context, parameterization is the process of assigning parameters (sometimes called coefficients) to develop an approximation of a function. You will recall from the the Starter Guide on the meaning of f(x) in data science that the function
One thing to note about parameters is that they are typically discovered; that is, we don’t assign them based off of intuition or optimization (this falls into the realm of hyperparameters, or tuning paramaters).
OLS (Ordinary Least Squares) As An Example Of Parameterization
OLS is a common example of a modeling technique that utilizes parameterization. You might even know that the
are the parameters of the general OLS equation. When we fit the model, we find the
The problem of estimating
Splines As An Example of Non-Parameterization
Splines are a great way to visualize non-parameterization. There are many ways to build splines (whole careers have been dedicated to such pursuits), but in general the idea is to make a direct line between all data points by making a piecewise polynomial between each pair of points.
You might recall the picture of a very simple spline from the Starter Guide on the meaning of f(x) in data science.

We have 5 data points, represented by the blue dots. The black line is our spline, which we got by making a piecewise set of functions between points A to B, B to C, etc. We don’t need any parameters at all to build this piecewise function; the data itself will tell us what the function should be.
The attentive reader here will notice that a piecewise polynomial still has parameters. For instance, maybe the polynomial between points A and B is
Why Would We Ever Not Parameterize?
One challenge with parameterization is that you have to make assumptions about the shape of
For those familiar with regression, often times there are basic tests done early on in the analysis to see if a simple linear model is ideal, or maybe interaction effects, or multivariable polynomial regression is more appropriate. Here, the analyst is trying to find out the form, or shape,
By not parameterizing, we remove the possibility that we are way off in the shape of