Lesson 11: Transformations

Nicky Wakim

2025-10-27

Distributions of transformations of random variables

It is common in many scientific, mathematical, and statistical context to transform variables. A function of a random variable is a random variable: if \(X\) is a random variable and \(g\) is a function then \(Y=g(X)\) is a random variable. Since \(g(X)\) is a random variable it has a distribution. In general, the distribution of \(g(X)\) will have a different shape than the distribution of \(X\). This section discusses some techniques for determining how a transformation changes the shape of a distribution.

Linear rescaling

In general, the distribution of \(g(X)\) will have a different shape than the distribution of \(X\). The exception is when \(g\) is a linear rescaling.

A linear rescaling is a transformation of the form \(g(u) = a + bu\), where \(a\) (intercept) and \(b\) (slope1) are constants. For example, converting temperature from Celsius to Fahrenheit using \(g(u) = 32 + 1.8u\) is a linear rescaling.

A linear rescaling “preserves relative interval length” in the following sense.

  • If interval A and interval B have the same length in the original measurement units, then the rescaled intervals A and B will have the same length in the rescaled units. For example, [0, 10] and [10, 20] Celsius, both length 10 degrees Celsius, correspond to [32, 50] and [50, 68] Fahrenheit, both length 18 degrees Fahrenheit.
  • If the ratio of the lengths of interval A and B is \(r\) in the original measurement units, then the ratio of the lengths in the rescaled units is also \(r\). For example, [10, 30] is twice as long as [0, 10] in Celsius; for the corresponding Fahrenheit intervals, [50, 86] is twice as long as [32, 50].

linear

Think of a linear rescaling as just a consistent relabeling of the variable axis; every 1 unit increment in the original scale corresponds to a \(b\) unit increment in the linear rescaling.

Suppose that SAT Math score \(X\) follows a Uniform(200, 800) distribution. (It doesn’t but go with in for now). One way to simulate values of \(X\) is to simulate values of \(U\) from a Uniform(0, 1) distribution and let \(X = 200 + (800 - 200)U= 200 + 600U\). Then \(X\) is a linear rescaling of \(U\), and \(X\) takes values in the interval [200, 800]. We can define and simulate values of \(X\) in Symbulate. Before looking at the results, sketch a plot of the distribution of \(X\) and make an educated guess for its mean and standard deviation.

We see that \(X\) has a Uniform(200, 800) distribution. The linear rescaling changes the range of possible values, but the general shape of the distribution is still Uniform. We can see why by inspecting a few intervals on both the original and revised scale.

Interval of \(U\) values Probability that \(U\) lies in the interval Interval of \(X\) values Probability that \(X\) lies in the interval
(0.0, 0.1) 0.1 (200, 260) \(\frac{60}{600}\)
(0.9, 1.0) 0.1 (740, 800) \(\frac{60}{600}\)
(0.0, 0.2) 0.2 (200, 320) \(\frac{120}{600}\)

For a Uniform distribution the long run average is the midpoint of possible values. The long run average value of \(U\) is 0.5, and of \(X\) is 500. These two values are related through the same formula mapping \(U\) to \(X\) values: \(500 = 200 + 600\times 0.5\).

For a Uniform distribution, the standard deviation the about 0.289 times the length of the interval: \(|b-a|/\sqrt{12}\). The standard deviation of \(U\) is about 0.289, and of \(X\) is about 173.

The standard deviation of \(X\) is 600 times the standard deviation of \(U\). Multiplying the \(U\) values by 600 rescales the distance between the values. Two values of \(U\) that are 0.1 units apart correspond to two values of \(X\) that are 60 units apart. A \(U\) value of 0.6 is 0.1 units above the mean of \(U\), and the corresponding \(X\) value 560 is 60 units about the mean of \(X\). However, adding the constant 200 to all values just shifts the distribution and does affect degree of variability.

  1. Does \(V\) result from a linear rescaling of \(U\)?
  2. What are the possible values of \(V\)?
  3. Is \(V\) the same random variable as \(U\)?
  4. Find \(\IP(U \le 0.1)\) and \(\IP(V \le 0.1)\).
  5. Sketch a plot of what the histogram of many simulated values of \(V\) would look like.
  6. Does \(V\) have the same distribution as \(U\)?

Let’s consider a non-uniform example. Now let’s suppose that SAT Math score \(X\) follows a Normal(500, 100) distribution. We can simulate values of \(X\) by simulating \(Z\) from the standard Normal(0, 1) distribution and setting \(X = 500 + 100Z\). (Remember that the standard Normal spinner returns standardized values, so \(Z = 1\) corresponds to 1 standard deviation above the mean, that is, \(X= 600\).) The reason this works is because the linear rescaling doesn’t change the Normal shape.

The linear rescaling changes the range of observed values; almost all of the values of \(Z\) lie in the interval \((-3, 3)\) while almost all of the values of \(X\) lie in the interval \((200, 800)\). However, the distribution of \(X\) still has the general Normal shape. The means are related by the conversion formula: \(500 = 500 + 100 \times 0\). Multiplying the values of \(Z\) by 100 rescales the distance between values; two values of \(Z\) that are 1 unit apart correspond to two values of \(X\) that are 100 units apart. However, adding the constant 500 to all the values just shifts the center of the distribution and does not affect variability. Therefore, the standard deviation of \(X\) is 100 times the standard deviation of \(Z\).

In general, if \(Z\) has a Normal(0, 1) distribution then \(X = \mu + \sigma Z\) has a Normal(\(\mu\), \(\sigma\)) distribution.

  1. Is \(Y\) the same random variable as \(X\)?

  2. Does \(Y\) have the same distribution as \(X\)?

  3. Donny Don’t says that the distribution of \(-Z\) will look like an “upside-down bell”. Is Donny correct? If not, explain why not and describe the distribution of \(-Z\).

  4. Donny Don’t says that the standard deviation of \(-Z\) is -1. Is Donny correct? If not, explain why not and determine the standard deviation of \(-Z\).

Summary

  • A linear rescaling is a transformation of the form \(g(u) = a + bu\).
  • A linear rescaling of a random variable does not change the basic shape of its distribution, just the range of possible values.
    • However, remember that the possible values are part of the distribution. So a linear rescaling does technically change the distribution, even if the basic shape is the same. (For example, Normal(500, 100) and Normal(0, 1) are two different distributions.)
  • A linear rescaling transforms the mean in the same way the individual values are transformed.
  • Adding a constant to a random variable does not affect its standard deviation.
  • Multiplying a random variable by a constant multiplies its standard deviation by the absolute value of the constant.
  • Whether in the short run or the long run, \[\begin{align*} \text{Average of $a+bX$} & = a+b(\text{Average of $X$})\\ \text{SD of $a+bX$} & = |b|(\text{SD of $X$})\\ \text{Variance of $a+bX$} & = b^2(\text{Variance of $X$}) \end{align*}\]
  • If \(U\) has a Uniform(0, 1) distribution then \(X = a + (b-a)U\) has a Uniform(\(a\), \(b\)) distribution.
  • If \(Z\) has a Normal(0, 1) distribution then \(X = \mu + \sigma Z\) has a Normal(\(\mu\), \(\sigma\)) distribution.
  • Remember, do NOT confuse a random variable with its distribution.
    • The random variable is the numerical quantity being measured
    • The distribution is the long run pattern of variation of many observed values of the random variable

Nonlinear transformations of random variables

A linear rescaling does not change the shape of a distribution, only the range of possible values. But what about a nonlinear transformation, like a logarithmic or square root transformation? In contrast to a linear rescaling, a nonlinear rescaling does not preserve relative interval length, so we might expect that a nonlinear rescaling can change the shape of a distribution. We’ll investigate by considering the Uniform(0, 1) spinner and a logarithmic1 transformation.

Let \(U\) represent the result of a single spin of the Uniform(0, 1) spinner. We’ll basically consider \(\log(U)\), but this leads to to two minor technicalities.

  • Since \(U\in[0, 1]\), \(\log(U)\le 0\). To obtain positive values we consider \(-\log(U)\), which takes values in \([0,\infty)\).
  • Technically, applying \(-\log(u)\) to the values on the axis of the Uniform(0, 1) spinner, the resulting values would decrease from \(\infty\) to 0 clockwise. To make the values start at 0 and increase to \(\infty\) clockwise, we consider \(-\log(1-U)\). (We saw in the previous section the transformation \(u \to 1-u\) basically just changes direction from clockwise to counterclockwise.)

Therefore, it’s a little more convenient to consider the random variable \(X=-\log(1-U)\) which takes values in \([0,\infty)\). It also turns out, as we saw in earlier, that \(-\log(1-u)\) is the quantile function of the Exponential(1) distribution. We have already seen that \(X\) has an Exponential(1) distribution. Now we’ll take a closer look why.

The following code defines \(X\) and plots a few simulated values.

Notice that values near 0 occur with higher frequency than larger values. For example, there are many more simulated values of \(X\) that lie in the interval \([0, 1]\) than in the interval \([3, 4]\), even though these intervals both have length 1. Let’s see why this is happening.

Interval of U Length of U interval Probability Interval of X Length of X interval
(0, 0.1)
(0.1, 0.2)
(0.2, 0.3)
(0.3, 0.4)
(0.4, 0.5)
(0.5, 0.6)
(0.6, 0.7)
(0.7, 0.8)
(0.8, 0.9)
(0.9, 1)

Plug the endpoints into the conversion formula \(u\mapsto -\log(1-u)\) to find the corresponding \(X\) interval. For example, the \(U\) interval \((0.1, 0.2)\) corresponds to the \(X\) interval \((-\log(1-0.1), -\log(1-0.2)) = (0.105, 0.223)\). Since \(U\) has a Uniform(0, 1) distribution the probability is just the length of the \(U\) interval.

Interval of U Length of U interval Probability Interval of X Length of X interval
(0, 0.1) 0.1 0.1 (0, 0.105) 0.105
(0.1, 0.2) 0.1 0.1 (0.105, 0.223) 0.118
(0.2, 0.3) 0.1 0.1 (0.223, 0.357) 0.134
(0.3, 0.4) 0.1 0.1 (0.357, 0.511) 0.154
(0.4, 0.5) 0.1 0.1 (0.511, 0.693) 0.182
(0.5, 0.6) 0.1 0.1 (0.693, 0.916) 0.223
(0.6, 0.7) 0.1 0.1 (0.916, 1.204) 0.288
(0.7, 0.8) 0.1 0.1 (1.204, 1.609) 0.405
(0.8, 0.9) 0.1 0.1 (1.609, 2.303) 0.693
(0.9, 1) 0.1 0.1 (2.303, Inf) Inf

We see that the logarithmic transformation does not preserve relative interval length. Each of the original intervals of \(U\) values has the same length, but the nonlinear logarithmic transformation “stretches out” these intervals in different ways. The probability that \(U\) lies in each of these intervals is 0.1. As the transformation stretches the intervals, the 0.1 probability gets “spread” over intervals of different length. Since probability/relative frequency is represented by area in a histogram, if two regions of differing length have the same area, then they must have different heights. Thus the shape of the distribution of \(X\) will not be Uniform.

The following plot illustrates the results of Example @ref(exm:uniform-log-transform-calcs). Each bar in the top histogram corresponds to the same color bar in the bottom histogram. All bars have area 0.1. In the top histogram, the bins have equal width so the heights are the same. However, in the bottom histogram the bars have different widths but the same area, so they must have different heights, and we start to see where the Exponential(1) shape comes from.

The following example provides a similar illustration, but from the reverse perspective.

Interval of X Length of X interval Probability Interval of U Length of U interval
(0, 0.5)
(0.5, 1)
(1, 1.5)
(1.5, 2)
(2, 2.5)
(2.5, 3)
(3, 3.5)
(3.5, 4)
(4, 4.5)
(4.5, 5)

The corresponding \(U\) intervals are obtained by applying the inverse transformation \(v\mapsto 1-e^{-v}\). For example, the \(X\) interval \((0.5, 1)\) corresponds to the \(U\) interval \((1-e^{-0.5}, 1-e^{-1}) = (0.393, 0.632)\).

Interval of X Length of X interval Probability Interval of U Length of U interval
(0, 0.5) 0.5 0.393 (0, 0.393) 0.393
(0.5, 1) 0.5 0.239 (0.393, 0.632) 0.239
(1, 1.5) 0.5 0.145 (0.632, 0.777) 0.145
(1.5, 2) 0.5 0.088 (0.777, 0.865) 0.088
(2, 2.5) 0.5 0.053 (0.865, 0.918) 0.053
(2.5, 3) 0.5 0.032 (0.918, 0.95) 0.032
(3, 3.5) 0.5 0.020 (0.95, 0.97) 0.020
(3.5, 4) 0.5 0.012 (0.97, 0.982) 0.012
(4, 4.5) 0.5 0.007 (0.982, 0.989) 0.007
(4.5, 5) 0.5 0.004 (0.989, 0.993) 0.004

Since \(U\) has a Uniform(0, 1) distribution the probability is just the length of the \(U\) interval. Each of the \(X\) intervals has the same length but they correspond to intervals of differing length in the original \(U\) scale, and hence intervals of different probability.

The following plot illustrates the results of Example @ref(exm:uniform-log-transform-calcs2) (plots on the right). These two examples give some insight into how the transformed random variable \(X = -\log(1-U)\) has an Exponential(1) distribution.

“Spreadsheet” calculations like those in the previous two examples can help when sketching the distribution of a transformed random variable.

For a linear rescaling, we could just plug the mean of the original variable into the conversion formula to find the mean of the transformed variable. However, this will not work for nonlinear transformations.

We know that since \(U\) has a Uniform(0, 1) distribution its long run average value is 0.5, and since \(X\) has an Exponential(1) distribution its long run average value is 1, but \(-\log(1 - 0.5) \neq 1\). The nonlinear “stretching” of the axis makes some value relatively larger and others relatively smaller than they were on the original scale, which influences the average. Remember, in general: whether in the short run or the long run \[ \text{Average of } g(X) \neq g(\text{Average of }X). \]

Recall that a function of a random variable is also a random variable. If \(X\) is a random variable, then \(Y=g(X)\) is also a random variable and so it has a probability distribution. Unless \(g\) represents a linear rescaling, a transformation will change the shape of the distribution. So the question is: what is the distribution of \(g(X)\)? We’ll focus on transformations of continuous random variables, in which case the key to answering the question is to work with cdfs.

  1. Identify the possible values of \(X\). (We have done this already, but this should always be your first step.)
  2. Let \(F_X\) denote the cdf of \(X\). Find \(F_X(1)\).
  3. Find \(F_X(2)\).
  4. Find the cdf \(F_X(x)\).
  5. Find the pdf \(f_X(x)\).
  6. Why should we not be surprised that \(X=-\log(1-U)\) has cdf \(F_X(x) = 1 - e^{-x}\)? Hint: what is the function \(u\mapsto -\log(1-u)\) in this case?

(ref:cap-log-function-plot) A plot of the function \(u\mapsto -\log(1-u)\). The dotted lines illustrate that \(-\log(1-u)\le 1\) if and only if \(u\le 1-e^{-1}\approx 0.632\).

(ref:cap-log-function-plot)

If \(X\) is a continuous random variable whose distribution is known, the cdf method can be used to find the pdf of \(Y=g(X)\)

  • Determine the possible values of \(Y\). Let \(y\) represent a generic possible value of \(Y\).
  • The cdf of \(Y\) is \(F_Y(y) = \IP(Y\le y) = \IP(g(X) \le y)\).
  • Rearrange \(\{g(X) \le y\}\) to get an event involving \(X\). Warning: it is not always \(\{X \le g^{-1}(y)\}\). Sketching a picture of the function \(g\) helps.
  • Obtain an expression for the cdf of \(Y\) which involves \(F_X\) and some transformation of the value \(y\).
  • Differentiate the expression for \(F_Y(y)\) with respect to \(y\), and use what is known about \(F'_X = f_X\), to obtain the pdf of \(Y\). You will typically need to apply the chain rule when differentiating.

You will need to use information about \(X\) at some point in the last step above. You can either:

  • Plug in the cdf of \(X\) and then differentiate with respect to \(y\).
  • Differentiate with respect to \(y\) and then plug in the pdf of \(X\).

Either way gets you to the correct answer, but depending on the problem one way might be easier than the other. We’ll illustrate both methods in the next example.

  1. Identify the possible values of \(Y\).
  2. Sketch the pdf of \(Y\). Hint: consider a few equally spaced intervals of \(Y\) values and see what \(X\) values they correspond to.
  3. Run a simulation to approximate the pdf of \(Y\).
  4. Find \(F_Y(0.49)\).
  5. Use the cdf method to find the pdf of \(Y\). Is the pdf consistent with your simulation results?

(ref:cap-square-function-plot) A plot of the function \(x\mapsto x^2\) for \(-1<x<1\). The dotted lines illustrate that \(x^2\le 0.49\) if and only if \(-\sqrt{0.49}\le x\le \sqrt{0.49}\).

(ref:cap-square-function-plot)

The table below helps us see how the transformation \(Y = X^2\) “pushes” density towards 0 if \(X\) has a Uniform(-1, 1) distribution.

Y interval X interval Length of X interval Probability Length of Y interval Height of Y interval
(0, 0.1) (-0.3162, 0) U (0,0.3162) 0.6324 0.3162 0.1 3.162
(0.1, 0.2) (-0.4472, -0.3162) U (0.3162,0.4472) 0.2620 0.1310 0.1 1.310
(0.2, 0.3) (-0.5477, -0.4472) U (0.4472,0.5477) 0.2010 0.1005 0.1 1.005
(0.3, 0.4) (-0.6325, -0.5477) U (0.5477,0.6325) 0.1696 0.0848 0.1 0.848
(0.4, 0.5) (-0.7071, -0.6325) U (0.6325,0.7071) 0.1492 0.0746 0.1 0.746
(0.5, 0.6) (-0.7746, -0.7071) U (0.7071,0.7746) 0.1350 0.0675 0.1 0.675
(0.6, 0.7) (-0.8367, -0.7746) U (0.7746,0.8367) 0.1242 0.0621 0.1 0.621
(0.7, 0.8) (-0.8944, -0.8367) U (0.8367,0.8944) 0.1154 0.0577 0.1 0.577
(0.8, 0.9) (-0.9487, -0.8944) U (0.8944,0.9487) 0.1086 0.0543 0.1 0.543
(0.9, 1) (-1, -0.9487) U (0.9487,1) 0.1026 0.0513 0.1 0.513

We can use the table to sketch a histogram.

If we continued the above process with narrower and narrower \(Y\) intervals we would arrive at the smooth pdf given by \(f_Y(y) = \frac{1}{2\sqrt{y}}, 0<y<1\); see the black curve in the plot below.

Now we’ll approximate the pdf via simulation. The density blows up at 0 so it’s hard for the chunky histogram to capture that, but we see the simulated values follow a distribution described by the smooth \(f_Y(y) = \frac{1}{2\sqrt{y}}, 0<y<1\) in the black curve.