Interval of U | Length of U interval | Probability | Interval of X | Length of X interval |
---|---|---|---|---|
(0, 0.1) | ||||
(0.1, 0.2) | ||||
(0.2, 0.3) | ||||
(0.3, 0.4) | ||||
(0.4, 0.5) | ||||
(0.5, 0.6) | ||||
(0.6, 0.7) | ||||
(0.7, 0.8) | ||||
(0.8, 0.9) | ||||
(0.9, 1) |
2025-10-27
It is common in many scientific, mathematical, and statistical context to transform variables. A function of a random variable is a random variable: if \(X\) is a random variable and \(g\) is a function then \(Y=g(X)\) is a random variable. Since \(g(X)\) is a random variable it has a distribution. In general, the distribution of \(g(X)\) will have a different shape than the distribution of \(X\). This section discusses some techniques for determining how a transformation changes the shape of a distribution.
In general, the distribution of \(g(X)\) will have a different shape than the distribution of \(X\). The exception is when \(g\) is a linear rescaling.
A linear rescaling is a transformation of the form \(g(u) = a + bu\), where \(a\) (intercept) and \(b\) (slope1) are constants. For example, converting temperature from Celsius to Fahrenheit using \(g(u) = 32 + 1.8u\) is a linear rescaling.
A linear rescaling “preserves relative interval length” in the following sense.
Think of a linear rescaling as just a consistent relabeling of the variable axis; every 1 unit increment in the original scale corresponds to a \(b\) unit increment in the linear rescaling.
Suppose that SAT Math score \(X\) follows a Uniform(200, 800) distribution. (It doesn’t but go with in for now). One way to simulate values of \(X\) is to simulate values of \(U\) from a Uniform(0, 1) distribution and let \(X = 200 + (800 - 200)U= 200 + 600U\). Then \(X\) is a linear rescaling of \(U\), and \(X\) takes values in the interval [200, 800]. We can define and simulate values of \(X\) in Symbulate. Before looking at the results, sketch a plot of the distribution of \(X\) and make an educated guess for its mean and standard deviation.
We see that \(X\) has a Uniform(200, 800) distribution. The linear rescaling changes the range of possible values, but the general shape of the distribution is still Uniform. We can see why by inspecting a few intervals on both the original and revised scale.
Interval of \(U\) values | Probability that \(U\) lies in the interval | Interval of \(X\) values | Probability that \(X\) lies in the interval |
---|---|---|---|
(0.0, 0.1) | 0.1 | (200, 260) | \(\frac{60}{600}\) |
(0.9, 1.0) | 0.1 | (740, 800) | \(\frac{60}{600}\) |
(0.0, 0.2) | 0.2 | (200, 320) | \(\frac{120}{600}\) |
For a Uniform distribution the long run average is the midpoint of possible values. The long run average value of \(U\) is 0.5, and of \(X\) is 500. These two values are related through the same formula mapping \(U\) to \(X\) values: \(500 = 200 + 600\times 0.5\).
For a Uniform distribution, the standard deviation the about 0.289 times the length of the interval: \(|b-a|/\sqrt{12}\). The standard deviation of \(U\) is about 0.289, and of \(X\) is about 173.
The standard deviation of \(X\) is 600 times the standard deviation of \(U\). Multiplying the \(U\) values by 600 rescales the distance between the values. Two values of \(U\) that are 0.1 units apart correspond to two values of \(X\) that are 60 units apart. A \(U\) value of 0.6 is 0.1 units above the mean of \(U\), and the corresponding \(X\) value 560 is 60 units about the mean of \(X\). However, adding the constant 200 to all values just shifts the distribution and does affect degree of variability.
Let’s consider a non-uniform example. Now let’s suppose that SAT Math score \(X\) follows a Normal(500, 100) distribution. We can simulate values of \(X\) by simulating \(Z\) from the standard Normal(0, 1) distribution and setting \(X = 500 + 100Z\). (Remember that the standard Normal spinner returns standardized values, so \(Z = 1\) corresponds to 1 standard deviation above the mean, that is, \(X= 600\).) The reason this works is because the linear rescaling doesn’t change the Normal shape.
The linear rescaling changes the range of observed values; almost all of the values of \(Z\) lie in the interval \((-3, 3)\) while almost all of the values of \(X\) lie in the interval \((200, 800)\). However, the distribution of \(X\) still has the general Normal shape. The means are related by the conversion formula: \(500 = 500 + 100 \times 0\). Multiplying the values of \(Z\) by 100 rescales the distance between values; two values of \(Z\) that are 1 unit apart correspond to two values of \(X\) that are 100 units apart. However, adding the constant 500 to all the values just shifts the center of the distribution and does not affect variability. Therefore, the standard deviation of \(X\) is 100 times the standard deviation of \(Z\).
In general, if \(Z\) has a Normal(0, 1) distribution then \(X = \mu + \sigma Z\) has a Normal(\(\mu\), \(\sigma\)) distribution.
Is \(Y\) the same random variable as \(X\)?
Does \(Y\) have the same distribution as \(X\)?
Donny Don’t says that the distribution of \(-Z\) will look like an “upside-down bell”. Is Donny correct? If not, explain why not and describe the distribution of \(-Z\).
Donny Don’t says that the standard deviation of \(-Z\) is -1. Is Donny correct? If not, explain why not and determine the standard deviation of \(-Z\).
A linear rescaling does not change the shape of a distribution, only the range of possible values. But what about a nonlinear transformation, like a logarithmic or square root transformation? In contrast to a linear rescaling, a nonlinear rescaling does not preserve relative interval length, so we might expect that a nonlinear rescaling can change the shape of a distribution. We’ll investigate by considering the Uniform(0, 1) spinner and a logarithmic1 transformation.
Let \(U\) represent the result of a single spin of the Uniform(0, 1) spinner. We’ll basically consider \(\log(U)\), but this leads to to two minor technicalities.
Therefore, it’s a little more convenient to consider the random variable \(X=-\log(1-U)\) which takes values in \([0,\infty)\). It also turns out, as we saw in earlier, that \(-\log(1-u)\) is the quantile function of the Exponential(1) distribution. We have already seen that \(X\) has an Exponential(1) distribution. Now we’ll take a closer look why.
The following code defines \(X\) and plots a few simulated values.
Notice that values near 0 occur with higher frequency than larger values. For example, there are many more simulated values of \(X\) that lie in the interval \([0, 1]\) than in the interval \([3, 4]\), even though these intervals both have length 1. Let’s see why this is happening.
Interval of U | Length of U interval | Probability | Interval of X | Length of X interval |
---|---|---|---|---|
(0, 0.1) | ||||
(0.1, 0.2) | ||||
(0.2, 0.3) | ||||
(0.3, 0.4) | ||||
(0.4, 0.5) | ||||
(0.5, 0.6) | ||||
(0.6, 0.7) | ||||
(0.7, 0.8) | ||||
(0.8, 0.9) | ||||
(0.9, 1) |
Plug the endpoints into the conversion formula \(u\mapsto -\log(1-u)\) to find the corresponding \(X\) interval. For example, the \(U\) interval \((0.1, 0.2)\) corresponds to the \(X\) interval \((-\log(1-0.1), -\log(1-0.2)) = (0.105, 0.223)\). Since \(U\) has a Uniform(0, 1) distribution the probability is just the length of the \(U\) interval.
Interval of U | Length of U interval | Probability | Interval of X | Length of X interval |
---|---|---|---|---|
(0, 0.1) | 0.1 | 0.1 | (0, 0.105) | 0.105 |
(0.1, 0.2) | 0.1 | 0.1 | (0.105, 0.223) | 0.118 |
(0.2, 0.3) | 0.1 | 0.1 | (0.223, 0.357) | 0.134 |
(0.3, 0.4) | 0.1 | 0.1 | (0.357, 0.511) | 0.154 |
(0.4, 0.5) | 0.1 | 0.1 | (0.511, 0.693) | 0.182 |
(0.5, 0.6) | 0.1 | 0.1 | (0.693, 0.916) | 0.223 |
(0.6, 0.7) | 0.1 | 0.1 | (0.916, 1.204) | 0.288 |
(0.7, 0.8) | 0.1 | 0.1 | (1.204, 1.609) | 0.405 |
(0.8, 0.9) | 0.1 | 0.1 | (1.609, 2.303) | 0.693 |
(0.9, 1) | 0.1 | 0.1 | (2.303, Inf) | Inf |
We see that the logarithmic transformation does not preserve relative interval length. Each of the original intervals of \(U\) values has the same length, but the nonlinear logarithmic transformation “stretches out” these intervals in different ways. The probability that \(U\) lies in each of these intervals is 0.1. As the transformation stretches the intervals, the 0.1 probability gets “spread” over intervals of different length. Since probability/relative frequency is represented by area in a histogram, if two regions of differing length have the same area, then they must have different heights. Thus the shape of the distribution of \(X\) will not be Uniform.
The following plot illustrates the results of Example @ref(exm:uniform-log-transform-calcs). Each bar in the top histogram corresponds to the same color bar in the bottom histogram. All bars have area 0.1. In the top histogram, the bins have equal width so the heights are the same. However, in the bottom histogram the bars have different widths but the same area, so they must have different heights, and we start to see where the Exponential(1) shape comes from.
The following example provides a similar illustration, but from the reverse perspective.
Interval of X | Length of X interval | Probability | Interval of U | Length of U interval |
---|---|---|---|---|
(0, 0.5) | ||||
(0.5, 1) | ||||
(1, 1.5) | ||||
(1.5, 2) | ||||
(2, 2.5) | ||||
(2.5, 3) | ||||
(3, 3.5) | ||||
(3.5, 4) | ||||
(4, 4.5) | ||||
(4.5, 5) |
The corresponding \(U\) intervals are obtained by applying the inverse transformation \(v\mapsto 1-e^{-v}\). For example, the \(X\) interval \((0.5, 1)\) corresponds to the \(U\) interval \((1-e^{-0.5}, 1-e^{-1}) = (0.393, 0.632)\).
Interval of X | Length of X interval | Probability | Interval of U | Length of U interval |
---|---|---|---|---|
(0, 0.5) | 0.5 | 0.393 | (0, 0.393) | 0.393 |
(0.5, 1) | 0.5 | 0.239 | (0.393, 0.632) | 0.239 |
(1, 1.5) | 0.5 | 0.145 | (0.632, 0.777) | 0.145 |
(1.5, 2) | 0.5 | 0.088 | (0.777, 0.865) | 0.088 |
(2, 2.5) | 0.5 | 0.053 | (0.865, 0.918) | 0.053 |
(2.5, 3) | 0.5 | 0.032 | (0.918, 0.95) | 0.032 |
(3, 3.5) | 0.5 | 0.020 | (0.95, 0.97) | 0.020 |
(3.5, 4) | 0.5 | 0.012 | (0.97, 0.982) | 0.012 |
(4, 4.5) | 0.5 | 0.007 | (0.982, 0.989) | 0.007 |
(4.5, 5) | 0.5 | 0.004 | (0.989, 0.993) | 0.004 |
Since \(U\) has a Uniform(0, 1) distribution the probability is just the length of the \(U\) interval. Each of the \(X\) intervals has the same length but they correspond to intervals of differing length in the original \(U\) scale, and hence intervals of different probability.
The following plot illustrates the results of Example @ref(exm:uniform-log-transform-calcs2) (plots on the right). These two examples give some insight into how the transformed random variable \(X = -\log(1-U)\) has an Exponential(1) distribution.
“Spreadsheet” calculations like those in the previous two examples can help when sketching the distribution of a transformed random variable.
For a linear rescaling, we could just plug the mean of the original variable into the conversion formula to find the mean of the transformed variable. However, this will not work for nonlinear transformations.
We know that since \(U\) has a Uniform(0, 1) distribution its long run average value is 0.5, and since \(X\) has an Exponential(1) distribution its long run average value is 1, but \(-\log(1 - 0.5) \neq 1\). The nonlinear “stretching” of the axis makes some value relatively larger and others relatively smaller than they were on the original scale, which influences the average. Remember, in general: whether in the short run or the long run \[ \text{Average of } g(X) \neq g(\text{Average of }X). \]
Recall that a function of a random variable is also a random variable. If \(X\) is a random variable, then \(Y=g(X)\) is also a random variable and so it has a probability distribution. Unless \(g\) represents a linear rescaling, a transformation will change the shape of the distribution. So the question is: what is the distribution of \(g(X)\)? We’ll focus on transformations of continuous random variables, in which case the key to answering the question is to work with cdfs.
(ref:cap-log-function-plot) A plot of the function \(u\mapsto -\log(1-u)\). The dotted lines illustrate that \(-\log(1-u)\le 1\) if and only if \(u\le 1-e^{-1}\approx 0.632\).
(ref:cap-log-function-plot)
If \(X\) is a continuous random variable whose distribution is known, the cdf method can be used to find the pdf of \(Y=g(X)\)
You will need to use information about \(X\) at some point in the last step above. You can either:
Either way gets you to the correct answer, but depending on the problem one way might be easier than the other. We’ll illustrate both methods in the next example.
(ref:cap-square-function-plot) A plot of the function \(x\mapsto x^2\) for \(-1<x<1\). The dotted lines illustrate that \(x^2\le 0.49\) if and only if \(-\sqrt{0.49}\le x\le \sqrt{0.49}\).
(ref:cap-square-function-plot)
The table below helps us see how the transformation \(Y = X^2\) “pushes” density towards 0 if \(X\) has a Uniform(-1, 1) distribution.
Y interval | X interval | Length of X interval | Probability | Length of Y interval | Height of Y interval |
---|---|---|---|---|---|
(0, 0.1) | (-0.3162, 0) U (0,0.3162) | 0.6324 | 0.3162 | 0.1 | 3.162 |
(0.1, 0.2) | (-0.4472, -0.3162) U (0.3162,0.4472) | 0.2620 | 0.1310 | 0.1 | 1.310 |
(0.2, 0.3) | (-0.5477, -0.4472) U (0.4472,0.5477) | 0.2010 | 0.1005 | 0.1 | 1.005 |
(0.3, 0.4) | (-0.6325, -0.5477) U (0.5477,0.6325) | 0.1696 | 0.0848 | 0.1 | 0.848 |
(0.4, 0.5) | (-0.7071, -0.6325) U (0.6325,0.7071) | 0.1492 | 0.0746 | 0.1 | 0.746 |
(0.5, 0.6) | (-0.7746, -0.7071) U (0.7071,0.7746) | 0.1350 | 0.0675 | 0.1 | 0.675 |
(0.6, 0.7) | (-0.8367, -0.7746) U (0.7746,0.8367) | 0.1242 | 0.0621 | 0.1 | 0.621 |
(0.7, 0.8) | (-0.8944, -0.8367) U (0.8367,0.8944) | 0.1154 | 0.0577 | 0.1 | 0.577 |
(0.8, 0.9) | (-0.9487, -0.8944) U (0.8944,0.9487) | 0.1086 | 0.0543 | 0.1 | 0.543 |
(0.9, 1) | (-1, -0.9487) U (0.9487,1) | 0.1026 | 0.0513 | 0.1 | 0.513 |
We can use the table to sketch a histogram.
If we continued the above process with narrower and narrower \(Y\) intervals we would arrive at the smooth pdf given by \(f_Y(y) = \frac{1}{2\sqrt{y}}, 0<y<1\); see the black curve in the plot below.
Now we’ll approximate the pdf via simulation. The density blows up at 0 so it’s hard for the chunky histogram to capture that, but we see the simulated values follow a distribution described by the smooth \(f_Y(y) = \frac{1}{2\sqrt{y}}, 0<y<1\) in the black curve.
Lesson 11 Slides