Alpha Compositing

Transparency may not seem particularly exciting. The GIF image format which allowed some pixels to show through the background was published over 30 years ago. Almost every graphic design application released in the last two decades has supported the creation of semi-transparent content. The novelty of these concepts is long gone.

With this article I’m hoping to show you that transparency in digital imaging is actually much more interesting than it seems – there is a lot of invisible depth and beauty in something that we often take for granted.

Opacity

If you ever had a chance to look through rose-tinted glasses you may have experienced something akin to the simulation below. Try dragging the glasses around to see how they affect what’s seen through them:

The way these kind of glasses work is that they let through a lot of red light, a decent amount of blue light, and only some amount of green light. We can write down the math behind these specific glasses using the following set of three equations. The letter R is the result of the operation and the letter D describes the destination we’re looking at. The RGB subscripts denote the red, green, and blue color components:

RR = DR × 1.0
RG = DG × 0.7
RB = DB × 0.9

This stained glass lets through red, green, and blue components of the background with different intensities. In other words, the transparency of the rose-tinted glasses depends on the color of the incoming light. In general, transparency can vary with a wavelength of light, but in this simplified situation we’re only interested in how the glasses affect the classic RGB components.

A simulation of the behavior of regular sunglasses is much less complicated, they usually just attenuate some amount of the incoming light:

These sunglasses let through only 30% of the light passing through them. We can describe this behavior using these equations:

RR = DR × 0.3
RG = DG × 0.3
RB = DB × 0.3

All three color components are scaled by the same value – the absorption of the incoming light is uniform. We could say that the dark glasses are 30% transparent, or we could say that they’re 70% opaque. Opacity of an object defines how much light is blocked by it. In computer graphics we usually deal with a simplified model in which only a single value is needed to describe that property. Opacity can be spatially variant like in a case of a column of smoke which becomes more transparent the higher it goes.

In the real world objects with 100% opacity are just opaque and they don’t let through any light. The world of digital imaging is slightly different. It has some, quite literally, edge cases where even solid opaque things can let through some amount of light.

Coverage

Vector graphics deal with pristine and infinitely precise descriptions of shapes that are defined using points, linear segments, Bézier curves, and other mathematical primitives. When it comes to putting the shapes onto a computer screen, those immaculate beings have to be rasterized into a bitmap:

Rasterization of a vector shape into a bitmap

Rasterization of a vector shape into a bitmap

The most primitive way to do it is to check if a pixel sample is on the inside or on the outside of the vector shape. In the examples below you can drag the triangle on either zoomed or un-zoomed side, the latter lets you perform finer movements. The blue outline symbolizes the original vector geometry. As you can see the steps on the edges of the triangle look unpleasant and they flicker heavily when the geometry is moved:

The flaw of this approach lies in the fact that we only do one test per output pixel and the results are quantized to one of the two possible values – in or out.

We could sample the vector geometry more than once per pixel to achieve a higher gradation of steps and decide that some pixels are covered only partially. One approach is to use four sample points which lets us represent five different levels of coverage: 0, 14, 24, 34, and 1:

The quality of the edges of the triangle is improved, but just five possible levels of coverage are very often not enough and we can easily do a much better job. While describing a pixel as a little square is frowned upon in a world of signal processing, in some contexts it is a useful model that lets us calculate an accurate coverage of a pixel by the vector geometry. An intersection of a line with a square can always be decomposed into a trapezoid and a rectangle:

A linear segment splits a square into a trapezoid and a rectangle

A linear segment splits a square into a trapezoid and a rectangle

The area of those two parts can be calculated relatively easily and their sum divided by the square’s area specifies the percentage of a coverage of a pixel. That coverage is calculated as an exact number and can be arbitrarily precise. The demonstration below uses that method to render significantly better looking edges that stay smooth when the triangle is dragged around:

When it comes to more complicated shapes like ellipses or Bézier curves, they’re very often subdivided to simple linear segments allowing the coverage evaluation with as high accuracy as required.

Partial coverage is crucial for high quality rendering of vector graphics, including, most importantly, the rendering of text. If you look up close on a screenshot of this page you’ll notice that almost all the edges of the glyphs end up covering the pixels only partially:

Text rendering heavily relies on partial coverage

Text rendering heavily relies on partial coverage

With the object’s opacity in one hand and its coverage of individual pixels in another, we’re ready to combine them into a single value.

Alpha

The product of an object’s opacity and its pixel coverage is known as alpha:

alpha = opacity × coverage

An object that’s 60% opaque that covers 30% of a pixel’s area has an alpha value of 18% in that pixel. Naturally, when an object is transparent, or it doesn’t cover a pixel at all, its value of alpha at that pixel is equal to 0. Once multiplied the distinction between the opacity and the coverage is lost which in some sense justifies the synonymous use of the terms “alpha” and “opacity”.

Alpha is often presented as a fourth channel of a bitmap. The usual values of red, green, and blue are accompanied by the alpha value to form the quadruple of RGBA values.

When it comes to storing the alpha value in memory one could be tempted to use only a few bits. If you consider the pixel coverage of edges of opaque objects 4 or even 3 bits of alpha often look good enough, depending on the pixel density of your display:

However, opacity also contributes to the alpha value so a low bit-depth may be catastrophic for quality of some smoothly varying transparencies. In the visualization below a gradient spanned between an opaque black and a clear color shows that low bit-depths result in very banded images:

Clearly, the more bits the better and most commonly an alpha bit-depth of 8 is used to match the precision of the color components making many RGBA buffers occupy 32 bits per pixel. It’s worth pointing out that unlike the color components which are often encoded using a non-linear transformation, alpha is stored linearly – encoded value of 0.5 corresponds to alpha value of 0.5.

So far when talking about alpha we’ve completely ignored any color components, but other than blocking a background color a pixel can add some color on its own. The idea is fairly simple – a semi-transparent pink object blocks some of the incoming background light and emits, or reflects, some of the pink light:


Note that this behaves differently from a stained glass. The glass merely blocks some of the background light with varying intensities. Looking at a pitch black object with rose-stained glass maintains the blackness since a black object doesn’t emit or reflect any light in the first place. However, the semi-transparent pink object adds its own light. Placing it on top of a black object produces a pinkish result. A good equivalent of that behavior is a fine material suspended in the air, be it haze, smoke, fog, or some colorful powder.

Visualizing the alpha channel is a little complicated – a perfectly transparent object is by definition invisible, so I’ll use two tricks to make things discernible. A checkerboard background shows which parts of the image are clear, a pattern used in many graphic design applications:

Checkerboard pattern showing clear parts

Checkerboard pattern showing clear parts

The four little squares under the image tell us we’re looking at red, green, blue, and alpha components of the image. In some cases it’s useful to see the values of the alpha channel directly and the easiest way to present them is by using shades of gray:

Showing RGB and A values as separate planes

Showing RGB and A values as separate planes

The brighter the gray the higher the alpha value and so pure black corresponds to 0% alpha and pure white means 100% alpha. The little squares show that RGB and A components of the image have been separated into two planes.

On its own the alpha component isn’t particularly useful, but it becomes critical when we talk about compositing.

Simple Compositing

Very few 2D rendering effects can be achieved with a single operation and we use the process of compositing to combine multiple images to create the final output. As an example, a simple “Cancel” button can be created by a composition of five separate elements:

Compositing elements of a “Cancel” Button

Compositing elements of a “Cancel” Button

Compositing is often performed in multiple steps where each step combines two images. The usual name for the foreground image being composed is a source. The background image that is being composed into is called a destination. It’s easy to remember if you think of it in terms of the new pixels from the source arriving at the destination.

We’ll start with compositing against an opaque background since it is a very common case. Everything you see on your screen ultimately ends being composited into a non-transparent destination.

When the value of alpha of the source is 100% the source is opaque and it should cover the destination completely. For the alpha value of 0% the source is completely transparent and the destination should not be affected. Alpha value of 25% allows the object to emit 25% of its light and lets through 75% of the light from the background and so on:

Compositing of purple sources with various alpha values into a yellow destination

Compositing of purple sources with various alpha values into a yellow destination

You may see where this is all going – the simple case of alpha compositing against an opaque background is just a linear interpolation between the destination and the source colors. In the chart below the slider controls the alpha value of the source, while the red, green, and blue plots depict the values of the RGB components. The result R is just a mix between the source S and the destination D:

We can describe what’s going on using the following equations. As before, the subscript denotes the component, so SA is the alpha value of the source, while DG is the green value of the destination:

RR = SR × SA + DR × (1 − SA)
RG = SG × SA + DG × (1 − SA)
RB = SB × SA + DB × (1 − SA)

The equations for the red, green, and blue components have the same form so we can just use an RGB subscript and combine everything into a single line:

RRGB = SRGB × SA + DRGB × (1 − SA)

Moreover, since the destination is opaque and already blocks all the background light we know that the alpha value of the result is always equal to 1:

RA = 1

Compositing against an opaque background is simple, but unfortunately quite limited. There are many cases where a more robust solution is needed.

Intermediate Buffers

In the image below you’ll see a two step process of compositing three different layers marked as: A, B, and C. I’ll use symbol ⇨ to denote “composed onto”:

Compositing result of three layers in two steps

Compositing result of three layers in two steps

We first compose B onto C, and then we compose A onto that to achieve the final image. In the next example we’re going to do things slightly differently. We’ll compose the two topmost layers first, and then we’ll compose that result into the final destination:

Compositing result of three layers in two steps with different order

Compositing result of three layers in two steps with different order

You may wonder if that situation happens in practice, but it’s actually very common. A lot of non-trivial compositing and rendering effects like masking and blurring require a step through an intermediate buffer that contains only partial results of compositing. That concept is known under different names: offscreen passes, transparency layers, or side buffers, but the idea is usually the same.

More importantly, almost any image with transparency can be thought of as a partial result of some rendering that, at some later time, will be composed to its final destination:

Partial composition of a button into a buffer

Partial composition of a button into a buffer

We want to figure out how to replace the composition of semi-transparent images A and B with a single image (AB) that will have the same color and opacity. Let’s start by calculating the alpha value of the resulting buffer.

Combining Alphas

It may be unclear how to combine opacities of two objects, but it’s easier to reason about the issue when we think about the transparencies instead.

Consider some amount of light passing through a first object and then through a second object. If the transparency of the first object is 80%, it will let through 80% of the incoming light. Similarly, a second object with transparency 60% will let through 60% of the light passing through it, which is 60% × 80% = 48% of the original light. You can try this in a demonstration below, remember that the sliders control transparency and not opacity of the objects in the path of light:

Naturally, when either the first or the second object is opaque no light passes through even when the other one is fully transparent.

If an object D has transparency DT and object S has transparency ST, then the resulting transparency RT of those two objects combined is equal to their product:

RT = DT × ST

However, transparency is just one minus alpha, so a substitution gets us:

1 − RA = (1 − DA) × (1 − SA)

Which expands to:

1 − RA = 1 − DA − SA + DA × SA

And simplifies to:

RA = DA + SA − DA × SA

This can be further minimized to one of the two equivalent forms:

RA = SA + DA × (1 − SA)
RA = DA + SA × (1 − DA)

We’ll soon see that the former is more commonly used. As an interesting side note, notice that the resulting alpha doesn’t depend on the relative order of the objects – the opacity of the resulting pixels is the same even when the source and destination are switched. This makes a lot of sense. A light passing through two objects should be equally dimmed when it’s shining front to back or back to front.

Combining Colors

Calculating alpha wasn’t that difficult so let’s try to reason through the math behind the RGB components. The source image has a color SRGB, but its opacity SA causes it to contribute only the product of those two values to the final result:

SRGB×SA

The destination image has a color DRGB, its opacity causes it to emit DRGB×DA light, however, some part of that light is blocked by opacity of S so the entire contribution from the destination is equal to:

DRGB×DA×(1 − SA)

The total light contribution from both S and D is equal to their sum:

SRGB×SA + DRGB×DA×(1 − SA)

Similarly, the contribution of the merged layers is equal to their color times their opacity:

RRGB×RA

We want those two values to match:

RRGB×RA = SRGB×SA + DRGB×DA×(1 − SA)

Which gives us the final equations:

RA = SA + DA × (1 − SA)
RRGB = (SRGB×SA + DRGB×DA×(1 − SA)) / RA

Look how complicated the second equation is! Notice that to obtain the RGB values of the result we have to do a division by the alpha value. However, any subsequent compositing step will require a multiplication by the alpha value again, since the result of a current operation will become a new source or destination in the next operation. This is just inelegant.

Let’s go back to the almost final form of RRGB for a second:

RRGB×RA = SRGB×SA + DRGB×DA×(1 − SA)

The source, the destination, and the result are multiplied by their alpha components. This hints us that the pixel’s color and its alpha like to be together, so let’s try to take a step back and rethink the way we store the color information.

Premultiplied Alpha

Recall how we talked about opacity – if the object is partially opaque the contribution of its color to the output is also partial. Premultiplied alpha makes this idea explicit. The values of the RGB components are, as the name implies, premultiplied by the alpha component. Starting with a non-premultiplied color:

(1.00, 0.80, 0.30, 0.40)

Alpha premultiplication results in:

(0.40, 0.32, 0.12, 0.40)

Let’s look at more than one pixel at the time. The following picture shows how color information is stored in a non-premultiplied alpha:

RGB and A information in a non-premultiplied image

RGB and A information in a non-premultiplied image

Notice that the areas where alpha is 0 can have arbitrary RGB values as shown by the green and cyan glitches in the image. With premultiplied alpha the color information also carries the values of pixel’s opacity:

RGB and A information in a premultiplied image

RGB and A information in a premultiplied image

Premultiplied alpha is sometimes called associated alpha, while non-premultiplied alpha is occasionally referred to as straight or unassociated alpha.

When the alpha component of a color is equal to 0, the premultiplication nullifies all the other components, regardless of what was in them:

(0.0, 0.0, 0.0, 0.0)

With premultiplied alpha there is only one completely transparent color – a charming simplicity.

The benefits of this treatment of color components may slowly start to become clear, but before we go back to our compositing example let’s see how premultiplied alpha helps to solve some other rendering issues.

Filtering

A gaussian blur is a popular way to either provide an interesting unfocused background, or as a way of reducing the high frequency of the underlying contents in some UI elements. As we’ll see alpha premultiplication is critical to achieve correctly looking blurs.

The image we’ll analyze is created by filling the background with 1% opaque blue color and then painting an opaque red circle on top. First, let’s consider a non-premultiplied example. I separated RGB channels from the alpha channel to make it easier to see what’s going on. The arrow symbolizes the blur operation:

Blurring non-premultiplied content

Blurring non-premultiplied content

The final result has an ugly blue halo around it. This happens because the blue background seeps into the red area during the blur and then it’s weighted by the alpha at composite time.

When the colors are alpha premultiplied the final result is correct:

Blurring premultiplied content

Blurring premultiplied content

Due to the premultiplication the blue color in the image is reduced to 1% of its original strength so its impact on the colors of the blurred circle is expectedly minimal.

Interpolation

Rendering an image whose pixels are perfectly aligned with a destination is easy since we have a trivial one-to-one mapping between the samples. A problem arises when the simple mapping doesn’t exist, e.g. due to rotation, scaling, or translation. In the picture below you can see how the pixels of a rotated image shown with a red outline no longer align with a destination:

Relative orientation of image and destination pixels before and after rotation

Relative orientation of image and destination pixels before and after rotation

There are multiple ways of deciding which color from the image should be put into the destination pixel and the easiest option is a so called nearest-neighbor interpolation which simply chooses the nearest sample in the texture as the resulting color.

In the demonstration below the red outline shows the position of the image in the destination. The right side presents the sample positions from the point of view of the image. By dragging the slider you can rotate the quad to see how the samples pick up colors from the bitmap. I highlighted a single pixel in both the source and the destination to make it easier to see how they relate:

That approach works and the pixels are consistently colored, but the quality is unacceptable. A better way to do it is to use bilinear interpolation which calculates a weighted average of the four nearest pixels in the sampled image:

This is better, but the edges around the rectangles just don’t look right, the contents of the non-premultiplied pixels bleeds in since the alpha is “applied” after interpolation. The sometimes recommended approach of bleeding the color of the valid contents out, which you can see some examples of in Adrian Courrèges’ fantastic article, is far from perfect – no color would make the one pixel gap between the red and the blue rectangle look correct.

Let’s see how things look when an image is alpha premultiplied, and, to foreshadow a little, composited using a better equation that we’ll derive in a minute:

It’s just perfect, we got rid of all the color bleeding and jaggies are nowhere to be seen.

The issues related to blurring and interpolation are ultimately closely related. Any operation that requires some combination of semi-transparent colors will likely have incorrect results unless the colors are alpha premultiplied.

Compositing Done Right

Let’s jump back to compositing. We’ve left the discussion with the almost finished derivation:

RRGB×RA = SRGB×SA + DRGB×DA×(1 − SA)

If we represent the colors using premultiplied alpha all those pesky multiplications disappear since they’re already part of the color values and we end up with:

RRGB = SRGB + DRGB×(1 − SA)

Let’s have a look at the equation for the alpha:

RA = SA + DA × (1 − SA)

The factors for red, green, blue, and alpha channels are all the same, so we can just express the entire thing using a single equation and just remember that every component undergoes the same operation:

R = S + D × (1 − SA)

Look how premultiplied alpha made things beautifully simple. When we analyze the components of the equation they all fit right in. The operation masks some background light and adds new light:

R = S + D × (1 − SA)

This blending operation is known as source-over, sover, or just normal, and, without a doubt, it is the most commonly used compositing mode. Almost everything you see on this website was blended using this mode.

Associativity

An important feature of the source-over on premultiplied colors is that it’s associative. In a complex blending equation this lets us put the parentheses completely arbitrarily. All compositions below are equivalent:

R = (((AB)⇨C)⇨D)⇨E
R = (AB)⇨(C⇨(DE))
R = A⇨(B⇨(C⇨(DE)))

The proof is fairly simple, but I won’t bore you with the algebraic manipulations. The practical implication is that we can perform partial renders of complicated drawings without a fear that the final composition will look any different.

The vast majority of time the alpha is used purely for source-over compositing, however, its power doesn’t stop there. We can use the alpha values for other useful rendering operations.

Porter-Duff

In July 1984 Thomas Porter and Tom Duff have published a seminal paper entitled “Compositing Digital Images”. The authors not only introduced the concept of premultiplied alpha and derived the equation behind source-over compositing, but they also presented an entire family of alpha compositing operations, many of which are not well know despite being very useful. The introduced functions are also known as operators, since, similarly to addition or multiplication, they operate on input values to create an output value.

Over

In the following examples we’ll use the interactive demos that show how different blend modes operate. The destination has a black clubs symbol ♣, while the source has a red hearts symbol . You can drag the heart around to see how the overlap of the two shapes behaves under different compositing operators. Notice the little minimap in the corner. Some blend modes are quite destructive and it may be easy to lose track of what’s where. The minimap always shows a result of the simple source-over compositing which should make it easier to navigate:

R = S + D × (1 − SA)
R = S × (1 − DA) + D

If you switch to destination-over you’ll quickly realize it’s just a “flipped” source-over, the destination and source swap places in the equation and the result is equivalent to treating the source as the destination and the destination as the source. While seemingly redundant, the destination-over operator is quite useful as it lets us composite things behind already existing contents.

Out

The source-out and destination-out operators are great for punching holes in the source and destination respectively:

R = S × (1 − DA)
R = D × (1 − SA)

Destination-out is the more convenient operator of the two, it uses the alpha channel of the source to punch out the existing destination.

In

The source-in and destination-in are essentially masking operators:

R = S × DA
R = D × SA

They make it fairly easy to obtain complex intersections of complicated geometry without resolving to relatively difficult to compute intersections of vector-based paths.

Atop

The source-atop and destination-atop allow overlaying of new contents on top of existing one, while simultaneously masking it to the destination:

R = S × DA + D × (1 − SA)
R = S × (1 − DA) + D × SA

Xor

The exclusive-or operator or just xor keeps either the source or the destination, their overlap disappears:

R = S × (1 − DA) + D × (1 − SA)

Source, Destination, Clear

The last three classic compositing mode are rather dull. Source, also known as copy just takes the source color. Similarly, destination ignores the source color and simply returns the destination. Finally, clear simply wipes everything:

R = S
R = D
R = 0

The usefulness of those modes is limited. A dirty buffer may be reset using clear which can be optimized to just filling memory with zeros. Additionally, in some cases source may be cheaper to evaluate since it doesn’t require any blending at all, it just replaces the contents of a buffer with source.

Porter-Duff in Action

With the individual operators out of the way let’s see how we can combine them. In the example below we’ll paint a nautical motif in nine steps without using any masking or complex geometrical shapes. The blue outlines show the simple geometry that is emitted. You can advance the steps by clickingtapping on the right half of the image and go back by clickingtapping on the left half of the image:

You should by no means forget about masks and clip paths, but Porter-Duff compositing modes are an often forgotten tool that makes some visuals effects much easier to achieve.

Operators

If you ponder on the Porter-Duff operators we’ve discussed you may notice that they all have the same form. A source is always multiplied by some factor FS and added to the destination multiplied by a factor FD:

R = S×FS + D×FD

Possible values of the FS are 0, 1, DA, and 1 − DA, while FD can be either 0, 1, SA, or 1 − SA. It doesn’t make much sense to multiply the source or destination by their respective alphas since they’re already premultiplied and we’d just achieve a wacky, but not particularly useful alpha-squaring effect. We can present all the operators in a table:

0 1 DA 1 − DA
0 clear source source-in source-out
1 destination destination-over
SA destination-in destination-atop
1 − SA destination-out source-over source-atop xor
0 1 DA 1 − DA
0 clear source source-in source-out
1 dest dest-over
SA dest-in dest-atop
1 − SA dest-out source-over source-atop xor
0 1 DA 1 − DA
0 clear s sin sout
1 d dover
SA din datop
1 − SA dout sover satop xor

Notice the symmetry of the operators along the diagonal. The four central entries in the table are missing and that’s cause they’re unlike the others.

Additive Lighting

In their paper Porter and Duff presented one more operator created when both FS and FD are equal to 1. It’s known as plus, lighter, or plus-lighter:

R = S + D

This operation effectively adds source light to the destination:

Additive light with the plus operator

Additive light with the plus operator

Green and red correctly created yellow, while green and blue produced cyan. The black color is a no-op – it doesn’t modify the color values in any way since adding zero to a number doesn’t do anything.

The three remaining operators were never named and that’s cause they’re not particularly useful. They’re simply a combination of masking and additive blending.

It’s also worth noting that premultiplied alpha lets us abuse the source-over operator. Let’s look at the equation again:

R = S + D × (1 − SA)

If we manage to make the alpha value of the source equal to zero, but have non-zero values in RGB channels we can achieve additive lighting without using plus operator:

Additive light with the source-over operator

Additive light with the source-over operator

Note that you have to be careful – the values are no longer correctly premultiplied. Some pieces of software may have an optimization that avoids blending colors with zero alpha completely, while others may un-premultiply the alpha values, do some color operations, and then premultiply again which would wipe the color channels completely. It may also be difficult to export assets in that format, so you should stick to plus unless you control your rendering pipeline completely.

So far all the pieces I’ve talked about have fit together gracefully. Let’s take the rose-tinted glasses off and discuss some of the issues that one should be aware of when dealing with alpha compositing.

Group Opacity

Let’s look at this simple drawing of a pill done using just six primitives:

Drawing a pill using simple shapes

Drawing a pill using simple shapes

If we were asked to render the pill at 50% opacity one could be tempted to just halve the opacity of every drawing operation, however, it proves to be a faulty approach:

Unexpected rendering of a pill at half opacity

Unexpected rendering of a pill at half opacity

To achieve the correct result we can’t just distribute object’s opacity into each of its individual components. We have to create the object first by rendering it to a bitmap, then modify the opacity of that bitmap, and finally compose:

Expected rendering of a pill at half opacity

Expected rendering of a pill at half opacity

This is yet another case when the concept of rendering into a side buffer shows its usefulness.

Compositing Coverage

Conversion of geometrical coverage to a single alpha value has some unfortunate consequences. Consider a case in which two perfectly overlapping edges of vector geometry, depicted below with orange and blue outlines, are rendered into a bitmap. In an ideal world the results should look something like this since each pixel is perfectly covered:

An ideal result of rendering with correct coverage

An ideal result of rendering with correct coverage

However, if we first render the orange geometry and then the blue geometry the edge pixels of the result will still have some white background bleeding in:

Two-step composited result

Two-step composited result

Once the coverage is stored in the alpha channel all its geometrical information is lost and there is no way to recreate it. The blue geometry just blends with some contents in the buffer but it simply can’t know that the geometry behind the reddish pixels was intended to match it. This issue is particularly visible when the geometries are perfectly overlapping. In the picture below a white circle is painted on top of black circle. You can see the dark edges despite both circles having the exact same radius and position:

A white circle drawn on top of a black circle

A white circle drawn on top of a black circle

One way to avoid this problem is to not calculate partial coverage of pixels, but instead use significantly scaled-up buffers. By rasterizing the vector geometry with a simple in/out coverage and then scaling the final result down to the original size we can achieve the expected outcome.

However, to perfectly match the quality of edge rendering of an 8-bit alpha channel the buffers would have to be 256 times larger in both directions, an increase in number of pixels by a whooping factor of 216. As we’ve seen, reducing bit-depth for coverage can still produce satisfying results so in practice smaller scales could be used.

It’s worth noting that those issues can often be avoided relatively easily even without using any super-sized bitmaps. For example, instead of drawing two overlapping circles one could just draw two squares on top of each other and then mask the result to the shape of the circle.

Linear Values

If you had a chance to brush up your knowledge of color spaces you may remember that most of them encode the color values nonlinearly and performing correct mathematical operations requires a prior linearization. When that step is done the result of compositing looks as follows, notice they nicely yellowish tint of the overlapping parts:

Fuzzy red circles composited on a green background using linear values

Fuzzy red circles composited on a green background using linear values

For better or for worse, this is not how most compositing works. The default way to do things on the web and in most graphic design software is to directly blend non-linear values:

Fuzzy red circles composited on a green background using non-linear values

Fuzzy red circles composited on a green background using non-linear values

See how much darker the overlap between the reds and the greens is. This is far from perfect, but there are some cases when doing things incorrectly has become deeply ingrained in the way we think about color. For example, opaque 50% gray from sRGB space looks exactly the same as pure black at 50% opacity blended with a white background:

Composition of two colors against a white background without linearization

Composition of two colors against a white background without linearization

In the picture below the sRGB colors from the source and destination are linearized, composited, and then converted back to non-linear encoding for display. This is how those colors should actually look:

Composition of two colors against a white background with linearization

Composition of two colors against a white background with linearization

We’ve now introduced a discrepancy that goes against the expectations. The only way to achieve the visual parity using that method is to pick all the colors using linear values, but those are quite different than what everyone is used to. A 50% gray with linear values looks like 73.5% sRGB gray.

Additionally, special care has to be taken when dealing with premultiplied alpha. The premultiplication should happen on the linear values, i.e. before then non-linear encoding happens. This will cause the linearization step to correctly end up with proper alpha premultiplied linear values.

Premultiplied Alpha and Bit Depth

While very useful for compositing, filtering, and interpolation, premultiplied alpha isn’t a silver bullet and has some drawbacks. The biggest one is the reduction of bit-depth of representable colors. Let’s think about an 8-bit encoding of value 150 that is premultiplied by 20% alpha. After alpha premultiplication we get

round(150 × 0.2) = 30

If we repeat the same procedure for a value of 151 we get:

round(151 × 0.2) = 30

The encoded value is the same despite the difference in the input value. In fact, values 148, 149, 150, 151, and 152 end up being encoded as 30 after alpha premultiplication, the original distinction between those five unique numbers is lost:

Premultiplication by 20% alpha collapses different 8-bit values into the same one

Premultiplication by 20% alpha collapses different 8-bit values into the same one

Naturally, the lower the alpha the more severe the effect is. From the possible range of 2564 (roughly 4.3 billion) different combinations of 8-bit RGBA values only 25.2% of them end up having unique representation after premultiplication, we’re effectively wasting almost 2 bits of the 32 bit range.

Transformation of colors between various color spaces often requires the color components to be un-premultiplied, that is divided by the alpha component, to get the original intensity of the color. That step is mandatory due to already mentioned common non-linear nature of the encoding. The existence of premultiplication reduces the accuracy of color representation and the conversions between color spaces may be imperfect.

In practice the reduction of bit depth is rarely important, especially when it comes to compositing. The lower the alpha value the less visible the color is and the less impactful its compositing becomes. If you’re aiming for pedantically accurate operations on colors you wouldn’t be using 8-bit representation in the first place – the floating point formats are much better suited for this purpose.

Further Reading

The concept of an alpha channel was created by Alvy Ray and Ed Catmull – the cofounders of Pixar. The former’s “Alpha and the History of Digital Compositing” describes the history of the invention and origins of the name “alpha”, while also showing how those concepts built on and significantly surpassed the matting used in filmmaking.

For a detailed discussion of the meaning of alpha I highly recommend Andrew Glassner’s “Interpreting Alpha”. His paper provides a rigorous, but very readable mathematical derivation of alpha as an interaction between opacity and coverage.

Finally, for a thorough discourse on premultiplied alpha you can’t go wrong with “GPUs prefer premultiplication” by Eric Haines. He not only provides a great overview of the issues caused by not premultiplying, especially as they relate to 3D rendering, but also links to a bunch of other articles that discuss the problem.

Final Words

This article originally started as a simple exposé of Porter-Duff compositing operators, but all the other concepts related to alpha compositing were just too interesting to leave by.

One thing I particularly like about the alpha is that it’s just an extra number that accompanies the RGB components, yet it opens up so many unique rendering opportunities. It literally created a new dimension of what could be achieved in plain old world of compositing and 2D rendering.

The next time you look at smooth edges of vector shapes, or you notice a dark overlay making some parts of a user interface dimmer, remember it’s just a small but mighty powerful component making it all possible.

Color Spaces

For the longest time we didn’t have to pay a lot of attention to the way we talk about color. The modern display technologies capable of showing more vivid shades have, for better or for worse, changed the rules of the game. Once esoteric ideas like a gamut or a color space are becoming increasingly important.

A color can be described in many different ways. We could use words, list amounts of CMYK printer inks, enumerate quite flawed HSL and HSV values, or even quantify the responses of cells in a human retina. Those notions are useful in some contexts, but I’m not going to focus on any of them.

This article is dedicated entirely to RGB values from RGB color spaces. It may seem fairly restrictive, but considering the domination of displays as the medium for color presentation, it is a pragmatic approach and it ultimately won’t prevent us from describing everything we can see.

A dry definition of a color space is not a good way to kick things off. Instead, we’ll start by playing with one of the most common tools used to specify colors.

Color Pickers

You’ve probably seen an RGB color picker before, it usually looks like this:

By dragging all the sliders to the right you can create a white color. Lots of red and green with little blue produces shades of yellow. That color picker is not the only way to specify colors. Try using the one below:

The gist of the behavior of that new picker is the same – the red slider controls the red component, the green slider affects the green component, and the blue slider acts on the blue component. Both sets of sliders have the minimum value of 0 and the maximum value of 255. However, the shades and intensities they create are quite different. We can compare some of the colors for the same slider positions side by side. In each plate the top half shows the color from the first picker and the bottom half contains the color from the second picker:

The only color that looks the same is a pure black, none of the other colors match despite the same values of red, green, and blue. You may be wondering which halves of the plates are wrong, but the truth is that they’re both correct, they just come from different RGB color spaces.

I’ll soon explain what causes those differences and we’ll eventually see what defines a color space, but for now the important lesson is that on their own the numeric values of the red, green, and blue components have no meaning. A color space is what assigns the meaning to those numeric values.

What I’ve shown you above is an extreme example in which almost everything about the two color spaces is different. Over the course of the next few paragraphs we will be looking at simpler examples which will help us get a feel for what different aspects of color spaces mean. Before we discuss those aspects we should make a quick detour to revisit how the components of an RGB color are described.

Normalized Range

When dealing with RGB colors you’ll often seem the specified in a 0 to 255 range:

255, 230, 26

Or in an equivalent hexadecimal form:

ff, e6, 1a

While using the range of 0 to 255 is convenient, especially for specifying colors on the web, it ties the description of color to a specific 8-bit depth. What we actually want to express is the percentage of the maximum intensity a red, green, or blue component can represent.

In our discourse I’ll use a so called normalized range where the minimum value is still 0.0, but the maximum value is 1.0. Calculating normalized values from 0 to 255 range is easy – just divide the source numbers by 255.0:

1.000, 0.902, 0.102

This form may feel less familiar, but it lets us talk about the values without having to care what kind of discrete encoding scheme they’ll eventually use.

Intensity Mismatch

Let’s continue playing with the color pickers by upgrading them to show two colors at the same time. In the top half we’ll see the color obtained from the RGB values interpreted by one color space, while the bottom half shows the color obtained from the same RGB values, but interpreted by another color space:

While playing with the sliders you may have noticed something peculiar – if the sliders are at their minimum or maximum the colors look the same, otherwise they don’t. Here’s a comparison of some of the colors for different slider values:

A color space can specify how the numeric values of the red, green, and blue components map to intensity of the corresponding light source. In other words, the position of a slider may not be equal to intensity of the light the slider controls. The color space from the bottom half uses a simple linear encoding:

(intensity value) = (encoded value)

The light shining at 64% of its maximum intensity will be encoded as the number 0.64. The top color space uses an encoding with a fixed 2.0 exponent:

(intensity value) = (encoded value)2.0

The light shining at 64% of its maximum intensity will be encoded as 0.8.

This may seem all like a pointless transformation, but there is a good reason for doing all this nonlinear mapping. The human eye is not a simple detector of the power of the incoming light – its response is nonlinear. A two-fold increase in emitted number of photons per second will not be perceived as twice as bright light.

If we were to encode the colors using floating point numbers the need for a nonlinear encoding function would be diminished. However, the numeric values of color are often encoded using the familiar 8 bits per component, e.g. in the most common configurations of JPEG and PNG files. Using a nonlinear tone response curve, or TRC for short, lets us maintain more or less perceptual uniformity and use the chunky, quantized range to keep the detail in the darker parts.

Here are the first 14 representable values in 8 bit encoding of linearly increasing amount of light. You can probably tell that the brightness difference we perceive between the first two panes is much larger than between the last two panes:

First 14 representable linear shades of gray in 8 bit encoding

First 14 representable linear shades of gray in 8 bit encoding

The extremely common sRGB color space employs a nonlinear TRC to make a better use of the human visual perception. Here are the first 14 representable values in an 8 bit encoding of output sRGB values. You may need to ramp up your display brightness to see them, or maybe even make the background black:

First 14 representable sRGB shades of gray in an 8 bit encoding

First 14 representable sRGB shades of gray in an 8 bit encoding

Notice the hollow circles under all but the first and the last colors. None of those 12 shades of gray would be representable in an 8 bit format if we were to use straightforward linear encoding. You’d actually need a range of 0 to 4095 (12 bits per component) to represent the same very dark shades of sRGB gray without using any tone response curves.

The situation at the other end of the spectrum is more difficult to visualize since the images on this website use sRGB color space, but we can at least show that the white shades in linear color space shown in the upper part of the picture get darker more slowly than corresponding white shades in sRGB at the bottom:

Last 14 representable linear (top) and sRGB (bottom) shades of gray in an 8 bit encoding

Last 14 representable linear (top) and sRGB (bottom) shades of gray in an 8 bit encoding

Linear encoding has the same precision of light intensity everywhere, which accounting for our nonlinear perception ends up having very quickly increasing brightness at the dark end and very slowly decreasing darkness at the bright end.

Different color spaces use different TRCs. Some, like DCI P3, use a simple power function with a fixed exponent. Others, like ProPhoto employ a piecewise combination of a linear segment with a power segment. The magnitude of the exponent differs between color spaces – there is no unanimous opinion on what constitutes the best encoding. Finally, some color spaces don’t use any special encoding and just represent the component values in linear fashion.

While TRCs are very useful for encoding the intensities for storage, the crucial thing to remember is that mathematical operations on light only make sense when done on linear values, because they represent the actual intensities of light. It’s actually fairly easy to visualize in a gradual mix between a pure red and a pure green:

Notice how in the top halves, which use nonlinear encoding, the colors get darker in the middle, while in the bottom, linear halves the progression looks more balanced.

What we’ve discussed in this section is the first component of a typical RGB display color space. We can now say that one of the defining properties of a color space is:

A set of tone response curves for its red, green, and blue components.

The encoding is an essential, but nonetheless, technical part of a color space. The specification of the actual colors is what we usually care about the most.

Primary Differences

Let’s look at a different example of two color spaces in a color picker. To make things simpler, both halves operate with linear intensities:

Notice that grayscale colors now match perfectly, however, the colors in the bottom half are clearly more subdued for the same numeric value:

I’ll symbolize the red, green, and blue values from the top half with a small bar on top of the letters: RGB. For the values from the bottom half I’ll use a bar at the bottom: RGB.

You may be wondering if, despite a rather different behavior, it is possible to make the colors match. In other words, knowing the RGB values we’d like to express the same color using RGB values.

The notion may seem contrived, but the underlying ideas are important for understanding how the same color can be described by many different triplets of values. To see the scenario in action we can fix the bottom half of the picker and let the sliders control only the top color:

After some trial and error you may get pretty close, or perhaps even achieve the exact match. Unless you’re quite skilled, the task probably wasn’t trivial and trying to do the same manual work for every possible color would be a daunting endeavor.

A consistent reproduction of color is crucial for realizing any design – we wouldn’t want a carefully selected shade of pastel beige to look like a ripe orange when seen on some other device. We’re in a dire need of a better approach for trying to make the colors look the same despite a different meaning of the values of the RGB components.

Primaries Matched

Reproducing arbitrary colors between different color spaces is difficult, but we can try to simplify things a little by just trying to match the pure red, green, and blue colors and seeing where it gets us. Let’s start by matching the red from the bottom plate:

Making the border between the two halves disappear may take some tweaking, but eventually we can agree on values that fit perfectly. It’s a step in the right direction – since we’re dealing with linear values we can now express how much a single unit of R will contribute to RGB components. For instance, to recreate the RGB color of 0.60.00.0 we’d need to use:

0.6 × 0.712 = 0.427 units of R
0.6 × 0.100 = 0.060 units of G
0.6 × 0.024 = 0.014 units of B

We can put these results into equations. When a color from the RGB space has no green and no blue, but some amount of red, we could use the following formulas to see the same color in the top plate:

R = 0.712×R
G = 0.100×R
B = 0.024×R

We can repeat the experiment for pure G:

Having decided on the the perfect match, we can write down the impact of G on RGB components:

R = 0.221×G
G = 0.690×G
B = 0.000×G

All that’s left is to repeat the experiments for pure B. If you’re still not tired of dragging the sliders you can do it yourself, or just let me do it:

Once again, the results can be written down as:

R = 0.067×B
G = 0.210×B
B = 0.976×B

The provided equations apply only when the RGB half shows just some amount of red, or just some amount of green, or just some amount of blue. Being limited to having non-zero values in only one component won’t get us far. To create a more diverse set of colors we need to have a way of finding out how mixes of basic components in RGB affect values in RGB.

Mixing Things Up

Thankfully, Grassmann’s laws come to our rescue. Those empirical rules tell us that if two colored lights match, then after addition of another set of matching light sources their overlap, or their sum, will also match:

The sum of two matching sets of colored lights also matches

The sum of two matching sets of colored lights also matches

As a result we can do the three matchings for R, G, and B separately, then just combine the results into one set. For instance, to obtain the final R value from RGB all we need to do is to sum the contribution of R to R, the contribution of G to R, and the contribution of B to R. Repeating the trick for the other two components yields the following set of equations:

R = 0.712×R + 0.221×G + 0.067×B
G = 0.100×R + 0.690×G + 0.210×B
B = 0.024×R + 0.000×G + 0.976×B

What we just did was a matching of primary red, green, and blue colors from one color space to another. With the equations in hand we can finally recreate the math behind the matching beige from the first example. By plugging in the original RGB values of 0.90.50.2 we can calculate the value of the exact match:

R = 0.712×0.9 + 0.221×0.5 + 0.067×0.2 = 0.765
G = 0.100×0.9 + 0.690×0.5 + 0.210×0.2 = 0.477
B = 0.024×0.9 + 0.000×0.5 + 0.976×0.2 = 0.217

Seeing the Matrix

If you look closely at the three equations governing the conversion from RGB to RGB you may notice the coefficients form a 3×3 grid:

0.7120.2210.067
0.1000.6900.210
0.0240.0000.976

It’s actually a 3×3 matrix that defines the conversion from RGB to RGB. If you’re familiar with matrices and vectors, you might have already realized that the transformation we did was a plain old matrix times vector multiplication.

On its own the matrix isn’t particularly interesting, but the transformation it does can be visualized in 3D. In the following interactive diagram the outer cube defines the RGB space, while the inner, skewed cube (parallelepiped) defines the RGB space. You can drag the cubes around to see them from different angles and control the RGB color indicator using the sliders:

As an experiment I encourage you to set red and green components to 0 and just play with the blue slider. You should be able to see that movement along the pure blue color in RGB space requires some shift in red and green in RGB.

Notice that RGB parallelepiped fits completely inside RGB cube which means that one can recreate every single RGB color using RGB space. The inverse, however, is not true.

There and Back Again

In the example below the background of the top half is set to a pure R at maximum intensity. You may try really hard, but there is no combination of values in RGB that can cause the seam between the two halves to disappear:

If we ignore the issue for a minute and treat the problem algebraically we can solve the system of 3 equations looking for pure R at the output:

1.0 = 0.712×R + 0.221×G + 0.067×B
0.0 = 0.100×R + 0.690×G + 0.210×B
0.0 = 0.024×R + 0.000×G + 0.976×B

The details of the evaluation are boring, so let’s just present the solution as is:

R = +1.470
G = −0.201
B = −0.036

All three values are outside of 0.0 to 1.0 range – a clear indication of the unreachability of that color in RGB space!

Values larger than 1.0 have a very straightforward explanation: 1.0 of R is beyond the range of intensity of R, you could say that R is not strong enough. Some less intense values of R can contribute to R within limit. For example, 0.5 of R requires 0.5 × 1.470 = 0.735 of R.

Negative values are slightly trickier because they are imaginary, but we can still reason about them quite easily.

Negative Values

Let’s try to think about an experiment in which we’re tasked with matching a light on the right side with a combination of a red, green, and blue light on the left side:

Mixing lights to obtain the color on the right

Mixing lights to obtain the color on the right

After some tuning we may realize that even with no green and no blue light the colors still don’t match and the left patch of light continues to look a bit orangey indicating that it has too much green in it:

The left patch is still too green

The left patch is still too green

We can’t decrease the green light on the left side anymore, so we have to become more creative. We could say that the left side is too green, or we could say that the right side is not green enough! Since all we care about is making the colors match we could try adding some of the green light to the other side:

Adding green light to the target side

Adding green light to the target side

Now the right side is too green, but if we adjust the added amount we can eventually make the colors look the same:

A match with just the right amount of green

A match with just the right amount of green

Let’s try to write down what we’ve achieved in an equation. We have a maximum amount of red and no green or blue on the left side, while on the right side we have the target color with some green light added to it:

1.0×R + 0.0×G + 0.0×B = (target) + 0.2×G

This feels a little mischievous – we wanted to match the pure target color without any additions. Let’s try to rearrange the equation by moving the added green light from the right to the left side:

1.0×R − 0.2×G + 0.0×B = (target)

That equivalent equation tells us that getting the exact match with the starting target color requires using a negative amount of the green light on the left side. Naturally, this is something we can’t do in a physical world. Mathematically, however, a match of the very saturated red from the right side really requires removing some green light from the left side.

If you’re still not convinced consider what would happen if we somehow were able to create −0.2×G on the left side. I can’t show you a picture of that imaginary situation, but I can show you what would happen if we then added 0.2×G to both sides:

This is exactly the same image as before since both scenarios are equivalent. Recall from Grassmann’s laws that adding the same amount of light to both sides maintains the color match, so indeed the imaginary −0.2×G on the left side must have made it look like the original unmodified saturated red from the right side.

This may all be somehow unsatisfying, however, when you step away from the physical restrictions and think about lights as just numbers, it all, quite literally, adds up.

The Matrix Seeing

If we go back to our quest for obtaining RGB values from RGB colors we can repeat the “system of equations” trick for the other two components and combine the results using Grassmann’s laws to end up with the full set of equations:

R = +1.470×R − 0.470×G + 0.000×B
G = −0.201×R + 1.513×G − 0.311×B
B = −0.036×R + 0.011×G + 1.025×B

Once again, the set of coefficients can be presented in a 3×3 matrix:

+1.470−0.470+0.000
−0.201+1.513−0.311
−0.036+0.011+1.025

If you’ve solved systems of linear equations before, you may have realized that this is just an inverse matrix of the original RGB to RGB transformation. This is a very useful property. When establishing the conversion from one linear color space to another we just need to figure out the coefficients of the matrix in one direction – the matrix in the other direction is just its inverse.

One more time we can visualize the transformation the matrix performs. This time RGB is the perfect cube, while RGB is the outer skewed cube:

You can easily see how much bigger the RGB space is and indeed it lets us express some colors that RGB can’t.

Breaking the Boundaries

So far whenever a numeric value in a color space was outside of 0.0 to 1.0 range we’d just clip it to the representable limits. Let’s try to see what happens when we remove that restriction. We can modify the sliders to allow −1.0 to 2.0 range and try to match the pure R again:

With that approach we can actually represent the pure R using RGB color space. Since the numbers don’t care, the entire thing works and is often called unbounded or extended range.

To actually see RGB colors represented using extended range RGB colors the display has to be capable of showing RGB colors in the first place, otherwise they would get physically clamped by the display itself. To put it differently, the transformation of values from a color space with extended range to the native color space of the display should end up with the values in a standard range.

While unbounded values are very flexible, there are two important caveats one should consider when dealing with them. First of all, since the range of values is unlimited, it is possible to create colors that truly have no physical meaning, e.g. −1.0−1.0−1.0.

Additionally, storing the values in a type with a limited range may not be possible. If we decide to use 8 bits per component and encode the 0.0 to 1.0 range into 0 to 255 range then we won’t be able to represent the values outside of normalized range since they simply won’t fit.

One Space to Rule Them All

All the color transformations we performed are somewhat contrived – the two color spaces we analyzed are defined in relation to each other and have no immutable attachment to the physical world. Their reds, greens, and blues are whatever we ended up seeing on the screen. We have no way of knowing how the colors would look on another device.

If we had a master color space that was derived from physical quantities we could solve the problem by finding a transformation from any color space to that common connection space. Luckily, in 1931 that very color space has been specified by CIE and resulted in what is known as CIE 1931 XYZ color space. Before we discuss how the space was defined we need to take a quick look at physics and the human visual system.

And There Was Light

To ground the discussion of color perception in the real world we have to establish it in terms of light – the part of electromagnetic radiation that is visible to the human eye. The visible spectrum can only be approximated on a typical display, but its colors at specific wavelengths look roughly like this:

Visible spectrum with wavelengths shown in nanometers

Visible spectrum with wavelengths shown in nanometers

Colors corresponding to a single wavelength are called spectral colors. For example, a light with wavelength of 570 nm would produce a pure spectral yellow. Notice that boundaries between the colors are soft, so there is no single “spectral red”, but a wavelength is all we need to describe a spectral color accurately.

Human eyes are not simple wavelength detectors and the perception of the same color can be created in many different ways. A yellow color can be created using light with wavelength around 570 nm, or as a mixture of a green and a red light.

Since a human retina has three different types of photoreceptive cones, any color sensation can be matched using three fixed colors with varying intensities. It doesn’t mean that all colors can be recreated using just additive mixes of three colors. As we’ve discussed, sometimes one or two of those colors have negative weights and have to be added to the target color instead.

Algebraically, however, we don’t need more than three primary colors to represent everything humans can perceive (with some very rare exceptions). This property of the human visual system is called trichromacy and is exploited by various display technologies to present a very broad spectrum of colors using just three RGB primaries.

The Color Matching Experiments

At the beginning of the 20th century David Wright and John Guild independently conducted experiments with human observers in which the test subjects tried to match the spectral colors in 10 nm increments with a combination of red, green, and blue light sources. The experiments were fairly similar to what’ve been doing with the sliders trying to match the target color as a combination of some other three colors.

The results of the experiments were standardized as rgb color matching functions that can be presented on a graph:

CIE rgb color matching functions

CIE rgb color matching functions

Notice the presence of negative values. Not all spectral colors could be achieved by a combination of selected RGB values and yet again a negative value means that the color was added to the target value.

The CIE standardized the reference R of the experiments as monochromatic light with wavelength of 700.0 nm, the reference G as 546.1 nm, and the reference B as 435.8 nm, with specific ratio of relative intensities between them. This is a critical step in our discussion of color – we’re finally grounded to some physically measurable properties.

Let’s look at the yellow wavelength of 570 nm. Reading from the color matching plot we can tell that the color match required 0.16768 units of R, 0.17087 units of G and −0.00135 units of B. A set of three RGB coordinates just begs for a three dimensional presentation. If we read the color matching value for every wavelength we obtain the following plot:

You may wonder why the spectrum curve doesn’t go through the pure red, green, or blue endpoints of the CIE RGB cube, but it’s simply the result of normalization and scaling applied to the color matching functions by the committee. Within reasonable limits a light source can be constructed to be as powerful as needed, so what constitutes a “1.0” is always defined somehow arbitrarily.

XYZ Space

CIE RGB color space is rooted to concrete physical properties of monochromatic lights and we could use it to define any color sensation. However, the committee strived to create a derived space that would have a few useful properties, two of which are worth noting:

  • No negative coordinates of spectral colors
  • Separation of chromaticity (hue and colorfulness) from luminance (visual perception of brightness)

After some deliberation the following set of equations was created:

X = 2.769×R + 1.752×G + 1.130×B
Y = 1.000×R + 4.591×G + 0.061×B
Z = 0.000×R + 0.057×G + 5.594×B

The factors may seem arbitrary, and in some sense they are, but this is simply yet another mapping of values from one space to another, similar to what we did with RGB to RGB transformation. The Y component was chosen as the luminance component, while X and Z define the chromaticity.

We can visualize the CIE XYZ space inside the CIE RGB space:

You can see how the XYZ space is designed around the spectrum curve to make sure it fits in the positive octant. Its smaller size is not really a concern, the scaling makes things more convenient to work with by making sure the perceptually brightest wavelength of 555 nm has a Y value of 1. As a final step we can just show the XYZ space and the spectral colors on their own:

Since the CIE XYZ space is derived from CIE RGB it is also grounded in measurable physical quantities. The XYZ color space is the base color space used for all conversions of matrix-based RGB display color spaces. The mapping between any two color spaces is done through a common CIE XYZ intermediate. This simplifies the color transformations since we don’t need to know how to convert colors between any two arbitrary color spaces, we just need to be able to convert both of them to the XYZ space.

xy Chromaticity Diagram

Three dimensional diagrams are fun to play with, but they’re often not particularly practical – a 2D plot is often easier to work with and reason about. Since the Y component of the XYZ space is devoid of any colorfulness and hue, we’re left with only two components that affect the chromaticity of a color. We can perform three operations intended to reduce the dimensionality of the space:

x = X / (X + Y + Z)
y = Y / (X + Y + Z)
z = Z / (X + Y + Z)

The distinction between a lower and an upper case is important here – X is not the same as x. These seemingly arbitrary equations have a simple visual explanation – it’s a projection onto a triangle spanned between XYZ coordinates of 1.00.00.0, 0.01.00.0, and 0.00.01.0:

Notice that x, y, and z add up to 1, so we can drop the last component since it’s redundant – we can always recreate it by subtracting x and y from 1. Rejection of z is equivalent to a flat projection onto xy plane:

If we repeat that step for every combination of spectral colors we can finally present the 2D plot known as CIE xy chromaticity diagram in its full glory. You may have seen this horseshoe shape before:

CIE xy chromaticity diagram

CIE xy chromaticity diagram

No RGB display technology is capable of presenting all the colors that humans can see, so many of the ones shown in the picture are actually the result of clamping to the representable range of the sRGB color space.

The colors on the inside of the plot are just some combinations of the spectral colors and the colors outside of the plot don’t exist. Notice the straight diagonal line connecting the end points of the red and blue area. While every point on the outline of the horseshoe shape has a corresponding spectral wavelength, the colors on that line of purples do not – there is no wavelength of light that looks like magenta. The purples are simply how the human brain interprets the mixes of red and blue light and ultimately they are no different than any other shade. Perception of every color happens in our heads.

It’s important to mention that the xy chromaticity diagram is not perceptually uniform. In some areas of the plot one has to move relatively far away from a chosen color to notice the difference in chromaticity, while in some other areas the distance to change is much smaller. Over time CIE developed more uniform chromaticity diagrams, but since the xy diagram is easily obtained from the XYZ space it continues to be used in discussion of RGB color spaces.

Gamut

The chromaticity diagram is useful in visualizing the gamut of a color space – the extent of colors that a color space can represent. It’s necessary to note that a gamut is a three dimensional construct, so a 2D projection onto an image, somehow contrarily, does not present a full picture. It’s nonetheless a useful tool employed in comparison of color spaces.

We can finally present the chromaticities of primaries of both RGB and RGB color spaces in a single graph:

Comparison of primaries

Comparison of primaries

I’ll discuss the meaning of the little cross in the middle soon, but the important fact is that a triangle depicts all representable chromaticities of a color space. Notice how RGB triangle is smaller than RGB triangle showing us yet again that the latter can represent more colors.

This diagram summarizes a long journey we took to define the second component that defines every RGB color space:

Location of its red, green, and blue primaries on CIE xy chromaticity diagram.

In the simulation below you can drag the slider to see how the extent of the chromaticity triangle corresponds to the representable colors. The base values of an image from sRGB color space are converted to a space with a reduced gamut, clamped to 0.0 to 1.0 range, then finally converted back to sRGB for display. You can clicktap the image to change it:

Your browser doesn't support WebGL which is required for this demo.

In some cases the color clamping is pretty severe, but for the image with snowy mountains the change is minimal. An almost black and white picture has little chromaticity and therefore a reduced gamut has almost no effect on it.

White Point

The last aspect of color spaces we will discuss is related to the little cross in the middle of RGB and RGB gamut visualization. Similarly to how different color spaces can assign different colors to their pure red defined as 1.00.00.0, the white point of a color space defines its color of white – the represented color when all three components are ones 1.01.01.0.

In the color picker below both halves have the same RGB primaries, but different white points. See what happens when you drag all the sliders to the right:

The color space form the bottom half defines its white as a slightly different color than the top color space.

A 3D visualization shows two interesting properties. Firstly, the axes of the inner and the outer cubes are collinear, they’re just scaled differently. Secondly, the “far” endpoints of the cubes no longer overlap since their whitepoints are different:

The white point is the last piece of the color space puzzle. We can now say that a color space is also defined by:

A location of its white point on CIE xy chromaticity diagram.

With xy coordinates of the red, green, and blue primaries, and the xy coordinates of the white point one can evaluate the RGB to XYZ transformation for a given color space. The details of this computation are not critical to our discussion and you can read about them in many places online, e.g. on Bruce Lindbloom’s website.

While necessary for correct calculation of RGB to XYZ transformation, in practice it may be difficult to notice that two color spaces have different whitepoints. Most color conversion operations will undergo chromatic adaptation and the color of 1.01.01.0 in the source color space will be mapped to 1.01.01.0 in the destination color space.

sRGB Color Space

We’ll finish our discussion with a showcase of the sRGB color space, described by its authors as “A Standard Default Color Space for the Internet”. If you ever were specifying just “RGB” colors it’s extremely likely that the components were assumed to come from the sRGB color space. In fact, it’s the color space used in the very first color picker in this article.

While the sRGB specification isn’t free, Wikipedia and International Color Consortium provide all the information we need to describe it.

Tone Response Curve

Let’s look at the plot of the tone response curve of the sRGB color space. The light intensity value is on the vertical axis and it’s symbolized with a sun icon. The encoded value is on the horizontal axis and it’s symbolized with a binary code:

Tone response curve of sRGB

Tone response curve of sRGB

While the curve may look like a single power function, in the range of 0 to 0.04045 it actually is a linear segment:

(intensity value) = (encoded value)/12.92

When encoded value is larger than 0.04045 the intensity value is indeed defined by a slightly nudged power function with exponent of 2.4:

(intensity value) = (((encoded value) + 0.055)/1.055)2.4

The transition between the two parts is continuous and I marked its location on the plot with a little bump.

The curve was designed to roughly correspond to the response curve of CRT displays thus to some extent minimizing the need for any color management. That convenience had the unfortunate consequence of creating a widespread confusion by binding the two separate ideas into one.

In general case the purpose of tone response curves is to make the best use of available space when storing the color information in formats with limited bit depth. The non-linear response of an electron gun in a CRT display is a separate and only loosely related concept.

Chromaticity and White Point Coordinates

Chromaticity coordinates of red, green, blue, and white are just 4 pairs of x and y values and we can put them in a table:

Red Green Blue White
x 0.6400 0.3000 0.1500 0.3127
y 0.3300 0.6000 0.0600 0.3290

You’ll probably agree that this form is not particularly exciting, things look much nicer on the xy chromaticity diagram:

Chromaticity and white point coordinates of sRGB

Chromaticity and white point coordinates of sRGB

On a technical note it’s worth mentioning that the primaries share the same location as Rec. 709 – the standard for HDTV. Additionally, the white point location corresponds to Standard Illuminant D65 – a representation of the average daylight.

While sRGB isn’t capable of representing some of the more vivid shades and clearly doesn’t contain all the colors humans can see, it has served its purpose as a standard color space of 8-bit graphics remarkably well.

Further Reading

The web is filled with interesting gems related to color. For a very good dissection of many of the major components on a pathway between hex colors and what we end up perceiving with our eyes I suggest reading Jamie Wong’s “Color: From Hexcodes to Eyeballs”. I especially like his very approachable explanation of spectral distributions.

Bruce MacEvoy’s set of pages on color vision is an amazing resource about various aspects of color. I highly recommend the first post in the series in which the author describes the fascinating details of biophysics of the human visual system.

Elle Stone authored an extensive collection of great articles on color spaces, calibration, and image editing. For a different take on what we’ve touched upon in this article you should see “Completely Painless Programmer’s Guide to XYZ, RGB, ICC, xyY, and TRCs”.

If you think you’d enjoy a deep dive into the optimization process of creating a minimal sRGB ICC profile, Clinton Ingram has got you covered in his four part series. His pragmatic approach to the problem provides a very entertaining read.

Finally, as a resource in a more traditional form, I recommend the book “Color Imaging: Fundamentals and Applications”. It covers a vast selection of topics related to color and its reproduction, all in a very readable form.

Final Words

Many of the concepts we’ve discussed were developed decades ago, but despite the age even the old ideas continue to be incredibly useful and the science behind them is expanded to this day.

Color is one of those areas with seemingly infinite depth of complexity. I hopefully showed you that some of that complexity isn’t actually as scary as it looks, you just need to shine a light on it.