This document details the basic operations that arise when dealing with different coordinate systems. We will look at the special case of orthogonal, right-handed and unit-length basis vectors with a translation, as this is a very useful subset of coordinate transforms, that comes up in various fields and applications.
More general versions with scaling, or non-orthogonal basis vectors can be derived in a very similar way, though they lose a bit of the geometric intuition, so we don't cover them here.
Knowledge about vectors and matrix operations are needed.
Coordinate systems and transforms
A coordinate system A can be described by its origin oA and a set of n basis vectors xA,i,i=1,…,n.
We require, that the basis vectors have a length of 1 and are perpendicular to each other. This can be expressed with the dot product as:
xA,i⋅xA,j={10, if i=j, else=δij
(The second line is the so called Kronecker delta and is just a shorthand for the brackets above)
Furthermore, they should form a right-handed coordinate system. This basically means, that the coordinate axes should "look like" the standard basis. In 2D, the x-axis points right and y up. You can rotate these two around any angle, and they will still "look" the same. In 3D, the axes should conform to the right-hand-rule. In general n-D space, which includes the previous ones, we write the coordinate axes as rows or columns in a matrix and we call it right-handed, if that matrix has a determinant greater than zero (or exactly one in the case of our orthonormal system).
We imagine a point p. This is a fixed point "in the world". In the coordinate system A it can be described by its coordinates in that system piA, where the index iA corresponds to the associated basis vector xA,i. Geometrically, we move to the origin of A (oA) and then continue to move piA units in the direction xA,i for each xA,i. This can be written as:
p=p1AxA,1+⋯+pnAxA,n+oA
A vector v does not have a fixed location in space, but just a direction and length. Therefore there is no addition of the origin.
v=v1AxA,1+⋯+vnixA,n
To see why that is, we can define a vector as the difference of two point p−q.
Taking the dot product of both sides with one of the axis vectors allows us to retrieve the individual coordinates, thanks to the property xA,i⋅xA,j=δij.
This can be written in matrix notation with each xA,i being a column and the coordinates piA forming a column vector pA. The upper index denotes the coordinates given in the system A.
We write which coordinate system the matrix R comes from as an index, here A.
As the base vectors are normalized and orthogonal, the matrix R is a rotation matrix, which has the nice property, that R−1=RT.
To actually calculate this, we will need to introduce some arbitrary coordinate system, in which we can express the xA,i and oA. As there is no absolute coordinate frame, any choice is valid. You basically just need a way to measure your coordinates (or just being able to calculate the dot products between the axes and points or vectors! This means, you could geometrically construct your first coordinates!),
We can redo the extraction of the coordinates, but in matrix notation, this extracts the whole coordinate vector! To get the coordinate for one of the axes, we applied the axis with a dot product to the vector. If we just write all axes in the rows of a matrix and left multiply, that is exactly the same, but for all of them at once. But that is just (RA)T. The resulting matrix of (RA)TRA contains all the pairwise dot products xA,i⋅xA,j=δij and is thus the identity matrix. So we have:
So applying (RA)T on the left hand side projects a vector onto the coordinate vectors of A!
We could also use non-normalized non-orthogonal axis vectors. In the calculations, we would just need to replace the transpose with the inverse when it comes up and everything still works the same. We can always find our special case axes though, for example with the Gram-Schmidt process. So in many practical cases, you might not want to deal with these more generic coordinate systems. So in the following, we will stick to the nicer version. Having dot products with orthogonal vectors also seems more geometrically easy to visualize. While you can construct stuff like the contravariant basis, this isn't really the focus here.
With that, we define a Transform. This isn't actually much different from the coordinate frame. From the last equations, we can see the base idea: A transform takes a coordinate vector defined in one coordinate system (start) A and gives us the vector coordinates in the (target) system B defined by the xB,i with origin oB.
We define a transform as the structure TAB={RAB,tA→BB} with the following semantics:
Given coordinates in the start system A, we can compute the coordinates in the target System B with TAB according to:
pB=TAB(pA)=RABpA+tA→BB
So a transform is a combination of a rotation and a translation of the coordinates. Writing the upper and lower indices in this way allows us to quickly check if we made a mistake, as multiplying a matrix will always have a matching lower index with the upper one of the coordinates. The translation tA→BB represents the displacement that aligns the origin of B with A as expressed in the axes of B.
In fields like computer graphics, this is often combined into a single matrix (with higher dimension to allow for translation), as it allows for very easy concatenations of different operations by the usual matrix multiplication. We will not use that here, but the formulas can be easily converted. This can also include scaling and other linear transformations. You can also write it as multiple special matrices (rotation, scale, translation), which allows us to exploit the special structure of these matrices like this:
In this, you could also replace the rotation matrix by a unit quaternion, which has some nice properties.
One issue you will encounter with this, is that you have a hierarchy of coordinate systems with non-uniform scales and rotations, the full transform will not adhere to that definition, it will include some skewing! That is why you can't for example just retrieve a global scaling for an object in engines like Unity. So this setup only works locally (which is already very useful!)
In the next sections, we will look at the formulas to work with transforms. The sections include proofs, but you can skip them if you are not interested and just want to use or check the formulas.
Defining a transform based on two coordinate systems
We are given a start coordinate system A and a target B. The transform that transforms point coordinates from A to B according to pB=RABpA+tA→BB is given by:
The upper index B comes from the fact that (RB)T by definition transforms a vector into the coordinates of B.
Important: The basis vector and origin coordinates of the two coordinate systems must be expressed in the same system. As stated before, this is arbitrary and could be any system we like. One could for example express both systems as seen from A. This would lead to the simplification: (xA,1A…xA,nA)=In (In is the identity matrix) and oA=0.
This is a common formula found for example in the camera coordinate system transform in computer graphics. First you move to the center (−oBA) and then you project onto the camera axes. This projection comes from RAB which contains the axis vectors in its rows. Multiplying a vector to the right is equivalent to computing the dot product with each axis vector. Since the vectors are normalized and perpendicular, this lets us know "how much" our vector points in the direction of one of the axes. This amount is just a coordinate. As a n+1 dimensional (we need one extra dimension to fit in the translation) matrix product this is:
Just to restate: The point itself does not change, we just use different coordinates appropriate for each system. We can equate both equations and then solve for pB, which will leave us with a transform expression.
The identity matrix comes again from the fact, that xB,i⋅xB,j=δij or alternatively, that the matrix formed by the basis vectors is a rotation matrix, with the transpose being its inverse, as stated previously.
RAB is the product of two rotation matrices and thus is a rotation matrix itself. This property is easy to check.
Furthermore, the determinant is one, since detRAB=det((RB)TRA)=det(RB)TdetRA=1∗1=1.
Next, we will check, that the computed transform is independent of the coordinate axes that we express our axes in (if at all in one).
We can compute, as seen before, the coordinates of the axis vector xA,i for the coordinate system W as xA,iW=(RW)TxA,i. The same is true for xB,i. The coordinate origin coordinates are found by oAW=(RW)T(oA−oW) with oBW being handled the same again. Doing that for all the axes gathered in the columns of RA (and similarly RB):
Note that the notation RAW is used, since (RW)TRA follows the exact same definition that we used for RAB. This means, that this product just represents the projection of A's axes onto the axes of W (and similarly for any system in the lower index that is projected onto the upper one). So RAW are the coordinates of A's axes in the the system W.
Furthermore, the notation also works, if the axes are just given as abstract vectors, in that case we just leave the index blank! For example, RA transforms point coordinates from A into the "geometric space".
We can use this in our found formula for the transform elements.
So the formulas stay exactly the same, regardless in which coordinate system we express the points and vectors in, as long as that system is the same for all of them.
For the special case, we just want to check, that the local coordinates of the axes are the standard basis vectors (zeros everywhere, aside from the index of the axis, where there is a 1). We call this standard vector ei=00⋮1i⋮0n
The i-th basis vector of the coordinate system A is given as xA,i. Writing this as a matrix product:
xA,i=(xA,1…xA,n)00⋮1i⋮0n=RAei
Computing the local coordinates in A by applying (RA)T:
xA,iA=(RA)TxA,i=(RA)TRAei=Iei=ei
So, if we write all of these in the columns of a matrix as before, we get the rotation matrix expressed in the coordinate system A: RAA=I
The notation stays consistent here: This matrix transforms coordinates from A to A, so they are unchanged!
Next up, let's calculate the origin oA in the coordinates of A, namely oAA. From the formula to get coordinates of a point we have:
oAA=(RA)T(oa−oa)=0
Thus the origin of A in the coordinates of A is, as probably expected, the zero vector 0.
We can now return to the special case. Expressing all points and vectors in the transform TAB in the coordinates of A gives us:
Taking a look at the inverse composed transform, we can see, that it follows the same rules as matrix multiplications. The inverse of a composed ("multiplied") transform is the composition ("product") of the inverses of its parts in reverse order.
(TAB)−1=TBA=TMATBM=(TAM)−1(TMB)−1
As this section is the core of this document, there is also a quick example coming up on how to apply it.
Example
To give you an example how this is applicable, let's look at a simplified hierarchy of coordinate systems, that you could find in a part a humanoid animation rig or robot.
The direction of the arrows indicate how our local transformations are defined. Basically, we define a transformation, that takes us from a child to the parent. That way a child only needs to know how it is rotated and translated with respect to its parent. Therefore, we have four transforms:
TLHLS
TLSH
TRHRS
TRSH
We can now use our rules to construct the transformation that transforms points from the left hand to the right hand!
When we look at the tree above, we can try to find a path from left to right hand. In this case it is pretty easy, since the hierarchy is easy.
Left hand -> Left shoulder -> Head -> Right Shoulder -> Right hand.
Now we try to express that in transformations. With our index notation, this isn't too hard. We are looking for TLHRH. Since the order is from right to left, we know that the upper index on the left will be RH and the lower one on the right LH. From the sequence above, we can then just fill in the intermediate ones.
To make writing a bit easier, we can also split this into two parts: Left hand -> Head and Head -> Right hand.
We have expressed the full transform with only our initially known transforms. Now we need to apply the inverse and compose rules to each part and we get the final values.
This is something you see in most game engines, where you can define an object hierarchy, in 3D rigging, in robots and many more applications.
You just need to find the shortest (or actually any) path in the hierarchy between your desired start and target systems. In the easiest way, you start with your start transform and traverse the path, at each step applying the composition with the next node. Depending on whether you traveled along or against the direction of the graph array, the composition will take the transform itself (for example from left hand to left shoulder) or the inverse (for example from the head to the right shoulder).
With that, you will have implemented a generic coordinate transformation system!
Proof
For the proof, we just successively apply both transforms.
The transform TAB transforms point coordinates from A to B.
By definition, point coordinates are transformed as:
pB=RABpA+tA→BB
Vectors vA have no position in space and thus do not use the translation part.
vB=RABvA
Proof
The point formula is just the definition of the transform.
We already showed that the projection for vectors does not include the translational part, but we will write it again with the transform formula.
For the vector vA, we consider that it can be defined by two points pA and qA as vA=pA−qA. In the B coordinate system, the vector is of course still defined by the same points, but in B coordinates (pB,qB). Since we know how points transform, we just plug it in.
We can get information about the start and target coordinate axes from the rotation matrix RAB of a transform.
The i-th column is the i-th basis vector of A in coordinates of B.
The i-th row is the i-th basis vector of B in coordinates of A.
Proof
This is basically what we have already shown. The j-th coordinate of the i-th axis xA,i of A in B is found by projecting the vector onto xB,j: xA,i⋅xB,j.
Writing all of B's axis vectors into the rows of a matrix and multiplying with xA,i then gives us the full coordinate vector xA,iB.
xA,iB=(xB,0)T⋮(xB,n)TxA,i=(RB)TxA,i
Writing all xA,i in the columns of a matrix and multiplying by (RB)T on the left is then equal to multiplying (RB)T by each xA,i and writing the resulting vectors into a new matrix.
Thus we have shown, that the columns of RAB contain the axes of A as expressed in B.
By the same logic, we know that RBA contains the axes of A as expressed in B. But we have already shown, that RBA=(RAB)T. Thus, the columns of RBA are just the rows of RAB, which was the second statement.
Interactive demo
The following demo allows you to visually inspect the coordinate transform between two coordinate systems A and B. You can move around the systems by dragging the gray dots. You can also move the black point p.
The coordinates of p in the coordinate systems are displayed (can be disabled), so you can check the values.
Technically, we start by defining all the axes and points in the shared "world" coordinate system. We regard this as the default here and won't specifically annotate every vector with a W.
Below, the full transformation with all formulas is computed, if you want to check those as well. When cross-checking with the image, just keep in mind that there might sometimes be a slight discrepancy in the decimals due to rounding.