How matrices can help in games
Have you ever wondered how 3d models drawn in games really work? How these models rotate, move or even scale up and down? In this article I’ll explain how do games represent these models in the memory and how does a Matrix help us in simplifying our lives in game development.
What are matrices?
Matrices are basically an arrangement of data in N x M grid on which we can perform operations. Matrices can be operated on by a scalar or a function or another matrix itself. Operations performed by a scalar and a function on a matrix include multiplication, addition, division and subtraction. However, operations performed by one matrix over another includes addition, subtraction and multiplication.
Here is a typical 2 x 2 (2 rows , 2 columns) example of a matrix:
Out of all the operations on matrices the most important in games is the multiplication operator. The multiplication of two matrices is a tedious and complex task than you can imagine. The complexity of this operation increases proportionally to the number of rows and columns. The order of multiplication of matrices is of utmost important hence A * B is not the same as B * A, hence the multiplication operator is not commutative.
Assuming the reader is aware of how matrix operations work we will move forward with the application of matrices in games.
How can a matrix be used to play around with the data in games?
Games usually represent one model as a mesh which contains a set of vertices. To simplify how a basic mesh would look like here is a picture:
The points representing the triangle in the image are as follows:
A = (1, 3) , B = (-2, 2), C = (2, 1)
These points are termed as ‘vertices’. When it comes to games, it is considered to be more than just data points. It can also hold the information for texture, normal vectors and much more. This is the mathematical way of representing meshes on a graph. However, this data is arranged in a 1 x 2 matrix or a 2 — dimensional vector for each point in the vertex. Hence the points can be described as:
A = [ 1 3 ] , B = [ -2 2 ] , C = [ 2 1 ]
These vertices represent a complete Mesh hence each Mesh has its own matrices for performing the following operation:
- Translation
- Rotation
- Scaling
Out of these three matrices, rotation normally has additional matrices to represent rotation on each axis for 3d meshes. For simplicity sake we will concentrate on 2D meshes for now.
Translation
Translation is the process of moving the object around in 2D or 3D space. Translation is simply the operation of adding scalars to individual data points in the mesh. Consider the above example, say we want to move the whole mesh 2 units to the right side (on the X — axis), to achieve this we have to add the scalar value (say ‘t’) to all ‘x’ points in each vertex. We will represent this translation as a vector ‘T’ which will also be a 2 — dimensional vector. Therefore
T = [ t 0 ]
Hence the point ‘A’ will have the following operation:
R = A + T
R = [1 3 ] + [ t 0 ] = [ (1+t) (3 + 0) ] = [ (1+t) 3 ] = ((1+t) , 3)
Substituting a scalar value say t = 2 in the above equation we get the following:
R = [ (1+2) 3 ] = [ 3 3 ]
Hence the new point for A is (3, 3). Similarly B and C can be recalculated as (0, 2) and (4, 1) respectively.
Similarly, we can translate the object on the respective axes by using a vector
T = [ tx, ty ]
(Where tx, ty are the scalar values representing translation of x and y axis respectively)
But how do we represent Translation in the matrix form ? How will the operation be performed on the translation for data points? And why would we represent it in the matrix form ?
To answer the first question, the Translation matrix is usually represented as:
The answer to the second question of how the data points will be translated using matrix, we normally add one additional data at the end of vectors. Consider the data points used above
A = [ 1 3 ] , B = [ -2 2 ] , C = [ 2 1 ]
We rewrite it as follows:
A = [ 1 3 1 ] , B = [ -2 2 1 ] , C = [ 2 1 1 ]
Notice that at the end we have added ‘1’ as an additional data of each vector. This is done in order to make the vector multipliable with the translation matrix. This is now a 1x3 matrix or a 3 — dimensional vector that can be multiplied to the above translation matrix to yield the intended translation effect. If we consider the above scenario in terms of the modified vectors and the translation matrix, we multiply each vector (A, B & C) with the translation matrix to get the translated data points.
To demonstrate, we will consider point A again and translate it 2 on the x-axis and 0 on the y-axis. The following is the representation of the operation:
Now the intermediate operation will look something like this:
Which will result in the following:
R = [ (1 + 2) 3 1] = [ 3 3 1]
To answer the final question ‘Why a matrix to represent translation?’. Well, this can be better understood once you understand the rotation and scaling matrix but for an overview we club up the Translation (T) , Rotation (R) and Scaling (S) operations and multiply those 3 matrices together to form a single matrix which is then multiplied with the data points to form the combined effect. This reduces the total operations that are required to be performed since now we are performing only one single matrix multiplication and we are eliminating the need to add translation points first to the data points then rotating them and then scaling them.
Rotation
Rotation is a topic related to trigonometry and to be specific a topic related to compound angles in trigonometry. Before diving in the intricacies of the rotation matrix, we first need to understand how do we represent (x and y) in its parametric form. Consider the following example:
The above given figure is of a right angled triangle (right angled at B) with sides x, y and hypotenuse z. Basic trigonometry will give us the following:
We can rewrite it as follows:
To figure out how the rotation matrix was derived we need to first deduce how we can rotate the point (x, y) to a new point say (q, w). To do this we assume that the points (x, y) have an inclination with the x-axis say ‘ θ ’ and we then rotate the points by ‘ α ’, by doing so we are essentially inclining the points with a ( θ + α ) angle from the x-axis which can be represented as below:
Here the red line is the rotated line and the blue line is the initial line. Now, we write the parametric form for the red line as follows:
Expanding the compound angle to simpler form using the equations:
we get the following:
Simplifying, we get equation ‘A’ :
Now from the very first parametric form we have
Substituting those values in the equation ‘A’:
This is the equation of rotating a point from (x, y) to a new point (q, w). Since the above operation is a simple multiplication and addition of trigonometric values we can represent it matrix format as follows:
Considering this is a case 2D rotation we are looking at the object from the top (i.e. a hypothetical z-axis) hence we are actually rotating the object on the z-axis in 3d but viewing its 2d projection. In case of a 3d object we have 3 axes (x, y, z), to achieve a rotation on all three axes we derive the equation by isolating one axis at a time (similar to how we did it in this case, we isolated the z axis and viewed the x-y plane) and maintaining the rotation matrix for each axis.
Now looking at the matrix you will wonder, why is there a ‘1’ at the end since it is superfluous? Well, it goes back to what I said for the translation matrix again, since the translation matrix has an order of 3x3 we have to maintain an order of 3x3 in all the matrices to multiply them successfully.
Scaling
Scaling is simply a process of multiplying scalar values to the respective x and y values (and z in case of 3d). To understand how scaling works, we can isolate a 2-d point to its individual axis. Consider a number line which has the following point marked on it:
A = 2
Now say we want to scale this point by a factor of 2 that is, the new point will be multiplied by a factor of 2, the new point A will be as follows:
A = 2 * 2 = 4
We can do the same for a point located on the y-axis.
If we combine this multiplication operation for a 2-d point with scaling factor of (sx, sy) on x & y axis respectively we will have the operation as:
P = (x * sx , y * sy)
The matrix representation of a scaling matrix is fairly simple:
Combined Final Matrix:
As I mentioned earlier, the matrices are used for a purpose, to simply the operations required to be performed on the data point. To get the combined matrix of Translation (T), Rotation(R) and Scaling (S) we need to multiply them in a particular order otherwise the end result of the complete operation will be not what you expect. We’ll see why does that happen by deriving both the matrices. First we will derive the correct order of multiplication i.e.
Transformation = S * R * T
Hence, we propose one single matrix by doing the following operation:
Which will yield the following matrix:
Now to clarify why the order of multiplication of the T, R & S matrix matters a lot, we jump back to the part where I said:
The order of multiplication of matrices is of utmost important hence A * B is not the same as B * A, hence the multiplication operator is not commutative.
Hence we do the following which is an incorrect method of applying transformation to meshes:
Transformation = T * R *S
Which will produce the following abomination:
You can clearly see that when the points get multiplied the points are not getting translated by (tx, ty) anymore but by some weird factor which is not even what we wanted. Hence, the order of multiplying matrices matter A LOT.
Hope this article clears the basics of why matrices are so powerful and are preferred to be used in games.