Mathematics for Data Science Part 1- Linear Algebra

Kiran Nagarkoti
8 min readDec 28, 2024

--

image by author

While pursuing my Master’s in Mathematics, Linear Algebra was one subject that excited me the most — closely followed by Probability and Statistics. Back then, little did I know how much these subjects would shape my professional journey. I must confess, though, that at the time, I had only a vague idea of their real-world applications.

Fast forward to today, and these mathematical concepts are the bedrock of my work in Data Science and Machine Learning. You may often hear people talk about “Mathematics for Data Science,” but it’s not always clear how these abstract theories connect to practical use cases. It can be challenging to see where exactly mathematical concepts fit into data-driven workflows.

In my previous post, I provided a comprehensive Data Science Roadmap — a guide for anyone looking to become a Data Scientist. Today, we’ll be diving into part 1 of the first topic from that roadmap: Linear Algebra.

Linear algebra forms the backbone of many data science and machine learning applications. It underpins everything from data transformations to building complex deep learning algorithms. For aspiring data scientists, mastering these concepts is not just beneficial — it’s essential for excelling in the field.

In this article, we’ll explore the foundational aspects of linear algebra: Scalars, Vectors, Matrices, Tensors, Matrix Operations, Eigenvalues, and Eigenvectors. Each concept will be explained in the context of real-world data science problems to ensure the theory feels connected to its practical applications.

Stay with me as we bridge the gap between abstract mathematics and hands-on data science, starting with the core building blocks of linear algebra.

1. Scalars

A scalar is the simplest entity in linear algebra, representing a single numerical value. Scalars are often used in data science to define constants, such as a learning rate in optimization or a single value representing the output of a loss function.

Notation:

  • Scalars are usually denoted by lowercase letters, e.g., a, b, c.

Application in Data Science:

  • Hyperparameters in Machine Learning models like learning rate α=0.01, n_estimators = 100, max_depth = 5 represent scalar values.
  • Scalars are also the building blocks for more complex structures like vectors and matrices.

2. Vectors

A vector is an ordered list of numbers, arranged in a single row (row vector) or a single column (column vector). Vectors are fundamental for representing data points, gradients, or weights in machine learning.

Notation:

  • Vectors are typically represented in bold lowercase letters, e.g., v
  • A vector with n elements is written as:

Application in Data Science:

  • Representing a data point with n features in a dataset, e.g.,
  • Weights, such as those in machine learning or neural network models. For example, in a simple linear regression model:

3. Matrices

A Matrix is a two-dimensional array of numbers arranged in rows and columns. Matrices are the cornerstone of linear transformations, data storage, and computations in data science.

Notation:

  • A matrix is denoted by a bold uppercase letter, e.g., A
  • A matrix with m rows and n columns is written as :
m×n matrix

Application in Data Science:

  • In a feature matrix X with shape 3×4, each row corresponds to a data point (e.g., individual customers), and each column corresponds to a feature (e.g., age, income, purchase frequency, etc.)
3×4 feature matrix
  • Neural networks use matrices to store weights and biases. These weights are applied during matrix multiplications to calculate layer activations. For a layer with 5 inputs(features) and 3 neurons in hidden layer, the weight matrix would have dimensions 5×3 and it’ll be represented as a matrix:
5×3 weight matrix

4. Tensors

A tensor is a generalization of scalars(0D), vectors(1D), and matrices(2D) to higher dimensions. Tensors are commonly used in deep learning for handling multi-dimensional data, such as images, audio, and video.

Notation:

  • An image with height h, width w, and 3 colour channels (red, green, blue) can be represented as a 3D tensor with dimensions h×w×3. If we have an image with a height of 4 pixels, a width of 3 pixels, and 3 colour channels (RGB), the image can be represented as a 3D tensor with the shape .
4×3×3 3D Tensor

In this tensor:

  • Each row corresponds to a row of pixels in the image.
  • Each column corresponds to a pixel.
  • Each inner array contains 3 values: the RGB values for that pixel.

Application in Data Science:

  • In deep learning, data often comes in batches. When training a model, we might feed a batch of images into the network. Instead of just one image, we have multiple images, each represented as a tensor.
  • Batch Size (B): The number of images in the batch.
  • Height (h): The number of rows of pixels.
  • Width (w): The number of columns of pixels.
  • Channels (c): The number of colour channels (typically 3 for RGB images).

For a batch of B images, each having a height h, width w, and 3 colour channels, we get a 4D tensor with the shape B × h × w × 3.

5. Matrix Operations

  1. Matrix Multiplication

Matrix multiplication combines two matrices to produce a third matrix. It is the foundation for many operations in machine learning and neural networks.

  • For two matrices A (of size m×n) and B (of size n×p), the resulting matrix C will be of size m×p.

Each element Cij in matrix C is calculated as the dot product of the i-th row of A with the j-th column of B.

Example:

Matrix Multiplication

Application:

  • In neural networks, matrix multiplication is used to compute activations. The weights of a network are stored in matrices, and when an input vector is passed through the network, the input is multiplied by the weight matrices to produce the output.

For example, in a simple feedforward neural network:

Where W is the weight matrix, X is the input matrix, and b is the bias vector.

  • During training, backpropagation uses matrix multiplication to calculate the gradients for updating the weights. The gradients are obtained by multiplying the error of each layer by the derivative of the activation function and the weight matrices.

2. Matrix Transposition

The transpose of a matrix A, denoted Aᵀ, is obtained by swapping its rows and columns.

Example:

Application:

  • Matrix Transposition is a simpler operation that plays a crucial role in data preprocessing, covariance matrix computation, and in neural network computations, ensuring that data and matrices are aligned correctly for operations.
  • In neural networks, when working with weight matrices and input data, the transpose operation can be used to align the data dimensions for multiplication. For example, when calculating gradients during backpropagation, transposing matrices is required for correct alignment of the dimensions for matrix multiplication.

6. Eigenvalues and Eigenvectors

An eigenvector is a vector that, when transformed by a matrix A, does not change its direction, only its length.

In other words, if A transforms a vector , the transformed vector is just a scaled version of the original:

where:

  • A is a square matrix. Square matrix a matrix with the same number of rows and columns ( n×n ).

Note: Eigenvalues and eigenvectors are only defined for square matrices because the transformation requires the dimension of A and to align properly for the equation Av=λv.

  • v is the eigenvector (a vector that doesn’t change direction under the transformation).
  • λ is the eigenvalue (the scaling factor that determines how much the eigenvector is stretched or compressed.).

The eigenvalue λ is the scalar that represents how much the eigenvector v is scaled during the transformation.

  • If λ > 1,the eigenvector is stretched.
  • If 0 < λ < 1, the eigenvector is compressed.
  • If λ = 0 , the transformation flattens the eigenvector to the zero vector.
  • If λ < 0, the eigenvector is flipped (inverted direction).

Suppose we have a graph with three vectors:

source: sciencethedata
source: sciencethedata
source: sciencethedata

Application:

  • Dimensionality Reduction: Principal Component Analysis (PCA) uses eigenvalues and eigenvectors to identify directions of maximum variance in data. By projecting the data onto the eigenvectors corresponding to the largest eigenvalues, dimensionality is reduced while retaining the most critical information.
  • Feature Selection: Eigenvalues from PCA help determine the importance of each feature. Features corresponding to small eigenvalues can be removed, simplifying models while maintaining performance.

With this, we end our Linear Algebra part. I hope I was able to give you some context of its concepts and applications in data science.

Linear algebra is the cornerstone of data science and machine learning, forming the basis for everything from data representation to algorithm design. By understanding its core concepts — scalars, vectors, matrices, tensors, and eigenvalues — you unlock the tools to transform raw data into actionable insights and powerful predictive models. Although these topics may not make much sense right now if you’re just starting with data science, as you move forward in your journey and explore their applications, you’ll gain a clearer understanding of these concepts.

The goal here is to introduce you to these foundational terms, providing a framework you can always refer back to as you tackle more advanced topics in your data science roadmap.

Stay tuned for the next post on Probability and Statistics!

If you found this article helpful, I’d love to hear your thoughts! Feel free to leave a comment below with your feedback or any questions you have. Don’t forget to share this article with others who might find it useful. Your support means a lot — thank you for reading!

--

--

Kiran Nagarkoti
Kiran Nagarkoti

Written by Kiran Nagarkoti

Project Manager@Exl Service | Masters in Mathematics - IIT Delhi

No responses yet