Simple Linear Regression from Scratch Part I

6 min readFeb 10, 2024

Let’s take up a simple topic of Simple Linear Regression and study it in and out mathematically and implement it. We’re going to do it in the old-fashioned way — starting from scratch, using basic equations, and no shortcuts.

Think of it as uncovering the secrets of a math problem, one step at a time. By the end, you’ll not only understand how linear regression works but also gain a deeper understanding for what is happening behind the scenes when we make predictions. So, let’s roll up our sleeves and start learning linear regression, where we’ll learn how to predict salaries one step at a time.

I’ll be using a straightforward dataset called Salary_data.csv from Kaggle to illustrate the concepts. Once you’re comfortable with simple linear regression, tackling multiple linear regression will feel like a breeze.

What do I want to predict?

I aim to utilize linear regression to predict my expected salary based on my years of experience.

Salary Dataset :

Whats the concept?

My ultimate aim is to find a way to draw a line through the dataset that helps me predict my salary if I provide my years of experience.

Suppose my current work experience is 5 years. Based on the graph’s trend, I would estimate my expected salary to be about Rs. 66,000. However, after determining the best-fit regression line, if I still have 5 years of experience, my predicted salary would likely be approximately Rs. 71,000. Therefore,

For, Years of Experience = 5 years.

Actual salary = 66,000

Predicted Salary = 71,000 (We get this from the regression line)

Thus, Error = Actual Salary — Predicted Salary

Error = 71000–66000

Error = 5000

Generalizing the above concept with equations,

1. Mathematical Equations

Fig-1.1 Equation of Line for Simple Linear Regression

Our goal is is to find θ₀ and θ₁ coefficients of the line h(x) = θ₀ + θ₁X such that we could find a best fit line through the data above.

In out dataset below,

Dependent Variable (Y) = Salary

Independent Variable (X1) = Years of Experience. (Also, called a Feature Vector)

So, if we have a feature like X1 (Years of Experience) in our dataset, it allows us to predict Ŷ (Salary) based on that feature.

Dependent Variable (Y) and Independent Variable (X)

In the data above, I have a total of 30 rows of data, thus,

Number of data points (n) = 30

2. Error Function / Cost Function

Y — Actual Value of Salary

Ŷ — Predicted Value of Salary

n = number of data points

3. Minimizing the Error Function

How to find the value of the parameter θ ( θ₀ and θ₁) such that it minimizes the error function(J(θ)).

Start with some random θ, say θ = 0 which means, the matrix [θ₀, θ₁] = [0, 0]. Now let’s see, how the minimum value of θ is searched for by using Gradient Descent Algorithm.

3.1 Gradient Descent Algorithm

Gradient Descent serves as a crucial optimization technique utilized in various functions to pinpoint local minima. In linear regression, it becomes important to locate the local minimum of the Error Function.

Gradient Descen Animation From Github — Source

We use gradient descent to determine the minimum value of θ that yields the lowest error, represented by J(θ). However, evaluating the Error Function across all conceivable input values is impractical. Hence, we must confine the algorithm to a specified range of input values. The effectiveness of this process significantly hinges on the choice of the Learning Rate, as we will explore in the subsequent discussion.

3.2 Minimize J(θ)

The process described above illustrates how we locate the minimum of the error function by taking its derivative and setting it equal to zero. From this calculation, we derive a general formula for finding the optimal θ, which we’ll apply in practice. Thus, θⱼ is the minimum theta that will be used in the line equation to find the best fit line.

Finally, we obtain the coefficients θ₀ and θ₁, which are utilized to construct the linear regression line, referred to as the model, as

h(x) = θ₀ + θ₁x

#Note : When we put our linear regression into action later on, we’ll be using matrix operations to handle the calculations.

4. Learning Rate (α)

The learning rate, often denoted as α, plays a crucial role in how effectively the gradient descent algorithm navigates towards the minimum of our error function. Think of it as the size of the steps the algorithm takes as it adjusts the parameters (θ) to minimize the error.

If α is set too high, the algorithm may overshoot the minimum, making it difficult to converge and find the best parameters.

On the other hand, if α is too small, the algorithm may take tiny steps, requiring many iterations to reach the minimum. This could slow down the process significantly due to increased computational complexity.

In practice, it’s common to experiment with different values of α, often trying a range of values on an exponential scale. By doing so, we can determine the optimal learning rate that allows us to converge to the minimum error most efficiently.

Finding the right balance with the learning rate is crucial for the success of the gradient descent algorithm in training our model. It requires a bit of trial and error, but once we find the sweet spot, we can significantly improve the efficiency of our optimization process.

Summary

What is Linear Regression?

Linear regression is a fundamental method for predicting outcomes based on input variables.
Explored simple linear regression, focusing on predicting salaries based on years of experience.

Going back to basics :

Learnt how to perform a linear regression from scratch using basic equations, and no shortcuts.

Step — 1 Dataset and Goal

For Example, let’s use the Salary_data.csv dataset from Kaggle.
Our goal is to predict expected salary based on years of experience.

Step — 2 Understanding Predictions

From the dataset, plot the graph to visualize the relationship between years of experience and salary.
If, for example, we have 5 years of experience, we might estimate our expected salary to be around Rs. 66,000 based on the graph.

Step — 3 Fitting the Regression line

Proceed to fit the best regression line to the dataset, aiming to make accurate predictions.
After fitting the line, if we still have 5 years of experience, our predicted salary might be approximately Rs. 71,000, as determined by the regression model.

Step — 4 Optimization with gradient Descent

Gradient descent is an optimization technique crucial for finding the minimum of the error function.
The learning rate (α) determines the step size in the algorithm, affecting convergence speed.

Step — 5 Experimenting with learning Rate(α)

It’s essential to experiment with different values of α to find the optimal learning rate.
Too large a learning rate might overshoot the minimum, while too small a rate could slow down convergence.

Conclusion

Linear regression provides a way to predict outcomes based on input variables.
By understanding the process from scratch, we gain deeper insights into how predictions are made and how the model works.

These steps provide a structured approach to understanding and implementing simple linear regression, setting the foundation for further exploration into more complex regression techniques.

In Part II, we’ll dive into the practical side of linear regression, how it’s used in real-life and implement the same using Python.

Stay tuned for hands-on coding and real-life examples!