This project builds a Linear Regression model to predict median house prices in California using the California Housing dataset. It is built using modular Python scripts, Jupyter notebooks, and Conda environment management, and is structured for scalability, and ease of automation.
The project serves as a hands-on experience for a complete data science project using Python.
- Data Preprocessing and feature scaling
- Linear Regression model training
- Evaluation with MSE and R²
- Prediction on new data
- Conda environment setup with
environment.yml
All dependencies are managed via Anaconda
- Clone the repository
git clone https://github.com/ksav03/linear-regression-housing.git cd linear-regression-housing
- Create the Conda environment using environment file
conda env create -f environment.yml conda activate house-price-env
- Run the main pipeline - This will download the dataset, preprocess it, train a Linear Regression model, evaluate it, and save the model and scaler.
python main.py
- Make predictions on new data - Run the prediction script or use the function from src/predict.py:
python predict.py
Keshav Sapkota
Github: @ksav03