Edit

Share via


Quickstart: Connect to a SQL database from a Jupyter Notebook

In this quickstart, you use Jupyter Notebook in Visual Studio Code to quickly derive business insights. You use the mssql-python driver for Python to connect to your SQL database and read the data that is then formatted for use in emails, reports presentations, etc.

The mssql-python driver doesn't require any external dependencies on Windows machines. The driver installs everything that it needs with a single pip install, allowing you to use the latest version of the driver for new scripts without breaking other scripts that you don't have time to upgrade and test.

mssql-python documentation | mssql-python source code | Package (PyPi) | Visual Studio Code

Prerequisites


Create a SQL database

This quickstart requires the AdventureWorks2022 Lightweight schema, on Microsoft SQL Server, SQL database in Fabric or Azure SQL Database.

Create the project and run the code

Create a new project

  1. Open a command prompt in your development directory. If you don't have one, create a new directory called python, scripts, etc. Avoid folders on your OneDrive, the synchronization can interfere with managing your virtual environment.

  2. Create a new project with uv.

    uv init jupyter-notebook-qs
    cd jupyter-notebook-qs
    

Add dependencies

In the same directory, install the mssql-python, python-dotenv, rich, pandas, and matplotlib packages. Then add ipykernel and uv as dev dependencies. VS Code requires ipykernel and uv are added to be able to interact with uv from within your notebook cells using commands like !uv add mssql_python.

uv add mssql_python dotenv rich pandas matplotlib
uv add --dev ipykernel
uv add --dev uv

Launch Visual Studio Code

In the same directory, run the following command.

code .

Update pyproject.toml

  1. The pyproject.toml contains the metadata for your project.

  2. Update the description to be more descriptive.

    description = "A quick example using the mssql-python driver and Jupyter Notebooks."
    
  3. Save and close the file.

Save the connection string

  1. Open the .gitignore file and add an exclusion for .env files. Your file should be similar to this example. Be sure to save and close it when you're done.

    # Python-generated files
    __pycache__/
    *.py[oc]
    build/
    dist/
    wheels/
    *.egg-info
    
    # Virtual environments
    .venv
    
    # Connection strings and secrets
    .env
    
  2. In the current directory, create a new file named .env.

  3. Within the .env file, add an entry for your connection string named SQL_CONNECTION_STRING. Replace the example here with your actual connection string value.

    SQL_CONNECTION_STRING="Server=<server_name>;Database={<database_name>};Encrypt=yes;TrustServerCertificate=no;Authentication=ActiveDirectoryInteractive"
    

    Tip

    The connection string used here largely depends on the type of SQL database you're connecting to. If you're connecting to an Azure SQL Database or a SQL database in Fabric, use the ODBC connection string from the connection strings tab. You might need to adjust the authentication type depending on your scenario. For more information on connection strings and their syntax, see connection string syntax reference.

Create a Jupyter Notebook

  1. Select File, then New File and Jupyter Notebook from the list. A new notebook opens.

  2. Select File, then Save As... and give your new notebook a name.

  3. Add the following imports in the first cell.

    from os import getenv
    from mssql_python import connect
    from dotenv import load_dotenv
    from rich.console import Console
    from rich.table import Table
    import pandas as pd
    import matplotlib.pyplot as plt
    
  4. Use the + Markdown button at the top of the notebook to add a new markdown cell.

  5. Add the following text to the new markdown cell.

    ## Define queries for use later
    
  6. Select the check mark in the cell toolbar or use the keyboard shortcuts Ctrl+Enter or Shift+Enter to render the markdown cell.

  7. Use the + Code button at the top of the notebook to add a new code cell.

  8. Add the following code to the new code cell.

    SQL_QUERY_ORDERS_BY_CUSTOMER = """
    SELECT TOP 5
    c.CustomerID,
    c.CompanyName,
    COUNT(soh.SalesOrderID) AS OrderCount
    FROM
    SalesLT.Customer AS c
    LEFT OUTER JOIN SalesLT.SalesOrderHeader AS soh
    ON c.CustomerID = soh.CustomerID
    GROUP BY
    c.CustomerID,
    c.CompanyName
    ORDER BY
    OrderCount DESC;
    """
    
    SQL_QUERY_SPEND_BY_CATEGORY = """
    select top 10
    pc.Name as ProductCategory,
    SUM(sod.OrderQty * sod.UnitPrice) as Spend
    from SalesLT.SalesOrderDetail sod
    inner join SalesLt.SalesOrderHeader soh on sod.salesorderid = soh.salesorderid
    inner join SalesLt.Product p on sod.productid = p.productid
    inner join SalesLT.ProductCategory pc on p.ProductCategoryID = pc.ProductCategoryID
    GROUP BY pc.Name
    ORDER BY Spend;
    """
    

Display results in a table

  1. Use the + Markdown button at the top of the notebook to add a new markdown cell.

  2. Add the following text to the new markdown cell.

    ## Print orders by customer and display in a table
    
  3. Select the check mark in the cell toolbar or use the keyboard shortcuts Ctrl+Enter or Shift+Enter to render the markdown cell.

  4. Use the + Code button at the top of the notebook to add a new code cell.

  5. Add the following code to the new code cell.

    load_dotenv()
    with connect(getenv("SQL_CONNECTION_STRING")) as conn: # type: ignore
        with conn.cursor() as cursor:
            cursor.execute(SQL_QUERY_ORDERS_BY_CUSTOMER)
            if cursor:
                table = Table(title="Orders by Customer")
                # https://rich.readthedocs.io/en/stable/appendix/colors.html
                table.add_column("Customer ID", style="bright_blue", justify="center")
                table.add_column("Company Name", style="bright_white", justify="left")
                table.add_column("Order Count", style="bold green", justify="right")
    
                records = cursor.fetchall()
    
                for r in records:
                    table.add_row(f"{r.CustomerID}",
                                    f"{r.CompanyName}", f"{r.OrderCount}")
    
                Console().print(table)
    
  6. Use the Run All button at the top of the notebook to run the notebook.

  7. Select the jupyter-notebook-qs kernel when prompted.

Display results in a chart

  1. Review the output of the last cell. You should see a table with three columns and five rows.

  2. Use the + Markdown button at the top of the notebook to add a new markdown cell.

  3. Add the following text to the new markdown cell.

    ## Display spend by category in a horizontal bar chart
    
  4. Select the check mark in the cell toolbar or use the keyboard shortcuts Ctrl+Enter or Shift+Enter to render the markdown cell.

  5. Use the + Code button at the top of the notebook to add a new code cell.

  6. Add the following code to the new code cell.

    with connect(getenv("SQL_CONNECTION_STRING")) as conn: # type: ignore
        data = pd.read_sql_query(SQL_QUERY_SPEND_BY_CATEGORY, conn)
        # Set the style - use print(plt.style.available) to see all options
        plt.style.use('seaborn-v0_8-notebook')
        plt.barh(data['ProductCategory'], data['Spend'])
    
  7. Use the Execute Cell button or Ctrl+Alt+Enter to run the cell.

  8. Review the results. Make this notebook your own.

Next step

Visit the mssql-python driver GitHub repository for more examples, to contribute ideas or report issues.