Connect to Fabric Lakehouse On-Prem

Jeremy Persing — Wed, 26 Feb 2025 05:48:54 GMT

Overview

You might need to connect to data in your Fabric environment with an on-prem machine to run it through a proprietary model or back it up in an on-premise location. The code below is how you can connect to data in a Fabric Warehouse. You will ideally have a service principal created so that you can automate whatever code you need, but if you don’t have one it should work as is. Don’t forget to fill out the connection string and Lakehouse/Warehouse name.

Prerequisites

Python
SQL Server Driver
`pip install pyodbc azure-identity`

import struct
from itertools import chain, repeat
import urllib
import pyodbc
import pandas as pd
from azure.identity import InteractiveBrowserCredential
# from azure.identity import ClientSecretCredential

credential = InteractiveBrowserCredential() # Use for proof of concept

# Preferred Method so you can automate your code
# tenant_id = "" # fill this out
# client_id = "" # fill this out
# client_secret = "" # fill this out
# credential = ClientSecretCredential(tenant_id, client_id, client_secret)

sql_endpoint = "" # fill this out
database = "" # fill this out

# Either Driver will work as of 2/25/25
connection_string = f"Driver={{ODBC Driver 17 for SQL Server}};Server={sql_endpoint},1433;Database={database};Encrypt=Yes;TrustServerCertificate=No"
# connection_string = f"Driver={{ODBC Driver 18 for SQL Server}};Server={sql_endpoint},1433;Database={database};Encrypt=Yes;TrustServerCertificate=No"

token_object = credential.get_token("https://database.windows.net//.default") # Retrieve an access token valid to connect to SQL databases
params = urllib.parse.quote(connection_string)

# Retrieve an access token
token_as_bytes = bytes(token_object.token, "UTF-8") # Convert the token to a UTF-8 byte string
encoded_bytes = bytes(chain.from_iterable(zip(token_as_bytes, repeat(0)))) # Encode the bytes to a Windows byte string
token_bytes = struct.pack(", len(encoded_bytes)) + encoded_bytes # Package the token into a bytes object
attrs_before = {1256: token_bytes}  # Attribute pointing to SQL_COPT_SS_ACCESS_TOKEN to pass access token to the driver

connection = pyodbc.connect(connection_string, attrs_before=attrs_before)

# Query using pandas
df = pd.read_sql("SELECT TOP (10) * FROM [dbo].[locations_with_country_2]", con=connection)

# Alternative way to query
# cursor = connection.cursor()
# cursor.execute("SELECT TOP (10) * FROM [dbo].[locations_with_country_2]")
# rows = cursor.fetchall()
# print(rows)
# cursor.close()

Pandas Cheat Sheet

Jeremy Persing — Wed, 26 Feb 2025 05:10:20 GMT

#### Importing Pandas
import pandas as pd

#### Python Cheats
x = 10
result = "Greater" if x > 5 else "Smaller" # Ternary operator


#### Creating DataFrames
# From a dictionary
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)

# From a CSV file
df = pd.read_csv('file.csv')

# From an Excel file
df = pd.read_excel('file.xlsx')

#### Viewing Data
df.head()  # First 5 rows
df.tail()  # Last 5 rows
df.info()  # Summary of DataFrame
df.describe()  # Summary statistics
df.shape  # Get number of rows and columns
df.columns  # List column names

#### Selecting Data
df['A']  # Select a single column
df[['A', 'B']]  # Select multiple columns
df.iloc[0]  # Select first row by index
df.loc[0, 'A']  # Select a specific value
df[df['A'] > 1]  # Filter rows based on condition
df["date"].dt.year == 2024 # Filter for all rows in 2024
new_products = new_products[new_products["category"].notna()] # Filter so column not null

#### Modifying Data
df['C'] = df['A'] + df['B']  # Add a new column
df.rename(columns={'A': 'Alpha'}, inplace=True)  # Rename column
df.drop(columns='B', axis=1, inplace=True)  # Drop column
df.drop(0, axis=0, inplace=True)  # Drop row
df.fillna(0, inplace=True)  # Fill missing values with 0
df["price"] = df["price"].str.replace("$", "", regex=False).astype(float)
df["price"] = pd.to_numeric(df["price"].str.replace("$", "", regex=False), errors="coerce") # Change type
df.replace({'old_value': 'new_value'}, inplace=True)  # Replace values
df["key"] = df["key"].str.replace("McCafé® ", "", regex=False) # Replace substring

#### Sorting & Ordering
df.sort_values('A', ascending=False)  # Sort by column
df.sort_index(ascending=True)  # Sort by index

#### Aggregation & Grouping
df.mean()  # Column-wise mean
df.groupby('A').sum()  # Group by column and sum
df['A'].value_counts()  # Count unique values

#### Handling Missing Data
df.isnull().sum()  # Count missing values
df.dropna(inplace=True)  # Drop missing values
df.fillna(0, inplace=True)  # Fill missing values with 0

#### Merging & Joining
df1.merge(df2, on='key')  # Inner join
df1.merge(df2, on='key', how='left')  # Left join
df1.append(df2, ignore_index=True)  # Append rows

#### Exporting Data
df.to_csv('file.csv', index=False)  # Save to CSV
df.to_excel('file.xlsx', index=False)  # Save to Excel

#### Pivot Tables
df.pivot_table(index='A', columns='B', values='C', aggfunc='sum')

#### Working with Dates
df['date'] = pd.to_datetime(df['date'])
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
df['day'] = df['date'].dt.day

#### Applying Functions
df['A'] = df['A'].apply(lambda x: x*2)  # Apply function to column
df.applymap(lambda x: str(x).upper())  # Apply function to all elements

def generate_price(): ...
df["price"] = df.apply(generate_price, axis=1) # Apply generate_value func for every row without args
df['price2'] = df.apply(lambda row: generate_value(1, 2), axis=1) # Apply generate_value func for every row with args

---
🔹 **Tip:** Use `df.sample(5)` to quickly check random rows from your DataFrame!

Datamoat

Connect to Fabric Lakehouse On-Prem

Overview

Prerequisites

Pandas Cheat Sheet