<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Datamoat]]></title><description><![CDATA[Datamoat]]></description><link>https://datamoat.dev</link><image><url>https://cdn.hashnode.com/res/hashnode/image/upload/v1740549744995/4d7843ad-8851-41f2-bbfb-1d6fa72b5b08.png</url><title>Datamoat</title><link>https://datamoat.dev</link></image><generator>RSS for Node</generator><lastBuildDate>Tue, 21 Apr 2026 11:56:48 GMT</lastBuildDate><atom:link href="https://datamoat.dev/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Connect to Fabric Lakehouse On-Prem]]></title><description><![CDATA[Overview
You might need to connect to data in your Fabric environment with an on-prem machine to run it through a proprietary model or back it up in an on-premise location. The code below is how you can connect to data in a Fabric Warehouse. You will...]]></description><link>https://datamoat.dev/connect-to-fabric-lakehouse-on-prem</link><guid isPermaLink="true">https://datamoat.dev/connect-to-fabric-lakehouse-on-prem</guid><category><![CDATA[fabric]]></category><category><![CDATA[PowerBI]]></category><category><![CDATA[Power BI]]></category><category><![CDATA[Python]]></category><category><![CDATA[SQL]]></category><dc:creator><![CDATA[Jeremy Persing]]></dc:creator><pubDate>Wed, 26 Feb 2025 05:48:54 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1740550391319/c45ea64d-f8b7-46c4-9cd8-42786f258282.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-overview">Overview</h1>
<p>You might need to connect to data in your Fabric environment with an on-prem machine to run it through a proprietary model or back it up in an on-premise location. The code below is how you can connect to data in a Fabric Warehouse. You will ideally have a <a target="_blank" href="https://learn.microsoft.com/en-us/power-bi/developer/embedded/embed-service-principal?tabs=azure-portal">service principal</a> created so that you can automate whatever code you need, but if you don’t have one it should work as is. Don’t forget to fill out the connection string and Lakehouse/Warehouse name.</p>
<h3 id="heading-prerequisites">Prerequisites</h3>
<ul>
<li><p>Python</p>
</li>
<li><p><a target="_blank" href="https://learn.microsoft.com/en-us/sql/connect/odbc/download-odbc-driver-for-sql-server?view=sql-server-ver16">SQL Server Driver</a></p>
</li>
<li><p>`pip install pyodbc azure-identity`</p>
</li>
</ul>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> struct
<span class="hljs-keyword">from</span> itertools <span class="hljs-keyword">import</span> chain, repeat
<span class="hljs-keyword">import</span> urllib
<span class="hljs-keyword">import</span> pyodbc
<span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd
<span class="hljs-keyword">from</span> azure.identity <span class="hljs-keyword">import</span> InteractiveBrowserCredential
<span class="hljs-comment"># from azure.identity import ClientSecretCredential</span>

credential = InteractiveBrowserCredential() <span class="hljs-comment"># Use for proof of concept</span>

<span class="hljs-comment"># Preferred Method so you can automate your code</span>
<span class="hljs-comment"># tenant_id = "" # fill this out</span>
<span class="hljs-comment"># client_id = "" # fill this out</span>
<span class="hljs-comment"># client_secret = "" # fill this out</span>
<span class="hljs-comment"># credential = ClientSecretCredential(tenant_id, client_id, client_secret)</span>

sql_endpoint = <span class="hljs-string">""</span> <span class="hljs-comment"># fill this out</span>
database = <span class="hljs-string">""</span> <span class="hljs-comment"># fill this out</span>

<span class="hljs-comment"># Either Driver will work as of 2/25/25</span>
connection_string = <span class="hljs-string">f"Driver={{ODBC Driver 17 for SQL Server}};Server=<span class="hljs-subst">{sql_endpoint}</span>,1433;Database=<span class="hljs-subst">{database}</span>;Encrypt=Yes;TrustServerCertificate=No"</span>
<span class="hljs-comment"># connection_string = f"Driver={{ODBC Driver 18 for SQL Server}};Server={sql_endpoint},1433;Database={database};Encrypt=Yes;TrustServerCertificate=No"</span>

token_object = credential.get_token(<span class="hljs-string">"https://database.windows.net//.default"</span>) <span class="hljs-comment"># Retrieve an access token valid to connect to SQL databases</span>
params = urllib.parse.quote(connection_string)

<span class="hljs-comment"># Retrieve an access token</span>
token_as_bytes = bytes(token_object.token, <span class="hljs-string">"UTF-8"</span>) <span class="hljs-comment"># Convert the token to a UTF-8 byte string</span>
encoded_bytes = bytes(chain.from_iterable(zip(token_as_bytes, repeat(<span class="hljs-number">0</span>)))) <span class="hljs-comment"># Encode the bytes to a Windows byte string</span>
token_bytes = struct.pack(<span class="hljs-string">"&lt;i"</span>, len(encoded_bytes)) + encoded_bytes <span class="hljs-comment"># Package the token into a bytes object</span>
attrs_before = {<span class="hljs-number">1256</span>: token_bytes}  <span class="hljs-comment"># Attribute pointing to SQL_COPT_SS_ACCESS_TOKEN to pass access token to the driver</span>

connection = pyodbc.connect(connection_string, attrs_before=attrs_before)

<span class="hljs-comment"># Query using pandas</span>
df = pd.read_sql(<span class="hljs-string">"SELECT TOP (10) * FROM [dbo].[locations_with_country_2]"</span>, con=connection)

<span class="hljs-comment"># Alternative way to query</span>
<span class="hljs-comment"># cursor = connection.cursor()</span>
<span class="hljs-comment"># cursor.execute("SELECT TOP (10) * FROM [dbo].[locations_with_country_2]")</span>
<span class="hljs-comment"># rows = cursor.fetchall()</span>
<span class="hljs-comment"># print(rows)</span>
<span class="hljs-comment"># cursor.close()</span>
</code></pre>
]]></content:encoded></item><item><title><![CDATA[Pandas Cheat Sheet]]></title><description><![CDATA[#### Importing Pandas
import pandas as pd

#### Python Cheats
x = 10
result = "Greater" if x > 5 else "Smaller" # Ternary operator


#### Creating DataFrames
# From a dictionary
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)

# From ...]]></description><link>https://datamoat.dev/pandas-cheat-sheet</link><guid isPermaLink="true">https://datamoat.dev/pandas-cheat-sheet</guid><category><![CDATA[Python]]></category><category><![CDATA[Data Science]]></category><category><![CDATA[data-engineering]]></category><dc:creator><![CDATA[Jeremy Persing]]></dc:creator><pubDate>Wed, 26 Feb 2025 05:10:20 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1740550357009/80479316-73bf-4513-8718-24f62841a904.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<pre><code class="lang-python"><span class="hljs-comment">#### Importing Pandas</span>
<span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd

<span class="hljs-comment">#### Python Cheats</span>
x = <span class="hljs-number">10</span>
result = <span class="hljs-string">"Greater"</span> <span class="hljs-keyword">if</span> x &gt; <span class="hljs-number">5</span> <span class="hljs-keyword">else</span> <span class="hljs-string">"Smaller"</span> <span class="hljs-comment"># Ternary operator</span>


<span class="hljs-comment">#### Creating DataFrames</span>
<span class="hljs-comment"># From a dictionary</span>
data = {<span class="hljs-string">'A'</span>: [<span class="hljs-number">1</span>, <span class="hljs-number">2</span>, <span class="hljs-number">3</span>], <span class="hljs-string">'B'</span>: [<span class="hljs-number">4</span>, <span class="hljs-number">5</span>, <span class="hljs-number">6</span>]}
df = pd.DataFrame(data)

<span class="hljs-comment"># From a CSV file</span>
df = pd.read_csv(<span class="hljs-string">'file.csv'</span>)

<span class="hljs-comment"># From an Excel file</span>
df = pd.read_excel(<span class="hljs-string">'file.xlsx'</span>)

<span class="hljs-comment">#### Viewing Data</span>
df.head()  <span class="hljs-comment"># First 5 rows</span>
df.tail()  <span class="hljs-comment"># Last 5 rows</span>
df.info()  <span class="hljs-comment"># Summary of DataFrame</span>
df.describe()  <span class="hljs-comment"># Summary statistics</span>
df.shape  <span class="hljs-comment"># Get number of rows and columns</span>
df.columns  <span class="hljs-comment"># List column names</span>

<span class="hljs-comment">#### Selecting Data</span>
df[<span class="hljs-string">'A'</span>]  <span class="hljs-comment"># Select a single column</span>
df[[<span class="hljs-string">'A'</span>, <span class="hljs-string">'B'</span>]]  <span class="hljs-comment"># Select multiple columns</span>
df.iloc[<span class="hljs-number">0</span>]  <span class="hljs-comment"># Select first row by index</span>
df.loc[<span class="hljs-number">0</span>, <span class="hljs-string">'A'</span>]  <span class="hljs-comment"># Select a specific value</span>
df[df[<span class="hljs-string">'A'</span>] &gt; <span class="hljs-number">1</span>]  <span class="hljs-comment"># Filter rows based on condition</span>
df[<span class="hljs-string">"date"</span>].dt.year == <span class="hljs-number">2024</span> <span class="hljs-comment"># Filter for all rows in 2024</span>
new_products = new_products[new_products[<span class="hljs-string">"category"</span>].notna()] <span class="hljs-comment"># Filter so column not null</span>

<span class="hljs-comment">#### Modifying Data</span>
df[<span class="hljs-string">'C'</span>] = df[<span class="hljs-string">'A'</span>] + df[<span class="hljs-string">'B'</span>]  <span class="hljs-comment"># Add a new column</span>
df.rename(columns={<span class="hljs-string">'A'</span>: <span class="hljs-string">'Alpha'</span>}, inplace=<span class="hljs-literal">True</span>)  <span class="hljs-comment"># Rename column</span>
df.drop(columns=<span class="hljs-string">'B'</span>, axis=<span class="hljs-number">1</span>, inplace=<span class="hljs-literal">True</span>)  <span class="hljs-comment"># Drop column</span>
df.drop(<span class="hljs-number">0</span>, axis=<span class="hljs-number">0</span>, inplace=<span class="hljs-literal">True</span>)  <span class="hljs-comment"># Drop row</span>
df.fillna(<span class="hljs-number">0</span>, inplace=<span class="hljs-literal">True</span>)  <span class="hljs-comment"># Fill missing values with 0</span>
df[<span class="hljs-string">"price"</span>] = df[<span class="hljs-string">"price"</span>].str.replace(<span class="hljs-string">"$"</span>, <span class="hljs-string">""</span>, regex=<span class="hljs-literal">False</span>).astype(float)
df[<span class="hljs-string">"price"</span>] = pd.to_numeric(df[<span class="hljs-string">"price"</span>].str.replace(<span class="hljs-string">"$"</span>, <span class="hljs-string">""</span>, regex=<span class="hljs-literal">False</span>), errors=<span class="hljs-string">"coerce"</span>) <span class="hljs-comment"># Change type</span>
df.replace({<span class="hljs-string">'old_value'</span>: <span class="hljs-string">'new_value'</span>}, inplace=<span class="hljs-literal">True</span>)  <span class="hljs-comment"># Replace values</span>
df[<span class="hljs-string">"key"</span>] = df[<span class="hljs-string">"key"</span>].str.replace(<span class="hljs-string">"McCafé® "</span>, <span class="hljs-string">""</span>, regex=<span class="hljs-literal">False</span>) <span class="hljs-comment"># Replace substring</span>

<span class="hljs-comment">#### Sorting &amp; Ordering</span>
df.sort_values(<span class="hljs-string">'A'</span>, ascending=<span class="hljs-literal">False</span>)  <span class="hljs-comment"># Sort by column</span>
df.sort_index(ascending=<span class="hljs-literal">True</span>)  <span class="hljs-comment"># Sort by index</span>

<span class="hljs-comment">#### Aggregation &amp; Grouping</span>
df.mean()  <span class="hljs-comment"># Column-wise mean</span>
df.groupby(<span class="hljs-string">'A'</span>).sum()  <span class="hljs-comment"># Group by column and sum</span>
df[<span class="hljs-string">'A'</span>].value_counts()  <span class="hljs-comment"># Count unique values</span>

<span class="hljs-comment">#### Handling Missing Data</span>
df.isnull().sum()  <span class="hljs-comment"># Count missing values</span>
df.dropna(inplace=<span class="hljs-literal">True</span>)  <span class="hljs-comment"># Drop missing values</span>
df.fillna(<span class="hljs-number">0</span>, inplace=<span class="hljs-literal">True</span>)  <span class="hljs-comment"># Fill missing values with 0</span>

<span class="hljs-comment">#### Merging &amp; Joining</span>
df1.merge(df2, on=<span class="hljs-string">'key'</span>)  <span class="hljs-comment"># Inner join</span>
df1.merge(df2, on=<span class="hljs-string">'key'</span>, how=<span class="hljs-string">'left'</span>)  <span class="hljs-comment"># Left join</span>
df1.append(df2, ignore_index=<span class="hljs-literal">True</span>)  <span class="hljs-comment"># Append rows</span>

<span class="hljs-comment">#### Exporting Data</span>
df.to_csv(<span class="hljs-string">'file.csv'</span>, index=<span class="hljs-literal">False</span>)  <span class="hljs-comment"># Save to CSV</span>
df.to_excel(<span class="hljs-string">'file.xlsx'</span>, index=<span class="hljs-literal">False</span>)  <span class="hljs-comment"># Save to Excel</span>

<span class="hljs-comment">#### Pivot Tables</span>
df.pivot_table(index=<span class="hljs-string">'A'</span>, columns=<span class="hljs-string">'B'</span>, values=<span class="hljs-string">'C'</span>, aggfunc=<span class="hljs-string">'sum'</span>)

<span class="hljs-comment">#### Working with Dates</span>
df[<span class="hljs-string">'date'</span>] = pd.to_datetime(df[<span class="hljs-string">'date'</span>])
df[<span class="hljs-string">'year'</span>] = df[<span class="hljs-string">'date'</span>].dt.year
df[<span class="hljs-string">'month'</span>] = df[<span class="hljs-string">'date'</span>].dt.month
df[<span class="hljs-string">'day'</span>] = df[<span class="hljs-string">'date'</span>].dt.day

<span class="hljs-comment">#### Applying Functions</span>
df[<span class="hljs-string">'A'</span>] = df[<span class="hljs-string">'A'</span>].apply(<span class="hljs-keyword">lambda</span> x: x*<span class="hljs-number">2</span>)  <span class="hljs-comment"># Apply function to column</span>
df.applymap(<span class="hljs-keyword">lambda</span> x: str(x).upper())  <span class="hljs-comment"># Apply function to all elements</span>

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">generate_price</span>():</span> ...
df[<span class="hljs-string">"price"</span>] = df.apply(generate_price, axis=<span class="hljs-number">1</span>) <span class="hljs-comment"># Apply generate_value func for every row without args</span>
df[<span class="hljs-string">'price2'</span>] = df.apply(<span class="hljs-keyword">lambda</span> row: generate_value(<span class="hljs-number">1</span>, <span class="hljs-number">2</span>), axis=<span class="hljs-number">1</span>) <span class="hljs-comment"># Apply generate_value func for every row with args</span>

---
🔹 **Tip:** Use `df.sample(<span class="hljs-number">5</span>)` to quickly check random rows <span class="hljs-keyword">from</span> your DataFrame!
</code></pre>
]]></content:encoded></item></channel></rss>