This post is the second part of the Master Large Datasets for Peak Performance in Streamlit post.
Efficient Data Handling in Streamlit: Optimizing Back-End Processing and UI Rendering
Streamlit is a powerful framework for building interactive data apps with minimal code. However, efficiently handling large datasets remains a challenge. Without the right tools, your app can become sluggish due to excessive memory usage and slow UI rendering.
This blog post covers two essential aspects of efficient data handling in Streamlit:
- Back-End Data Processing – Using
Polars
for fast, memory-efficient data manipulation. - Front-End Data Rendering – Using
AG-Grid
for interactive and efficient table visualization.
By leveraging Polars
and AG-Grid
, you can create highly responsive Streamlit apps that handle large datasets efficiently. Let’s dive in! 🚀
Part 1: Efficient Back-End Data Processing with Polars
Polars
is a high-performance DataFrame library built in Rust, optimized for speed and memory efficiency. It is an excellent alternative to pandas
, especially when working with large datasets.
Why Use Polars?
- Blazing Fast Execution: Thanks to Rust’s optimizations, Polars outperforms pandas in many cases.
- Lazy Execution: It delays computations until needed, optimizing query execution.
- Efficient Memory Usage: Uses columnar memory layout, reducing RAM usage.
- Parallel Execution: Takes full advantage of multi-core CPUs.
- SQL-like Query API: Makes transformations intuitive and efficient.
Loading and Processing Data in Polars
Let’s start by reading a large CSV file efficiently using Polars:
import polars as pl
# Load a large dataset
df = pl.read_csv("large_dataset.csv")
# Display first few rows
print(df.head())
Lazy Loading for Efficiency
One of the most powerful features of Polars is lazy evaluation. Instead of executing operations immediately, Polars builds a query plan and optimizes execution.
# Enable lazy evaluation
ldf = pl.scan_csv("large_dataset.csv")
# Apply transformations lazily
filtered_df = ldf.filter(pl.col("price") > 100).select(["product", "price"])
# Execute when needed
result = filtered_df.collect()
print(result)
This approach significantly reduces memory usage and speeds up processing.
Performing Efficient Queries and Transformations
Polars makes data filtering, aggregation, and transformations lightning-fast. Here’s an example:
# Group by category and calculate average price
agg_df = df.groupby("category").agg(pl.mean("price").alias("avg_price"))
print(agg_df)
Converting Polars DataFrame to Pandas for Streamlit
Since Streamlit’s st.dataframe()
works best with pandas, you can convert your Polars DataFrame:
df_pandas = df.to_pandas()
st.dataframe(df_pandas)
Using Polars ensures that all data transformations happen efficiently before sending the final result to Streamlit’s UI components.
Part 2: Efficient UI Rendering with AG-Grid
Once you’ve optimized your back-end processing with Polars, the next challenge is efficiently rendering large datasets in the UI. This is where AG-Grid
comes in.
Why Use AG-Grid?
- Handles Large Data Efficiently: Supports lazy loading and virtual scrolling.
- Rich Interactivity: Users can filter, sort, and edit data within the grid.
- Optimized Performance: Renders only visible rows, reducing memory usage.
- Customizable: Supports themes, row selection, and column formatting.
Using AG-Grid in Streamlit
To use AG-Grid in Streamlit, install the streamlit-aggrid
package:
pip install streamlit-aggrid
Here’s how to display a large dataset efficiently:
import streamlit as st
import pandas as pd
from st_aggrid import AgGrid
# Load data (using Polars, then converting to pandas for Streamlit)
import polars as pl
df = pl.read_csv("large_dataset.csv").to_pandas()
# Display using AG-Grid
st.write("Interactive Data Table")
AgGrid(df, height=400, fit_columns_on_grid_load=True)
Lazy Loading for Large Datasets
AG-Grid supports lazy loading, ensuring that only the visible rows are rendered, improving performance.
from st_aggrid import GridOptionsBuilder
# Define grid options
grid_options = GridOptionsBuilder.from_dataframe(df)
grid_options.configure_pagination(enabled=True)
grid_options.configure_side_bar()
grid_response = AgGrid(
df,
gridOptions=grid_options.build(),
enable_enterprise_modules=True,
height=400,
fit_columns_on_grid_load=True,
)
Adding Sorting, Filtering, and Editing
AG-Grid provides built-in sorting, filtering, and inline editing:
grid_options.configure_default_column(editable=True, filterable=True, sortable=True)
Now users can interact with large datasets without performance bottlenecks! 🎉
Warp Up
Efficiently handling large datasets in Streamlit requires optimizing both back-end processing and front-end rendering:
- Use Polars for fast, memory-efficient data transformations with lazy execution.
- Use AG-Grid for interactive and efficient data visualization with lazy loading.
By combining these two tools, you can build high-performance, scalable Streamlit apps that handle large data seamlessly. Try it out and see the difference! 🚀