Hosting Streamlit in Snowflake with AWS PrivateLink Access Without Authentication
READER BEWARE: THE FOLLOWING WRITTEN ENTIRELY BY AI WITHOUT HUMAN EDITING.
Introduction
Snowflake’s Streamlit integration enables organizations to build and deploy data applications directly within their data cloud platform. However, there’s a unique use case where you need to expose a Streamlit app via AWS PrivateLink without requiring authentication, while maintaining robust security through defense-in-depth principles. This guide explores how to architect, secure, and implement such a solution.
Use Case Overview
The scenario involves:
- Application: Streamlit app hosted in Snowflake
- Access Method: AWS PrivateLink only (no public internet access)
- Authentication: No user login required
- Security Requirement: Multi-layered defense despite no authentication
This pattern is useful for internal dashboards, monitoring displays, or public-facing analytics where network-level security provides the primary access control.
Architecture Overview
High-Level Architecture
┌─────────────────────────────────────────────────────────────┐
│ AWS Account (Consumer) │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ VPC (10.0.0.0/16) │ │
│ │ ┌──────────────────────────────────────────────────┐ │ │
│ │ │ Private Subnet (10.0.1.0/24) │ │ │
│ │ │ ┌────────────┐ ┌─────────────────────┐ │ │ │
│ │ │ │ EC2 │ │ VPC Endpoint │ │ │ │
│ │ │ │ Instance │────────▶│ (PrivateLink) │ │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ └────────────┘ └──────────┬──────────┘ │ │ │
│ │ │ │ │ │ │
│ │ └────────────────────────────────────┼──────────────┘ │ │
│ │ │ │ │
│ └───────────────────────────────────────┼─────────────────┘ │
│ │ │
└──────────────────────────────────────────┼───────────────────┘
│
│ Private Connection
│
┌──────────────────────────────────────────┼───────────────────┐
│ │ │
│ Snowflake Account │ │
│ ┌───────────────────────────────────────▼────────────────┐ │
│ │ PrivateLink Endpoint │ │
│ │ ┌─────────────────────────────────────────────────┐ │ │
│ │ │ Streamlit App (UNAUTHENTICATED) │ │ │
│ │ │ ┌─────────────────────────────────────────┐ │ │ │
│ │ │ │ - Read-only queries │ │ │ │
│ │ │ │ - Public/aggregated data only │ │ │ │
│ │ │ │ - Rate limiting enabled │ │ │ │
│ │ │ │ - Audit logging active │ │ │ │
│ │ │ └─────────────────────────────────────────┘ │ │ │
│ │ └─────────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────┘
Key Components
- Snowflake Streamlit App: Python application with unauthenticated access enabled
- AWS PrivateLink: Private network connectivity between AWS VPC and Snowflake
- VPC Endpoint: AWS endpoint service connected to Snowflake PrivateLink
- Security Groups: Network-level access controls
- Snowflake Network Policies: IP allowlisting and blocking rules
Defense in Depth Strategy
Since authentication is disabled, we implement multiple security layers:
Layer 1: Network Isolation (Primary Control)
PrivateLink as the Foundation
- Traffic never traverses public internet
- Only accessible from specific AWS VPC(s)
- No DNS resolution from public internet
- Encrypted in transit automatically
Why This Matters: Even without authentication, attackers cannot reach the application without first compromising your AWS infrastructure.
Layer 2: Network Policies (Essential)
Snowflake Network Policy
-- Restrict access to specific VPC CIDR blocks only
CREATE NETWORK POLICY streamlit_privatelink_only
ALLOWED_IP_LIST = ('10.0.0.0/8', '172.16.0.0/12') -- Your VPC CIDR ranges
BLOCKED_IP_LIST = ('0.0.0.0/0'); -- Block everything else
-- Apply to the Streamlit user/role
ALTER USER streamlit_app_user SET NETWORK_POLICY = streamlit_privatelink_only;
AWS Security Groups
- Restrict inbound traffic to specific source security groups or IP ranges
- Implement least-privilege egress rules
- Log all connection attempts
Layer 3: Application-Level Controls (Compensating Controls)
Read-Only Access
# Configure Streamlit to use read-only database role
snowflake_connection = {
'user': 'STREAMLIT_READONLY_USER',
'role': 'STREAMLIT_READONLY_ROLE',
'warehouse': 'STREAMLIT_WH',
'database': 'PUBLIC_DATA',
'schema': 'AGGREGATED'
}
Data Access Restrictions
- Use dedicated read-only role with minimal privileges
- Only grant SELECT on specific views/tables with aggregated data
- Never expose PII or sensitive data
- Implement row-level security even for “public” data
Layer 4: Rate Limiting and Resource Controls
Snowflake Resource Monitors
-- Prevent runaway query costs
CREATE RESOURCE MONITOR streamlit_app_monitor
WITH CREDIT_QUOTA = 100
FREQUENCY = MONTHLY
START_TIMESTAMP = IMMEDIATELY
TRIGGERS
ON 75 PERCENT DO NOTIFY
ON 100 PERCENT DO SUSPEND;
-- Apply to the warehouse
ALTER WAREHOUSE streamlit_wh SET RESOURCE_MONITOR = streamlit_app_monitor;
Application-Level Rate Limiting
import streamlit as st
from datetime import datetime, timedelta
import hashlib
def rate_limit_check(client_id, max_requests=100, window_minutes=60):
"""
Simple rate limiting based on client identifier
In production, use Redis or similar for distributed systems
"""
if 'rate_limit' not in st.session_state:
st.session_state.rate_limit = {}
now = datetime.now()
client_key = hashlib.md5(client_id.encode()).hexdigest()
if client_key not in st.session_state.rate_limit:
st.session_state.rate_limit[client_key] = []
# Clean old requests
cutoff = now - timedelta(minutes=window_minutes)
st.session_state.rate_limit[client_key] = [
req_time for req_time in st.session_state.rate_limit[client_key]
if req_time > cutoff
]
# Check limit
if len(st.session_state.rate_limit[client_key]) >= max_requests:
return False
st.session_state.rate_limit[client_key].append(now)
return True
Layer 5: Monitoring and Audit (Detection Control)
Query Logging
-- Enable query logging for the Streamlit user
ALTER USER streamlit_app_user SET ENABLE_UNREDACTED_QUERY_SYNTAX_ERROR = TRUE;
-- Monitor query history
CREATE OR REPLACE VIEW security.streamlit_audit AS
SELECT
query_id,
query_text,
user_name,
role_name,
warehouse_name,
execution_status,
error_code,
error_message,
start_time,
end_time,
total_elapsed_time,
bytes_scanned,
rows_produced
FROM snowflake.account_usage.query_history
WHERE user_name = 'STREAMLIT_APP_USER'
ORDER BY start_time DESC;
CloudWatch Integration
- Monitor VPC endpoint connection attempts
- Alert on unusual traffic patterns
- Track data transfer volumes
- Monitor query execution patterns
Layer 6: Data Sanitization (Preventive Control)
Create Sanitized Views
-- Create views that only expose aggregated, non-sensitive data
CREATE OR REPLACE VIEW public_data.sales_dashboard AS
SELECT
DATE_TRUNC('month', sale_date) AS month,
product_category,
COUNT(*) AS transaction_count,
SUM(amount) AS total_sales,
AVG(amount) AS avg_sale
FROM raw_data.sales
WHERE sale_date >= DATEADD('year', -2, CURRENT_DATE())
GROUP BY DATE_TRUNC('month', sale_date), product_category;
-- Grant access to read-only role
GRANT SELECT ON VIEW public_data.sales_dashboard TO ROLE streamlit_readonly_role;
Implementation Walkthrough
Prerequisites
Before starting, ensure you have:
- Snowflake Enterprise Edition or higher (required for PrivateLink)
- AWS account with VPC created
- Appropriate Snowflake privileges (ACCOUNTADMIN role)
- AWS IAM permissions for VPC and PrivateLink management
Step 1: Enable Snowflake PrivateLink
In Snowflake (as ACCOUNTADMIN):
-- Enable PrivateLink for your account
-- This generates the AWS PrivateLink configuration details
USE ROLE ACCOUNTADMIN;
-- Request PrivateLink enablement (contact Snowflake Support first)
-- Once enabled, get your PrivateLink configuration
SHOW PARAMETERS LIKE 'PRIVATELINK_ACCOUNT_URL' IN ACCOUNT;
-- Note the service name for AWS setup
-- Format: com.amazonaws.vpce.<region>.<snowflake-vpc-id>
Get PrivateLink Details:
-- Get your account's PrivateLink service name
SELECT SYSTEM$GET_PRIVATELINK_CONFIG();
-- This returns JSON with:
-- - privatelink-account-name: Your Snowflake PrivateLink URL
-- - privatelink_account_url: Full connection URL
-- - privatelink-vpce-id: AWS VPC Endpoint Service ID
-- - regionless-privatelink-account-url: Alternative URL
Step 2: Configure AWS PrivateLink
Create VPC Endpoint in AWS:
#!/bin/bash
# Pseudo-code for AWS CLI commands
# Variables from Snowflake output
SNOWFLAKE_VPCE_SERVICE="com.amazonaws.vpce.us-east-1.vpce-svc-xxxxxxxxx"
VPC_ID="vpc-xxxxxxxxx"
SUBNET_IDS="subnet-xxxxxxxx,subnet-xxxxxxxx"
SECURITY_GROUP_ID="sg-xxxxxxxxx"
REGION="us-east-1"
# Create VPC endpoint
aws ec2 create-vpc-endpoint \
--vpc-id $VPC_ID \
--vpc-endpoint-type Interface \
--service-name $SNOWFLAKE_VPCE_SERVICE \
--subnet-ids $SUBNET_IDS \
--security-group-ids $SECURITY_GROUP_ID \
--region $REGION \
--private-dns-enabled
# Output: VPC Endpoint ID (vpce-xxxxxxxxx)
Configure Security Group:
# Allow inbound HTTPS (443) from your application subnet
aws ec2 authorize-security-group-ingress \
--group-id $SECURITY_GROUP_ID \
--protocol tcp \
--port 443 \
--cidr 10.0.1.0/24 \
--region $REGION
Step 3: Configure DNS (Optional but Recommended)
Private Hosted Zone in Route 53:
# Create private hosted zone for Snowflake domain
aws route53 create-hosted-zone \
--name "privatelink.snowflakecomputing.com" \
--vpc VPCRegion=$REGION,VPCId=$VPC_ID \
--caller-reference $(date +%s)
# Add A record pointing to VPC endpoint
aws route53 change-resource-record-sets \
--hosted-zone-id $HOSTED_ZONE_ID \
--change-batch '{
"Changes": [{
"Action": "CREATE",
"ResourceRecordSet": {
"Name": "myaccount.privatelink.snowflakecomputing.com",
"Type": "A",
"AliasTarget": {
"HostedZoneId": "VPC_ENDPOINT_HOSTED_ZONE_ID",
"DNSName": "vpce-xxxxxxxxx.vpce-svc-xxxxxxxxx.us-east-1.vpce.amazonaws.com",
"EvaluateTargetHealth": false
}
}
}]
}'
Step 4: Create Snowflake Objects
Database and Schema:
USE ROLE ACCOUNTADMIN;
-- Create dedicated database for public Streamlit apps
CREATE DATABASE IF NOT EXISTS streamlit_apps;
CREATE SCHEMA IF NOT EXISTS streamlit_apps.public_dashboards;
-- Create database for sanitized public data
CREATE DATABASE IF NOT EXISTS public_data;
CREATE SCHEMA IF NOT EXISTS public_data.aggregated;
Warehouse:
-- Create small warehouse for Streamlit queries
CREATE WAREHOUSE IF NOT EXISTS streamlit_wh
WITH
WAREHOUSE_SIZE = 'XSMALL'
AUTO_SUSPEND = 60
AUTO_RESUME = TRUE
MIN_CLUSTER_COUNT = 1
MAX_CLUSTER_COUNT = 2
SCALING_POLICY = 'STANDARD'
INITIALLY_SUSPENDED = TRUE
COMMENT = 'Warehouse for unauthenticated Streamlit apps';
-- Apply resource monitor
ALTER WAREHOUSE streamlit_wh SET RESOURCE_MONITOR = streamlit_app_monitor;
Roles and Users:
-- Create read-only role for Streamlit app
CREATE ROLE IF NOT EXISTS streamlit_readonly_role
COMMENT = 'Read-only role for public Streamlit applications';
-- Grant minimal privileges
GRANT USAGE ON DATABASE public_data TO ROLE streamlit_readonly_role;
GRANT USAGE ON SCHEMA public_data.aggregated TO ROLE streamlit_readonly_role;
GRANT SELECT ON ALL VIEWS IN SCHEMA public_data.aggregated TO ROLE streamlit_readonly_role;
GRANT USAGE ON WAREHOUSE streamlit_wh TO ROLE streamlit_readonly_role;
-- Create service user for Streamlit app
CREATE USER IF NOT EXISTS streamlit_app_user
PASSWORD = 'SECURE_GENERATED_PASSWORD'
DEFAULT_ROLE = streamlit_readonly_role
DEFAULT_WAREHOUSE = streamlit_wh
COMMENT = 'Service account for unauthenticated Streamlit apps';
-- Grant role to user
GRANT ROLE streamlit_readonly_role TO USER streamlit_app_user;
-- Apply network policy
ALTER USER streamlit_app_user SET NETWORK_POLICY = streamlit_privatelink_only;
Network Policy:
-- Create network policy allowing only PrivateLink IPs
CREATE NETWORK POLICY streamlit_privatelink_only
ALLOWED_IP_LIST = (
'10.0.0.0/8', -- Your VPC CIDR
'172.16.0.0/12' -- Additional private ranges if needed
)
BLOCKED_IP_LIST = ('0.0.0.0/0') -- Block everything else
COMMENT = 'Allow access only via PrivateLink from specific VPC';
-- Apply to user
ALTER USER streamlit_app_user SET NETWORK_POLICY = streamlit_privatelink_only;
Step 5: Create Streamlit Application
Main Application File (streamlit_app.py):
import streamlit as st
import snowflake.connector
from snowflake.connector.errors import Error
import pandas as pd
import plotly.express as px
from datetime import datetime, timedelta
import hashlib
import os
# Page configuration
st.set_page_config(
page_title="Sales Dashboard",
page_icon="📊",
layout="wide"
)
# Connection configuration using environment variables
# These are set in Snowflake when deploying the Streamlit app
def get_snowflake_connection():
"""
Create Snowflake connection using app credentials
Note: In Streamlit in Snowflake, connection is handled differently
This is pseudo-code showing the concept
"""
try:
conn = snowflake.connector.connect(
user=os.environ.get('SNOWFLAKE_USER'),
password=os.environ.get('SNOWFLAKE_PASSWORD'),
account=os.environ.get('SNOWFLAKE_ACCOUNT'),
warehouse='STREAMLIT_WH',
database='PUBLIC_DATA',
schema='AGGREGATED',
role='STREAMLIT_READONLY_ROLE',
# Use PrivateLink URL
host=os.environ.get('SNOWFLAKE_PRIVATELINK_HOST')
)
return conn
except Error as e:
st.error(f"Connection failed: {str(e)}")
return None
# Rate limiting function
def check_rate_limit(max_requests=100, window_minutes=60):
"""
Basic rate limiting using session state
Production should use external store (Redis, etc.)
"""
# Use forwarded IP or session ID as identifier
client_id = st.session_state.get('client_id', 'default')
if 'rate_limit' not in st.session_state:
st.session_state.rate_limit = {}
now = datetime.now()
client_key = hashlib.md5(client_id.encode()).hexdigest()
if client_key not in st.session_state.rate_limit:
st.session_state.rate_limit[client_key] = []
# Clean old requests
cutoff = now - timedelta(minutes=window_minutes)
st.session_state.rate_limit[client_key] = [
req_time for req_time in st.session_state.rate_limit[client_key]
if req_time > cutoff
]
# Check limit
if len(st.session_state.rate_limit[client_key]) >= max_requests:
return False
st.session_state.rate_limit[client_key].append(now)
return True
# Query execution with error handling
def execute_query(conn, query, params=None):
"""
Execute query with error handling and logging
"""
try:
if params:
cursor = conn.cursor().execute(query, params)
else:
cursor = conn.cursor().execute(query)
df = cursor.fetch_pandas_all()
return df
except Error as e:
st.error(f"Query execution failed: {str(e)}")
# Log error for monitoring
print(f"Query error: {str(e)} | Query: {query}")
return None
finally:
if cursor:
cursor.close()
# Main application
def main():
# Check rate limit
if not check_rate_limit():
st.error("⚠️ Rate limit exceeded. Please try again later.")
st.stop()
# Header
st.title("📊 Sales Performance Dashboard")
st.markdown("""
This dashboard displays aggregated sales metrics.
Data is refreshed daily and shows trends over the past 24 months.
""")
# Get connection
conn = get_snowflake_connection()
if not conn:
st.error("Unable to connect to data source. Please try again later.")
st.stop()
try:
# Date filter
col1, col2 = st.columns(2)
with col1:
start_date = st.date_input(
"Start Date",
value=datetime.now() - timedelta(days=90)
)
with col2:
end_date = st.date_input(
"End Date",
value=datetime.now()
)
# Query for sales data
query = """
SELECT
MONTH,
PRODUCT_CATEGORY,
TRANSACTION_COUNT,
TOTAL_SALES,
AVG_SALE
FROM public_data.aggregated.sales_dashboard
WHERE MONTH BETWEEN %s AND %s
ORDER BY MONTH DESC, PRODUCT_CATEGORY
"""
# Execute query
df = execute_query(conn, query, (start_date, end_date))
if df is not None and not df.empty:
# Display metrics
st.subheader("📈 Key Metrics")
col1, col2, col3 = st.columns(3)
with col1:
total_sales = df['TOTAL_SALES'].sum()
st.metric("Total Sales", f"${total_sales:,.2f}")
with col2:
total_transactions = df['TRANSACTION_COUNT'].sum()
st.metric("Total Transactions", f"{total_transactions:,}")
with col3:
avg_transaction = total_sales / total_transactions if total_transactions > 0 else 0
st.metric("Avg Transaction", f"${avg_transaction:,.2f}")
# Sales by category chart
st.subheader("💼 Sales by Category")
fig_category = px.bar(
df.groupby('PRODUCT_CATEGORY')['TOTAL_SALES'].sum().reset_index(),
x='PRODUCT_CATEGORY',
y='TOTAL_SALES',
title='Total Sales by Product Category'
)
st.plotly_chart(fig_category, use_container_width=True)
# Sales trend over time
st.subheader("📅 Sales Trend")
fig_trend = px.line(
df.groupby('MONTH')['TOTAL_SALES'].sum().reset_index(),
x='MONTH',
y='TOTAL_SALES',
title='Sales Trend Over Time'
)
st.plotly_chart(fig_trend, use_container_width=True)
# Show data table
with st.expander("📋 View Raw Data"):
st.dataframe(df)
else:
st.warning("No data available for the selected date range.")
finally:
# Always close connection
if conn:
conn.close()
# Run the app
if __name__ == "__main__":
main()
Environment Configuration (environment.yml):
# Streamlit in Snowflake environment configuration
name: streamlit_public_app
channels:
- snowflake
dependencies:
- python=3.8
- snowflake-connector-python
- pandas
- plotly
- streamlit
Step 6: Deploy Streamlit in Snowflake
Create Streamlit App in Snowflake:
USE ROLE ACCOUNTADMIN;
USE DATABASE streamlit_apps;
USE SCHEMA public_dashboards;
-- Create the Streamlit app
CREATE STREAMLIT streamlit_apps.public_dashboards.sales_dashboard
ROOT_LOCATION = '@streamlit_apps.public_dashboards.streamlit_stage'
MAIN_FILE = 'streamlit_app.py'
QUERY_WAREHOUSE = streamlit_wh
COMMENT = 'Public sales dashboard accessible via PrivateLink';
-- Upload files to stage
PUT file:///path/to/streamlit_app.py @streamlit_stage AUTO_COMPRESS=FALSE OVERWRITE=TRUE;
PUT file:///path/to/environment.yml @streamlit_stage AUTO_COMPRESS=FALSE OVERWRITE=TRUE;
Configure Unauthenticated Access:
-- Enable unauthenticated access for the Streamlit app
-- This is the critical setting that removes authentication requirement
ALTER STREAMLIT streamlit_apps.public_dashboards.sales_dashboard
SET UNAUTHENTICATED_ACCESS = TRUE;
-- Get the Streamlit app URL
SHOW STREAMLITS LIKE 'sales_dashboard';
-- Note the URL format for PrivateLink access:
-- https://<account>.privatelink.snowflakecomputing.com/-/streamlit/<database>/<schema>/<app_name>
Step 7: Testing and Validation
Test from AWS EC2 Instance:
#!/bin/bash
# Run from EC2 instance in your VPC with access to VPC endpoint
# Test DNS resolution
nslookup myaccount.privatelink.snowflakecomputing.com
# Test connectivity
curl -v https://myaccount.privatelink.snowflakecomputing.com/-/streamlit/streamlit_apps/public_dashboards/sales_dashboard
# Test from outside VPC (should fail)
# This validates that access is truly restricted to PrivateLink
Verify Security Controls:
-- Check network policy is applied
SHOW PARAMETERS LIKE 'NETWORK_POLICY' FOR USER streamlit_app_user;
-- Verify query history
SELECT
query_text,
execution_status,
start_time,
user_name
FROM snowflake.account_usage.query_history
WHERE user_name = 'STREAMLIT_APP_USER'
ORDER BY start_time DESC
LIMIT 10;
-- Check warehouse usage
SELECT
warehouse_name,
SUM(credits_used) as total_credits,
COUNT(*) as query_count
FROM snowflake.account_usage.warehouse_metering_history
WHERE warehouse_name = 'STREAMLIT_WH'
AND start_time >= DATEADD('day', -7, CURRENT_TIMESTAMP())
GROUP BY warehouse_name;
Step 8: Monitoring Setup
CloudWatch Alarms:
# Pseudo-code for CloudWatch alarm creation
import boto3
cloudwatch = boto3.client('cloudwatch', region_name='us-east-1')
# Alarm for VPC endpoint connection count
cloudwatch.put_metric_alarm(
AlarmName='StreamlitVPCEndpointConnections',
ComparisonOperator='GreaterThanThreshold',
EvaluationPeriods=2,
MetricName='PacketCount',
Namespace='AWS/PrivateLink',
Period=300,
Statistic='Sum',
Threshold=10000,
ActionsEnabled=True,
AlarmDescription='Alert on high VPC endpoint traffic',
Dimensions=[
{
'Name': 'VPC Endpoint Id',
'Value': 'vpce-xxxxxxxxx'
}
]
)
# Alarm for data transfer
cloudwatch.put_metric_alarm(
AlarmName='StreamlitDataTransfer',
ComparisonOperator='GreaterThanThreshold',
EvaluationPeriods=1,
MetricName='BytesProcessed',
Namespace='AWS/PrivateLink',
Period=3600,
Statistic='Sum',
Threshold=1000000000, # 1 GB
ActionsEnabled=True,
AlarmDescription='Alert on high data transfer'
)
Snowflake Alert:
-- Create alert for suspicious query patterns
CREATE OR REPLACE ALERT streamlit_app.public_dashboards.suspicious_queries
WAREHOUSE = streamlit_wh
SCHEDULE = '5 MINUTE'
IF (EXISTS (
SELECT 1
FROM snowflake.account_usage.query_history
WHERE user_name = 'STREAMLIT_APP_USER'
AND start_time >= DATEADD('minute', -5, CURRENT_TIMESTAMP())
AND (
execution_status = 'FAILED'
OR query_text ILIKE '%DELETE%'
OR query_text ILIKE '%UPDATE%'
OR query_text ILIKE '%INSERT%'
OR query_text ILIKE '%DROP%'
)
))
THEN CALL system$send_email(
'security_team@company.com',
'Suspicious Streamlit App Activity',
'Potential unauthorized activity detected on Streamlit app'
);
-- Resume alert
ALTER ALERT streamlit_app.public_dashboards.suspicious_queries RESUME;
Compensating Controls Summary
Since authentication is disabled, these compensating controls are mandatory:
| Control Layer | Control Type | Purpose | Priority |
|---|---|---|---|
| PrivateLink Only | Preventive | Prevent public internet access | Critical |
| Network Policy | Preventive | IP allowlisting at Snowflake level | Critical |
| Security Groups | Preventive | VPC-level access control | High |
| Read-Only Role | Preventive | Prevent data modification | Critical |
| Data Sanitization | Preventive | No PII/sensitive data exposure | Critical |
| Rate Limiting | Preventive | Prevent abuse | High |
| Resource Monitors | Preventive | Control costs | High |
| Query Logging | Detective | Audit trail | High |
| CloudWatch Monitoring | Detective | Traffic analysis | Medium |
| Alerts | Detective | Anomaly detection | Medium |
Security Considerations
Risks and Mitigations
Risk 1: Network-Level Attack
- Scenario: Attacker compromises AWS account
- Mitigation:
- Multi-factor authentication on AWS
- CloudTrail logging
- Regular access reviews
- Separate VPC for Streamlit access
- Security group restrictions
Risk 2: Data Exfiltration
- Scenario: Large-scale data extraction via app
- Mitigation:
- Rate limiting
- Query result size limits
- Resource monitors
- Only aggregated data in views
- CloudWatch data transfer monitoring
Risk 3: Resource Abuse
- Scenario: Excessive queries causing high costs
- Mitigation:
- Warehouse auto-suspend
- Resource monitors with suspend triggers
- Query timeout settings
- Small warehouse size
Risk 4: Lateral Movement
- Scenario: Compromised app used to access other resources
- Mitigation:
- Dedicated service account
- Minimal privilege role
- Separate database for public data
- Network segmentation
Query Safety Configuration
-- Set query timeout (5 minutes max)
ALTER USER streamlit_app_user SET STATEMENT_TIMEOUT_IN_SECONDS = 300;
-- Set query result size limit (1000 rows max)
ALTER USER streamlit_app_user SET STATEMENT_QUEUED_TIMEOUT_IN_SECONDS = 60;
-- Disable multi-statement queries
ALTER USER streamlit_app_user SET MULTI_STATEMENT_COUNT = 0;
Advanced Configurations
Multiple VPCs
To allow access from multiple VPCs:
-- Update network policy with multiple CIDR blocks
ALTER NETWORK POLICY streamlit_privatelink_only
SET ALLOWED_IP_LIST = (
'10.0.0.0/8', -- VPC 1
'172.16.0.0/12', -- VPC 2
'192.168.0.0/16' -- VPC 3
);
Geo-Restricted Access
Add geo-blocking at CloudFront or AWS WAF level:
# Pseudo-code for AWS WAF geo-blocking
import boto3
waf = boto3.client('wafv2', region_name='us-east-1')
# Create geo-match set
response = waf.create_geo_match_set(
Name='AllowUSOnly',
ChangeToken='CHANGE_TOKEN',
GeoMatchTuples=[
{
'Type': 'Country',
'Value': 'US'
}
]
)
Session Management
Even without authentication, implement basic session tracking:
import streamlit as st
from datetime import datetime
import uuid
def init_session():
"""Initialize session tracking"""
if 'session_id' not in st.session_state:
st.session_state.session_id = str(uuid.uuid4())
st.session_state.session_start = datetime.now()
st.session_state.request_count = 0
# Increment request counter
st.session_state.request_count += 1
# Log session activity (pseudo-code)
log_session_activity(
session_id=st.session_state.session_id,
request_count=st.session_state.request_count,
timestamp=datetime.now()
)
Operational Considerations
Maintenance Windows
-- Schedule regular maintenance
-- Disable app during maintenance
ALTER STREAMLIT streamlit_apps.public_dashboards.sales_dashboard SUSPEND;
-- Perform maintenance tasks
-- Update data, refresh views, etc.
-- Re-enable app
ALTER STREAMLIT streamlit_apps.public_dashboards.sales_dashboard RESUME;
Backup and Disaster Recovery
-- Backup Streamlit app code and configuration
CREATE OR REPLACE PROCEDURE backup_streamlit_app()
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
BEGIN
-- Export Streamlit configuration
-- Copy to backup location
-- Document current state
RETURN 'Backup completed';
END;
$$;
Cost Management
-- Monitor daily costs
SELECT
warehouse_name,
DATE(start_time) as usage_date,
SUM(credits_used) as daily_credits,
SUM(credits_used) * 2.0 as estimated_cost_usd -- Adjust rate
FROM snowflake.account_usage.warehouse_metering_history
WHERE warehouse_name = 'STREAMLIT_WH'
AND start_time >= DATEADD('day', -30, CURRENT_TIMESTAMP())
GROUP BY warehouse_name, DATE(start_time)
ORDER BY usage_date DESC;
Troubleshooting Guide
Common Issues
Issue 1: Cannot Access App from VPC
# Diagnostic steps
# 1. Verify VPC endpoint is active
aws ec2 describe-vpc-endpoints --vpc-endpoint-ids vpce-xxxxxxxxx
# 2. Check security group rules
aws ec2 describe-security-groups --group-ids sg-xxxxxxxxx
# 3. Test DNS resolution
nslookup myaccount.privatelink.snowflakecomputing.com
# 4. Test HTTPS connectivity
curl -v https://myaccount.privatelink.snowflakecomputing.com
Issue 2: Network Policy Blocking Access
-- Verify current policy
SHOW NETWORK POLICIES;
DESC NETWORK POLICY streamlit_privatelink_only;
-- Check user's network policy
SHOW PARAMETERS LIKE 'NETWORK_POLICY' FOR USER streamlit_app_user;
-- Temporarily disable for testing (DO NOT DO IN PRODUCTION)
-- ALTER USER streamlit_app_user UNSET NETWORK_POLICY;
Issue 3: High Query Costs
-- Identify expensive queries
SELECT
query_id,
query_text,
total_elapsed_time,
bytes_scanned,
rows_produced,
credits_used_cloud_services
FROM snowflake.account_usage.query_history
WHERE user_name = 'STREAMLIT_APP_USER'
AND start_time >= DATEADD('day', -1, CURRENT_TIMESTAMP())
ORDER BY credits_used_cloud_services DESC
LIMIT 10;
-- Optimize expensive queries by:
-- 1. Adding filters to views
-- 2. Pre-aggregating data
-- 3. Using result caching
Conclusion
Hosting an unauthenticated Streamlit app in Snowflake via PrivateLink requires careful implementation of defense-in-depth principles. The key success factors are:
- Network Isolation: PrivateLink ensures traffic never touches public internet
- Layered Security: Multiple compensating controls prevent abuse
- Data Sanitization: Only expose aggregated, non-sensitive data
- Monitoring: Comprehensive logging and alerting detect anomalies
- Resource Controls: Rate limiting and quotas prevent runaway costs
This architecture is suitable for:
- Internal corporate dashboards
- Public-facing analytics with non-sensitive data
- Monitoring and observability displays
- Status pages and metrics dashboards
Final Security Reminder: Despite these controls, unauthenticated access should only be used when:
- Data is truly public or aggregated
- Business requirements explicitly prohibit authentication
- All compensating controls are implemented and maintained
- Regular security reviews are conducted
For applications with sensitive data or higher security requirements, always implement proper authentication and authorization, even with PrivateLink.