Hosting Streamlit in Snowflake with AWS PrivateLink Access Without Authentication

READER BEWARE: THE FOLLOWING WRITTEN ENTIRELY BY AI WITHOUT HUMAN EDITING.

Introduction

Snowflake’s Streamlit integration enables organizations to build and deploy data applications directly within their data cloud platform. However, there’s a unique use case where you need to expose a Streamlit app via AWS PrivateLink without requiring authentication, while maintaining robust security through defense-in-depth principles. This guide explores how to architect, secure, and implement such a solution.

Use Case Overview

The scenario involves:

  • Application: Streamlit app hosted in Snowflake
  • Access Method: AWS PrivateLink only (no public internet access)
  • Authentication: No user login required
  • Security Requirement: Multi-layered defense despite no authentication

This pattern is useful for internal dashboards, monitoring displays, or public-facing analytics where network-level security provides the primary access control.

Architecture Overview

High-Level Architecture

┌─────────────────────────────────────────────────────────────┐
│                    AWS Account (Consumer)                    │
│  ┌────────────────────────────────────────────────────────┐ │
│  │                  VPC (10.0.0.0/16)                      │ │
│  │  ┌──────────────────────────────────────────────────┐  │ │
│  │  │         Private Subnet (10.0.1.0/24)              │  │ │
│  │  │  ┌────────────┐         ┌─────────────────────┐  │  │ │
│  │  │  │   EC2      │         │  VPC Endpoint       │  │  │ │
│  │  │  │  Instance  │────────▶│  (PrivateLink)      │  │  │ │
│  │  │  │            │         │                     │  │  │ │
│  │  │  └────────────┘         └──────────┬──────────┘  │  │ │
│  │  │                                    │              │  │ │
│  │  └────────────────────────────────────┼──────────────┘  │ │
│  │                                       │                 │ │
│  └───────────────────────────────────────┼─────────────────┘ │
│                                          │                   │
└──────────────────────────────────────────┼───────────────────┘
                                           │
                                           │ Private Connection
                                           │
┌──────────────────────────────────────────┼───────────────────┐
│                                          │                   │
│                  Snowflake Account       │                   │
│  ┌───────────────────────────────────────▼────────────────┐ │
│  │              PrivateLink Endpoint                       │ │
│  │  ┌─────────────────────────────────────────────────┐   │ │
│  │  │          Streamlit App (UNAUTHENTICATED)        │   │ │
│  │  │  ┌─────────────────────────────────────────┐    │   │ │
│  │  │  │     - Read-only queries                 │    │   │ │
│  │  │  │     - Public/aggregated data only       │    │   │ │
│  │  │  │     - Rate limiting enabled             │    │   │ │
│  │  │  │     - Audit logging active              │    │   │ │
│  │  │  └─────────────────────────────────────────┘    │   │ │
│  │  └─────────────────────────────────────────────────┘   │ │
│  └─────────────────────────────────────────────────────────┘ │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Key Components

  1. Snowflake Streamlit App: Python application with unauthenticated access enabled
  2. AWS PrivateLink: Private network connectivity between AWS VPC and Snowflake
  3. VPC Endpoint: AWS endpoint service connected to Snowflake PrivateLink
  4. Security Groups: Network-level access controls
  5. Snowflake Network Policies: IP allowlisting and blocking rules

Defense in Depth Strategy

Since authentication is disabled, we implement multiple security layers:

Layer 1: Network Isolation (Primary Control)

PrivateLink as the Foundation

  • Traffic never traverses public internet
  • Only accessible from specific AWS VPC(s)
  • No DNS resolution from public internet
  • Encrypted in transit automatically

Why This Matters: Even without authentication, attackers cannot reach the application without first compromising your AWS infrastructure.

Layer 2: Network Policies (Essential)

Snowflake Network Policy

-- Restrict access to specific VPC CIDR blocks only
CREATE NETWORK POLICY streamlit_privatelink_only
  ALLOWED_IP_LIST = ('10.0.0.0/8', '172.16.0.0/12')  -- Your VPC CIDR ranges
  BLOCKED_IP_LIST = ('0.0.0.0/0');  -- Block everything else

-- Apply to the Streamlit user/role
ALTER USER streamlit_app_user SET NETWORK_POLICY = streamlit_privatelink_only;

AWS Security Groups

  • Restrict inbound traffic to specific source security groups or IP ranges
  • Implement least-privilege egress rules
  • Log all connection attempts

Layer 3: Application-Level Controls (Compensating Controls)

Read-Only Access

# Configure Streamlit to use read-only database role
snowflake_connection = {
    'user': 'STREAMLIT_READONLY_USER',
    'role': 'STREAMLIT_READONLY_ROLE',
    'warehouse': 'STREAMLIT_WH',
    'database': 'PUBLIC_DATA',
    'schema': 'AGGREGATED'
}

Data Access Restrictions

  • Use dedicated read-only role with minimal privileges
  • Only grant SELECT on specific views/tables with aggregated data
  • Never expose PII or sensitive data
  • Implement row-level security even for “public” data

Layer 4: Rate Limiting and Resource Controls

Snowflake Resource Monitors

-- Prevent runaway query costs
CREATE RESOURCE MONITOR streamlit_app_monitor
  WITH CREDIT_QUOTA = 100
  FREQUENCY = MONTHLY
  START_TIMESTAMP = IMMEDIATELY
  TRIGGERS
    ON 75 PERCENT DO NOTIFY
    ON 100 PERCENT DO SUSPEND;

-- Apply to the warehouse
ALTER WAREHOUSE streamlit_wh SET RESOURCE_MONITOR = streamlit_app_monitor;

Application-Level Rate Limiting

import streamlit as st
from datetime import datetime, timedelta
import hashlib

def rate_limit_check(client_id, max_requests=100, window_minutes=60):
    """
    Simple rate limiting based on client identifier
    In production, use Redis or similar for distributed systems
    """
    if 'rate_limit' not in st.session_state:
        st.session_state.rate_limit = {}
    
    now = datetime.now()
    client_key = hashlib.md5(client_id.encode()).hexdigest()
    
    if client_key not in st.session_state.rate_limit:
        st.session_state.rate_limit[client_key] = []
    
    # Clean old requests
    cutoff = now - timedelta(minutes=window_minutes)
    st.session_state.rate_limit[client_key] = [
        req_time for req_time in st.session_state.rate_limit[client_key]
        if req_time > cutoff
    ]
    
    # Check limit
    if len(st.session_state.rate_limit[client_key]) >= max_requests:
        return False
    
    st.session_state.rate_limit[client_key].append(now)
    return True

Layer 5: Monitoring and Audit (Detection Control)

Query Logging

-- Enable query logging for the Streamlit user
ALTER USER streamlit_app_user SET ENABLE_UNREDACTED_QUERY_SYNTAX_ERROR = TRUE;

-- Monitor query history
CREATE OR REPLACE VIEW security.streamlit_audit AS
SELECT
    query_id,
    query_text,
    user_name,
    role_name,
    warehouse_name,
    execution_status,
    error_code,
    error_message,
    start_time,
    end_time,
    total_elapsed_time,
    bytes_scanned,
    rows_produced
FROM snowflake.account_usage.query_history
WHERE user_name = 'STREAMLIT_APP_USER'
ORDER BY start_time DESC;

CloudWatch Integration

  • Monitor VPC endpoint connection attempts
  • Alert on unusual traffic patterns
  • Track data transfer volumes
  • Monitor query execution patterns

Layer 6: Data Sanitization (Preventive Control)

Create Sanitized Views

-- Create views that only expose aggregated, non-sensitive data
CREATE OR REPLACE VIEW public_data.sales_dashboard AS
SELECT
    DATE_TRUNC('month', sale_date) AS month,
    product_category,
    COUNT(*) AS transaction_count,
    SUM(amount) AS total_sales,
    AVG(amount) AS avg_sale
FROM raw_data.sales
WHERE sale_date >= DATEADD('year', -2, CURRENT_DATE())
GROUP BY DATE_TRUNC('month', sale_date), product_category;

-- Grant access to read-only role
GRANT SELECT ON VIEW public_data.sales_dashboard TO ROLE streamlit_readonly_role;

Implementation Walkthrough

Prerequisites

Before starting, ensure you have:

  • Snowflake Enterprise Edition or higher (required for PrivateLink)
  • AWS account with VPC created
  • Appropriate Snowflake privileges (ACCOUNTADMIN role)
  • AWS IAM permissions for VPC and PrivateLink management

In Snowflake (as ACCOUNTADMIN):

-- Enable PrivateLink for your account
-- This generates the AWS PrivateLink configuration details
USE ROLE ACCOUNTADMIN;

-- Request PrivateLink enablement (contact Snowflake Support first)
-- Once enabled, get your PrivateLink configuration
SHOW PARAMETERS LIKE 'PRIVATELINK_ACCOUNT_URL' IN ACCOUNT;

-- Note the service name for AWS setup
-- Format: com.amazonaws.vpce.<region>.<snowflake-vpc-id>

Get PrivateLink Details:

-- Get your account's PrivateLink service name
SELECT SYSTEM$GET_PRIVATELINK_CONFIG();

-- This returns JSON with:
-- - privatelink-account-name: Your Snowflake PrivateLink URL
-- - privatelink_account_url: Full connection URL
-- - privatelink-vpce-id: AWS VPC Endpoint Service ID
-- - regionless-privatelink-account-url: Alternative URL

Create VPC Endpoint in AWS:

#!/bin/bash
# Pseudo-code for AWS CLI commands

# Variables from Snowflake output
SNOWFLAKE_VPCE_SERVICE="com.amazonaws.vpce.us-east-1.vpce-svc-xxxxxxxxx"
VPC_ID="vpc-xxxxxxxxx"
SUBNET_IDS="subnet-xxxxxxxx,subnet-xxxxxxxx"
SECURITY_GROUP_ID="sg-xxxxxxxxx"
REGION="us-east-1"

# Create VPC endpoint
aws ec2 create-vpc-endpoint \
    --vpc-id $VPC_ID \
    --vpc-endpoint-type Interface \
    --service-name $SNOWFLAKE_VPCE_SERVICE \
    --subnet-ids $SUBNET_IDS \
    --security-group-ids $SECURITY_GROUP_ID \
    --region $REGION \
    --private-dns-enabled

# Output: VPC Endpoint ID (vpce-xxxxxxxxx)

Configure Security Group:

# Allow inbound HTTPS (443) from your application subnet
aws ec2 authorize-security-group-ingress \
    --group-id $SECURITY_GROUP_ID \
    --protocol tcp \
    --port 443 \
    --cidr 10.0.1.0/24 \
    --region $REGION

Private Hosted Zone in Route 53:

# Create private hosted zone for Snowflake domain
aws route53 create-hosted-zone \
    --name "privatelink.snowflakecomputing.com" \
    --vpc VPCRegion=$REGION,VPCId=$VPC_ID \
    --caller-reference $(date +%s)

# Add A record pointing to VPC endpoint
aws route53 change-resource-record-sets \
    --hosted-zone-id $HOSTED_ZONE_ID \
    --change-batch '{
        "Changes": [{
            "Action": "CREATE",
            "ResourceRecordSet": {
                "Name": "myaccount.privatelink.snowflakecomputing.com",
                "Type": "A",
                "AliasTarget": {
                    "HostedZoneId": "VPC_ENDPOINT_HOSTED_ZONE_ID",
                    "DNSName": "vpce-xxxxxxxxx.vpce-svc-xxxxxxxxx.us-east-1.vpce.amazonaws.com",
                    "EvaluateTargetHealth": false
                }
            }
        }]
    }'

Step 4: Create Snowflake Objects

Database and Schema:

USE ROLE ACCOUNTADMIN;

-- Create dedicated database for public Streamlit apps
CREATE DATABASE IF NOT EXISTS streamlit_apps;
CREATE SCHEMA IF NOT EXISTS streamlit_apps.public_dashboards;

-- Create database for sanitized public data
CREATE DATABASE IF NOT EXISTS public_data;
CREATE SCHEMA IF NOT EXISTS public_data.aggregated;

Warehouse:

-- Create small warehouse for Streamlit queries
CREATE WAREHOUSE IF NOT EXISTS streamlit_wh
WITH
    WAREHOUSE_SIZE = 'XSMALL'
    AUTO_SUSPEND = 60
    AUTO_RESUME = TRUE
    MIN_CLUSTER_COUNT = 1
    MAX_CLUSTER_COUNT = 2
    SCALING_POLICY = 'STANDARD'
    INITIALLY_SUSPENDED = TRUE
    COMMENT = 'Warehouse for unauthenticated Streamlit apps';

-- Apply resource monitor
ALTER WAREHOUSE streamlit_wh SET RESOURCE_MONITOR = streamlit_app_monitor;

Roles and Users:

-- Create read-only role for Streamlit app
CREATE ROLE IF NOT EXISTS streamlit_readonly_role
    COMMENT = 'Read-only role for public Streamlit applications';

-- Grant minimal privileges
GRANT USAGE ON DATABASE public_data TO ROLE streamlit_readonly_role;
GRANT USAGE ON SCHEMA public_data.aggregated TO ROLE streamlit_readonly_role;
GRANT SELECT ON ALL VIEWS IN SCHEMA public_data.aggregated TO ROLE streamlit_readonly_role;
GRANT USAGE ON WAREHOUSE streamlit_wh TO ROLE streamlit_readonly_role;

-- Create service user for Streamlit app
CREATE USER IF NOT EXISTS streamlit_app_user
    PASSWORD = 'SECURE_GENERATED_PASSWORD'
    DEFAULT_ROLE = streamlit_readonly_role
    DEFAULT_WAREHOUSE = streamlit_wh
    COMMENT = 'Service account for unauthenticated Streamlit apps';

-- Grant role to user
GRANT ROLE streamlit_readonly_role TO USER streamlit_app_user;

-- Apply network policy
ALTER USER streamlit_app_user SET NETWORK_POLICY = streamlit_privatelink_only;

Network Policy:

-- Create network policy allowing only PrivateLink IPs
CREATE NETWORK POLICY streamlit_privatelink_only
    ALLOWED_IP_LIST = (
        '10.0.0.0/8',      -- Your VPC CIDR
        '172.16.0.0/12'    -- Additional private ranges if needed
    )
    BLOCKED_IP_LIST = ('0.0.0.0/0')  -- Block everything else
    COMMENT = 'Allow access only via PrivateLink from specific VPC';

-- Apply to user
ALTER USER streamlit_app_user SET NETWORK_POLICY = streamlit_privatelink_only;

Step 5: Create Streamlit Application

Main Application File (streamlit_app.py):

import streamlit as st
import snowflake.connector
from snowflake.connector.errors import Error
import pandas as pd
import plotly.express as px
from datetime import datetime, timedelta
import hashlib
import os

# Page configuration
st.set_page_config(
    page_title="Sales Dashboard",
    page_icon="📊",
    layout="wide"
)

# Connection configuration using environment variables
# These are set in Snowflake when deploying the Streamlit app
def get_snowflake_connection():
    """
    Create Snowflake connection using app credentials
    Note: In Streamlit in Snowflake, connection is handled differently
    This is pseudo-code showing the concept
    """
    try:
        conn = snowflake.connector.connect(
            user=os.environ.get('SNOWFLAKE_USER'),
            password=os.environ.get('SNOWFLAKE_PASSWORD'),
            account=os.environ.get('SNOWFLAKE_ACCOUNT'),
            warehouse='STREAMLIT_WH',
            database='PUBLIC_DATA',
            schema='AGGREGATED',
            role='STREAMLIT_READONLY_ROLE',
            # Use PrivateLink URL
            host=os.environ.get('SNOWFLAKE_PRIVATELINK_HOST')
        )
        return conn
    except Error as e:
        st.error(f"Connection failed: {str(e)}")
        return None

# Rate limiting function
def check_rate_limit(max_requests=100, window_minutes=60):
    """
    Basic rate limiting using session state
    Production should use external store (Redis, etc.)
    """
    # Use forwarded IP or session ID as identifier
    client_id = st.session_state.get('client_id', 'default')
    
    if 'rate_limit' not in st.session_state:
        st.session_state.rate_limit = {}
    
    now = datetime.now()
    client_key = hashlib.md5(client_id.encode()).hexdigest()
    
    if client_key not in st.session_state.rate_limit:
        st.session_state.rate_limit[client_key] = []
    
    # Clean old requests
    cutoff = now - timedelta(minutes=window_minutes)
    st.session_state.rate_limit[client_key] = [
        req_time for req_time in st.session_state.rate_limit[client_key]
        if req_time > cutoff
    ]
    
    # Check limit
    if len(st.session_state.rate_limit[client_key]) >= max_requests:
        return False
    
    st.session_state.rate_limit[client_key].append(now)
    return True

# Query execution with error handling
def execute_query(conn, query, params=None):
    """
    Execute query with error handling and logging
    """
    try:
        if params:
            cursor = conn.cursor().execute(query, params)
        else:
            cursor = conn.cursor().execute(query)
        
        df = cursor.fetch_pandas_all()
        return df
    except Error as e:
        st.error(f"Query execution failed: {str(e)}")
        # Log error for monitoring
        print(f"Query error: {str(e)} | Query: {query}")
        return None
    finally:
        if cursor:
            cursor.close()

# Main application
def main():
    # Check rate limit
    if not check_rate_limit():
        st.error("⚠️ Rate limit exceeded. Please try again later.")
        st.stop()
    
    # Header
    st.title("📊 Sales Performance Dashboard")
    st.markdown("""
    This dashboard displays aggregated sales metrics. 
    Data is refreshed daily and shows trends over the past 24 months.
    """)
    
    # Get connection
    conn = get_snowflake_connection()
    if not conn:
        st.error("Unable to connect to data source. Please try again later.")
        st.stop()
    
    try:
        # Date filter
        col1, col2 = st.columns(2)
        with col1:
            start_date = st.date_input(
                "Start Date",
                value=datetime.now() - timedelta(days=90)
            )
        with col2:
            end_date = st.date_input(
                "End Date",
                value=datetime.now()
            )
        
        # Query for sales data
        query = """
        SELECT
            MONTH,
            PRODUCT_CATEGORY,
            TRANSACTION_COUNT,
            TOTAL_SALES,
            AVG_SALE
        FROM public_data.aggregated.sales_dashboard
        WHERE MONTH BETWEEN %s AND %s
        ORDER BY MONTH DESC, PRODUCT_CATEGORY
        """
        
        # Execute query
        df = execute_query(conn, query, (start_date, end_date))
        
        if df is not None and not df.empty:
            # Display metrics
            st.subheader("📈 Key Metrics")
            col1, col2, col3 = st.columns(3)
            
            with col1:
                total_sales = df['TOTAL_SALES'].sum()
                st.metric("Total Sales", f"${total_sales:,.2f}")
            
            with col2:
                total_transactions = df['TRANSACTION_COUNT'].sum()
                st.metric("Total Transactions", f"{total_transactions:,}")
            
            with col3:
                avg_transaction = total_sales / total_transactions if total_transactions > 0 else 0
                st.metric("Avg Transaction", f"${avg_transaction:,.2f}")
            
            # Sales by category chart
            st.subheader("💼 Sales by Category")
            fig_category = px.bar(
                df.groupby('PRODUCT_CATEGORY')['TOTAL_SALES'].sum().reset_index(),
                x='PRODUCT_CATEGORY',
                y='TOTAL_SALES',
                title='Total Sales by Product Category'
            )
            st.plotly_chart(fig_category, use_container_width=True)
            
            # Sales trend over time
            st.subheader("📅 Sales Trend")
            fig_trend = px.line(
                df.groupby('MONTH')['TOTAL_SALES'].sum().reset_index(),
                x='MONTH',
                y='TOTAL_SALES',
                title='Sales Trend Over Time'
            )
            st.plotly_chart(fig_trend, use_container_width=True)
            
            # Show data table
            with st.expander("📋 View Raw Data"):
                st.dataframe(df)
        else:
            st.warning("No data available for the selected date range.")
    
    finally:
        # Always close connection
        if conn:
            conn.close()

# Run the app
if __name__ == "__main__":
    main()

Environment Configuration (environment.yml):

# Streamlit in Snowflake environment configuration
name: streamlit_public_app
channels:
  - snowflake
dependencies:
  - python=3.8
  - snowflake-connector-python
  - pandas
  - plotly
  - streamlit

Step 6: Deploy Streamlit in Snowflake

Create Streamlit App in Snowflake:

USE ROLE ACCOUNTADMIN;
USE DATABASE streamlit_apps;
USE SCHEMA public_dashboards;

-- Create the Streamlit app
CREATE STREAMLIT streamlit_apps.public_dashboards.sales_dashboard
    ROOT_LOCATION = '@streamlit_apps.public_dashboards.streamlit_stage'
    MAIN_FILE = 'streamlit_app.py'
    QUERY_WAREHOUSE = streamlit_wh
    COMMENT = 'Public sales dashboard accessible via PrivateLink';

-- Upload files to stage
PUT file:///path/to/streamlit_app.py @streamlit_stage AUTO_COMPRESS=FALSE OVERWRITE=TRUE;
PUT file:///path/to/environment.yml @streamlit_stage AUTO_COMPRESS=FALSE OVERWRITE=TRUE;

Configure Unauthenticated Access:

-- Enable unauthenticated access for the Streamlit app
-- This is the critical setting that removes authentication requirement
ALTER STREAMLIT streamlit_apps.public_dashboards.sales_dashboard
    SET UNAUTHENTICATED_ACCESS = TRUE;

-- Get the Streamlit app URL
SHOW STREAMLITS LIKE 'sales_dashboard';

-- Note the URL format for PrivateLink access:
-- https://<account>.privatelink.snowflakecomputing.com/-/streamlit/<database>/<schema>/<app_name>

Step 7: Testing and Validation

Test from AWS EC2 Instance:

#!/bin/bash
# Run from EC2 instance in your VPC with access to VPC endpoint

# Test DNS resolution
nslookup myaccount.privatelink.snowflakecomputing.com

# Test connectivity
curl -v https://myaccount.privatelink.snowflakecomputing.com/-/streamlit/streamlit_apps/public_dashboards/sales_dashboard

# Test from outside VPC (should fail)
# This validates that access is truly restricted to PrivateLink

Verify Security Controls:

-- Check network policy is applied
SHOW PARAMETERS LIKE 'NETWORK_POLICY' FOR USER streamlit_app_user;

-- Verify query history
SELECT
    query_text,
    execution_status,
    start_time,
    user_name
FROM snowflake.account_usage.query_history
WHERE user_name = 'STREAMLIT_APP_USER'
ORDER BY start_time DESC
LIMIT 10;

-- Check warehouse usage
SELECT
    warehouse_name,
    SUM(credits_used) as total_credits,
    COUNT(*) as query_count
FROM snowflake.account_usage.warehouse_metering_history
WHERE warehouse_name = 'STREAMLIT_WH'
    AND start_time >= DATEADD('day', -7, CURRENT_TIMESTAMP())
GROUP BY warehouse_name;

Step 8: Monitoring Setup

CloudWatch Alarms:

# Pseudo-code for CloudWatch alarm creation
import boto3

cloudwatch = boto3.client('cloudwatch', region_name='us-east-1')

# Alarm for VPC endpoint connection count
cloudwatch.put_metric_alarm(
    AlarmName='StreamlitVPCEndpointConnections',
    ComparisonOperator='GreaterThanThreshold',
    EvaluationPeriods=2,
    MetricName='PacketCount',
    Namespace='AWS/PrivateLink',
    Period=300,
    Statistic='Sum',
    Threshold=10000,
    ActionsEnabled=True,
    AlarmDescription='Alert on high VPC endpoint traffic',
    Dimensions=[
        {
            'Name': 'VPC Endpoint Id',
            'Value': 'vpce-xxxxxxxxx'
        }
    ]
)

# Alarm for data transfer
cloudwatch.put_metric_alarm(
    AlarmName='StreamlitDataTransfer',
    ComparisonOperator='GreaterThanThreshold',
    EvaluationPeriods=1,
    MetricName='BytesProcessed',
    Namespace='AWS/PrivateLink',
    Period=3600,
    Statistic='Sum',
    Threshold=1000000000,  # 1 GB
    ActionsEnabled=True,
    AlarmDescription='Alert on high data transfer'
)

Snowflake Alert:

-- Create alert for suspicious query patterns
CREATE OR REPLACE ALERT streamlit_app.public_dashboards.suspicious_queries
    WAREHOUSE = streamlit_wh
    SCHEDULE = '5 MINUTE'
    IF (EXISTS (
        SELECT 1
        FROM snowflake.account_usage.query_history
        WHERE user_name = 'STREAMLIT_APP_USER'
            AND start_time >= DATEADD('minute', -5, CURRENT_TIMESTAMP())
            AND (
                execution_status = 'FAILED'
                OR query_text ILIKE '%DELETE%'
                OR query_text ILIKE '%UPDATE%'
                OR query_text ILIKE '%INSERT%'
                OR query_text ILIKE '%DROP%'
            )
    ))
    THEN CALL system$send_email(
        'security_team@company.com',
        'Suspicious Streamlit App Activity',
        'Potential unauthorized activity detected on Streamlit app'
    );

-- Resume alert
ALTER ALERT streamlit_app.public_dashboards.suspicious_queries RESUME;

Compensating Controls Summary

Since authentication is disabled, these compensating controls are mandatory:

Control LayerControl TypePurposePriority
PrivateLink OnlyPreventivePrevent public internet accessCritical
Network PolicyPreventiveIP allowlisting at Snowflake levelCritical
Security GroupsPreventiveVPC-level access controlHigh
Read-Only RolePreventivePrevent data modificationCritical
Data SanitizationPreventiveNo PII/sensitive data exposureCritical
Rate LimitingPreventivePrevent abuseHigh
Resource MonitorsPreventiveControl costsHigh
Query LoggingDetectiveAudit trailHigh
CloudWatch MonitoringDetectiveTraffic analysisMedium
AlertsDetectiveAnomaly detectionMedium

Security Considerations

Risks and Mitigations

Risk 1: Network-Level Attack

  • Scenario: Attacker compromises AWS account
  • Mitigation:
    • Multi-factor authentication on AWS
    • CloudTrail logging
    • Regular access reviews
    • Separate VPC for Streamlit access
    • Security group restrictions

Risk 2: Data Exfiltration

  • Scenario: Large-scale data extraction via app
  • Mitigation:
    • Rate limiting
    • Query result size limits
    • Resource monitors
    • Only aggregated data in views
    • CloudWatch data transfer monitoring

Risk 3: Resource Abuse

  • Scenario: Excessive queries causing high costs
  • Mitigation:
    • Warehouse auto-suspend
    • Resource monitors with suspend triggers
    • Query timeout settings
    • Small warehouse size

Risk 4: Lateral Movement

  • Scenario: Compromised app used to access other resources
  • Mitigation:
    • Dedicated service account
    • Minimal privilege role
    • Separate database for public data
    • Network segmentation

Query Safety Configuration

-- Set query timeout (5 minutes max)
ALTER USER streamlit_app_user SET STATEMENT_TIMEOUT_IN_SECONDS = 300;

-- Set query result size limit (1000 rows max)
ALTER USER streamlit_app_user SET STATEMENT_QUEUED_TIMEOUT_IN_SECONDS = 60;

-- Disable multi-statement queries
ALTER USER streamlit_app_user SET MULTI_STATEMENT_COUNT = 0;

Advanced Configurations

Multiple VPCs

To allow access from multiple VPCs:

-- Update network policy with multiple CIDR blocks
ALTER NETWORK POLICY streamlit_privatelink_only
    SET ALLOWED_IP_LIST = (
        '10.0.0.0/8',      -- VPC 1
        '172.16.0.0/12',   -- VPC 2
        '192.168.0.0/16'   -- VPC 3
    );

Geo-Restricted Access

Add geo-blocking at CloudFront or AWS WAF level:

# Pseudo-code for AWS WAF geo-blocking
import boto3

waf = boto3.client('wafv2', region_name='us-east-1')

# Create geo-match set
response = waf.create_geo_match_set(
    Name='AllowUSOnly',
    ChangeToken='CHANGE_TOKEN',
    GeoMatchTuples=[
        {
            'Type': 'Country',
            'Value': 'US'
        }
    ]
)

Session Management

Even without authentication, implement basic session tracking:

import streamlit as st
from datetime import datetime
import uuid

def init_session():
    """Initialize session tracking"""
    if 'session_id' not in st.session_state:
        st.session_state.session_id = str(uuid.uuid4())
        st.session_state.session_start = datetime.now()
        st.session_state.request_count = 0
    
    # Increment request counter
    st.session_state.request_count += 1
    
    # Log session activity (pseudo-code)
    log_session_activity(
        session_id=st.session_state.session_id,
        request_count=st.session_state.request_count,
        timestamp=datetime.now()
    )

Operational Considerations

Maintenance Windows

-- Schedule regular maintenance
-- Disable app during maintenance
ALTER STREAMLIT streamlit_apps.public_dashboards.sales_dashboard SUSPEND;

-- Perform maintenance tasks
-- Update data, refresh views, etc.

-- Re-enable app
ALTER STREAMLIT streamlit_apps.public_dashboards.sales_dashboard RESUME;

Backup and Disaster Recovery

-- Backup Streamlit app code and configuration
CREATE OR REPLACE PROCEDURE backup_streamlit_app()
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
BEGIN
    -- Export Streamlit configuration
    -- Copy to backup location
    -- Document current state
    RETURN 'Backup completed';
END;
$$;

Cost Management

-- Monitor daily costs
SELECT
    warehouse_name,
    DATE(start_time) as usage_date,
    SUM(credits_used) as daily_credits,
    SUM(credits_used) * 2.0 as estimated_cost_usd  -- Adjust rate
FROM snowflake.account_usage.warehouse_metering_history
WHERE warehouse_name = 'STREAMLIT_WH'
    AND start_time >= DATEADD('day', -30, CURRENT_TIMESTAMP())
GROUP BY warehouse_name, DATE(start_time)
ORDER BY usage_date DESC;

Troubleshooting Guide

Common Issues

Issue 1: Cannot Access App from VPC

# Diagnostic steps
# 1. Verify VPC endpoint is active
aws ec2 describe-vpc-endpoints --vpc-endpoint-ids vpce-xxxxxxxxx

# 2. Check security group rules
aws ec2 describe-security-groups --group-ids sg-xxxxxxxxx

# 3. Test DNS resolution
nslookup myaccount.privatelink.snowflakecomputing.com

# 4. Test HTTPS connectivity
curl -v https://myaccount.privatelink.snowflakecomputing.com

Issue 2: Network Policy Blocking Access

-- Verify current policy
SHOW NETWORK POLICIES;
DESC NETWORK POLICY streamlit_privatelink_only;

-- Check user's network policy
SHOW PARAMETERS LIKE 'NETWORK_POLICY' FOR USER streamlit_app_user;

-- Temporarily disable for testing (DO NOT DO IN PRODUCTION)
-- ALTER USER streamlit_app_user UNSET NETWORK_POLICY;

Issue 3: High Query Costs

-- Identify expensive queries
SELECT
    query_id,
    query_text,
    total_elapsed_time,
    bytes_scanned,
    rows_produced,
    credits_used_cloud_services
FROM snowflake.account_usage.query_history
WHERE user_name = 'STREAMLIT_APP_USER'
    AND start_time >= DATEADD('day', -1, CURRENT_TIMESTAMP())
ORDER BY credits_used_cloud_services DESC
LIMIT 10;

-- Optimize expensive queries by:
-- 1. Adding filters to views
-- 2. Pre-aggregating data
-- 3. Using result caching

Conclusion

Hosting an unauthenticated Streamlit app in Snowflake via PrivateLink requires careful implementation of defense-in-depth principles. The key success factors are:

  1. Network Isolation: PrivateLink ensures traffic never touches public internet
  2. Layered Security: Multiple compensating controls prevent abuse
  3. Data Sanitization: Only expose aggregated, non-sensitive data
  4. Monitoring: Comprehensive logging and alerting detect anomalies
  5. Resource Controls: Rate limiting and quotas prevent runaway costs

This architecture is suitable for:

  • Internal corporate dashboards
  • Public-facing analytics with non-sensitive data
  • Monitoring and observability displays
  • Status pages and metrics dashboards

Final Security Reminder: Despite these controls, unauthenticated access should only be used when:

  • Data is truly public or aggregated
  • Business requirements explicitly prohibit authentication
  • All compensating controls are implemented and maintained
  • Regular security reviews are conducted

For applications with sensitive data or higher security requirements, always implement proper authentication and authorization, even with PrivateLink.

References