Retrieving Data from Okta for Reporting: Python SDK, REST API, and CLI Comparison

READER BEWARE: THE FOLLOWING WRITTEN ENTIRELY BY AI WITHOUT HUMAN EDITING.

Introduction

Okta is a powerful identity and access management (IAM) platform that stores valuable data about users, groups, applications, authentication events, and system configurations. For security teams, compliance officers, and system administrators, extracting this data for reporting, auditing, and analytics is essential. Whether you need to generate compliance reports, monitor authentication patterns, audit application access, or analyze user lifecycle events, Okta provides multiple methods to retrieve this information.

This comprehensive guide explores three primary approaches for retrieving data from Okta:

  1. Okta Python SDK - Official Python library for programmatic access
  2. Okta REST API - Direct HTTP API calls for maximum flexibility
  3. Okta CLI - Command-line interface for quick queries and automation

We’ll cover authentication methods for each approach, compare their strengths and weaknesses, and provide practical examples for common reporting scenarios.

Why Extract Data from Okta?

Common Use Cases

  • Compliance Reporting: Generate reports for SOC 2, ISO 27001, HIPAA, or other compliance frameworks
  • Security Auditing: Track authentication events, failed login attempts, and suspicious activities
  • User Lifecycle Management: Monitor user provisioning, deprovisioning, and status changes
  • Application Access Reviews: Audit who has access to which applications
  • System Configuration Audits: Document Okta policies, rules, and network zones
  • Analytics and Insights: Analyze authentication patterns, adoption rates, and user behavior
  • Incident Response: Investigate security incidents and extract forensic data
  • Access Certification: Periodic review of user access rights

Method 1: Okta Python SDK

The official Okta Python SDK provides a high-level, Pythonic interface for interacting with Okta’s API. It handles authentication, pagination, rate limiting, and provides type-safe models for Okta resources.

Installation

pip install okta

Authentication Setup

The Python SDK supports multiple authentication methods:

First, create an API token in your Okta admin console:

  1. Navigate to SecurityAPITokens
  2. Click Create Token
  3. Give it a descriptive name (e.g., “Reporting Script”)
  4. Copy the token immediately (it won’t be shown again)

Store the token securely:

# config.py - Never commit this file!
OKTA_ORG_URL = "https://your-domain.okta.com"
OKTA_API_TOKEN = "your-api-token-here"

Configure the SDK client:

from okta.client import Client as OktaClient

config = {
    'orgUrl': 'https://your-domain.okta.com',
    'token': 'your-api-token-here'
}

client = OktaClient(config)

For production applications, OAuth 2.0 with private key JWT is more secure:

  1. Create an OAuth 2.0 application in Okta
  2. Generate a public/private key pair
  3. Upload the public key to Okta
  4. Configure the SDK with private key authentication
import asyncio
from okta.client import Client as OktaClient

config = {
    'orgUrl': 'https://your-domain.okta.com',
    'authorizationMode': 'PrivateKey',
    'clientId': 'your-client-id',
    'scopes': ['okta.users.read', 'okta.groups.read', 'okta.apps.read', 'okta.logs.read'],
    'privateKey': 'path/to/private-key.pem'
}

client = OktaClient(config)

Basic Data Retrieval Examples

Retrieving All Users

import asyncio
from okta.client import Client as OktaClient

async def get_all_users():
    config = {
        'orgUrl': 'https://your-domain.okta.com',
        'token': 'your-api-token-here'
    }
    
    client = OktaClient(config)
    
    users, resp, err = await client.list_users()
    
    all_users = []
    while True:
        for user in users:
            all_users.append({
                'id': user.id,
                'email': user.profile.email,
                'firstName': user.profile.first_name,
                'lastName': user.profile.last_name,
                'status': user.status,
                'created': user.created,
                'lastLogin': user.last_login
            })
        
        if resp.has_next():
            users, err = await resp.next()
        else:
            break
    
    await client.close()
    return all_users

# Run the async function
users = asyncio.run(get_all_users())
print(f"Retrieved {len(users)} users")

Retrieving Users with Filters

async def get_filtered_users():
    config = {
        'orgUrl': 'https://your-domain.okta.com',
        'token': 'your-api-token-here'
    }
    
    client = OktaClient(config)
    
    # Get active users only
    query_params = {'filter': 'status eq "ACTIVE"'}
    users, resp, err = await client.list_users(query_params)
    
    # Get users created in the last 30 days
    from datetime import datetime, timedelta
    thirty_days_ago = (datetime.now() - timedelta(days=30)).strftime('%Y-%m-%dT%H:%M:%S.000Z')
    query_params = {'filter': f'created gt "{thirty_days_ago}"'}
    recent_users, resp, err = await client.list_users(query_params)
    
    # Search users by email domain
    query_params = {'search': 'profile.email sw "example.com"'}
    company_users, resp, err = await client.list_users(query_params)
    
    await client.close()
    return users, recent_users, company_users

asyncio.run(get_filtered_users())

Retrieving Groups and Members

async def get_groups_and_members():
    config = {
        'orgUrl': 'https://your-domain.okta.com',
        'token': 'your-api-token-here'
    }
    
    client = OktaClient(config)
    
    # Get all groups
    groups, resp, err = await client.list_groups()
    
    groups_data = []
    for group in groups:
        # Get group members
        members, resp, err = await client.list_group_users(group.id)
        
        member_list = []
        async for user in members:
            member_list.append({
                'email': user.profile.email,
                'name': f"{user.profile.first_name} {user.profile.last_name}"
            })
        
        groups_data.append({
            'id': group.id,
            'name': group.profile.name,
            'description': group.profile.description,
            'memberCount': len(member_list),
            'members': member_list
        })
    
    await client.close()
    return groups_data

groups = asyncio.run(get_groups_and_members())

Retrieving Applications and Assignments

async def get_applications_report():
    config = {
        'orgUrl': 'https://your-domain.okta.com',
        'token': 'your-api-token-here'
    }
    
    client = OktaClient(config)
    
    # Get all applications
    apps, resp, err = await client.list_applications()
    
    apps_data = []
    for app in apps:
        # Get users assigned to this application
        assignments, resp, err = await client.list_application_users(app.id)
        
        user_count = 0
        assigned_users = []
        async for assignment in assignments:
            user_count += 1
            user, resp, err = await client.get_user(assignment.id)
            assigned_users.append({
                'email': user.profile.email,
                'assignedDate': assignment.created
            })
        
        apps_data.append({
            'id': app.id,
            'name': app.label,
            'status': app.status,
            'created': app.created,
            'userCount': user_count,
            'assignedUsers': assigned_users
        })
    
    await client.close()
    return apps_data

apps = asyncio.run(get_applications_report())

Retrieving System Logs

async def get_system_logs():
    config = {
        'orgUrl': 'https://your-domain.okta.com',
        'token': 'your-api-token-here'
    }
    
    client = OktaClient(config)
    
    # Get logs from the last 24 hours
    from datetime import datetime, timedelta
    since = (datetime.utcnow() - timedelta(days=1)).strftime('%Y-%m-%dT%H:%M:%S.000Z')
    
    query_params = {
        'since': since,
        'limit': 1000
    }
    
    logs, resp, err = await client.get_logs(query_params)
    
    log_entries = []
    async for log in logs:
        log_entries.append({
            'timestamp': log.published,
            'eventType': log.event_type,
            'actor': log.actor.display_name if log.actor else 'System',
            'target': log.target[0].display_name if log.target else 'N/A',
            'outcome': log.outcome.result,
            'clientIp': log.client.ip_address if log.client else 'N/A'
        })
    
    await client.close()
    return log_entries

logs = asyncio.run(get_system_logs())

Complete Reporting Example with Python SDK

import asyncio
import csv
from datetime import datetime, timedelta
from okta.client import Client as OktaClient

async def generate_user_access_report():
    """
    Generate comprehensive user access report including:
    - User details
    - Group memberships
    - Application assignments
    - Recent authentication activity
    """
    
    config = {
        'orgUrl': 'https://your-domain.okta.com',
        'token': 'your-api-token-here'
    }
    
    client = OktaClient(config)
    
    print("Fetching users...")
    users, resp, err = await client.list_users()
    
    report_data = []
    
    async for user in users:
        print(f"Processing {user.profile.email}...")
        
        # Get user's groups
        groups, resp, err = await client.list_user_groups(user.id)
        group_names = []
        async for group in groups:
            group_names.append(group.profile.name)
        
        # Get user's application assignments
        apps, resp, err = await client.list_assigned_applications_for_user(user.id)
        app_names = []
        async for app in apps:
            app_names.append(app.label)
        
        # Get recent login activity
        since = (datetime.utcnow() - timedelta(days=30)).strftime('%Y-%m-%dT%H:%M:%S.000Z')
        query_params = {
            'filter': f'actor.id eq "{user.id}" and eventType eq "user.session.start"',
            'since': since,
            'limit': 10
        }
        logs, resp, err = await client.get_logs(query_params)
        
        login_count = 0
        last_login = None
        async for log in logs:
            login_count += 1
            if not last_login:
                last_login = log.published
        
        report_data.append({
            'Email': user.profile.email,
            'First Name': user.profile.first_name,
            'Last Name': user.profile.last_name,
            'Status': user.status,
            'Created': user.created,
            'Last Login': last_login or 'Never',
            'Login Count (30d)': login_count,
            'Groups': ', '.join(group_names),
            'Applications': ', '.join(app_names),
            'Group Count': len(group_names),
            'App Count': len(app_names)
        })
    
    await client.close()
    
    # Write to CSV
    output_file = f'user_access_report_{datetime.now().strftime("%Y%m%d_%H%M%S")}.csv'
    with open(output_file, 'w', newline='') as csvfile:
        fieldnames = report_data[0].keys()
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()
        writer.writerows(report_data)
    
    print(f"\nReport generated: {output_file}")
    print(f"Total users: {len(report_data)}")
    
    return report_data

# Run the report
if __name__ == "__main__":
    asyncio.run(generate_user_access_report())

Advantages of Python SDK

Type Safety: Strongly-typed models prevent errors
Automatic Pagination: SDK handles pagination automatically
Rate Limiting: Built-in rate limit handling
Error Handling: Comprehensive exception handling
Documentation: IntelliSense and type hints in IDEs
Maintainability: High-level abstractions make code cleaner
Best Practices: Follows Python conventions and patterns

Disadvantages of Python SDK

Learning Curve: Need to learn SDK-specific APIs
Async Only: Requires understanding of asyncio
Updates Needed: SDK must be updated for new API features
Overhead: Additional layer between your code and API
Limited Flexibility: Some advanced API features may not be exposed

Method 2: Okta REST API

The Okta REST API provides direct HTTP access to all Okta functionality. This approach offers maximum flexibility and control, making it ideal for custom integrations, edge cases, and when the SDK doesn’t support a specific feature.

Authentication

The REST API supports two primary authentication methods:

API Token Authentication

import requests

OKTA_ORG_URL = "https://your-domain.okta.com"
API_TOKEN = "your-api-token-here"

headers = {
    'Accept': 'application/json',
    'Content-Type': 'application/json',
    'Authorization': f'SSWS {API_TOKEN}'
}

response = requests.get(f'{OKTA_ORG_URL}/api/v1/users', headers=headers)
users = response.json()

OAuth 2.0 Bearer Token Authentication

import requests
import jwt
import time

def get_access_token():
    """
    Get OAuth 2.0 access token using private key JWT
    """
    private_key = open('private-key.pem', 'r').read()
    
    # Create JWT
    payload = {
        'aud': f'https://your-domain.okta.com/oauth2/v1/token',
        'iss': 'your-client-id',
        'sub': 'your-client-id',
        'iat': int(time.time()),
        'exp': int(time.time()) + 3600
    }
    
    client_assertion = jwt.encode(payload, private_key, algorithm='RS256')
    
    # Request access token
    token_url = f'https://your-domain.okta.com/oauth2/v1/token'
    data = {
        'grant_type': 'client_credentials',
        'scope': 'okta.users.read okta.groups.read okta.apps.read okta.logs.read',
        'client_assertion_type': 'urn:ietf:params:oauth:client-assertion-type:jwt-bearer',
        'client_assertion': client_assertion
    }
    
    response = requests.post(token_url, data=data)
    return response.json()['access_token']

# Use the access token
access_token = get_access_token()
headers = {
    'Accept': 'application/json',
    'Content-Type': 'application/json',
    'Authorization': f'Bearer {access_token}'
}

Basic Data Retrieval Examples

Retrieving Users

import requests

OKTA_ORG_URL = "https://your-domain.okta.com"
API_TOKEN = "your-api-token-here"

headers = {
    'Accept': 'application/json',
    'Content-Type': 'application/json',
    'Authorization': f'SSWS {API_TOKEN}'
}

def get_all_users():
    """
    Retrieve all users with pagination
    """
    users = []
    url = f'{OKTA_ORG_URL}/api/v1/users'
    
    while url:
        response = requests.get(url, headers=headers)
        response.raise_for_status()
        
        users.extend(response.json())
        
        # Handle pagination via Link header
        links = response.links
        url = links['next']['url'] if 'next' in links else None
    
    return users

def get_filtered_users():
    """
    Retrieve users with filters
    """
    # Active users only
    params = {'filter': 'status eq "ACTIVE"'}
    response = requests.get(f'{OKTA_ORG_URL}/api/v1/users', headers=headers, params=params)
    active_users = response.json()
    
    # Search by email
    params = {'search': 'profile.email eq "user@example.com"'}
    response = requests.get(f'{OKTA_ORG_URL}/api/v1/users', headers=headers, params=params)
    search_results = response.json()
    
    # Users with specific attribute
    params = {'filter': 'profile.department eq "Engineering"'}
    response = requests.get(f'{OKTA_ORG_URL}/api/v1/users', headers=headers, params=params)
    dept_users = response.json()
    
    return active_users, search_results, dept_users

users = get_all_users()
print(f"Retrieved {len(users)} users")

Retrieving Groups

def get_groups_with_members():
    """
    Retrieve all groups and their members
    """
    groups = []
    url = f'{OKTA_ORG_URL}/api/v1/groups'
    
    while url:
        response = requests.get(url, headers=headers)
        response.raise_for_status()
        
        for group in response.json():
            # Get group members
            members_url = f'{OKTA_ORG_URL}/api/v1/groups/{group["id"]}/users'
            members_response = requests.get(members_url, headers=headers)
            members = members_response.json()
            
            groups.append({
                'id': group['id'],
                'name': group['profile']['name'],
                'description': group['profile'].get('description', ''),
                'memberCount': len(members),
                'members': [{'email': m['profile']['email'], 'name': f"{m['profile']['firstName']} {m['profile']['lastName']}"} for m in members]
            })
        
        links = response.links
        url = links['next']['url'] if 'next' in links else None
    
    return groups

groups = get_groups_with_members()

Retrieving Applications

def get_applications_with_assignments():
    """
    Retrieve all applications and their user assignments
    """
    apps = []
    url = f'{OKTA_ORG_URL}/api/v1/apps'
    
    while url:
        response = requests.get(url, headers=headers)
        response.raise_for_status()
        
        for app in response.json():
            # Get application users
            app_users_url = f'{OKTA_ORG_URL}/api/v1/apps/{app["id"]}/users'
            app_users_response = requests.get(app_users_url, headers=headers)
            app_users = app_users_response.json()
            
            apps.append({
                'id': app['id'],
                'name': app['label'],
                'status': app['status'],
                'created': app['created'],
                'userCount': len(app_users),
                'users': [{'id': u['id'], 'username': u.get('credentials', {}).get('userName', 'N/A')} for u in app_users]
            })
        
        links = response.links
        url = links['next']['url'] if 'next' in links else None
    
    return apps

apps = get_applications_with_assignments()

Retrieving System Logs

from datetime import datetime, timedelta
from urllib.parse import quote

def get_system_logs(hours=24, event_type=None):
    """
    Retrieve system logs with filters
    """
    since = (datetime.utcnow() - timedelta(hours=hours)).strftime('%Y-%m-%dT%H:%M:%S.000Z')
    
    params = {
        'since': since,
        'limit': 1000,
        'sortOrder': 'DESCENDING'
    }
    
    if event_type:
        params['filter'] = f'eventType eq "{event_type}"'
    
    logs = []
    url = f'{OKTA_ORG_URL}/api/v1/logs'
    
    while url:
        response = requests.get(url, headers=headers, params=params if url == f'{OKTA_ORG_URL}/api/v1/logs' else None)
        response.raise_for_status()
        
        logs.extend(response.json())
        
        links = response.links
        url = links['next']['url'] if 'next' in links else None
        params = None  # Only use params on first request
    
    return logs

# Get all authentication events
auth_logs = get_system_logs(hours=24, event_type='user.session.start')

# Get failed login attempts
failed_logins = get_system_logs(hours=24, event_type='user.session.start')
failed_logins = [log for log in failed_logins if log['outcome']['result'] == 'FAILURE']

Complete Reporting Example with REST API

import requests
import csv
import json
from datetime import datetime, timedelta
from collections import defaultdict

OKTA_ORG_URL = "https://your-domain.okta.com"
API_TOKEN = "your-api-token-here"

headers = {
    'Accept': 'application/json',
    'Content-Type': 'application/json',
    'Authorization': f'SSWS {API_TOKEN}'
}

def generate_security_audit_report():
    """
    Generate comprehensive security audit report
    """
    print("Generating security audit report...")
    
    # 1. Get failed login attempts
    print("Analyzing authentication failures...")
    since = (datetime.utcnow() - timedelta(days=7)).strftime('%Y-%m-%dT%H:%M:%S.000Z')
    params = {
        'since': since,
        'filter': 'eventType eq "user.session.start"',
        'limit': 1000
    }
    
    response = requests.get(f'{OKTA_ORG_URL}/api/v1/logs', headers=headers, params=params)
    logs = response.json()
    
    failed_logins = defaultdict(int)
    suspicious_ips = defaultdict(int)
    
    for log in logs:
        if log['outcome']['result'] == 'FAILURE':
            actor = log.get('actor', {}).get('displayName', 'Unknown')
            failed_logins[actor] += 1
            
            ip = log.get('client', {}).get('ipAddress', 'Unknown')
            suspicious_ips[ip] += 1
    
    # 2. Get users with excessive privileges
    print("Analyzing user privileges...")
    response = requests.get(f'{OKTA_ORG_URL}/api/v1/users', headers=headers)
    users = response.json()
    
    privileged_users = []
    for user in users:
        # Get user's groups
        groups_response = requests.get(f'{OKTA_ORG_URL}/api/v1/users/{user["id"]}/groups', headers=headers)
        groups = groups_response.json()
        
        # Check for admin groups
        admin_groups = [g for g in groups if 'admin' in g['profile']['name'].lower()]
        if admin_groups:
            privileged_users.append({
                'email': user['profile']['email'],
                'groups': [g['profile']['name'] for g in admin_groups]
            })
    
    # 3. Get inactive users
    print("Identifying inactive users...")
    thirty_days_ago = (datetime.utcnow() - timedelta(days=30)).strftime('%Y-%m-%dT%H:%M:%S.000Z')
    
    inactive_users = []
    for user in users:
        last_login = user.get('lastLogin')
        if not last_login or last_login < thirty_days_ago:
            inactive_users.append({
                'email': user['profile']['email'],
                'status': user['status'],
                'lastLogin': last_login or 'Never',
                'created': user['created']
            })
    
    # 4. Generate report
    report = {
        'timestamp': datetime.utcnow().isoformat(),
        'summary': {
            'totalUsers': len(users),
            'failedLoginAttempts': sum(failed_logins.values()),
            'usersWithFailedLogins': len(failed_logins),
            'suspiciousIPs': len([ip for ip, count in suspicious_ips.items() if count > 10]),
            'privilegedUsers': len(privileged_users),
            'inactiveUsers': len(inactive_users)
        },
        'failedLogins': dict(sorted(failed_logins.items(), key=lambda x: x[1], reverse=True)[:20]),
        'suspiciousIPs': dict(sorted(suspicious_ips.items(), key=lambda x: x[1], reverse=True)[:10]),
        'privilegedUsers': privileged_users,
        'inactiveUsers': inactive_users[:50]
    }
    
    # Save to JSON
    output_file = f'security_audit_{datetime.now().strftime("%Y%m%d_%H%M%S")}.json'
    with open(output_file, 'w') as f:
        json.dump(report, f, indent=2)
    
    print(f"\nSecurity audit report generated: {output_file}")
    print(f"Summary:")
    print(f"  - Total users: {report['summary']['totalUsers']}")
    print(f"  - Failed login attempts: {report['summary']['failedLoginAttempts']}")
    print(f"  - Suspicious IPs: {report['summary']['suspiciousIPs']}")
    print(f"  - Privileged users: {report['summary']['privilegedUsers']}")
    print(f"  - Inactive users: {report['summary']['inactiveUsers']}")
    
    return report

if __name__ == "__main__":
    generate_security_audit_report()

Advantages of REST API

Maximum Flexibility: Access all API features directly
No Dependencies: Only requires HTTP client (requests)
Language Agnostic: Easy to translate to other languages
Fine-Grained Control: Complete control over requests
Immediate Updates: Access new API features immediately
Debugging: Easy to test with curl or Postman
Lightweight: No SDK overhead

Disadvantages of REST API

More Code: Need to handle pagination, rate limiting manually
Error Prone: No type safety, easy to make mistakes
Boilerplate: More repetitive code for common operations
Maintenance: Breaking changes require code updates
Documentation: Need to reference API docs constantly

Method 3: Okta CLI

The Okta CLI is a command-line tool that provides quick access to Okta APIs for scripting and automation. While not as feature-rich as the SDK or API, it’s excellent for rapid queries and one-off reports.

Installation

macOS:

brew install okta/tap/okta-cli

Linux:

curl -L https://cli.okta.com/install.sh | bash

Windows:

# Using Chocolatey
choco install okta-cli

# Or using Scoop
scoop bucket add okta https://github.com/okta/scoop-okta-cli
scoop install okta-cli

Authentication Setup

Initialize the CLI with your Okta credentials:

# Interactive setup
okta login

# You'll be prompted for:
# - Okta domain (e.g., your-domain.okta.com)
# - Choose authentication method (browser or API token)

The CLI stores configuration in ~/.okta/okta.yaml.

Using API Token

# Set environment variable
export OKTA_CLIENT_TOKEN="your-api-token-here"
export OKTA_CLIENT_ORGURL="https://your-domain.okta.com"

# Or configure in profile
okta login --token your-api-token-here --url https://your-domain.okta.com

Using OAuth 2.0

# Configure OAuth app
okta apps create

# Login with OAuth
okta login --org https://your-domain.okta.com

Basic Data Retrieval Examples

Listing Users

# List all users
okta users list

# List users with filters
okta users list --filter 'status eq "ACTIVE"'

# Search users
okta users list --search 'profile.email sw "example.com"'

# Get specific user
okta users get user@example.com

# Export users to JSON
okta users list --format json > users.json

# List users in CSV format
okta users list --format csv > users.csv

Listing Groups

# List all groups
okta groups list

# Get group details
okta groups get "Engineering Team"

# List group members
okta groups list-users "Engineering Team"

# Export group membership
okta groups list-users "Engineering Team" --format json > group-members.json

Listing Applications

# List all applications
okta apps list

# Get application details
okta apps get "Salesforce"

# List users assigned to an application
okta apps list-users "Salesforce"

# Export application assignments
okta apps list-users "Salesforce" --format json > app-assignments.json

Retrieving Logs

# Get recent logs (last 24 hours by default)
okta logs get

# Get logs with date range
okta logs get --since 2025-11-01T00:00:00Z --until 2025-11-19T23:59:59Z

# Filter by event type
okta logs get --filter 'eventType eq "user.session.start"'

# Export logs to JSON
okta logs get --since 2025-11-01T00:00:00Z --format json > logs.json

# Get failed login attempts
okta logs get --filter 'eventType eq "user.session.start" and outcome.result eq "FAILURE"'

Shell Scripting Examples

User Access Report

#!/bin/bash
# user-access-report.sh - Generate user access report

OUTPUT_DIR="./reports/$(date +%Y%m%d)"
mkdir -p "$OUTPUT_DIR"

echo "Generating user access report..."

# Export all users
echo "Exporting users..."
okta users list --format json > "$OUTPUT_DIR/users.json"

# Count active vs inactive users
ACTIVE=$(okta users list --filter 'status eq "ACTIVE"' --format json | jq '. | length')
INACTIVE=$(okta users list --filter 'status ne "ACTIVE"' --format json | jq '. | length')

# Export groups
echo "Exporting groups..."
okta groups list --format json > "$OUTPUT_DIR/groups.json"

# Generate summary report
cat > "$OUTPUT_DIR/summary.txt" << EOF
User Access Report
Generated: $(date)
==================================

User Statistics:
- Active Users: $ACTIVE
- Inactive Users: $INACTIVE
- Total Users: $((ACTIVE + INACTIVE))

Groups: $(okta groups list --format json | jq '. | length')
Applications: $(okta apps list --format json | jq '. | length')
EOF

echo "Report generated in $OUTPUT_DIR"
cat "$OUTPUT_DIR/summary.txt"

Security Audit Script

#!/bin/bash
# security-audit.sh - Daily security audit

DATE=$(date +%Y%m%d)
OUTPUT_FILE="security_audit_$DATE.txt"

{
    echo "Security Audit Report"
    echo "Date: $(date)"
    echo "========================================"
    echo ""
    
    echo "Failed Login Attempts (Last 24 hours):"
    okta logs get --filter 'eventType eq "user.session.start" and outcome.result eq "FAILURE"' \
        --format json | jq -r '.[] | "\(.published) - \(.actor.displayName) - \(.client.ipAddress)"'
    echo ""
    
    echo "Users with Admin Access:"
    okta groups list-users "Administrators" --format json | jq -r '.[].profile.email'
    echo ""
    
    echo "Recently Created Users (Last 7 days):"
    SEVEN_DAYS_AGO=$(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ)
    okta users list --filter "created gt \"$SEVEN_DAYS_AGO\"" --format json | \
        jq -r '.[] | "\(.profile.email) - Created: \(.created)"'
    echo ""
    
    echo "Suspended Users:"
    okta users list --filter 'status eq "SUSPENDED"' --format json | jq -r '.[].profile.email'
    
} > "$OUTPUT_FILE"

echo "Security audit complete: $OUTPUT_FILE"

# Email the report
if [ -s "$OUTPUT_FILE" ]; then
    mail -s "Daily Security Audit - $DATE" security-team@company.com < "$OUTPUT_FILE"
fi

Application Access Review

#!/bin/bash
# app-access-review.sh - Generate application access review

APP_NAME="${1:-Salesforce}"
OUTPUT="app_access_review_$(echo $APP_NAME | tr ' ' '_')_$(date +%Y%m%d).csv"

echo "Application Access Review: $APP_NAME"
echo "Generated: $(date)"
echo ""

# Get application ID
APP_ID=$(okta apps list --format json | jq -r ".[] | select(.label == \"$APP_NAME\") | .id")

if [ -z "$APP_ID" ]; then
    echo "Application not found: $APP_NAME"
    exit 1
fi

# Get assigned users
echo "Email,First Name,Last Name,Status,Assigned Date" > "$OUTPUT"
okta apps list-users "$APP_NAME" --format json | \
    jq -r '.[] | "\(.profile.email),\(.profile.firstName),\(.profile.lastName),\(.status),\(.created)"' \
    >> "$OUTPUT"

echo "Access review exported to: $OUTPUT"
echo "Total users with access: $(tail -n +2 "$OUTPUT" | wc -l)"

Advantages of Okta CLI

Quick & Easy: Fast for one-off queries
Shell Integration: Works seamlessly with bash/zsh scripts
No Coding Required: Simple command-line interface
Multiple Formats: JSON, CSV, table output
Interactive: Browser-based authentication option
Scriptable: Easy to automate with cron

Disadvantages of Okta CLI

Limited Features: Not all API endpoints available
Less Control: Can’t customize requests as much
Performance: Slower for large-scale operations
Error Handling: Limited error handling options
Complex Logic: Difficult for complex data processing
Dependencies: Requires CLI installation and updates

Comparison Matrix

FeaturePython SDKREST APIOkta CLI
Learning CurveMediumLowLow
Setup ComplexityMediumLowLow
Type Safety✅ High❌ None❌ None
Pagination✅ Automatic⚠️ Manual✅ Automatic
Rate Limiting✅ Built-in⚠️ Manual✅ Built-in
Error Handling✅ Excellent⚠️ Basic⚠️ Basic
Performance⚡ Fast⚡ Fast⚠️ Moderate
Flexibility⚠️ Good✅ Excellent❌ Limited
API Coverage⚠️ Good✅ Complete❌ Partial
Scripting✅ Excellent✅ Excellent✅ Good
Complex Logic✅ Excellent✅ Excellent❌ Limited
Debugging✅ Good✅ Excellent⚠️ Basic
DependenciesPython + SDKPython + requestsCLI binary
Update FrequencySDK releasesAlways currentCLI releases
Best ForProduction appsCustom integrationsQuick queries

When to Use Each Method

Use Python SDK When:

  • Building production applications
  • Need type safety and IDE support
  • Want automatic pagination and rate limiting
  • Working in Python ecosystem already
  • Building long-term maintainable code
  • Need comprehensive error handling

Example scenarios:

  • Automated user provisioning system
  • Compliance reporting dashboard
  • Identity governance application
  • User lifecycle automation

Use REST API When:

  • Need access to newest API features immediately
  • Building in non-Python language
  • Require maximum flexibility and control
  • SDK doesn’t support specific endpoint
  • Building custom integration
  • Need to minimize dependencies

Example scenarios:

  • Custom webhook handlers
  • Microservices integration
  • Multi-language environments
  • Edge cases not covered by SDK

Use Okta CLI When:

  • Performing ad-hoc queries
  • Writing quick shell scripts
  • Automating simple tasks
  • Learning Okta API
  • Troubleshooting issues
  • One-off data exports

Example scenarios:

  • Daily email reports
  • Manual audits
  • Quick data exports
  • Cron job automation

Best Practices

Security

  1. Never hardcode credentials

    # ❌ Bad
    API_TOKEN = "00abc123def456..."
    
    # ✅ Good
    import os
    API_TOKEN = os.environ.get('OKTA_API_TOKEN')
    
  2. Use OAuth 2.0 for production

    • More secure than API tokens
    • Supports scoped access
    • Better audit trail
  3. Rotate credentials regularly

    • Set expiration on API tokens
    • Rotate OAuth keys quarterly
    • Monitor for compromised credentials
  4. Implement least privilege

    • Request only necessary scopes
    • Use read-only tokens when possible
    • Create separate tokens per application

Performance

  1. Implement rate limiting

    import time
    from ratelimit import limits, sleep_and_retry
    
    @sleep_and_retry
    @limits(calls=100, period=60)  # 100 calls per minute
    def call_okta_api():
        # Your API call
        pass
    
  2. Use pagination efficiently

    # Process in batches
    BATCH_SIZE = 200
    params = {'limit': BATCH_SIZE}
    
  3. Cache when appropriate

    from functools import lru_cache
    
    @lru_cache(maxsize=100)
    def get_user_groups(user_id):
        # Cached for repeated calls
        pass
    
  4. Parallelize independent requests

    import concurrent.futures
    
    with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
        futures = [executor.submit(get_user, user_id) for user_id in user_ids]
        results = [f.result() for f in concurrent.futures.as_completed(futures)]
    

Error Handling

  1. Implement retry logic

    from tenacity import retry, stop_after_attempt, wait_exponential
    
    @retry(
        stop=stop_after_attempt(3),
        wait=wait_exponential(multiplier=1, min=2, max=10)
    )
    def fetch_users():
        response = requests.get(url, headers=headers)
        response.raise_for_status()
        return response.json()
    
  2. Log errors comprehensively

    import logging
    
    logging.basicConfig(level=logging.INFO)
    logger = logging.getLogger(__name__)
    
    try:
        users = fetch_users()
    except Exception as e:
        logger.error(f"Failed to fetch users: {e}", exc_info=True)
    
  3. Handle rate limits gracefully

    if response.status_code == 429:
        retry_after = int(response.headers.get('X-Rate-Limit-Reset', 60))
        time.sleep(retry_after)
    

Data Management

  1. Export to multiple formats

    import pandas as pd
    
    df = pd.DataFrame(users)
    df.to_csv('users.csv', index=False)
    df.to_excel('users.xlsx', index=False)
    df.to_json('users.json', orient='records')
    
  2. Validate data quality

    # Check for required fields
    for user in users:
        assert 'email' in user['profile'], f"Missing email for user {user['id']}"
    
  3. Archive reports

    import gzip
    import shutil
    
    # Compress old reports
    with open('report.json', 'rb') as f_in:
        with gzip.open('report.json.gz', 'wb') as f_out:
            shutil.copyfileobj(f_in, f_out)
    

Automation Examples

Scheduled Daily Report

# daily_report.py
import asyncio
import os
from datetime import datetime
from okta.client import Client as OktaClient
import pandas as pd

async def generate_daily_report():
    config = {
        'orgUrl': os.environ['OKTA_ORG_URL'],
        'token': os.environ['OKTA_API_TOKEN']
    }
    
    client = OktaClient(config)
    
    # Get active users created today
    users, resp, err = await client.list_users({'filter': 'status eq "ACTIVE"'})
    
    today = datetime.now().date()
    new_users = []
    
    async for user in users:
        created_date = datetime.fromisoformat(user.created.replace('Z', '+00:00')).date()
        if created_date == today:
            new_users.append({
                'Email': user.profile.email,
                'Name': f"{user.profile.first_name} {user.profile.last_name}",
                'Created': user.created
            })
    
    await client.close()
    
    # Generate report
    if new_users:
        df = pd.DataFrame(new_users)
        filename = f'new_users_{today}.csv'
        df.to_csv(filename, index=False)
        print(f"Report generated: {filename}")
        
        # Send email (using your email service)
        # send_email(filename)
    else:
        print("No new users today")

if __name__ == "__main__":
    asyncio.run(generate_daily_report())

Crontab entry:

# Run daily at 6 AM
0 6 * * * cd /path/to/scripts && python daily_report.py

Compliance Report Generator

# compliance_report.py
import asyncio
import os
from datetime import datetime, timedelta
from okta.client import Client as OktaClient
import json

async def generate_compliance_report():
    """
    Generate SOC 2 compliance report covering:
    - User access reviews
    - Administrative access
    - Authentication logs
    - Configuration changes
    """
    
    config = {
        'orgUrl': os.environ['OKTA_ORG_URL'],
        'token': os.environ['OKTA_API_TOKEN']
    }
    
    client = OktaClient(config)
    
    report = {
        'generated': datetime.utcnow().isoformat(),
        'period': '30 days',
        'sections': {}
    }
    
    # 1. User Access Review
    print("Generating user access review...")
    users, resp, err = await client.list_users()
    
    user_summary = {
        'total': 0,
        'active': 0,
        'suspended': 0,
        'deprovisioned': 0
    }
    
    async for user in users:
        user_summary['total'] += 1
        user_summary[user.status.lower()] = user_summary.get(user.status.lower(), 0) + 1
    
    report['sections']['user_access'] = user_summary
    
    # 2. Administrative Access
    print("Auditing administrative access...")
    groups, resp, err = await client.list_groups()
    
    admin_users = []
    async for group in groups:
        if 'admin' in group.profile.name.lower():
            members, resp, err = await client.list_group_users(group.id)
            async for member in members:
                admin_users.append({
                    'email': member.profile.email,
                    'group': group.profile.name
                })
    
    report['sections']['administrative_access'] = {
        'count': len(admin_users),
        'users': admin_users
    }
    
    # 3. Authentication Events
    print("Analyzing authentication events...")
    since = (datetime.utcnow() - timedelta(days=30)).strftime('%Y-%m-%dT%H:%M:%S.000Z')
    query_params = {
        'filter': 'eventType eq "user.session.start"',
        'since': since,
        'limit': 1000
    }
    
    logs, resp, err = await client.get_logs(query_params)
    
    auth_summary = {
        'total_attempts': 0,
        'successful': 0,
        'failed': 0
    }
    
    async for log in logs:
        auth_summary['total_attempts'] += 1
        if log.outcome.result == 'SUCCESS':
            auth_summary['successful'] += 1
        else:
            auth_summary['failed'] += 1
    
    report['sections']['authentication'] = auth_summary
    
    await client.close()
    
    # Save report
    filename = f'compliance_report_{datetime.now().strftime("%Y%m%d")}.json'
    with open(filename, 'w') as f:
        json.dump(report, f, indent=2)
    
    print(f"\nCompliance report generated: {filename}")
    return report

if __name__ == "__main__":
    asyncio.run(generate_compliance_report())

Troubleshooting

Common Issues

Authentication Errors

# Check token validity
import requests

response = requests.get(
    'https://your-domain.okta.com/api/v1/users/me',
    headers={'Authorization': f'SSWS {API_TOKEN}'}
)

if response.status_code == 401:
    print("Token is invalid or expired")
elif response.status_code == 403:
    print("Token lacks required permissions")
else:
    print("Token is valid")

Rate Limiting

# Check rate limit headers
print(f"Rate limit: {response.headers.get('X-Rate-Limit-Limit')}")
print(f"Remaining: {response.headers.get('X-Rate-Limit-Remaining')}")
print(f"Reset: {response.headers.get('X-Rate-Limit-Reset')}")

Pagination Issues

# Ensure you're following links correctly
if 'next' in response.links:
    next_url = response.links['next']['url']
    # Continue pagination

Conclusion

Extracting data from Okta for reporting and analytics is essential for security, compliance, and operational excellence. Each method—Python SDK, REST API, and CLI—has its strengths:

  • Python SDK: Best for production applications requiring robust error handling and type safety
  • REST API: Ideal for custom integrations and maximum flexibility
  • Okta CLI: Perfect for quick queries and simple automation

Choose the method that best fits your use case, technical requirements, and team expertise. For many organizations, a combination of all three provides the best balance of capabilities:

  • Use the CLI for ad-hoc queries and troubleshooting
  • Use the Python SDK for production reporting systems
  • Use the REST API for custom integrations and edge cases

Key Takeaways

  1. Security First: Always use secure authentication methods (OAuth 2.0 in production)
  2. Handle Errors: Implement retry logic and comprehensive error handling
  3. Respect Limits: Implement rate limiting and pagination
  4. Automate: Schedule regular reports for consistency
  5. Document: Keep your scripts well-documented and maintainable
  6. Monitor: Track your reporting scripts and alert on failures

Next Steps

  • Set up authentication for your chosen method
  • Start with simple queries to understand the data
  • Build your first reporting script
  • Automate your reports
  • Share insights with your team

Resources