High-performance time-series data warehouse built on DuckDB, Parquet, and MinIO.
⚠️ Alpha Release - Technical Preview Arc Core is currently in active development and evolving rapidly. While the system is stable and functional, it is not recommended for production workloads at this time. We are continuously improving performance, adding features, and refining the API. Use in development and testing environments only.
- High-Performance Ingestion: MessagePack binary protocol (recommended), InfluxDB Line Protocol (drop-in replacement), JSON
- DuckDB Query Engine: Fast analytical queries with SQL
- Distributed Storage with MinIO: S3-compatible object storage for unlimited scale and cost-effective data management (recommended). Also supports local disk, AWS S3, and GCS
- Data Import: Import data from InfluxDB, TimescaleDB, HTTP endpoints
- Query Caching: Configurable result caching for improved performance
- Production Ready: Docker deployment with health checks and monitoring
Arc achieves 1.89M records/sec with MessagePack binary protocol!
Metric | Value | Notes |
---|---|---|
Throughput | 1.89M records/sec | MessagePack binary protocol |
p50 Latency | 21ms | Median response time |
p95 Latency | 204ms | 95th percentile |
Success Rate | 99.9998% | Production-grade reliability |
vs Line Protocol | 7.9x faster | 240K → 1.89M RPS |
Tested on Apple M3 Max (14 cores), native deployment with MinIO
🎯 Optimal Configuration:
- Workers: 3x CPU cores (e.g., 14 cores = 42 workers)
- Deployment: Native mode (2.4x faster than Docker)
- Storage: MinIO native (not containerized)
- Protocol: MessagePack binary (
/write/v2/msgpack
)
Native deployment delivers 1.89M RPS vs 570K RPS in Docker (2.4x faster).
# One-command start (auto-installs MinIO, auto-detects CPU cores)
./start.sh native
# Alternative: Manual setup
python3.11 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# Start MinIO natively (auto-configured by start.sh)
brew install minio/stable/minio minio/stable/mc # macOS
# OR download from https://min.io/download for Linux
# Start Arc (auto-detects optimal worker count: 3x CPU cores)
./start.sh native
Arc API will be available at http://localhost:8000
MinIO Console at http://localhost:9001
(minioadmin/minioadmin)
# Start Arc Core with MinIO
docker-compose up -d
# Check status
docker-compose ps
# View logs
docker-compose logs -f arc-api
# Stop
docker-compose down
Note: Docker mode achieves ~570K RPS. For maximum performance (1.89M RPS), use native deployment.
Deploy Arc Core to a remote server:
# Docker deployment
./deploy.sh -h your-server.com -u ubuntu -m docker
# Native deployment
./deploy.sh -h your-server.com -u ubuntu -m native
Arc Core uses a centralized arc.conf
configuration file (TOML format). This provides:
- Clean, organized configuration structure
- Environment variable overrides for Docker/production
- Production-ready defaults
- Comments and documentation inline
Edit the arc.conf
file for all settings:
# Server Configuration
[server]
host = "0.0.0.0"
port = 8000
workers = 8 # Adjust based on load: 4=light, 8=medium, 16=high
# Authentication
[auth]
enabled = true
default_token = "" # Leave empty to auto-generate
# Query Cache
[query_cache]
enabled = true
ttl_seconds = 60
# Storage Backend (MinIO recommended)
[storage]
backend = "minio"
[storage.minio]
endpoint = "http://minio:9000"
access_key = "minioadmin"
secret_key = "minioadmin123"
bucket = "arc"
use_ssl = false
# For AWS S3
# [storage]
# backend = "s3"
# [storage.s3]
# bucket = "arc-data"
# region = "us-east-1"
# For Google Cloud Storage
# [storage]
# backend = "gcs"
# [storage.gcs]
# bucket = "arc-data"
# project_id = "my-project"
Configuration Priority (highest to lowest):
- Environment variables (e.g.,
ARC_WORKERS=16
) arc.conf
file- Built-in defaults
You can override any setting via environment variables:
# Server
ARC_HOST=0.0.0.0
ARC_PORT=8000
ARC_WORKERS=8
# Storage
STORAGE_BACKEND=minio
MINIO_ENDPOINT=minio:9000
MINIO_ACCESS_KEY=minioadmin
MINIO_SECRET_KEY=minioadmin123
MINIO_BUCKET=arc
# Cache
QUERY_CACHE_ENABLED=true
QUERY_CACHE_TTL=60
# Logging
LOG_LEVEL=INFO
Legacy Support: .env
files are still supported for backward compatibility, but arc.conf
is recommended.
After starting Arc Core, create an admin token for API access:
# Docker deployment
docker exec -it arc-api python3 -c "
from api.auth import AuthManager
auth = AuthManager(db_path='/data/historian.db')
token = auth.create_token('my-admin', description='Admin token')
print(f'Admin Token: {token}')
"
# Native deployment
cd /path/to/arc-core
source venv/bin/activate
python3 -c "
from api.auth import AuthManager
auth = AuthManager()
token = auth.create_token('my-admin', description='Admin token')
print(f'Admin Token: {token}')
"
Save this token - you'll need it for all API requests.
All endpoints require authentication via Bearer token:
# Set your token
export ARC_TOKEN="your-token-here"
curl http://localhost:8000/health
MessagePack binary protocol offers 3x faster ingestion with zero-copy PyArrow processing:
import msgpack
import requests
from datetime import datetime
# Prepare data in MessagePack format
data = {
"database": "metrics",
"table": "cpu_usage",
"records": [
{
"timestamp": int(datetime.now().timestamp() * 1e9), # nanoseconds
"host": "server01",
"cpu": 0.64,
"memory": 0.82
},
{
"timestamp": int(datetime.now().timestamp() * 1e9),
"host": "server02",
"cpu": 0.45,
"memory": 0.71
}
]
}
# Send via MessagePack
response = requests.post(
"http://localhost:8000/write/v2/msgpack",
headers={
"Authorization": f"Bearer {token}",
"Content-Type": "application/msgpack"
},
data=msgpack.packb(data)
)
print(response.json())
Batch ingestion (for high throughput):
# Send 10,000 records at once
records = [
{
"timestamp": int(datetime.now().timestamp() * 1e9),
"sensor_id": f"sensor_{i}",
"temperature": 20 + (i % 10),
"humidity": 60 + (i % 20)
}
for i in range(10000)
]
data = {
"database": "iot",
"table": "sensors",
"records": records
}
response = requests.post(
"http://localhost:8000/write/v2/msgpack",
headers={
"Authorization": f"Bearer {token}",
"Content-Type": "application/msgpack"
},
data=msgpack.packb(data)
)
For drop-in replacement of InfluxDB - compatible with Telegraf and InfluxDB clients:
# InfluxDB 1.x compatible endpoint
curl -X POST "http://localhost:8000/write/line?db=mydb" \
-H "Authorization: Bearer $ARC_TOKEN" \
-H "Content-Type: text/plain" \
--data-binary "cpu,host=server01 value=0.64 1633024800000000000"
# Multiple measurements
curl -X POST "http://localhost:8000/write/line?db=metrics" \
-H "Authorization: Bearer $ARC_TOKEN" \
-H "Content-Type: text/plain" \
--data-binary "cpu,host=server01,region=us-west value=0.64 1633024800000000000
memory,host=server01,region=us-west used=8.2,total=16.0 1633024800000000000
disk,host=server01,region=us-west used=120.5,total=500.0 1633024800000000000"
Telegraf configuration (drop-in InfluxDB replacement):
[[outputs.influxdb]]
urls = ["http://localhost:8000"]
database = "telegraf"
skip_database_creation = true
# Authentication
username = "" # Leave empty
password = "$ARC_TOKEN" # Use your Arc token as password
# Or use HTTP headers
[outputs.influxdb.headers]
Authorization = "Bearer $ARC_TOKEN"
curl -X POST http://localhost:8000/query \
-H "Authorization: Bearer $ARC_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"database": "mydb",
"query": "SELECT * FROM cpu_usage WHERE host = '\''server01'\'' ORDER BY timestamp DESC LIMIT 100"
}'
Advanced queries with DuckDB SQL:
# Aggregations
curl -X POST http://localhost:8000/query \
-H "Authorization: Bearer $ARC_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"database": "metrics",
"query": "SELECT host, AVG(cpu) as avg_cpu, MAX(memory) as max_memory FROM cpu_usage WHERE timestamp > now() - INTERVAL 1 HOUR GROUP BY host"
}'
# Time-series analysis
curl -X POST http://localhost:8000/query \
-H "Authorization: Bearer $ARC_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"database": "iot",
"query": "SELECT time_bucket(INTERVAL '\''5 minutes'\'', timestamp) as bucket, AVG(temperature) as avg_temp FROM sensors GROUP BY bucket ORDER BY bucket"
}'
┌─────────────────────────────────────────────────────────────┐
│ Client Applications │
│ (Telegraf, Python, Go, JavaScript, curl, etc.) │
└──────────────────┬──────────────────────────────────────────┘
│
│ HTTP/HTTPS
▼
┌─────────────────────────────────────────────────────────────┐
│ Arc API Layer (FastAPI) │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │
│ │ Line Protocol│ │ MessagePack │ │ Query Engine │ │
│ │ Endpoint │ │ Binary API │ │ (DuckDB) │ │
│ └──────────────┘ └──────────────┘ └──────────────────┘ │
└──────────────────┬──────────────────────────────────────────┘
│
│ Write Pipeline
▼
┌─────────────────────────────────────────────────────────────┐
│ Buffering & Processing Layer │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ ParquetBuffer (Line Protocol) │ │
│ │ - Batches records by measurement │ │
│ │ - Polars DataFrame → Parquet │ │
│ │ - Snappy compression │ │
│ └──────────────────────────────────────────────────────┘ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ ArrowParquetBuffer (MessagePack Binary) │ │
│ │ - Zero-copy PyArrow RecordBatch │ │
│ │ - Direct Parquet writes (3x faster) │ │
│ │ - Columnar from start │ │
│ └──────────────────────────────────────────────────────┘ │
└──────────────────┬──────────────────────────────────────────┘
│
│ Parquet Files
▼
┌─────────────────────────────────────────────────────────────┐
│ Storage Backend (Pluggable) │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ MinIO (Recommended - S3-compatible) │ │
│ │ ✓ Unlimited scale ✓ Distributed │ │
│ │ ✓ Cost-effective ✓ Self-hosted │ │
│ │ ✓ High availability ✓ Erasure coding │ │
│ │ ✓ Multi-tenant ✓ Object versioning │ │
│ └────────────────────────────────────────────────────────┘ │
│ │
│ Alternative backends: Local Disk, AWS S3, Google Cloud │
└─────────────────────────────────────────────────────────────┘
│
│ Query Path (Direct Parquet reads)
▼
┌─────────────────────────────────────────────────────────────┐
│ Query Engine (DuckDB) │
│ - Direct Parquet reads from object storage │
│ - Columnar execution engine │
│ - Query cache for common queries │
│ - Full SQL interface (Postgres-compatible) │
└─────────────────────────────────────────────────────────────┘
Arc Core is designed with MinIO as the primary storage backend for several key reasons:
- Unlimited Scale: Store petabytes of time-series data without hitting storage limits
- Cost-Effective: Commodity hardware or cloud storage at fraction of traditional database costs
- Distributed Architecture: Built-in replication and erasure coding for data durability
- S3 Compatibility: Works with any S3-compatible storage (AWS S3, GCS, Wasabi, etc.)
- Performance: Direct Parquet reads from object storage with DuckDB's efficient execution
- Separation of Compute & Storage: Scale storage and compute independently
- Self-Hosted Option: Run on your own infrastructure without cloud vendor lock-in
The MinIO + Parquet + DuckDB combination provides the perfect balance of cost, performance, and scalability for analytical time-series workloads.
Arc Core has been benchmarked using ClickBench - the industry-standard analytical database benchmark with 100M row dataset (14GB) and 43 analytical queries.
Hardware: AWS c6a.4xlarge (16 vCPU AMD EPYC 7R13, 32GB RAM, 500GB gp2)
- Cold Run Total: 35.18s (sum of 43 queries, first execution)
- Hot Run Average: 0.81s (average per query after caching)
- Aggregate Performance: ~2.8M rows/sec cold, ~123M rows/sec hot (across all queries)
- Storage: MinIO (S3-compatible)
- Success Rate: 43/43 queries (100%)
Hardware: Apple M3 Max (14 cores ARM, 36GB RAM)
- Cold Run Total: 23.86s (sum of 43 queries, first execution)
- Hot Run Average: 0.52s (average per query after caching)
- Aggregate Performance: ~4.2M rows/sec cold, ~192M rows/sec hot (across all queries)
- Storage: Local NVMe SSD
- Success Rate: 43/43 queries (100%)
- Columnar Storage: Parquet format with Snappy compression
- Query Engine: DuckDB with default settings (ClickBench compliant)
- Result Caching: 60s TTL for repeated queries (production mode)
- End-to-End: All timings include HTTP/JSON API overhead
Query | Time (avg) | Description |
---|---|---|
Q1 | 0.021s | Simple aggregation |
Q8 | 0.034s | String parsing |
Q27 | 0.086s | Complex grouping |
Q41 | 0.048s | URL parsing |
Q42 | 0.044s | Multi-column filter |
Query | Time (avg) | Description |
---|---|---|
Q29 | 7.97s | Heavy string operations |
Q19 | 1.69s | Multiple joins |
Q33 | 1.86s | Complex aggregations |
Benchmark Configuration:
- Dataset: 100M rows, 14GB Parquet (ClickBench hits.parquet)
- Protocol: HTTP REST API with JSON responses
- Caching: Disabled for benchmark compliance
- Tuning: None (default DuckDB settings)
See full results and methodology at ClickBench Results (Arc submission pending).
The docker-compose.yml
includes:
- arc-api: Main API server (port 8000)
- minio: S3-compatible storage (port 9000, console 9001)
- minio-init: Initializes MinIO buckets on startup
# Run with auto-reload
uvicorn api.main:app --reload --host 0.0.0.0 --port 8000
# Run tests (if available in parent repo)
pytest tests/
Health check endpoint:
curl http://localhost:8000/health
Logs:
# Docker
docker-compose logs -f arc-api
# Native (systemd)
sudo journalctl -u arc-api -f
GET /
- API informationGET /health
- Service health checkGET /ready
- Readiness probeGET /docs
- Swagger UI documentationGET /redoc
- ReDoc documentationGET /openapi.json
- OpenAPI specification
Note: All other endpoints require Bearer token authentication.
MessagePack Binary Protocol (Recommended - 3x faster):
POST /write/v2/msgpack
- Write data via MessagePackPOST /api/v2/msgpack
- Alternative endpointGET /write/v2/msgpack/stats
- Get ingestion statisticsGET /write/v2/msgpack/spec
- Get protocol specification
Line Protocol (InfluxDB compatibility):
POST /write
- InfluxDB 1.x compatible writePOST /api/v1/write
- InfluxDB 1.x API formatPOST /api/v2/write
- InfluxDB 2.x API formatPOST /api/v1/query
- InfluxDB 1.x query formatGET /write/health
- Write endpoint health checkGET /write/stats
- Write statisticsPOST /write/flush
- Force flush write buffer
POST /query
- Execute DuckDB SQL queryPOST /query/estimate
- Estimate query costPOST /query/stream
- Stream large query resultsGET /query/{measurement}
- Get measurement dataGET /query/{measurement}/csv
- Export measurement as CSVGET /measurements
- List all measurements/tables
GET /auth/verify
- Verify token validityGET /auth/tokens
- List all tokensPOST /auth/tokens
- Create new tokenGET /auth/tokens/{id}
- Get token detailsPATCH /auth/tokens/{id}
- Update tokenDELETE /auth/tokens/{id}
- Delete tokenPOST /auth/tokens/{id}/rotate
- Rotate token (generate new)
GET /health
- Service health checkGET /ready
- Readiness probeGET /metrics
- Prometheus metricsGET /metrics/timeseries/{type}
- Time-series metricsGET /metrics/endpoints
- Endpoint statisticsGET /metrics/query-pool
- Query pool statusGET /metrics/memory
- Memory profileGET /logs
- Application logs
InfluxDB Connections:
GET /connections/influx
- List InfluxDB connectionsPOST /connections/influx
- Create InfluxDB connectionPUT /connections/influx/{id}
- Update connectionDELETE /connections/{type}/{id}
- Delete connectionPOST /connections/{type}/{id}/activate
- Activate connectionPOST /connections/{type}/test
- Test connection
Storage Connections:
GET /connections/storage
- List storage backendsPOST /connections/storage
- Create storage connectionPUT /connections/storage/{id}
- Update storage connection
GET /jobs
- List all export jobsPOST /jobs
- Create new export jobPUT /jobs/{id}
- Update job configurationDELETE /jobs/{id}
- Delete jobGET /jobs/{id}/executions
- Get job execution historyPOST /jobs/{id}/run
- Run job immediatelyPOST /jobs/{id}/cancel
- Cancel running jobGET /monitoring/jobs
- Monitor job status
POST /api/http-json/connections
- Create HTTP/JSON connectionGET /api/http-json/connections
- List connectionsGET /api/http-json/connections/{id}
- Get connection detailsPUT /api/http-json/connections/{id}
- Update connectionDELETE /api/http-json/connections/{id}
- Delete connectionPOST /api/http-json/connections/{id}/test
- Test connectionPOST /api/http-json/connections/{id}/discover-schema
- Discover schemaPOST /api/http-json/export
- Export data via HTTP
GET /cache/stats
- Cache statisticsGET /cache/health
- Cache health statusPOST /cache/clear
- Clear query cache
Arc Core includes auto-generated API documentation:
- Swagger UI:
http://localhost:8000/docs
- ReDoc:
http://localhost:8000/redoc
- OpenAPI JSON:
http://localhost:8000/openapi.json
Arc Core is under active development. Current focus areas:
- Performance Optimization: Further improvements to ingestion and query performance
- API Stability: Finalizing core API contracts
- Enhanced Monitoring: Additional metrics and observability features
- Documentation: Expanded guides and tutorials
- Production Hardening: Testing and validation for production use cases
We welcome feedback and feature requests as we work toward a stable 1.0 release.
Arc Core is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).
This means:
- ✅ Free to use - Use Arc Core for any purpose
- ✅ Free to modify - Modify the source code as needed
- ✅ Free to distribute - Share your modifications with others
⚠️ Share modifications - If you modify Arc and run it as a service, you must share your changes under AGPL-3.0
AGPL-3.0 ensures that improvements to Arc benefit the entire community, even when run as a cloud service. This prevents the "SaaS loophole" where companies could take the code, improve it, and keep changes proprietary.
For organizations that require:
- Proprietary modifications without disclosure
- Commercial support and SLAs
- Enterprise features and managed services
Please contact us at: enterprise[at]basekick[dot]net
We offer dual licensing and commercial support options.
- Community Support: GitHub Issues
- Enterprise Support: enterprise[at]basekick[dot]net
- General Inquiries: support[at]basekick[dot]net
Arc Core is provided "as-is" in alpha state. While we use it extensively for development and testing, it is not yet production-ready. Features and APIs may change without notice. Always back up your data and test thoroughly in non-production environments before considering any production deployment.