AutoML Guide

Published on December 25, 2025

SynapCores AutoML Guide

Build powerful machine learning models directly in SQL without writing any Python code.

Overview

SynapCores AutoML provides comprehensive options for creating machine learning experiments through SQL syntax. Train, tune, and deploy production-ready models using familiar database commands.

Task Types

Task Type Description Default Metric
regression Continuous value prediction R-squared
binary_classification Two-class classification AUC
classification/multiclass Multi-class classification Accuracy
clustering Unsupervised grouping Silhouette Score
anomaly Anomaly detection F1 Score
time_series Time series forecasting MAPE

Creating AutoML Experiments

Basic Syntax

Option 1: AS Syntax

CREATE EXPERIMENT <experiment_name> AS
<SELECT_query>
WITH (<options>)

Option 2: USING Syntax

CREATE EXPERIMENT <experiment_name>
USING (<SELECT_query>)
TARGET <target_column>
OPTIONS (<options>)

Configuration Options

General Options

Option Type Default Description
task_type string 'binary_classification' Type of ML task
target_column string Required Column to predict
max_trials integer 100 Maximum training trials
time_budget_minutes integer 60 Maximum time budget
validation_split float 0.2 Validation data proportion
cv_folds integer 5 Cross-validation folds
optimization_metric string Task-dependent Metric to optimize
ensemble boolean true Create ensemble models
early_stopping_patience integer 10 Trials without improvement
random_seed integer 42 Random seed for reproducibility

###Available Algorithms

  • 'linear_regression' - Linear Regression
  • 'logistic_regression' - Logistic Regression
  • 'decision_tree' - Decision Tree
  • 'random_forest' - Random Forest
  • 'gradient_boosting' - Gradient Boosting
  • 'xgboost' - XGBoost
  • 'neural_network' - Neural Network
  • 'knn' - K-Nearest Neighbors
  • 'naive_bayes' - Naive Bayes
  • 'svm' - Support Vector Machine

Algorithm Selection Strategies

  • 'all' - Try all available algorithms
  • 'fast' - Only fast algorithms (linear models, decision trees, naive bayes, knn)
  • 'accurate' - Only highly accurate algorithms (random forest, gradient boosting, xgboost, neural networks)
  • 'interpretable' - Only interpretable algorithms (linear regression, logistic regression, decision trees)

Algorithm-Specific Options

Random Forest

Hyperparameter Type Default Description
n_estimators integer 100 Number of trees
max_depth integer None Maximum tree depth
min_samples_split integer 2 Minimum samples to split
max_features string/float 'sqrt' Features to consider
WITH (
  task_type='classification',
  algorithms=['random_forest'],
  n_estimators=200,
  max_depth=10
)

Neural Network

Hyperparameter Type Default Description
hidden_layers array [100] Hidden layer sizes
learning_rate float 0.001 Initial learning rate
batch_size integer 32 Mini-batch size
n_epochs integer 100 Maximum epochs
activation string 'relu' Activation function
dropout_rate float 0.0 Dropout rate
WITH (
  task_type='classification',
  algorithms=['neural_network'],
  hidden_layers=[128, 64, 32],
  dropout_rate=0.2
)

Gradient Boosting / XGBoost

Hyperparameter Type Default Description
n_estimators integer 100 Number of boosting stages
learning_rate float 0.1 Learning rate
max_depth integer 3 Maximum tree depth
subsample float 1.0 Fraction of samples

Feature Engineering Options

Option Type Default Description
auto_features boolean true Auto-generate features
polynomial_degree integer 2 Polynomial feature degree
interaction_features boolean false Generate interaction features
scaling string 'standard' Feature scaling method
missing_values string 'mean' Missing value handling
categorical_encoding string 'onehot' Categorical encoding method

Scaling Methods

  • 'standard' - Standardization (zero mean, unit variance)
  • 'minmax' - Min-Max scaling to [0, 1]
  • 'robust' - Robust scaling using median and IQR
  • 'none' - No scaling

Categorical Encoding

  • 'onehot' - One-hot encoding
  • 'label' - Label encoding
  • 'target' - Target encoding
  • 'ordinal' - Ordinal encoding

Complete Examples

Customer Churn Prediction

CREATE EXPERIMENT churn_prediction AS
SELECT customer_id, age, tenure, monthly_charges, total_charges, churned
FROM customers
WITH (
  task_type='binary_classification',
  target_column='churned',
  max_trials=50,
  validation_split=0.2
);

House Price Regression

CREATE EXPERIMENT house_price_model AS
SELECT * FROM housing_data
WITH (
  task_type='regression',
  target_column='price',
  algorithms=['random_forest', 'xgboost', 'gradient_boosting'],
  max_trials=100,
  n_estimators=200
);

Fraud Detection with Feature Engineering

CREATE EXPERIMENT fraud_detection AS
SELECT * FROM transactions
WITH (
  task_type='binary_classification',
  target_column='is_fraud',
  algorithms=['xgboost', 'neural_network'],
  auto_features=true,
  polynomial_degree=2,
  interaction_features=true,
  scaling='robust',
  categorical_encoding='target',
  max_trials=150
);

Time Series Forecasting

CREATE EXPERIMENT sales_forecast AS
SELECT date, product_id, sales, promotions, holidays
FROM sales_data
WITH (
  task_type='time_series',
  target_column='sales',
  algorithms=['gradient_boosting', 'neural_network'],
  cv_folds=5
);

Interpretable Model for Compliance

CREATE EXPERIMENT loan_approval AS
SELECT * FROM loan_applications
WITH (
  task_type='binary_classification',
  target_column='approved',
  algorithms=['logistic_regression', 'decision_tree'],
  max_depth=5
);

Model Operations

Show All Experiments

SHOW MODELS;

Deploy a Model

DEPLOY MODEL best_model FROM EXPERIMENT churn_prediction
WITH (replicas=3, memory='2Gi');

Make Predictions

PREDICT churn_probability, risk_score
USING churn_model
AS SELECT customer_id, age, tenure FROM new_customers;

Describe a Model

DESCRIBE MODEL churn_model;

Drop a Model

DROP MODEL old_model;

Best Practices

  1. Parameter Tuning: Algorithm-specific options apply to all selected algorithms where compatible.

  2. Default Values: All options have sensible defaults. Only specify options that differ from defaults.

  3. Resource Limits: Experiments respect both max_trials and time_budget_minutes. Stops when either limit is reached.

  4. Reproducibility: Set random_seed for consistent results across runs.

  5. Algorithm Compatibility: The system automatically filters incompatible algorithms for each task type.


Document Version: 1.0 Last Updated: December 2025 Website: https://synapcores.com