SynapCores AutoML Guide
Build powerful machine learning models directly in SQL without writing any Python code.
Overview
SynapCores AutoML provides comprehensive options for creating machine learning experiments through SQL syntax. Train, tune, and deploy production-ready models using familiar database commands.
Task Types
| Task Type | Description | Default Metric |
|---|---|---|
regression |
Continuous value prediction | R-squared |
binary_classification |
Two-class classification | AUC |
classification/multiclass |
Multi-class classification | Accuracy |
clustering |
Unsupervised grouping | Silhouette Score |
anomaly |
Anomaly detection | F1 Score |
time_series |
Time series forecasting | MAPE |
Creating AutoML Experiments
Basic Syntax
Option 1: AS Syntax
CREATE EXPERIMENT <experiment_name> AS
<SELECT_query>
WITH (<options>)
Option 2: USING Syntax
CREATE EXPERIMENT <experiment_name>
USING (<SELECT_query>)
TARGET <target_column>
OPTIONS (<options>)
Configuration Options
General Options
| Option | Type | Default | Description |
|---|---|---|---|
task_type |
string | 'binary_classification' |
Type of ML task |
target_column |
string | Required | Column to predict |
max_trials |
integer | 100 | Maximum training trials |
time_budget_minutes |
integer | 60 | Maximum time budget |
validation_split |
float | 0.2 | Validation data proportion |
cv_folds |
integer | 5 | Cross-validation folds |
optimization_metric |
string | Task-dependent | Metric to optimize |
ensemble |
boolean | true | Create ensemble models |
early_stopping_patience |
integer | 10 | Trials without improvement |
random_seed |
integer | 42 | Random seed for reproducibility |
###Available Algorithms
'linear_regression'- Linear Regression'logistic_regression'- Logistic Regression'decision_tree'- Decision Tree'random_forest'- Random Forest'gradient_boosting'- Gradient Boosting'xgboost'- XGBoost'neural_network'- Neural Network'knn'- K-Nearest Neighbors'naive_bayes'- Naive Bayes'svm'- Support Vector Machine
Algorithm Selection Strategies
'all'- Try all available algorithms'fast'- Only fast algorithms (linear models, decision trees, naive bayes, knn)'accurate'- Only highly accurate algorithms (random forest, gradient boosting, xgboost, neural networks)'interpretable'- Only interpretable algorithms (linear regression, logistic regression, decision trees)
Algorithm-Specific Options
Random Forest
| Hyperparameter | Type | Default | Description |
|---|---|---|---|
n_estimators |
integer | 100 | Number of trees |
max_depth |
integer | None | Maximum tree depth |
min_samples_split |
integer | 2 | Minimum samples to split |
max_features |
string/float | 'sqrt' | Features to consider |
WITH (
task_type='classification',
algorithms=['random_forest'],
n_estimators=200,
max_depth=10
)
Neural Network
| Hyperparameter | Type | Default | Description |
|---|---|---|---|
hidden_layers |
array | [100] | Hidden layer sizes |
learning_rate |
float | 0.001 | Initial learning rate |
batch_size |
integer | 32 | Mini-batch size |
n_epochs |
integer | 100 | Maximum epochs |
activation |
string | 'relu' | Activation function |
dropout_rate |
float | 0.0 | Dropout rate |
WITH (
task_type='classification',
algorithms=['neural_network'],
hidden_layers=[128, 64, 32],
dropout_rate=0.2
)
Gradient Boosting / XGBoost
| Hyperparameter | Type | Default | Description |
|---|---|---|---|
n_estimators |
integer | 100 | Number of boosting stages |
learning_rate |
float | 0.1 | Learning rate |
max_depth |
integer | 3 | Maximum tree depth |
subsample |
float | 1.0 | Fraction of samples |
Feature Engineering Options
| Option | Type | Default | Description |
|---|---|---|---|
auto_features |
boolean | true | Auto-generate features |
polynomial_degree |
integer | 2 | Polynomial feature degree |
interaction_features |
boolean | false | Generate interaction features |
scaling |
string | 'standard' | Feature scaling method |
missing_values |
string | 'mean' | Missing value handling |
categorical_encoding |
string | 'onehot' | Categorical encoding method |
Scaling Methods
'standard'- Standardization (zero mean, unit variance)'minmax'- Min-Max scaling to [0, 1]'robust'- Robust scaling using median and IQR'none'- No scaling
Categorical Encoding
'onehot'- One-hot encoding'label'- Label encoding'target'- Target encoding'ordinal'- Ordinal encoding
Complete Examples
Customer Churn Prediction
CREATE EXPERIMENT churn_prediction AS
SELECT customer_id, age, tenure, monthly_charges, total_charges, churned
FROM customers
WITH (
task_type='binary_classification',
target_column='churned',
max_trials=50,
validation_split=0.2
);
House Price Regression
CREATE EXPERIMENT house_price_model AS
SELECT * FROM housing_data
WITH (
task_type='regression',
target_column='price',
algorithms=['random_forest', 'xgboost', 'gradient_boosting'],
max_trials=100,
n_estimators=200
);
Fraud Detection with Feature Engineering
CREATE EXPERIMENT fraud_detection AS
SELECT * FROM transactions
WITH (
task_type='binary_classification',
target_column='is_fraud',
algorithms=['xgboost', 'neural_network'],
auto_features=true,
polynomial_degree=2,
interaction_features=true,
scaling='robust',
categorical_encoding='target',
max_trials=150
);
Time Series Forecasting
CREATE EXPERIMENT sales_forecast AS
SELECT date, product_id, sales, promotions, holidays
FROM sales_data
WITH (
task_type='time_series',
target_column='sales',
algorithms=['gradient_boosting', 'neural_network'],
cv_folds=5
);
Interpretable Model for Compliance
CREATE EXPERIMENT loan_approval AS
SELECT * FROM loan_applications
WITH (
task_type='binary_classification',
target_column='approved',
algorithms=['logistic_regression', 'decision_tree'],
max_depth=5
);
Model Operations
Show All Experiments
SHOW MODELS;
Deploy a Model
DEPLOY MODEL best_model FROM EXPERIMENT churn_prediction
WITH (replicas=3, memory='2Gi');
Make Predictions
PREDICT churn_probability, risk_score
USING churn_model
AS SELECT customer_id, age, tenure FROM new_customers;
Describe a Model
DESCRIBE MODEL churn_model;
Drop a Model
DROP MODEL old_model;
Best Practices
-
Parameter Tuning: Algorithm-specific options apply to all selected algorithms where compatible.
-
Default Values: All options have sensible defaults. Only specify options that differ from defaults.
-
Resource Limits: Experiments respect both
max_trialsandtime_budget_minutes. Stops when either limit is reached. -
Reproducibility: Set
random_seedfor consistent results across runs. -
Algorithm Compatibility: The system automatically filters incompatible algorithms for each task type.
Document Version: 1.0 Last Updated: December 2025 Website: https://synapcores.com