Recommendation Engine Architecture¶
This page explains the internal architecture of the recommend_encoding() and EncodingDecisionTree.decide() systems — how they score, rank, and select encodings for your problem.
Looking for a quick recommendation?
If you just want to pick an encoding, see Which Encoding? or the Decision Flowchart. This page is for understanding how the engine works under the hood.
System Overview¶
flowchart LR
subgraph INPUT["User Input"]
direction TB
I1["n_features"]
I2["n_samples"]
I3["task"]
I4["hardware"]
I5["priority"]
I6["data_type"]
I7["symmetry"]
I8["trainable"]
I9["problem_structure"]
I10["feature_interactions"]
end
subgraph KB["Knowledge Base"]
direction TB
K1["ENCODING_RULES<br/>(16 encodings)"]
K2["best_for tags"]
K3["avoid_when tags"]
K4["hard constraints"]
K5["metadata"]
end
subgraph OUTPUT["Output"]
direction TB
O1["Recommendation"]
O2["encoding_name"]
O3["explanation"]
O4["alternatives\[\]"]
O5["confidence"]
end
INPUT --> KB --> OUTPUT
Full Pipeline¶
The recommendation is produced in two phases, preceded by input validation.
flowchart TD
START([User calls recommend_encoding]) --> INPUT[/Collect all 10 input parameters/]
INPUT --> VALIDATE["Validate all inputs<br/>(ValueError on bad values)"]
VALIDATE --> PHASE_A["<b>PHASE A: Hard Filter</b><br/>Structural elimination"]
PHASE_A --> LOOP{For each of<br/>16 encodings}
LOOP --> HC1{data_type in<br/>requires_data_type?}
HC1 -- "Fails" --> ELIMINATE[Eliminate encoding]
HC1 -- "Passes / N/A" --> HC2{n_features ==<br/>requires_n_features?}
HC2 -- "Fails" --> ELIMINATE
HC2 -- "Passes / N/A" --> HC3{n_features even?<br/>if requires_even}
HC3 -- "Fails" --> ELIMINATE
HC3 -- "Passes / N/A" --> HC4{n_features <=<br/>max_features?}
HC4 -- "Fails" --> ELIMINATE
HC4 -- "Passes / N/A" --> HC5{symmetry ==<br/>requires_symmetry?}
HC5 -- "Fails" --> ELIMINATE
HC5 -- "Passes / N/A" --> HC6{trainable == True?<br/>if requires_trainable}
HC6 -- "Fails" --> ELIMINATE
HC6 -- "Passes" --> SURVIVE[Add to candidate pool]
ELIMINATE --> NEXT_ENC{More encodings?}
SURVIVE --> NEXT_ENC
NEXT_ENC -- "Yes" --> LOOP
NEXT_ENC -- "No" --> EMPTY_CHECK{Candidate pool<br/>empty?}
EMPTY_CHECK -- "Yes" --> FALLBACK["Return angle encoding<br/>confidence = 0.50<br/>(safe general fallback)"]
EMPTY_CHECK -- "No" --> PHASE_B["<b>PHASE B: Soft Scoring</b><br/>Score each candidate 0.0 → 1.0<br/>(9 scoring categories)"]
PHASE_B --> SCORE["Compute weighted score<br/>(see scoring breakdown)"]
SCORE --> RANK["Sort candidates by score<br/>descending"]
RANK --> SELECT["Best = rank #1<br/>Alternatives = ranks #2–#4"]
SELECT --> CONFIDENCE["Map raw score → confidence<br/>(0.50 – 0.95)"]
CONFIDENCE --> EXPLAIN["Generate explanation<br/>from template"]
EXPLAIN --> OUTPUT([Return Recommendation])
Phase A: Hard Constraint Filter¶
Each constraint is a binary gate. If any single gate fails, the encoding is eliminated regardless of everything else.
flowchart LR
subgraph "Hard Constraint Chain (ALL must pass)"
direction LR
A["1. Data Type<br/>─────────<br/>basis requires<br/>binary/discrete"] --> B["2. Exact Features<br/>─────────<br/>SO2 requires<br/>exactly 2"]
B --> C["3. Even Features<br/>─────────<br/>swap_equivariant<br/>requires even"]
C --> D["4. Max Features<br/>─────────<br/>IQP caps at 12<br/>HOAngle caps at 10"]
D --> E["5. Symmetry<br/>─────────<br/>cyclic_eq requires<br/>symmetry='cyclic'"]
E --> F["6. Trainable<br/>─────────<br/>trainable enc<br/>requires opt-in"]
end
F -- "All pass" --> PASS([CANDIDATE])
A -- "Any fail" --> FAIL([ELIMINATED])
B -- "Any fail" --> FAIL
C -- "Any fail" --> FAIL
D -- "Any fail" --> FAIL
E -- "Any fail" --> FAIL
Hard Constraints by Encoding¶
| Encoding | requires_data_type | requires_n_features | requires_even | max_features | requires_symmetry | requires_trainable |
|---|---|---|---|---|---|---|
| angle | -- | -- | -- | -- | -- | -- |
| basis | binary, discrete | -- | -- | -- | -- | -- |
| higher_order_angle | -- | -- | -- | 10 | -- | -- |
| iqp | -- | -- | -- | 12 | -- | -- |
| zz_feature_map | -- | -- | -- | 12 | -- | -- |
| pauli_feature_map | -- | -- | -- | 12 | -- | -- |
| data_reuploading | -- | -- | -- | 8 | -- | -- |
| hardware_efficient | -- | -- | -- | -- | -- | -- |
| amplitude | -- | -- | -- | -- | -- | -- |
| qaoa | -- | -- | -- | -- | -- | -- |
| hamiltonian | -- | -- | -- | -- | -- | -- |
| trainable | -- | -- | -- | -- | -- | YES |
| symmetry_inspired | -- | -- | -- | -- | general | -- |
| so2_equivariant | -- | 2 | -- | 2 | rotation | -- |
| cyclic_equivariant | -- | -- | -- | -- | cyclic | -- |
| swap_equivariant | -- | -- | YES | -- | permutation_pairs | -- |
Note
"--" means no constraint (always passes).
Phase B: Soft Scoring¶
After the hard filter, each surviving encoding gets a score built from 9 scoring categories applied in sequence. Weights are extracted as named constants (_W_*) for tuning.
flowchart TD
subgraph SCORE["Score Computation (per candidate)"]
direction TB
START_SCORE["score = 0.0"] --> S1
subgraph S1["Step 1 — Hard-Precondition Bonuses (dominant)"]
direction TB
DT["+0.50 _W_DATA_TYPE_BONUS<br/>encoding declares a required data_type<br/>AND user's data_type is in that list"]
SYM["+0.45 _W_SYMMETRY_BONUS<br/>encoding requires a symmetry type<br/>AND user specified that exact type"]
NF["+0.10 _W_N_FEATURES_BONUS<br/>encoding requires exact n_features<br/>AND user's n_features matches"]
TR["+0.40 _W_TRAINABLE_BONUS<br/>encoding requires trainable<br/>AND user set trainable=True"]
end
S1 --> S2
subgraph S2["Step 2 — Binary/Discrete Penalty"]
direction TB
PEN1["-0.20 _W_BINARY_PENALTY<br/>user has binary/discrete data BUT<br/>encoding has no data_type requirement"]
end
S2 --> S3
subgraph S3["Step 3 — Priority Matching"]
direction TB
PR["+0.10 per tag _W_PRIORITY_PER_TAG (max 2 tags)<br/>user's priority mapped to tags via _PRIORITY_TAG_MAP<br/>each tag found in best_for adds +0.10"]
end
S3 --> S4
subgraph S4["Step 4 — Problem Structure"]
direction TB
PS["+0.36 _W_STRUCTURE_BONUS<br/>user's problem_structure mapped to tags<br/>any tag found in best_for"]
end
S4 --> S5
subgraph S5["Step 5 — Feature Interactions"]
direction TB
FI["+0.35 _W_INTERACTION_BONUS<br/>polynomial → 'polynomial_features' in best_for<br/>custom_pauli → 'custom_pauli' in best_for"]
end
S5 --> S6
subgraph S6["Step 6 — Task Matching"]
direction TB
TM["+0.04 _W_TASK_BONUS<br/>classification → 'kernel_methods' in best_for<br/>regression → 'universal_approximation' in best_for<br/><i>Only when problem_structure AND<br/>feature_interactions are both None</i>"]
end
S6 --> S7
subgraph S7["Step 7 — Hardware Suitability"]
direction TB
HW1["+0.10 _W_HARDWARE_NISQ_BONUS<br/>hardware != simulator AND<br/>encoding has nisq/native/noise tags in best_for"]
HW2["-0.15 _W_HARDWARE_DEEP_PENALTY<br/>hardware != simulator AND<br/>circuit_depth == 'deep'"]
HW3["-0.08 _W_AVOID_WHEN_PENALTY<br/>hardware != simulator AND<br/>encoding has 'noisy_hardware' or 'nisq_hardware'<br/>in avoid_when tags"]
end
S7 --> S8
subgraph S8["Step 8 — Feature Count"]
direction TB
LOG["+0.15 _W_LOGARITHMIC_BONUS<br/>n_features > 8 AND<br/>qubit_scaling == 'logarithmic'"]
ACC["+0.12 _W_ACCURACY_DEFAULT_BONUS<br/>priority == 'accuracy' AND encoding is<br/>the default for this feature range"]
SMALL["+0.03 _W_SMALL_FEATURE_BONUS<br/>n_features <= 4 AND<br/>max_features >= n_features"]
end
S8 --> S9
subgraph S9["Step 9 — Sample Count"]
direction TB
TINY["+0.03 _W_SMALL_SAMPLE_BONUS<br/>n_samples < 100 AND<br/>simulable == True"]
end
S9 --> CLAMP["Clamp to 0.0 – 1.0"]
end
Weight Hierarchy¶
Step 1 bonuses (0.10 – 0.50) ██████████████████████████ Structural match — "this encoding EXISTS for your use case"
Step 2 penalty (-0.20) ████████ Binary mismatch — "wrong data type assumption"
Step 3 bonuses (0.10 – 0.20) ██████████ Priority match — "this encoding FITS your priorities"
Step 4–5 bonuses(0.35 – 0.36) ██████████████████ Domain match — "this encoding is DESIGNED for your domain"
Step 6 bonus (0.04) ██ Task match — "small nudge for task-relevant encodings"
Step 7 mixed (-0.15 – +0.10)████████ Hardware — "NISQ bonus / deep penalty / avoid_when penalty"
Step 8 bonuses (0.03 – 0.15) ██████ Feature count — "qubit efficiency tiebreakers"
Step 9 bonus (0.03) ██ Sample count — "prefer cheap for tiny datasets"
Key Insight
A single Step 1 match (+0.40 to +0.50) will always outrank all Step 3--9 bonuses combined. This ensures that specialized encodings reliably beat general-purpose ones when the user's constraints match.
Tag Mappings¶
The user's string parameters are expanded into best_for tags for matching.
Priority Tags¶
flowchart LR
subgraph "User Priority → Matched Tags"
SPEED["priority = 'speed'"] --> SPEED_TAGS["speed, simplicity"]
NOISE["priority = 'noise_resilience'"] --> NOISE_TAGS["nisq_hardware, native_gates,<br/>noise_resilience"]
TRAIN["priority = 'trainability'"] --> TRAIN_TAGS["trainability, task_specific,<br/>optimization"]
ACC["priority = 'accuracy'"] --> ACC_TAGS["expressibility, quantum_advantage,<br/>universal_approximation,<br/>kernel_methods"]
end
Problem Structure Tags¶
flowchart LR
subgraph "Problem Structure → Matched Tags"
COMB["'combinatorial'"] --> COMB_TAGS["combinatorial,<br/>graph_optimization,<br/>qaoa_structure"]
PHYS["'physics_simulation'"] --> PHYS_TAGS["physics_simulation,<br/>time_evolution"]
TS["'time_series'"] --> TS_TAGS["periodic_data,<br/>cyclic_symmetry,<br/>time_series"]
end
Task Tags¶
Applied in Step 6, only when no domain-specific parameter (problem_structure, feature_interactions) is active.
flowchart LR
subgraph "Task Type → Matched Tags (via _TASK_TAG_MAP)"
CLS["task = 'classification'"] --> CLS_TAGS["kernel_methods"]
REG["task = 'regression'"] --> REG_TAGS["universal_approximation"]
end
Warning
Task matching is suppressed when problem_structure or feature_interactions is specified, to prevent interference with higher-priority domain signals.
Confidence Mapping¶
Raw scores are mapped to a human-interpretable confidence via a continuous piecewise-linear function with three bands:
Raw Score Confidence Band
───────── ────────── ────
>= 0.50 ───→ 0.85 – 0.95 HIGH "Strong structural match"
0.30–0.49 ───→ 0.65 – 0.85 MEDIUM "Good priority/structure match"
< 0.30 ───→ 0.50 – 0.65 LOWER "Weak match / fallback"
Fallback ───→ 0.50 MINIMAL "No candidates survived hard filter"
(= _score_to_confidence(0.0))
The function is continuous at both band boundaries: score=0.30 → 0.65, score=0.50 → 0.85.
flowchart LR
SCORE["Raw Score"] --> CHECK1{">= 0.50?"}
CHECK1 -- "Yes" --> HIGH["0.85 + (score - 0.50) * 0.20<br/>capped at 0.95"]
CHECK1 -- "No" --> CHECK2{">= 0.30?"}
CHECK2 -- "Yes" --> MED["0.65 + (score - 0.30) * 1.00"]
CHECK2 -- "No" --> LOW["0.50 + score * 0.50"]
Decision Tree (Alternative Path)¶
EncodingDecisionTree.decide() provides a deterministic single-answer alternative to the scored recommender. It walks a fixed priority chain -- the first matching level returns immediately:
flowchart TD
START([decide]) --> L1{"<b>Level 1</b><br/>data_type?"}
L1 -- "binary / discrete" --> BASIS([basis])
L1 -- "continuous" --> L2{"<b>Level 2</b><br/>symmetry?"}
L2 -- "rotation<br/>(n_features=2)" --> SO2([so2_equivariant])
L2 -- "cyclic" --> CYC([cyclic_equivariant])
L2 -- "permutation_pairs<br/>(even features)" --> SWAP([swap_equivariant])
L2 -- "general" --> SYMI([symmetry_inspired])
L2 -- "none" --> L3{"<b>Level 3</b><br/>trainable?"}
L3 -- "yes" --> TRAINABLE([trainable])
L3 -- "no" --> L4{"<b>Level 4</b><br/>problem_structure?"}
L4 -- "combinatorial" --> QAOA([qaoa])
L4 -- "physics_simulation" --> HAM([hamiltonian])
L4 -- "time_series" --> DRU2([data_reuploading])
L4 -- "none" --> L5{"<b>Level 5</b><br/>feature_interactions?"}
L5 -- "polynomial" --> HOA([higher_order_angle])
L5 -- "custom_pauli" --> PFM([pauli_feature_map])
L5 -- "none" --> L6{"<b>Level 6</b><br/>priority?"}
L6 -- "speed" --> ANGLE([angle])
L6 -- "noise_resilience" --> HWE([hardware_efficient])
L6 -- "trainability" --> DRU([data_reuploading])
L6 -- "accuracy" --> L7{"<b>Level 7</b><br/>n_features?"}
L7 -- "<= 4" --> IQP([iqp])
L7 -- "5 – 8" --> ZZ([zz_feature_map])
L7 -- "> 8" --> AMP([amplitude])
style BASIS fill:#4a9,stroke:#333,color:#fff
style SO2 fill:#4a9,stroke:#333,color:#fff
style CYC fill:#4a9,stroke:#333,color:#fff
style SWAP fill:#4a9,stroke:#333,color:#fff
style SYMI fill:#4a9,stroke:#333,color:#fff
style TRAINABLE fill:#4a9,stroke:#333,color:#fff
style QAOA fill:#4a9,stroke:#333,color:#fff
style HAM fill:#4a9,stroke:#333,color:#fff
style HOA fill:#4a9,stroke:#333,color:#fff
style PFM fill:#4a9,stroke:#333,color:#fff
style ANGLE fill:#4a9,stroke:#333,color:#fff
style HWE fill:#4a9,stroke:#333,color:#fff
style DRU fill:#4a9,stroke:#333,color:#fff
style DRU2 fill:#4a9,stroke:#333,color:#fff
style IQP fill:#4a9,stroke:#333,color:#fff
style ZZ fill:#4a9,stroke:#333,color:#fff
style AMP fill:#4a9,stroke:#333,color:#fff
Note
All 16 encodings are reachable. data_reuploading is reachable by two distinct paths: problem_structure="time_series" (Level 4) and priority="trainability" (Level 6).
Recommender vs Decision Tree¶
| Aspect | recommend_encoding() |
EncodingDecisionTree.decide() |
|---|---|---|
| Output | Top pick + 3 alternatives + confidence + explanation | Single encoding name |
| Method | Hard filter → soft scoring → ranking | Fixed priority-level walk |
| Nuance | Weighs multiple signals simultaneously | First match wins |
| Confidence | Returns 0.50--0.95 confidence score | No confidence (deterministic) |
| Fallback | Returns angle with confidence 0.50 |
Always returns something |
| Validation | ValueError on invalid inputs |
ValueError on invalid inputs |
| Use case | "Explore trade-offs" | "Just tell me what to use" |
| Consistency | Top pick usually agrees with decision tree | Always deterministic |
Worked Example¶
Input:
from encoding_atlas.guide import recommend_encoding
rec = recommend_encoding(
n_features=6,
priority="accuracy",
hardware="ibm",
data_type="continuous",
)
Phase A -- Hard Filter¶
angle — no constraints → SURVIVES
basis — requires binary/discrete → ELIMINATED (continuous)
higher_order_angle — max_features=10 → SURVIVES (6 <= 10)
iqp — max_features=12 → SURVIVES (6 <= 12)
zz_feature_map — max_features=12 → SURVIVES
pauli_feature_map — max_features=12 → SURVIVES
data_reuploading — max_features=8 → SURVIVES (6 <= 8)
hardware_efficient — no constraints → SURVIVES
amplitude — no constraints → SURVIVES
qaoa — no constraints → SURVIVES
hamiltonian — no constraints → SURVIVES
trainable — requires trainable=True → ELIMINATED (trainable=False)
symmetry_inspired — requires symmetry=general→ ELIMINATED (symmetry=None)
so2_equivariant — requires symmetry=rotation→ ELIMINATED
cyclic_equivariant — requires symmetry=cyclic → ELIMINATED
swap_equivariant — requires symmetry=perm. → ELIMINATED
Candidates: 11 survive
Phase B -- Scoring (top candidates)¶
Encoding Step 1 Step 2 Step 3 Step 6 Step 7 Step 8 Total
(precon) (bin) (priority) (task) (hardware) (feat. cnt)
────────── ──────── ────── ────────── ────── ──────────── ─────────── ─────
iqp +0.00 — +0.20 +0.04 -0.08 (avoid) +0.03 ≈ 0.19
(2 acc tags) (classif) (noisy_hw+nisq) (small feat)
zz_feature_map +0.00 — +0.10 +0.04 +0.00 +0.12 ≈ 0.26
(kernel) (classif) (acc default)
pauli_feature_map +0.00 — +0.10 +0.00 +0.00 +0.03 ≈ 0.13
(kernel) (small feat)
data_reuploading +0.00 — +0.10 +0.00 -0.15 (deep) +0.00 ≈ 0.00
(univ.) -0.08 (avoid)
hardware_efficient +0.00 — +0.00 +0.00 +0.10 (nisq) +0.00 ≈ 0.10
amplitude +0.00 — +0.00 +0.00 -0.15 (deep) +0.00 ≈ 0.00
-0.08 (avoid)
angle +0.00 — +0.00 +0.00 +0.00 +0.00 ≈ 0.00
hamiltonian +0.00 — +0.00 +0.00 -0.15 (deep) +0.00 ≈ 0.00
-0.08 (avoid)
Result¶
Recommendation(
encoding_name = "zz_feature_map",
explanation = "ZZ Feature Map provides standard pairwise feature interactions ...",
alternatives = ["iqp", "pauli_feature_map", "hardware_efficient"],
confidence = 0.63, # lower band — weak soft match only
)
Why the low confidence?
No Step 1 bonuses fired because the user didn't specify symmetry, data type constraints, or trainable. The system is saying: "ZZ Feature Map is my best guess, but I'm not very confident because your requirements are generic."
The avoid_when penalty (Step 7) penalises IQP on real hardware ("noisy_hardware" and "nisq_hardware" in its avoid_when tags), allowing ZZ Feature Map to overtake it despite IQP having more priority tag matches.