Known Limitations

An honest assessment of the current DDA-X implementation, based on independent review by GPT-5.2 reasoning models.

Overview

This page documents known gaps between theory and implementation, areas requiring further work, and open research questions. Transparency about limitations is essential for research integrity.

1. Trust Equation Mismatch

Theory

The paper describes trust as predictability-based:

\[ T_{ij} = \frac{1}{1 + \sum_{\mathcal{W}} \epsilon_{ij}(t)} \]

Where trust decreases as accumulated prediction errors increase.

Implementation

Current simulations implement a hybrid trust model instead:

Simulation	Actual Trust Mechanism
Philosopher's Duel	Semantic alignment + \(\epsilon\) thresholds
Skeptic's Gauntlet	Civility gating (fairness-based)
Collatz Review Council	Coalition-weighted dyadic trust

Not a Bug

The hybrid approach may actually be more realistic than pure predictability-based trust. However, the documentation should accurately reflect what is implemented.

Status

The paper.md Section 7 has been updated to clarify this hybrid nature.

2. Dual Rigidity Models in AGI Debate

Issue

In simulate_agi_debate.py, two rigidity models run in parallel:

Multi-timescale (agent.multi_rho): Fast/Slow/Trauma decomposition
Legacy single-scale (agent.rho): Simple scalar update

The multi-timescale rigidity is computed and logged as telemetry, but the legacy agent.rho is what actually drives behavior.

Impact

Multi-timescale dynamics exist but don't affect generation
Claims about multi-timescale control are partially aspirational
Telemetry shows multi-timescale patterns, but behavior uses single-scale

Status

Documented here. The simulation works correctly — this is an architectural choice, not a bug.

3. Uncalibrated Thresholds

Issue

The simulations calibrate some parameters dynamically:

\(\epsilon_0\) (surprise baseline): Calibrated from early-run median
\(s\) (sigmoid steepness): Calibrated from IQR

But other thresholds are hardcoded:

wound_cosine_threshold (typically 0.28)
trauma_threshold (\(\theta_{\text{trauma}}\))
Multi-timescale weights (\(w_f, w_s, w_t\))

Impact

Wound sensitivity varies across domains and embedding models
Parameters tuned for one simulation may not transfer

Recommendation

Future work should extend calibration to wound and trauma thresholds, potentially using percentile-based approaches.

4. Hierarchical Identity Degeneracy

Issue

In simulate_identity_siege.py, hierarchical identity is implemented with three stiffness values:

\[ F = \gamma_c(x^*_c - x) + \gamma_p(x^*_p - x) + \gamma_r(x^*_r - x) \]

However, the same identity_emb is used for all layers:

# Current implementation
core_emb = identity_emb
persona_emb = identity_emb  # Same!
role_emb = identity_emb     # Same!

Impact

Layers differ only by \(\gamma\) magnitude
Directional differences between Core/Persona/Role are lost
True hierarchical identity would require separate embeddings

Status

Documented here. Would require code changes to fix, which is out of scope.

5. Measurement Validity

Concern

Prediction error is computed as:

\[ \epsilon_t = \|x_{\text{pred}} - e(a_t)\| \]

This conflates multiple factors:

Semantic novelty: Genuine new content
Style shifts: Verbosity, formality changes
Topic drift: Moving between subject areas

Impact

A highly verbose response might register as "surprising" even if semantically predictable, because embedding distance captures style as well as content.

Recommendation

Consider decomposing embeddings into content vs. tone components, or tracking cosine distance separately from norm distance.

6. Model-Dependent Behavior

Issue

For reasoning models (GPT-5.2, o1), DDA-X cannot control sampling parameters:

if "gpt-5.2" in self.model or "o1" in self.model:
    # Cannot set temperature, top_p, penalties
    # Must use semantic injection instead

Impact

Rigidity → behavior binding is semantic only for these models
The 100-point rigidity scale compensates, but effectiveness varies
Different models may respond differently to same semantic instructions

Status

Working as designed. The semantic injection approach is the only option for reasoning models.

Open Research Questions

Optimal weight learning: Can \(w_f, w_s, w_t\) be learned from data?
Cross-domain calibration: How do thresholds transfer between domains?
Embedding model sensitivity: How much do dynamics change with different embedders?
Trust convergence: Under what conditions does hybrid trust stabilize?
Trauma reversibility: What safe interaction patterns most effectively heal trauma?

Citing This Work

If referencing these limitations in academic work:

"The DDA-X framework, while novel in its approach to rigidity-based agent dynamics, has acknowledged limitations including hybrid trust implementations and uncalibrated wound thresholds, as documented by the authors in their Known Limitations disclosure."