Known Limitations
An honest assessment of the current DDA-X implementation, based on independent review by GPT-5.2 reasoning models.
Overview
This page documents known gaps between theory and implementation, areas requiring further work, and open research questions. Transparency about limitations is essential for research integrity.
1. Trust Equation Mismatch
Theory
The paper describes trust as predictability-based:
Where trust decreases as accumulated prediction errors increase.
Implementation
Current simulations implement a hybrid trust model instead:
| Simulation | Actual Trust Mechanism |
|---|---|
| Philosopher's Duel | Semantic alignment + \(\epsilon\) thresholds |
| Skeptic's Gauntlet | Civility gating (fairness-based) |
| Collatz Review Council | Coalition-weighted dyadic trust |
Not a Bug
The hybrid approach may actually be more realistic than pure predictability-based trust. However, the documentation should accurately reflect what is implemented.
Status
The paper.md Section 7 has been updated to clarify this hybrid nature.
2. Dual Rigidity Models in AGI Debate
Issue
In simulate_agi_debate.py, two rigidity models run in parallel:
- Multi-timescale (
agent.multi_rho): Fast/Slow/Trauma decomposition - Legacy single-scale (
agent.rho): Simple scalar update
The multi-timescale rigidity is computed and logged as telemetry, but the legacy agent.rho is what actually drives behavior.
Impact
- Multi-timescale dynamics exist but don't affect generation
- Claims about multi-timescale control are partially aspirational
- Telemetry shows multi-timescale patterns, but behavior uses single-scale
Status
Documented here. The simulation works correctly — this is an architectural choice, not a bug.
3. Uncalibrated Thresholds
Issue
The simulations calibrate some parameters dynamically:
- \(\epsilon_0\) (surprise baseline): Calibrated from early-run median
- \(s\) (sigmoid steepness): Calibrated from IQR
But other thresholds are hardcoded:
wound_cosine_threshold(typically 0.28)trauma_threshold(\(\theta_{\text{trauma}}\))- Multi-timescale weights (\(w_f, w_s, w_t\))
Impact
- Wound sensitivity varies across domains and embedding models
- Parameters tuned for one simulation may not transfer
Recommendation
Future work should extend calibration to wound and trauma thresholds, potentially using percentile-based approaches.
4. Hierarchical Identity Degeneracy
Issue
In simulate_identity_siege.py, hierarchical identity is implemented with three stiffness values:
However, the same identity_emb is used for all layers:
# Current implementation
core_emb = identity_emb
persona_emb = identity_emb # Same!
role_emb = identity_emb # Same!
Impact
- Layers differ only by \(\gamma\) magnitude
- Directional differences between Core/Persona/Role are lost
- True hierarchical identity would require separate embeddings
Status
Documented here. Would require code changes to fix, which is out of scope.
5. Measurement Validity
Concern
Prediction error is computed as:
This conflates multiple factors:
- Semantic novelty: Genuine new content
- Style shifts: Verbosity, formality changes
- Topic drift: Moving between subject areas
Impact
A highly verbose response might register as "surprising" even if semantically predictable, because embedding distance captures style as well as content.
Recommendation
Consider decomposing embeddings into content vs. tone components, or tracking cosine distance separately from norm distance.
6. Model-Dependent Behavior
Issue
For reasoning models (GPT-5.2, o1), DDA-X cannot control sampling parameters:
if "gpt-5.2" in self.model or "o1" in self.model:
# Cannot set temperature, top_p, penalties
# Must use semantic injection instead
Impact
- Rigidity → behavior binding is semantic only for these models
- The 100-point rigidity scale compensates, but effectiveness varies
- Different models may respond differently to same semantic instructions
Status
Working as designed. The semantic injection approach is the only option for reasoning models.
Open Research Questions
- Optimal weight learning: Can \(w_f, w_s, w_t\) be learned from data?
- Cross-domain calibration: How do thresholds transfer between domains?
- Embedding model sensitivity: How much do dynamics change with different embedders?
- Trust convergence: Under what conditions does hybrid trust stabilize?
- Trauma reversibility: What safe interaction patterns most effectively heal trauma?
Citing This Work
If referencing these limitations in academic work:
"The DDA-X framework, while novel in its approach to rigidity-based agent dynamics, has acknowledged limitations including hybrid trust implementations and uncalibrated wound thresholds, as documented by the authors in their Known Limitations disclosure."