meta:
  id: finance-bp-112-v5.3
  version: v6.1
  blueprint_id: finance-bp-112
  sop_version: crystal-compilation-v6.1
  source_language: en
  compiled_at: '2026-04-22T13:00:54.441302+00:00'
  target_host: openclaw
  authoritative_artifact:
    primary: seed.yaml
    non_authoritative_derivatives:
    - SKILL.md (host-generated summary, may lag)
    - HEARTBEAT.md (host telemetry)
    - memory/*.md (host conversational memory)
    rule: On any behavioral decision (preconditions check, OV assertion, EQ rule firing, spec_lock verification), agents MUST
      re-read seed.yaml. Derivatives are for UI display only and may be out-of-date.
  execution_protocol:
    install_trigger:
    - Execute resources.host_adapter.install_recipes[] in declared order
    - Verify each package with import check before proceeding
    execute_trigger: When user intent matches intent_router.uc_entries[].positive_terms AND user uses action verb (run/execute/跑/执行/backtest/fetch/collect)
    on_execute:
    - Reload seed.yaml (do not rely on SKILL.md or cached summaries)
    - Run preconditions[] in declared order; halt on first fatal failure with on_fail message to user
    - Enter context_state_machine.CA1_MEMORY_CHECKED state
    - Evaluate evidence_quality.enforcement_rules[]; prepend user_disclosure_template
    - Translate user_facing_fields to user locale per locale_contract
    - "[V6 READING ORDER]\nThis crystal contains the following V6 layers. Before answering any business question, the host\
      \ MUST read them in order:\n  1. anti_patterns[] — cross-project anti-patterns (with AP-* ids)\n  2. cross_project_wisdom[]\
      \ — cross-project wisdom (with CW-* ids)\n  3. domain_constraints_injected[] — domain constraints (SHARED-* ids)\n \
      \ 4. known_use_cases[] — concrete business scenarios (KUC-* ids)\n  5. component_capability_map — AST component map\
      \ (by module)\n\nWhen answering user questions, proactively cite relevant AP-*/CW-*/SHARED-*/KUC-* ids with source text.\
      \ Examples: T+1 rules -> cite SHARED-* constraint; model comparison -> warn via AP-*; follow-holdings strategy -> cite\
      \ KUC-* with example file."
    workspace_resolution:
      scripts_path: '{host_workspace}/scripts/'
      skills_path: '{host_workspace}/skills/'
      trace_path: '{host_workspace}/.trace/'
  capability_tags:
    markets:
    - global
    activities:
    - credit-risk
  upgraded_from: finance-bp-112-v1.seed.yaml
  upgraded_at: '2026-04-22T13:20:30.477210+00:00'
  v6_inputs:
    ast_mind_map: knowledge/sources/finance/finance-bp-112--openLGD/v6_inputs/ast_mind_map.yaml
    anti_patterns: null
    cross_project_wisdom: null
    examples_kuc: knowledge/sources/finance/finance-bp-112--openLGD/v6_inputs/examples_kuc.yaml
    shared_pools_dir: knowledge/sources/finance/_shared
anti_patterns:
- id: AP-CREDIT-RISK-001
  title: Empty DataFrame passed to bucketing pipeline
  description: When preparing input data for bucketing, passing an empty DataFrame with zero rows or zero columns causes immediate
    ValueError at validation stage. This prevents any downstream processing and blocks the entire credit risk scoring pipeline
    from executing. The root cause is missing defensive validation before data enters the bucketing workflow.
  project_source: finance-bp-050--skorecard
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - credit-risk
  _source_file: anti-patterns/credit-risk.yaml
- id: AP-CREDIT-RISK-002
  title: Multi-dimensional target array causing WoE shape mismatch
  description: When providing target variable y to bucketers without normalizing to 1D numpy array through _check_y validation,
    downstream Weight of Evidence calculations fail with shape mismatches. The consequence is corrupted bucket tables with
    incorrect credit risk scores that misrepresent default probability estimates.
  project_source: finance-bp-050--skorecard
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - credit-risk
  _source_file: anti-patterns/credit-risk.yaml
- id: AP-CREDIT-RISK-003
  title: OptimalBucketer receiving high-cardinality numerical features
  description: When implementing prebucketing for OptimalBucketer on numerical features without reducing to at most 100 unique
    values, the system raises NotPreBucketedError and blocks the entire bucketing pipeline. Similarly, AsIsNumericalBucketer
    fails with the same error for columns exceeding 100 unique values, preventing feature transformation in production scoring.
  project_source: finance-bp-050--skorecard
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - credit-risk
  _source_file: anti-patterns/credit-risk.yaml
- id: AP-CREDIT-RISK-004
  title: Special values distorting optimal bin boundaries
  description: When implementing fit() for bucketers without filtering special values from X before computing bin boundaries
    using _filter_specials_for_fit(), outlier special values distort optimal bin boundaries. This causes incorrect weight-of-evidence
    calculations and unreliable credit risk scores that misrepresent borrower default probabilities.
  project_source: finance-bp-050--skorecard
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - credit-risk
  _source_file: anti-patterns/credit-risk.yaml
- id: AP-CREDIT-RISK-005
  title: Two-phase bucketing ordering violation causing special value loss
  description: When fitting a BucketingProcess with two-phase bucketing without fitting prebucketing_pipeline before bucketing_pipeline,
    special value remapping fails because pre-bucket labels are unavailable. Additionally, not using _find_remapped_specials()
    after prebucketing causes special values to lose their correct bucket mappings, resulting in runtime errors.
  project_source: finance-bp-050--skorecard
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - credit-risk
  _source_file: anti-patterns/credit-risk.yaml
- id: AP-CREDIT-RISK-006
  title: Loan amount exceeding product and collateral limits
  description: When validating loan amount for loan applications without enforcing loan_amount does not exceed maximum_loan_amount
    from loan product or proposed securities, disbursing amounts exceeding product or collateral limits exposes the lender
    to uncollateralized risk. This violates lending policy and creates direct financial loss exposure through unauthorized
    lending.
  project_source: finance-bp-072--lending
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - credit-risk
  _source_file: anti-patterns/credit-risk.yaml
- id: AP-CREDIT-RISK-007
  title: Disbursement validation failures creating unauthorized exposure
  description: When implementing loan disbursement validation without checking disbursed amount against loan limit, assigned
    security value, available limit amount, and limit applicability dates, unauthorized disbursements occur. For Line of Credit
    loans, disbursement outside approved periods or exceeding available limits creates unauthorized lending exposure and regulatory
    compliance violations.
  project_source: finance-bp-072--lending
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - credit-risk
  _source_file: anti-patterns/credit-risk.yaml
- id: AP-CREDIT-RISK-008
  title: Interest accrual on written-off loans inflating income
  description: When processing interest accrual for Written Off loans without verifying posting_date is on or after the loan
    write-off date, interest is artificially inflated on non-performing assets. This misrepresents loan portfolio value, violates
    provisioning requirements, and creates false income reporting that misleads stakeholders about actual financial performance.
  project_source: finance-bp-072--lending
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - credit-risk
  _source_file: anti-patterns/credit-risk.yaml
- id: AP-CREDIT-RISK-009
  title: Loop index errors in federated parameter averaging
  description: When implementing federated parameter averaging logic, using the final index n instead of the loop variable
    k causes only the last server's weight to be applied repeatedly. Additionally, skipping the first server by starting loop
    index at 1 excludes valid parameters from averaging, breaking federated convergence and producing incorrect LGD estimates
    across all nodes.
  project_source: finance-bp-112--openLGD
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - credit-risk
  _source_file: anti-patterns/credit-risk.yaml
- id: AP-CREDIT-RISK-010
  title: API response format inconsistency breaking federated coordination
  description: When implementing GET /start and POST /update endpoints for LGD estimation without consistent 'intercept' and
    'coefficient' keys in JSON responses, the federated coordinator fails to parse responses causing KeyError. Different return
    key names (e.g., 'coef' instead of 'coefficient') break both standalone and federated execution paths.
  project_source: finance-bp-112--openLGD
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - credit-risk
  _source_file: anti-patterns/credit-risk.yaml
- id: AP-CREDIT-RISK-011
  title: Invalid transition probabilities corrupting Markov matrices
  description: When generating synthetic Markov chain data or estimating transition matrices with probabilities outside [0,
    1] or row sums not equal to 1.0, the resulting matrices violate the fundamental mathematical definition of a stochastic
    transition matrix. This corrupts all downstream Markov chain modeling and credit curve generation, producing unreliable
    credit risk estimates.
  project_source: finance-bp-119--transitionMatrix
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - credit-risk
  _source_file: anti-patterns/credit-risk.yaml
- id: AP-CREDIT-RISK-012
  title: Unsorted event data causing incorrect transition matrix estimates
  description: When feeding generated data to cohort or duration estimators without sorting by entity ID first, then by ascending
    time, incorrect timepoint assignment occurs in estimators, leading to wrong transition counts. Unsorted data also causes
    the Aalen-Johansen algorithm to process events out of temporal order, producing incorrect transition matrices that violate
    the Markov property.
  project_source: finance-bp-119--transitionMatrix
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - credit-risk
  _source_file: anti-patterns/credit-risk.yaml
- id: AP-CREDIT-RISK-013
  title: Zero-count division causing NaN in transition matrices
  description: When normalizing counts to produce transition probabilities without checking source state population count
    is greater than zero before division, division by zero occurs and causes NaN values in the transition matrix. These NaN
    values corrupt all downstream matrix operations including generator matrix computation and credit curve generation.
  project_source: finance-bp-119--transitionMatrix
  severity: high
  applicable_to_tags:
    markets:
    - global
    activities:
    - credit-risk
  _source_file: anti-patterns/credit-risk.yaml
- id: AP-CREDIT-RISK-014
  title: Wrong matrix logarithm method producing invalid generator matrices
  description: When implementing generator() method without using scipy.linalg.logm for matrix logarithm computation, using
    numpy.log or other approximation methods produces invalid generator matrices with row sums not equal to zero. This violates
    the mathematical definition of an infinitesimal generator, causing incorrect continuous-time Markov chain modeling.
  project_source: finance-bp-119--transitionMatrix
  severity: medium
  applicable_to_tags:
    markets:
    - global
    activities:
    - credit-risk
  _source_file: anti-patterns/credit-risk.yaml
cross_project_wisdom:
- wisdom_id: CW-CREDIT-RISK-001
  source_project: finance-bp-050--skorecard, finance-bp-112--openLGD
  pattern_name: Strict input DataFrame schema validation
  description: Both skorecard and openLGD require strict validation that input DataFrames contain exactly the expected columns
    (X/Y for openLGD, specified variable names for skorecard). This pattern is critical when data flows through multiple transformation
    stages where downstream modules access columns by name without defensive checking. Always validate column existence before
    pipeline execution.
  applicable_to_activity: credit-risk
  _source_file: cross-project-wisdom/credit-risk.yaml
- wisdom_id: CW-CREDIT-RISK-002
  source_project: finance-bp-112--openLGD
  pattern_name: Explicit random_state for ML model reproducibility
  description: In federated learning scenarios with SGDRegressor, omitting random_state causes non-deterministic results due
    to random data shuffling and weight initialization. This breaks federated learning convergence guarantees. Always set
    random_state explicitly when reproducibility across nodes or runs is required for regulatory auditability.
  applicable_to_activity: credit-risk
  _source_file: cross-project-wisdom/credit-risk.yaml
- wisdom_id: CW-CREDIT-RISK-003
  source_project: finance-bp-050--skorecard, finance-bp-119--transitionMatrix
  pattern_name: Mandatory data sorting before multi-stage estimation
  description: Both skorecard's two-phase bucketing and transitionMatrix's Aalen-Johansen estimator require data to be in
    a specific order before processing. Skorecard requires prebucketing before bucketing; transitionMatrix requires sorting
    by entity ID then time. Violating this ordering produces incorrect results or runtime errors. Always establish and enforce
    processing order in multi-stage pipelines.
  applicable_to_activity: credit-risk
  _source_file: cross-project-wisdom/credit-risk.yaml
- wisdom_id: CW-CREDIT-RISK-004
  source_project: finance-bp-112--openLGD
  pattern_name: Consistent API response key naming across all endpoints
  description: In federated systems with multiple API endpoints (/start, /update), all responses must use identical key names
    for parameters (intercept, coefficient). Inconsistency causes coordination loop failures in downstream consumers. Define
    a schema contract upfront and enforce key naming consistency across all response types.
  applicable_to_activity: credit-risk
  _source_file: cross-project-wisdom/credit-risk.yaml
- wisdom_id: CW-CREDIT-RISK-005
  source_project: finance-bp-050--skorecard, finance-bp-119--transitionMatrix
  pattern_name: Cardinality bounds checking before array operations
  description: Both skorecard's bucketers (max 100 unique values) and transitionMatrix's matrix operations (state cardinality
    matching matrix dimensions) require strict cardinality validation before creating numpy arrays or performing computations.
    Violations cause NotPreBucketedError or index out-of-bounds errors. Always validate cardinality constraints before array
    initialization.
  applicable_to_activity: credit-risk
  _source_file: cross-project-wisdom/credit-risk.yaml
- wisdom_id: CW-CREDIT-RISK-006
  source_project: finance-bp-072--lending
  pattern_name: Financial validation gates before transaction execution
  description: Lending systems require validation that disbursement amounts do not exceed limits, collateral values, or authorized
    periods before any transaction executes. These are financial loss prevention controls, not optional business logic. Missing
    these validations creates unauthorized exposure and regulatory compliance violations that cannot be remedied retroactively.
  applicable_to_activity: credit-risk
  _source_file: cross-project-wisdom/credit-risk.yaml
- wisdom_id: CW-CREDIT-RISK-007
  source_project: finance-bp-050--skorecard, finance-bp-119--transitionMatrix
  pattern_name: Mathematical constraint validation for probability outputs
  description: 'Credit risk models must validate mathematical constraints on outputs: skorecard''s WoE requires valid bin
    assignments, transitionMatrix''s transition matrices require row sums equals 1.0 and generator matrices require row sums
    equals 0.0. Invalid mathematical properties corrupt downstream risk calculations. Validate constraints before returning
    results.'
  applicable_to_activity: credit-risk
  _source_file: cross-project-wisdom/credit-risk.yaml
- wisdom_id: CW-CREDIT-RISK-008
  source_project: finance-bp-112--openLGD
  pattern_name: Port-to-ID mapping consistency in distributed model serving
  description: When deploying distributed model servers, port numbers must map deterministically to server IDs (e.g., port
    5001 maps to server ID 1). Computation of ID from port must be consistent across all components. Inconsistencies cause
    incorrect data directory selection and model parameter mismatches. Document and validate port-ID mappings during deployment.
  applicable_to_activity: credit-risk
  _source_file: cross-project-wisdom/credit-risk.yaml
domain_constraints_injected: []
resources_injected: {}
known_use_cases:
- kuc_id: KUC-101
  source_file: docs/source/conf.py
  business_problem: This file configures the Sphinx documentation builder for the openLGD project, setting up project metadata,
    version information, and path configurations needed to generate developer documentation.
  intent_keywords:
  - documentation
  - sphinx
  - configuration
  - build docs
  - project setup
  stage: documentation
  data_domain: mixed
  type: extension_example
component_capability_map:
  project: finance-bp-112--openLGD
  scan_date: '2026-04-22'
  stats:
    total_files: 5
    total_classes: 12
    total_functions: 0
    total_stages: 5
  modules:
    data_acquisition:
      class_count: 2
      stage_id: data_acquisition
      stage_order: 1
      responsibility: Retrieves LGD regression data from either local CSV files or REST API endpoints. Supports two transport
        modes enabling development and production deployments without code changes.
      classes:
      - name: dataSource
        file: data_acquisition/datasource.py
        line: 0
        kind: required_method
        signature: ''
      - name: Data Transport Layer
        file: data_acquisition/data-transport-layer.py
        line: 0
        kind: replaceable_point
      design_decision_count: 3
    model_estimation:
      class_count: 2
      stage_id: model_estimation
      stage_order: 2
      responsibility: Executes iterative linear regression using stochastic gradient descent. Supports warm-start mode for
        federated learning where prior averaged parameters initialize local estimation.
      classes:
      - name: lgdModel
        file: model_estimation/lgdmodel.py
        line: 0
        kind: required_method
        signature: ''
      - name: Regression Algorithm
        file: model_estimation/regression-algorithm.py
        line: 0
        kind: replaceable_point
      design_decision_count: 4
    model_serving:
      class_count: 3
      stage_id: model_serving
      stage_order: 3
      responsibility: Flask-based HTTP server that exposes LGD estimation via REST endpoints. Each server instance maintains
        local data access and provides cold-start and warm-start estimation paths.
      classes:
      - name: start_calculation
        file: model_serving/start-calculation.py
        line: 0
        kind: required_method
        signature: ''
      - name: update_calculation
        file: model_serving/update-calculation.py
        line: 0
        kind: required_method
        signature: ''
      - name: Server Framework
        file: model_serving/server-framework.py
        line: 0
        kind: replaceable_point
      design_decision_count: 4
    federated_coordination:
      class_count: 3
      stage_id: federated_coordination
      stage_order: 4
      responsibility: 'Orchestrates federated learning across multiple model servers using parameter averaging. Implements
        the FedAvg algorithm: local estimation, parameter collection, weighted averaging, and broadcast to each servers.'
      classes:
      - name: federated_run
        file: federated_coordination/federated-run.py
        line: 0
        kind: required_method
        signature: ''
      - name: Aggregation Algorithm
        file: federated_coordination/aggregation-algorithm.py
        line: 0
        kind: replaceable_point
      - name: Communication Pattern
        file: federated_coordination/communication-pattern.py
        line: 0
        kind: replaceable_point
      design_decision_count: 5
    standalone_execution:
      class_count: 2
      stage_id: standalone_execution
      stage_order: 5
      responsibility: Single-process LGD estimation loop for development and testing. Validates environment setup and core
        estimation logic without federation overhead.
      classes:
      - name: standalone_run
        file: standalone_execution/standalone-run.py
        line: 0
        kind: required_method
        signature: ''
      - name: Execution Mode
        file: standalone_execution/execution-mode.py
        line: 0
        kind: replaceable_point
      design_decision_count: 2
  data_flow_hints: []
locale_contract:
  source_language: en
  user_facing_fields:
  - human_summary.what_i_can_do.tagline
  - human_summary.what_i_can_do.use_cases[]
  - human_summary.what_i_auto_fetch[]
  - human_summary.what_i_ask_you[]
  - evidence_quality.user_disclosure_template
  - post_install_notice.message_template.positioning
  - post_install_notice.message_template.capability_catalog.groups[].name
  - post_install_notice.message_template.capability_catalog.groups[].description
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].name
  - post_install_notice.message_template.capability_catalog.groups[].ucs[].short_description
  - post_install_notice.message_template.call_to_action
  - post_install_notice.message_template.featured_entries[].beginner_prompt
  - post_install_notice.message_template.more_info_hint
  - preconditions[].description
  - preconditions[].on_fail
  - intent_router.uc_entries[].name
  - intent_router.uc_entries[].ambiguity_question
  - architecture.pipeline
  - architecture.stages[].narrative.does_what
  - architecture.stages[].narrative.key_decisions
  - architecture.stages[].narrative.common_pitfalls
  - constraints.fatal[].consequence
  - constraints.regular[].consequence
  - output_validator.assertions[].failure_message
  - acceptance.hard_gates[].on_fail
  - skill_crystallization.action
  locale_detection_order:
  - explicit_user_declaration
  - first_message_language
  - system_locale
  translation_enforcement:
    trigger: on_first_user_message
    action: Render user_facing_fields in detected locale, preserving all IDs (BD-/SL-/UC-/finance-C-) and code identifiers
      verbatim
    violation_code: LOCALE-01
    violation_signal: User receives untranslated English Human Summary when detected locale != en
evidence_quality:
  declared:
    evidence_coverage_ratio: 1.0
    evidence_verify_ratio: 0.20987654320987653
    evidence_invalid: 64
    evidence_verified: 17
    evidence_auto_fixed: 0
    audit_coverage: 30/30 (100%)
    audit_pass_rate: 1/30 (3%)
    audit_fail_total: 23
    audit_finance_universal:
      pass: 1
      warn: 3
      fail: 15
    audit_subdomain_totals:
      pass: 0
      warn: 3
      fail: 8
  enforcement_rules:
  - id: EQ-01
    trigger: declared.evidence_verify_ratio < 0.5
    action: MUST invoke traceback lookup for all cited BD-IDs in output before emitting business code — read LATEST.yaml sections
      for each BD referenced
    violation_code: EQ-01-V
    violation_signal: Generated script references BD-IDs but no tool_call to read LATEST.yaml preceded code generation
  user_disclosure_template: '[QUALITY NOTICE] This crystal was compiled from blueprint finance-bp-112. Evidence verify ratio
    = 21.0% and audit fail total = 23. Generated results may have uncaptured requirement gaps. Verify critical decisions against
    source files (LATEST.yaml / LATEST.jsonl).'
traceback:
  source_files:
    blueprint: LATEST.yaml
    constraints: LATEST.jsonl
  mandatory_lookup_scenarios:
  - id: TB-01
    condition: Two constraints have apparently conflicting enforcement rules
    lookup_target: LATEST.jsonl — find both constraint IDs, compare `consequence` + `evidence_refs` to determine priority
  - id: TB-02
    condition: A business decision rationale is unclear or disputed
    lookup_target: LATEST.yaml — locate BD-ID under business_decisions, read `rationale` + `alternative_considered` fields
  - id: TB-03
    condition: evidence_invalid > 0 in evidence_quality.declared
    lookup_target: LATEST.yaml _enrich_meta — cross-check specific BD `evidence_refs` fields for invalid markers
  - id: TB-04
    condition: User asks where a rule comes from
    lookup_target: LATEST.jsonl — find constraint by ID, read `confidence.evidence_refs` for source file + line number
  - id: TB-05
    condition: Generated code does not match expected ZVT API behavior
    lookup_target: LATEST.yaml stages[].required_methods — verify method signature and evidence locator in source code
  degraded_lookup:
    no_fs_access: 'Ask the user to paste the relevant LATEST.yaml section or LATEST.jsonl lines for the BD-/finance-C- IDs
      in question. Crystal ID: finance-bp-112-v5.0.'
trace_schema:
  event_types:
  - precondition_check
  - spec_lock_check
  - evidence_rule_fired
  - evidence_rule_skipped
  - locale_translation_emitted
  - hard_gate_passed
  - hard_gate_failed
  - skill_emitted
  - false_completion_claim
preconditions:
- id: PC-01
  description: zvt package installed and importable
  check_command: python3 -c 'import zvt; print(zvt.__version__)'
  on_fail: 'Run: python3 -m pip install zvt  then re-run: python3 -m zvt.init_dirs to initialize data directories'
  severity: fatal
- id: PC-02
  description: K-data exists for target entities (required before backtesting)
  check_command: python3 -c "from zvt.api.kdata import get_kdata; df = get_kdata(entity_ids=['stock_sh_600000'], limit=1);
    assert df is not None and len(df) > 0, 'No kdata found'"
  on_fail: 'Run recorder first: python3 -m zvt.recorders.em.em_stock_kdata_recorder --entity_ids stock_sh_600000  (replace
    with your target entity IDs)'
  severity: fatal
  applies_to_uc: []
- id: PC-03
  description: ZVT data directory initialized (~/.zvt or ZVT_HOME)
  check_command: 'python3 -c "import os; from pathlib import Path; zvt_home = Path(os.environ.get(''ZVT_HOME'', Path.home()
    / ''.zvt'')); assert zvt_home.exists(), f''ZVT home not found: {zvt_home}''"'
  on_fail: 'Run: python3 -m zvt.init_dirs'
  severity: fatal
- id: PC-04
  description: SQLite write permission for ZVT data directory
  check_command: python3 -c "import os, tempfile; from pathlib import Path; zvt_home = Path(os.environ.get('ZVT_HOME', Path.home()
    / '.zvt')); test_f = zvt_home / '.write_test'; test_f.touch(); test_f.unlink()"
  on_fail: 'Check directory permissions: chmod u+w ~/.zvt  or set ZVT_HOME environment variable to a writable location'
  severity: warn
intent_router:
  uc_entries:
  - uc_id: UC-101
    name: Sphinx Documentation Configuration
    positive_terms:
    - documentation
    - sphinx
    - configuration
    - build docs
    - project setup
    data_domain: mixed
    negative_terms:
    - trading strategy
    - screening
    - data pipeline
    - monitoring
    - live trading
    - factor computation
    - machine learning
    ambiguity_question: Are you looking to configure documentation build tools, or are you trying to implement a trading strategy,
      data pipeline, or analytical workflow?
context_state_machine:
  states:
  - id: CA1_MEMORY_CHECKED
    entry: Task started
    exit: All memory queries attempted and recorded; memory_unavailable set if failed
    timeout: 30s — skip memory, mark memory_unavailable=true, proceed to CA2
  - id: CA2_GAPS_FILLED
    entry: CA1 complete
    exit: 'All FATAL-priority required inputs answered: target market (A-share/HK/US), data source, time range, strategy type'
    timeout: NOT skippable — FATAL inputs MUST be user-answered before proceeding
  - id: CA3_PATH_SELECTED
    entry: CA2 complete
    exit: intent_router matched single use case with confidence gap > 20% over next candidate, no data_domain ambiguity
    timeout: Trigger ambiguity_question for top-2 candidates, await user selection
  - id: CA4_EXECUTING
    entry: CA3 complete + user explicit confirmation received
    exit: All hard gates G1-Gn passed and output files written
    timeout: NOT skippable — user confirmation of execution path required
  enforcement: Code generation is PROHIBITED before CA4_EXECUTING. Any regression to earlier state MUST be announced to user.
    buy/sell ordering SL-01 check runs at CA4 entry.
spec_lock_registry:
  semantic_locks:
  - id: SL-01
    description: Execute sell orders before buy orders in every trading cycle
    locked_value: sell() called before buy() in each Trader.run() iteration
    violation_is: fatal
    source_bd_ids:
    - BD-018
  - id: SL-02
    description: Trading signals MUST use next-bar execution (no look-ahead)
    locked_value: due_timestamp = happen_timestamp + level.to_second()
    violation_is: fatal
    source_bd_ids:
    - BD-014
    - BD-025
  - id: SL-03
    description: Entity IDs MUST follow format entity_type_exchange_code
    locked_value: stock_sh_600000 | stockhk_hk_0700 | stockus_nasdaq_AAPL
    violation_is: fatal
    source_bd_ids: []
  - id: SL-04
    description: DataFrame index MUST be MultiIndex (entity_id, timestamp)
    locked_value: df.index.names == ['entity_id', 'timestamp']
    violation_is: fatal
    source_bd_ids: []
  - id: SL-05
    description: 'TradingSignal MUST have EXACTLY ONE of: position_pct, order_money, order_amount'
    locked_value: XOR enforcement in trading/__init__.py:68
    violation_is: fatal
    source_bd_ids: []
  - id: SL-06
    description: 'filter_result column semantics: True=BUY, False=SELL, None/NaN=NO ACTION'
    locked_value: factor.py:475 order_type_flag mapping
    violation_is: fatal
    source_bd_ids: []
  - id: SL-07
    description: Transformer MUST run BEFORE Accumulator in factor pipeline
    locked_value: 'compute_result(): transform at :403 before accumulator at :409'
    violation_is: fatal
    source_bd_ids: []
  - id: SL-08
    description: 'MACD parameters locked: fast=12, slow=26, signal=9'
    locked_value: factors/algorithm.py:30 macd(slow=26, fast=12, n=9)
    violation_is: fatal
    source_bd_ids:
    - BD-036
  - id: SL-09
    description: 'Default transaction costs: buy_cost=0.001, sell_cost=0.001, slippage=0.001'
    locked_value: sim_account.py:25 SimAccountService default costs
    violation_is: warning
    source_bd_ids:
    - BD-029
  - id: SL-10
    description: A-share equity trading is T+1 (no same-day close of buy positions)
    locked_value: sim_account.available_long filters by trading_t
    violation_is: fatal
    source_bd_ids: []
  - id: SL-11
    description: Recorder subclass MUST define provider AND data_schema class attributes
    locked_value: contract/recorder.py:71 Meta; register_schema decorator
    violation_is: fatal
    source_bd_ids: []
  - id: SL-12
    description: Factor result_df MUST contain either 'filter_result' OR 'score_result' column
    locked_value: result_df.columns.intersection({'filter_result', 'score_result'}) non-empty
    violation_is: fatal
    source_bd_ids: []
  implementation_hints:
  - id: IH-01
    hint: 'Use AdjustType enum exactly: qfq (pre-adjust), hfq (post-adjust), bfq (none) — contract/__init__.py:121'
  - id: IH-02
    hint: For A-share kdata, default to hfq for long-term analysis (dividend-adjusted) — trader.py:538 StockTrader
  - id: IH-03
    hint: SQLite connection MUST use check_same_thread=False for multi-threaded recorders
  - id: IH-04
    hint: Accumulator state serialization uses JSON with custom encoder/decoder hooks — contract/base_service.py
  - id: IH-05
    hint: Factor.level MUST match TargetSelector.level (enforced at add_factor) — factors/target_selector.py:84
preservation_manifest:
  required_objects:
    business_decisions_count: 91
    fatal_constraints_count: 31
    non_fatal_constraints_count: 99
    use_cases_count: 1
    semantic_locks_count: 12
    preconditions_count: 4
    evidence_quality_rules_count: 2
    traceback_scenarios_count: 5
architecture:
  pipeline: data_collection -> data_storage -> factor_computation -> target_selection -> trading_execution -> visualization
  stages:
  - id: data_collection
    narrative:
      does_what: TimeSeriesDataRecorder and FixedCycleDataRecorder fetch OHLCV and fundamental data from providers (eastmoney,
        joinquant, baostock, akshare) and persist domain objects (Stock1dKdata, BalanceSheet) to SQLite via df_to_db().
      key_decisions: BD-002 chose evaluate_start_end_size_timestamps for incremental fetch (not full refresh) because comparing
        to get_latest_saved_record avoids redundant API calls; BD-003 chose get_data_map field transformation to keep domain
        schema provider-agnostic.
      common_pitfalls: 'Don''t forget SL-11: Recorder subclass MUST declare both provider and data_schema class attributes
        else initialization fails with assertion error; finance-C-001 fatal violation.'
    business_decisions: []
  - id: data_storage
    narrative:
      does_what: StorageBackend persists DataFrames to per-provider SQLite databases at {data_path}/{provider}/{provider}_{db_name}.db
        using path templates from _get_path_template; Mixin.record_data and Mixin.query_data provide uniform read/write interface.
      key_decisions: BD-004 chose StorageBackend abstraction (not hardcoded SQLite) to allow future cloud storage swap; BD-006
        derives db_name from data_schema __tablename__ for per-domain database isolation.
      common_pitfalls: SL-04 violation (wrong DataFrame index) causes factor pipeline failures downstream; always ensure df.index.names
        == ['entity_id', 'timestamp'] before calling record_data.
    business_decisions: []
  - id: factor_computation
    narrative:
      does_what: Factor.compute() applies Transformer (stateless, e.g. MacdTransformer) then Accumulator (stateful, e.g. MaStatsAccumulator)
        to produce filter_result or score_result columns; EntityStateService persists per-entity rolling state across batches.
      key_decisions: BD-007 chose Factor inheriting DataReader for composable data access; SL-08 locks MACD at (fast=12, slow=26,
        n=9) — chose standard Appel parameters not adaptive because interpretability matters for practitioners.
      common_pitfalls: 'SL-07: Transformer MUST run before Accumulator — swapping order causes NaN propagation; SL-12: result_df
        must contain filter_result OR score_result column or TargetSelector silently drops all signals.'
    business_decisions: []
  - id: target_selection
    narrative:
      does_what: TargetSelector.add_factor() registers Factor instances; get_targets() returns entity_ids passing threshold
        filter at a specific timestamp, enabling point-in-time historical backtesting without look-ahead.
      key_decisions: BD-012 chose registrable factor list (not hardcoded) for runtime customization; BD-013 chose timestamp-specific
        filtering not current-only because backtests need historical point-in-time correctness.
      common_pitfalls: Factor.level MUST match TargetSelector.level (IH-05); mismatched levels cause silent empty target lists
        that look like no signals but are actually level-mismatch bugs.
    business_decisions: []
  - id: trading_execution
    narrative:
      does_what: Trader.run() calls sell() before buy() each cycle, generates TradingSignals with due_timestamp = happen_timestamp
        + level.to_second() for next-bar execution, and applies on_profit_control() for stop-loss/take-profit before regular
        target selection.
      key_decisions: SL-01 locks sell-before-buy order because available_long check in sim_account depends on it — chose this
        over symmetric ordering to prevent implicit leverage; BD-039 chose long=AND/short=OR multi-level logic to reflect
        risk asymmetry.
      common_pitfalls: 'SL-02 violation (immediate execution instead of next-bar) introduces look-ahead bias and makes backtest
        results unreproducible in live trading; SL-10: A-share T+1 constraint — backtesting without it overstates returns.'
    business_decisions: []
  - id: visualization
    narrative:
      does_what: Drawer.draw() combines kline main chart with factor overlays and Rect annotations for entry/exit signals
        using Plotly; Drawable interface on Factor enables consistent chart rendering across data types.
      key_decisions: BD-019 chose drawer_rects subclass override for custom annotations not hardcoded markers — allows traders
        to define entry/exit visuals without modifying base drawing logic.
      common_pitfalls: draw_result=True by default (BD-055) is fine for development but set draw_result=False in production/headless
        environments to avoid Plotly server startup overhead.
    business_decisions: []
  - id: cross_cutting_concerns
    narrative:
      does_what: 'Invariants and utilities that span multiple pipeline stages — collected from 27 source groups: API(3), Aggregation(1),
        Algorithm(5), Architecture(2), Configuration(4), Deployment(2), and 21 more.'
      key_decisions: 91 BDs merged here because they apply to more than one main stage (e.g. algorithm helpers, default value
        choices, ordering contracts, error handling). Agent should inspect individual BD summaries and link back to affected
        main stages via shared IDs.
      common_pitfalls: Cross-cutting concerns frequently surface as bugs when changes to one main stage unintentionally break
        another. Check constraints referencing these BDs and verify invariants still hold after any stage-local modification.
    business_decisions:
    - id: BD-035
      type: B/BA
      summary: Use GET /start endpoint to initiate cold start and retrieve initial local estimates
    - id: BD-036
      type: B
      summary: Use POST /update endpoint to receive averaged parameters and return new estimates
    - id: BD-037
      type: B
      summary: Use GET / as health check endpoint to verify server liveness
    - id: BD-024
      type: B
      summary: Use equal weights (0.25 each) for federated parameter averaging across 4 servers
    - id: BD-020
      type: B
      summary: Use SGDRegressor from scikit-learn for linear regression with stochastic gradient descent
    - id: BD-021
      type: B
      summary: Set max_iter=1 per epoch for incremental/online learning style updates
    - id: BD-022
      type: B/BA
      summary: Disable regularization (tol=None) and early stopping for pure empirical loss
    - id: BD-052
      type: B
      summary: Set warm_start=False to erase previous solution on each fit call
    - id: BD-053
      type: B
      summary: Use verbose=0 for silent training output
    - id: BD-019
      type: BA/DK
      summary: Use federated learning architecture where data stays local and only model parameters are aggregated
    - id: BD-038
      type: BA/M
      summary: Design stateless model servers where each request is computed independently
    - id: BD-041
      type: B
      summary: Use YAML configuration file for cluster parameters (hosts, epochs, servers)
    - id: BD-042
      type: B/RC
      summary: Run Flask in debug mode (debug=True) for development
    - id: BD-057
      type: B
      summary: Configure base URL as 'http://127.0.0.1:500' in config.yml for local demo
    - id: BD-058
      type: B/RC
      summary: Use ruamel.yaml for YAML parsing with safe loading
    - id: BD-039
      type: B
      summary: Use Fabric deployment tool for cluster management tasks
    - id: BD-040
      type: B/BA
      summary: Use Docker containers for openNPL data backend deployment
    - id: BD-031
      type: B/BA
      summary: Run Flask model servers on ports 5001-5004 for the federated cluster
    - id: BD-032
      type: B
      summary: Run openNPL data backend servers on ports 8001-8004 for database-backed demo
    - id: BD-033
      type: B
      summary: 'Derive server ID from port number: server_id = port - 5000'
    - id: BD-034
      type: B/BA
      summary: Configure 4 federated servers as default cluster size
    - id: BD-054
      type: B
      summary: Print server estimates and averaged parameters after each epoch
    - id: BD-047
      type: B/DK
      summary: Provide /stop endpoint for graceful server shutdown
    - id: BD-048
      type: B
      summary: Recommend Linux environment for running the federated demo
    - id: BD-049
      type: B/DK
      summary: Use virtual environment for dependency isolation
    - id: BD-050
      type: B/DK
      summary: Use XTerm windows for displaying model server output during demo
    - id: BD-045
      type: B
      summary: Run separate client/coordinator process to orchestrate federated rounds
    - id: BD-046
      type: B
      summary: Check each model server health before starting federated calculation
    - id: BD-028
      type: B/DK
      summary: 'Exchange two parameters between coordinator and servers: intercept and coefficient'
    - id: BD-029
      type: B/BA
      summary: Use cold start (no initial params) for first iteration, warm start thereafter
    - id: BD-030
      type: B/DK
      summary: Use JSON serialization for parameter exchange in federated protocol
    - id: BD-051
      type: B/BA
      summary: Use HTTP requests library for client-server communication
    - id: BD-061
      type: DK/B
      summary: 'TODO: Implement fractional regression variations for LGD models'
    - id: BD-062
      type: DK/B
      summary: 'TODO: Adopt different data loading strategies for standalone vs federated learning'
    - id: BD-059
      type: DK/B
      summary: 'TODO: Remove hardcoded weights - fetch node data shape via controlled API'
    - id: BD-060
      type: DK
      summary: 'TODO: Remove file/URL path hardwiring in dataSource'
    - id: BD-055
      type: B/BA
      summary: Provide standalone_run.py as single-server validation before federated demo
    - id: BD-023
      type: B/BA
      summary: Set 10 epochs as default training iterations
    - id: BD-056
      type: B
      summary: Iterate federated rounds by calling lgdModel with previous averaged params
    - id: BD-001
      type: B
      summary: Choice parameter controls data transport rather than separate functions
    - id: BD-002
      type: BA/DK
      summary: Port-derived server ID convention (port - 5000 = server number)
    - id: BD-003
      type: B/BA
      summary: Hardcoded data schema (X, Y column names)
    - id: BD-025
      type: B
      summary: 'Provide two data source modes: local filesystem (choice=1) and REST API (choice=2)'
    - id: BD-026
      type: B/BA
      summary: Store CSV data in server_dirs/{server_id}/regression_data.csv pattern
    - id: BD-027
      type: B/BA
      summary: Define CSV data format with X column as target and Y as explanatory variable
    - id: BD-043
      type: B/RC
      summary: Query openNPL API endpoint /api/npl_data/counterparties for data backend
    - id: BD-044
      type: B/BA
      summary: Extract current_assets and cash_and_cash_equivalent_items as X and Y features
    - id: BD-073
      type: BA/DK
      summary: 'SGDRegressor defaults encode iterative forcing: max_iter=1, tol=None, early_stopping=False'
    - id: BD-075
      type: BA/DK
      summary: 'Server ID derived from port via hardcoded offset: n = int(port) - 5000'
    - id: BD-077
      type: BA/DK
      summary: Data source choice=1 loads from ./server_dirs/{server}/regression_data.csv
    - id: BD-081
      type: BA
      summary: Epochs count hardcoded in config.yml (10) vs standalone_run.py (10) - dual maintenance risk
    - id: BD-012
      type: M/BA
      summary: Federated Averaging (FedAvg) algorithm
    - id: BD-013
      type: B/BA
      summary: Equal weighting across servers
    - id: BD-014
      type: M/BA
      summary: Per-epoch parameter collection and averaging
    - id: BD-015
      type: B/BA
      summary: Hardcoded weight dictionary
    - id: BD-016
      type: B/BA
      summary: Blocking sequential server communication
    - id: BD-082
      type: B/BA
      summary: 'INTERACTION: BD-038 (stateless servers) × BD-005/BD-029 (warm-start via intercept_init/coef_init) → Paradox:
        warm-start REQUIRES state persistence across requests, contradicting stateless server desig'
    - id: BD-083
      type: B/BA
      summary: 'INTERACTION: BD-003/BD-078 (X=target, Y=explanatory column convention) × BD-025/BD-043 (dual data source modes)
        → Convention fragility amplified by data source variability'
    - id: BD-084
      type: BA
      summary: 'INTERACTION: BD-002/BD-033/BD-075 (port-derived server ID: n = port - 5000) × BD-080 (exactly 4 servers required)
        → Port availability dependency creates cascading failure risk'
    - id: BD-085
      type: B
      summary: 'INTERACTION: BD-013/BD-024/BD-076 (equal 0.25 weighting) × BD-016 (blocking sequential communication) → Unequal
        convergence quality with linear latency penalty'
    - id: BD-086
      type: B
      summary: 'INTERACTION: BD-004/BD-021/BD-052/BD-064/BD-073 (SGDRegressor single-epoch settings) × BD-019 (federated learning
        architecture) → Training limitation undermines federated convergence benefit'
    - id: BD-087
      type: B/RC
      summary: 'INTERACTION: BD-072 (start BEFORE update ordering) × BD-074 (averaging BEFORE next epoch) × BD-016 (sequential
        blocking) → Single slow server creates cascading deadlock risk in federated rounds'
    - id: BD-088
      type: BA
      summary: 'INTERACTION: BD-023 (epochs: 10) × BD-081 (epochs dual-hardcoded) → Configuration inconsistency risk between
        federated and standalone modes'
    - id: BD-089
      type: B/BA
      summary: 'RISK CASCADE: BD-076 (equal weighting) → BD-085 (latency amplification) → BD-087 (cascading deadlock) → federation
        failure when data is heterogeneous'
    - id: BD-090
      type: BA
      summary: 'RISK CASCADE: BD-075 (port-derived ID) → BD-080 (4-server hardcode) → BD-046 (health check) → deployment failure
        cascades to federation inability'
    - id: BD-091
      type: BA
      summary: 'CONTRADICTION: BD-038 (stateless servers) states ''each request is computed independently'' while BD-072 (start
        BEFORE update) mandates stateful request ordering across federated rounds'
    - id: BD-074
      type: B
      summary: Federated averaging MUST complete before sending averaged params to next epoch
    - id: BD-076
      type: B/BA
      summary: Equal federated weights (0.25 each) hardcoded for 4 servers - no API to fetch data size
    - id: BD-063
      type: B/BA
      summary: Linear regression model using SGDRegressor instead of closed-form OLS
    - id: BD-064
      type: B/BA
      summary: SGD optimization with max_iter=1 per fit call and warm_start disabled
    - id: BD-065
      type: B/BA
      summary: Early stopping disabled with no convergence tolerance criterion
    - id: BD-066
      type: B
      summary: No explicit regularization penalty applied to loss function
    - id: BD-067
      type: B
      summary: Server-based data source selection with file vs REST API input method
    - id: BD-068
      type: B/RC
      summary: 'Variable assignment convention: X as target, y as explanatory variables'
    - id: BD-069
      type: B/BA
      summary: Default squared error loss (squared_loss) with default optimal learning rate schedule
    - id: BD-070
      type: B/BA
      summary: Model parameter initialization supported via coef_init and intercept_init
    - id: BD-071
      type: B/BA
      summary: Fitted parameters returned as dictionary with predictions and metadata
    - id: BD-004
      type: M/BA
      summary: SGDRegressor with max_iter=1 for iterative control
    - id: BD-005
      type: M/DK
      summary: Warm-start via intercept_init/coef_init parameters
    - id: BD-006
      type: B/BA
      summary: None-checking as cold/warm start toggle
    - id: BD-007
      type: B/BA
      summary: SGDRegressor hardcoded (not abstracted or configurable)
    - id: BD-008
      type: B/BA
      summary: Port-to-server-ID derivation at runtime
    - id: BD-009
      type: BA
      summary: Server ID derived from request.host header parsing
    - id: BD-010
      type: B/DK
      summary: Signal-based shutdown via SIGKILL/SIGTERM selection
    - id: BD-011
      type: B/BA
      summary: Three-endpoint API design (/, /start, /update)
    - id: BD-072
      type: RC
      summary: Federated workflow REQUIRES /start cold-start BEFORE /update warm-start calls per epoch
    - id: BD-078
      type: RC
      summary: 'CSV column convention: X=target, Y=explanatory; extraction order matters for regression'
    - id: BD-079
      type: DK/B
      summary: Standalone and federated modes implement identical iterative SGD loop - code duplication
    - id: BD-017
      type: B
      summary: Identical epoch loop structure to federated_run
    - id: BD-018
      type: B
      summary: Direct lgdModel imports without server abstraction
    - id: BD-080
      type: B/BA
      summary: server_dirs/X/ requires exactly 4 subdirectories with identical CSV structure
resources:
  packages:
  - name: Flask
    version_pin: latest
  - name: scikit-learn
    version_pin: latest
  - name: numpy
    version_pin: latest
  - name: pandas
    version_pin: latest
  - name: scipy
    version_pin: latest
  - name: requests
    version_pin: latest
  - name: ruamel.yaml
    version_pin: latest
  - name: fabric
    version_pin: latest
  - name: Sphinx
    version_pin: latest
  - name: sphinx-rtd-theme
    version_pin: latest
  strategy_scaffold:
    entry_point_name: run_backtest
    output_path: result.csv
    execution_mode: backtest
    conditional_entry_points:
      backtest:
        entry_point_name: run_backtest
        output_path: result.csv
      collector:
        entry_point_name: run_collector
        output_path: result.json
      factor:
        entry_point_name: run_factor
        output_path: result.parquet
      training:
        entry_point_name: run_training
        output_path: result.json
      serving:
        entry_point_name: run_server
        output_path: result.json
      research:
        entry_point_name: run_research
        output_path: result.json
    tail_template: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()  #\
      \ implement above\n    from validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\"\
      )\n# === END DO NOT MODIFY ==="
  host_adapter:
    target: openclaw
    timeout_seconds: 1800
    shell_operator_restriction: 'exec tool intercepts && / ; / | — never chain: ''pip install X && python Y''. Use separate
      exec calls.'
    install_recipes:
    - python3 -m pip install Flask
    - python3 -m pip install scikit-learn
    - python3 -m pip install numpy
    - python3 -m pip install zvt
    credential_injection: JoinQuant/QMT credentials require user-side '!' prefix shell login. Never hardcode credentials in
      generated scripts.
    path_resolution: '{workspace} resolves to ~/.openclaw/workspace/doramagic at execution time.'
    file_io_tooling: Use openclaw 'write' tool for .py/.sql files; 'exec' tool for python3 /absolute/path/script.py (absolute
      paths only).
constraints:
  fatal:
  - id: finance-C-001
    when: When implementing data acquisition for LGD regression model
    action: Return a DataFrame containing exactly 'X' and 'Y' columns
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: The downstream lgdModel.py module accesses df[['X']] and df['Y'] columns without validation, causing KeyError
      exceptions if column names are different
    stage_ids:
    - data_acquisition
  - id: finance-C-002
    when: When implementing local file mode (choice=1) in dataSource
    action: Read CSV file from server_dirs/{server_id}/regression_data.csv path
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: pandas.read_csv will raise FileNotFoundError if the file path is incorrect, and there is no try-except handler
      to provide meaningful error messages
    stage_ids:
    - data_acquisition
  - id: finance-C-004
    when: When configuring data transport for the LGD model
    action: Pass choice values other than 1 or 2 to dataSource
    severity: fatal
    kind: resource_boundary
    modality: must_not
    consequence: If choice is neither 1 nor 2, the function returns None implicitly, causing lgdModel.py to fail when trying
      to access df[['X']] columns
    stage_ids:
    - data_acquisition
  - id: finance-C-005
    when: When deploying the federated model server infrastructure
    action: Start model servers on ports 5001-500N matching server IDs for correct port-to-ID mapping
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: model_server.py:35 computes n = int(port) - 5000 to derive server ID; wrong port causes incorrect data directory
      selection
    stage_ids:
    - data_acquisition
  - id: finance-C-011
    when: When implementing SGDRegressor for federated LGD estimation
    action: Set random_state parameter explicitly to verify reproducibility across federated nodes
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Without random_state, each lgdModel call produces non-deterministic results due to random data shuffling
      and weight initialization. This breaks federated learning convergence guarantees as different nodes will reach different
      local minima.
    stage_ids:
    - model_estimation
  - id: finance-C-013
    when: When providing input data to lgdModel
    action: Verify data source contains columns exactly named 'X' (explanatory) and 'Y' (target)
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: lgdModel accesses df[['X']] and df['Y'] without validation. Missing or misnamed columns will raise KeyError
      at runtime, breaking both standalone and federated execution flows.
    stage_ids:
    - model_estimation
  - id: finance-C-014
    when: When implementing federated parameter averaging logic
    action: Iterate over each participating servers (k from 1 to n) when computing weighted average
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: federated_run.py uses weights[str(n)] for all servers instead of weights[str(k)] for each server k. This
      causes the averaged parameters to use only the last server's weight, corrupting federated model convergence and producing
      incorrect global LGD estimates.
    stage_ids:
    - model_estimation
  - id: finance-C-018
    when: When returning fitted parameters from lgdModel
    action: Return dict with keys 'intercept' and 'coefficient' containing scalar values
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: model_server.py and standalone_run.py access params['intercept'] and params['coefficient']. Returning different
      key names (e.g., 'coef' or 'coefficients') would cause KeyError in all downstream consumers, breaking both standalone
      and federated modes.
    stage_ids:
    - model_estimation
  - id: finance-C-033
    when: When implementing GET /start endpoint for cold-start LGD estimation
    action: Return JSON with 'intercept' and 'coefficient' keys from lgdModel cold-start calculation
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Federated coordinator fails to parse response causing KeyError in federated_run.py:58-59 when accessing data['coefficient']
    stage_ids:
    - model_serving
  - id: finance-C-034
    when: When implementing POST /update endpoint for warm-start LGD estimation
    action: Accept JSON body with 'intercept' and 'coefficient' fields and return updated parameters in same JSON structure
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Federated coordination loop breaks when /update response format differs from /start response format
    stage_ids:
    - model_serving
  - id: finance-C-037
    when: When presenting LGD estimation results from model server for regulatory credit risk reporting
    action: Claim that backtest model parameters equal live production model parameters
    severity: fatal
    kind: claim_boundary
    modality: must_not
    consequence: Regulatory non-compliance when presenting simulated model estimates as actual risk quantification without
      noting estimation methodology differences
    stage_ids:
    - model_serving
  - id: finance-C-041
    when: When implementing the initial parameter averaging loop
    action: skip the first server by starting loop index at 1 instead of 0
    severity: fatal
    kind: domain_rule
    modality: must_not
    consequence: The first server's parameters are excluded from initial averaging, causing all subsequent averaged parameters
      to be incorrect and breaking federated convergence. Server 0 contribution is completely lost.
    stage_ids:
    - federated_coordination
  - id: finance-C-042
    when: When implementing the epoch averaging loop
    action: use the loop variable k to index weights instead of using the final index n
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Epoch averaging uses weights[str(n)] inside the loop instead of weights[str(k)], causing only the last server's
      weight (0.0 for n=4) to be applied repeatedly, producing meaningless averaged parameters.
    stage_ids:
    - federated_coordination
  - id: finance-C-043
    when: When initializing SGDRegressor with warm start parameters
    action: pass intercept_init and coef_init to the fit() method of sklearn SGDRegressor
    severity: fatal
    kind: domain_rule
    modality: must_not
    consequence: SGDRegressor.fit() does not accept intercept_init or coef_init parameters. Passing these will raise a TypeError,
      breaking all federated update cycles and preventing convergence.
    stage_ids:
    - federated_coordination
  - id: finance-C-053
    when: When implementing SGDRegressor warm-start parameter initialization
    action: Use sklearn set_params() method to set initial coefficient and intercept values before fitting with warm_start=True
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: SGDRegressor.fit() does not accept intercept_init and coef_init keyword arguments, causing TypeError at runtime
      when warm-starting with pre-existing parameter values
    stage_ids:
    - standalone_execution
  - id: finance-C-054
    when: When implementing warm-start SGDRegressor with external parameter initialization
    action: Set clf.coef_ and clf.intercept_ attributes directly before calling fit(), or use partial_fit() for stateful updates
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Attempting to pass custom initial parameters via non-existent fit() arguments will raise TypeError, breaking
      the epoch iteration loop
    stage_ids:
    - standalone_execution
  - id: finance-C-057
    when: When configuring dataSource for local CSV mode
    action: Set choice parameter to 1 and verify server_dirs/{server}/regression_data.csv exists before calling dataSource()
    severity: fatal
    kind: resource_boundary
    modality: must
    consequence: Missing data directory or incorrect file path will trigger FileNotFoundError, preventing model estimation
      from executing
    stage_ids:
    - standalone_execution
  - id: finance-C-066
    when: When implementing data acquisition, ensure DataFrame schema matches model expectations
    action: Return a pandas DataFrame with columns exactly named 'X' (target variable) and 'Y' (explanatory variable)
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: If column names differ, lgdModel.py line 73 df[['X']] and line 75 df['Y'] will raise KeyError, causing model
      estimation to fail silently or crash
    stage_ids:
    - data_acquisition
    - model_estimation
  - id: finance-C-067
    when: When passing SGDRegressor parameters to HTTP endpoints, ensure proper type extraction
    action: Extract scalar values from numpy arrays using index [0] before returning dict
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: JSON serialization of numpy arrays produces incompatible format that Flask cannot jsonify correctly, causing
      HTTP endpoint failures
    stage_ids:
    - model_estimation
    - model_serving
  - id: finance-C-078
    when: When implementing or validating DataFrame inputs to the LGD estimation model
    action: Provide DataFrames containing both 'X' (target/LGD variable) and 'Y' (explanatory variable) columns, as the model
      extracts X=df[['X']] and y=df['Y']
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: KeyError or incorrect regression results when the model tries to access missing 'X' or 'Y' columns, breaking
      the LGD estimation pipeline
  - id: finance-C-079
    when: When initializing or updating LGD model parameters in federated mode
    action: Pass parameter dictionaries containing both 'intercept' and 'coefficient' keys as required by the sklearn SGDRegressor
      warm-start interface
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: KeyError or TypeError when sklearn fit() receives incorrect parameter dict structure, breaking federated
      parameter exchange
  - id: finance-C-080
    when: When spawning Flask model servers for federated LGD estimation
    action: Assign server ports in the range 5001-5004, as the server code derives server ID via n = int(port) - 5000 to map
      to server_dirs/N/data
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Incorrect data directory mapping causing FileNotFoundError or loading wrong server's data, breaking the entire
      federated estimation
  - id: finance-C-081
    when: When executing a federated LGD training epoch
    action: Call /start (cold-start) before any /update (warm-start) calls, as the model requires initial parameters to be
      established first
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Incorrect model parameters propagated to all servers when warm-start is called without prior cold-start,
      leading to divergent or invalid federated estimates
  - id: finance-C-082
    when: When implementing the SGDRegressor-based LGD model iteration
    action: Set max_iter=1 and tol=None to enforce single-epoch per fit() call, as each gradient step must be performed independently
      across federated nodes
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Multi-epoch convergence within a single fit() call breaks the federated averaging contract, causing incorrect
      parameter aggregation across nodes
  - id: finance-C-085
    when: When deploying the openLGD Flask model servers
    action: Run Flask with debug=True in production or any security-sensitive environment, as this enables code execution
      and interactive debugger
    severity: fatal
    kind: domain_rule
    modality: must_not
    consequence: Remote code execution vulnerability when werkzeug debugger is exposed in production, allowing attackers to
      execute arbitrary code on the server
  - id: finance-C-086
    when: When presenting or marketing openLGD's capabilities to users
    action: Claim that openLGD is suitable for production deployment, as it is explicitly documented as early alpha software
      with unstable API
    severity: fatal
    kind: claim_boundary
    modality: must_not
    consequence: Users deploy alpha software in production, experiencing unexpected API breaking changes, unhandled edge cases,
      and security vulnerabilities
  - id: finance-C-099
    when: When implementing or evaluating federated learning architecture decisions
    action: Verify raw data remains local to each server node and only model parameters (intercept and coefficient) traverse
      the network — must NOT implement centralized data pooling even if technically feasible
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Centralizing raw financial data violates data sovereignty requirements for multi-institution scenarios, causing
      regulatory non-compliance with GDPR, banking secrecy laws, and institutional data sharing prohibitions
    derived_from_bd_id: BD-019
  - id: finance-C-102
    when: When implementing data loading and regression preparation in lgdModel.py and dataSource.py
    action: Extract column 'X' as target values and column 'Y' as explanatory variables in the exact order specified at lgdModel.py:73-75
      — must NOT swap, rename, or use alternative column mappings
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Inverting the X/Y column convention produces an inverted regression model where target and predictor variables
      are swapped, causing completely incorrect LGD estimates and invalid credit risk assessments
    derived_from_bd_id: BD-078
  - id: finance-C-106
    when: When parsing YAML configuration files in federated_run.py
    action: Use ruamel.yaml with safe loading (typ='safe') for YAML parsing; never use yaml.load without specifying a safe
      Loader to prevent arbitrary code execution through YAML deserialization vulnerabilities
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Unsafe YAML loading allows arbitrary code execution from malicious configuration files, creating remote code
      execution vulnerability in production deployments
    derived_from_bd_id: BD-058
  - id: finance-C-108
    when: When implementing federated learning parameter synchronization across distributed servers
    action: Implement per-epoch parameter collection and averaging with explicit epochs configuration; verify each server's
      partial_fit() results are collected and averaged before the next round begins
    severity: fatal
    kind: domain_rule
    modality: must
    consequence: Missing per-epoch synchronization causes parameter drift across servers, leading to inconsistent model states
      and failed federated convergence
    derived_from_bd_id: BD-014
  - id: finance-C-121
    when: When implementing or refactoring federated learning aggregation logic
    action: Verify federated averaging operation (parameter aggregation from servers) completes entirely before sending averaged
      parameters to the next epoch — do not parallelize or reorder the averaging step with subsequent epoch processing
    severity: fatal
    kind: architecture_guardrail
    modality: must
    consequence: Skipping or parallelizing the averaging step causes stale or inconsistent parameters to propagate, corrupting
      federated learning convergence and producing models that do not represent true global consensus
    derived_from_bd_id: BD-074
  regular:
  - id: finance-C-003
    when: When implementing API mode (choice=2) in dataSource
    action: Construct URL using localhost:800{server_id} pattern and /api/npl_data/counterparties endpoint
    severity: high
    kind: resource_boundary
    modality: must
    consequence: requests.get will raise ConnectionError if the target server is not running, with no error handling in the
      code
    stage_ids:
    - data_acquisition
  - id: finance-C-006
    when: When running openLGD in production
    action: Expect production-grade API stability from openLGD
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: README.md:18 explicitly states 'early alpha release' and CHANGELOG.rst:3 warns 'API IS STILL VERY UNSTABLE
      AS MORE USE CASES / FEATURES ARE ADDED REGULARLY'
    stage_ids:
    - data_acquisition
  - id: finance-C-007
    when: When deploying federated mode with API data source (choice=2)
    action: Verify target API server is running before calling dataSource with choice=2
    severity: high
    kind: resource_boundary
    modality: must
    consequence: requests.get() in dataSource.py:41 will raise ConnectionError if the localhost:800X API server is not running,
      and there is no error handling
    stage_ids:
    - data_acquisition
  - id: finance-C-008
    when: When implementing local file mode (choice=1) data acquisition
    action: Verify server_dirs/{server_id} directory exists before attempting to read CSV
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Missing directory will cause pandas.read_csv to raise FileNotFoundError with no custom error message or recovery
      mechanism
    stage_ids:
    - data_acquisition
  - id: finance-C-009
    when: When adding new data sources or modifying data acquisition
    action: Hardcode file paths, URLs, or column names directly in dataSource implementation
    severity: medium
    kind: operational_lesson
    modality: must_not
    consequence: dataSource.py:25 TODO comment explicitly states 'remove file / url path hardwiring', hardcoded paths make
      deployment brittle and non-portable
    stage_ids:
    - data_acquisition
  - id: finance-C-010
    when: When using API mode (choice=2) data acquisition
    action: Handle the nested API call pattern (counterparty list then individual records)
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: dataSource.py:40-47 makes two sequential requests.get calls; if any individual data_url fails, the loop continues
      with incomplete data
    stage_ids:
    - data_acquisition
  - id: finance-C-012
    when: When implementing the cold/warm start toggle logic
    action: Verify both intercept and coef parameters are provided together for warm-start mode
    severity: high
    kind: domain_rule
    modality: must
    consequence: The condition 'if intercept is None or coef is None' triggers cold-start if either parameter is missing.
      Partial initialization with only one parameter will silently fall back to random initialization, producing incorrect
      model updates in the federated loop.
    stage_ids:
    - model_estimation
  - id: finance-C-015
    when: When deploying federated model servers
    action: Use ports other than 5001-5004 for the default configuration without updating both server and client code
    severity: high
    kind: resource_boundary
    modality: must_not
    consequence: model_server.py:35 derives server ID as 'int(port) - 5000', and Federated_Demo.md documents ports 5001-5004.
      Port mismatches between server and controller cause requests to reach wrong servers, breaking federated coordination.
    stage_ids:
    - model_estimation
  - id: finance-C-016
    when: When selecting regression algorithm for LGD estimation
    action: Accept that SGDRegressor is the only available algorithm (not abstracted or configurable)
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: The regression algorithm is hardcoded to sklearn.linear_model.SGDRegressor. Replacing it requires modifying
      lgdModel.py directly. This creates a tight coupling and prevents using alternative algorithms (e.g., ridge regression,
      ElasticNet) without code changes.
    stage_ids:
    - model_estimation
  - id: finance-C-017
    when: When running lgdModel in iterative fashion for federated learning
    action: Set max_iter=1 to verify exactly one gradient step per function call
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: The max_iter=1 setting is critical for the federated learning architecture where each call represents one
      epoch. Increasing max_iter would perform multiple gradient steps per call, breaking the per-epoch parameter update contract
      required by federated averaging.
    stage_ids:
    - model_estimation
  - id: finance-C-019
    when: When claiming LGD estimation capabilities
    action: Claim statistical rigor equivalent to pooled dataset analysis
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Federated LGD estimation with SGD produces parameters that may not converge to the pooled optimum due to
      data heterogeneity across servers. Presenting federated estimates as equivalent to centralized estimation would misrepresent
      the statistical properties of the model.
    stage_ids:
    - model_estimation
  - id: finance-C-020
    when: When considering replacing the SGDRegressor implementation
    action: Claim that federated learning produces identical results to centralized estimation
    severity: medium
    kind: claim_boundary
    modality: should_not
    consequence: Federated averaging with SGD is an approximation that depends on data distribution across servers. Different
      server configurations will produce different model parameters even with identical hyperparameters, which is expected
      behavior, not a bug.
    stage_ids:
    - model_estimation
  - id: finance-C-021
    when: When evaluating federated averaging convergence
    action: Skip monitoring parameter stability across epochs
    severity: medium
    kind: operational_lesson
    modality: must_not
    consequence: Without tracking parameter change magnitude across epochs, users cannot determine if the federated process
      has converged. Parameters oscillating or diverging indicate misconfiguration or data quality issues that would go unnoticed.
    stage_ids:
    - model_estimation
  - id: finance-C-022
    when: When performing initial cold-start call
    action: Pass None for both intercept and coef parameters
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: The first federated iteration requires random initialization. Passing non-None values on cold-start would
      improperly seed the federated process with arbitrary values, corrupting the initial global model state.
    stage_ids:
    - model_estimation
  - id: finance-C-023
    when: When implementing port-to-server-ID derivation in Flask endpoints
    action: Validate that port can be converted to integer before subtracting 5000
    severity: high
    kind: domain_rule
    modality: must
    consequence: ValueError exception when Host header contains non-numeric port, causing HTTP 500 response to clients
    stage_ids:
    - model_serving
  - id: finance-C-024
    when: When implementing POST /update endpoint that parses JSON request body
    action: Validate JSON parsing result and check required fields 'intercept' and 'coefficient' exist
    severity: high
    kind: domain_rule
    modality: must
    consequence: KeyError exception when client sends JSON without 'intercept' or 'coefficient' fields, causing HTTP 500 response
    stage_ids:
    - model_serving
  - id: finance-C-026
    when: When implementing Flask model server endpoint that accesses local data
    action: Use server_dirs/{port-5000}/regression_data.csv as the data directory path pattern
    severity: high
    kind: domain_rule
    modality: must
    consequence: FileNotFoundError when server tries to access non-existent data directory, causing cold-start estimation
      to fail
    stage_ids:
    - model_serving
  - id: finance-C-027
    when: When implementing /update endpoint that expects model parameters
    action: Verify request Content-Type is application/json before parsing request body
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Malformed JSON response or HTTP 415 Unsupported Media Type when client sends non-JSON data
    stage_ids:
    - model_serving
  - id: finance-C-028
    when: When implementing model server in federated cluster topology
    action: Run server instances on ports 5001-5004 matching server_dirs/1 through server_dirs/4
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Server with port 5005 incorrectly maps to server_dirs/5 which may not exist, causing data loading failure
    stage_ids:
    - model_serving
  - id: finance-C-029
    when: When deploying Flask-based model server for federated LGD estimation
    action: Accept that Flask development server is single-threaded and not suitable for high-concurrency production workloads
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: HTTP request blocking causing federated coordination timeouts when multiple clients connect simultaneously
    stage_ids:
    - model_serving
  - id: finance-C-030
    when: When configuring model servers for federated estimation workflow
    action: Start each model servers before executing federated_run.py coordinator script
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: ConnectionError when coordinator attempts GET /start or POST /update on unavailable server, breaking federated
      iteration
    stage_ids:
    - model_serving
  - id: finance-C-031
    when: When using openLGD in early alpha release for credit risk estimation
    action: Expect API instability and prepare for breaking changes in each release cycle
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Silent model parameter changes causing inconsistent LGD estimates across federated nodes after library upgrade
    stage_ids:
    - model_serving
  - id: finance-C-032
    when: When implementing federated LGD estimation with multiple server instances
    action: Verify each server instance has unique port and corresponding server_dirs/{n}/data directory provisioned
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Multiple servers accessing same data directory causing race conditions in CSV file read operations
    stage_ids:
    - model_serving
  - id: finance-C-035
    when: When implementing Flask model server that derives server ID from HTTP Host header
    action: Use request.host header for port extraction to verify multi-tenant isolation per server instance
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Wrong server ID used for data directory access causing data contamination between federated nodes
    stage_ids:
    - model_serving
  - id: finance-C-036
    when: When implementing root endpoint (/) for health check
    action: Return HTTP 200 OK with JSON response indicating server liveness and identity
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Health check monitoring tools fail to detect server availability, causing false alarms in federated cluster
      monitoring
    stage_ids:
    - model_serving
  - id: finance-C-038
    when: When deploying model_server.py in a federated credit risk production system
    action: Advertise the Flask development server as production-ready HTTP service
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Security audit failure and operational risk when relying on Flask debug server lacking production hardening
      features
    stage_ids:
    - model_serving
  - id: finance-C-039
    when: When using model server for openLGD federated estimation in alpha stage
    action: Assume API compatibility between minor version upgrades without regression testing
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Silent breaking changes in JSON response format causing federated coordination to fail silently or produce
      incorrect averaged parameters
    stage_ids:
    - model_serving
  - id: finance-C-040
    when: When estimating LGD model parameters with federated learning across multiple servers
    action: Claim federated-averaged parameters are equivalent to centrally-computed parameters without mathematical proof
      of convergence
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Incorrect credit risk estimates when federated averaging assumptions (data homogeneity, equal weighting)
      are violated in practice
    stage_ids:
    - model_serving
  - id: finance-C-044
    when: When making HTTP requests to federated servers
    action: include timeout parameters to prevent indefinite blocking on unreachable servers
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Without HTTP timeouts, a single unresponsive server causes the entire federated run to hang indefinitely.
      In production, this blocks all participating servers waiting for the coordinator.
    stage_ids:
    - federated_coordination
  - id: finance-C-045
    when: When configuring server weights for federated averaging
    action: dynamically calculate weights based on actual data volumes or sample counts per server
    severity: high
    kind: operational_lesson
    modality: must
    consequence: Equal weighting assumes equal data volumes across servers. If servers have unequal data (e.g., 100 vs 10000
      samples), the weighted average under-represents larger datasets, producing biased LGD estimates that misrepresent actual
      credit risk.
    stage_ids:
    - federated_coordination
  - id: finance-C-046
    when: When running Flask model servers in production
    action: run with debug=True enabled in production environments
    severity: high
    kind: resource_boundary
    modality: must_not
    consequence: Flask debug mode enables code reloading and Werkzeug debugger, exposing the Python traceback to attackers.
      This creates remote code execution vulnerabilities in production deployments.
    stage_ids:
    - federated_coordination
  - id: finance-C-047
    when: When configuring the number of servers in config.yml
    action: verify the server count matches exactly the number of weights defined in the weights dictionary
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: The weights dictionary is hardcoded for 4 servers. If config.yml specifies servers != 4, the URL construction
      and weight indexing will fail, causing KeyError or IndexError exceptions.
    stage_ids:
    - federated_coordination
  - id: finance-C-048
    when: When presenting federated learning results
    action: claim that the system provides production-ready real-time federated credit risk modeling
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: The README explicitly states 'This is an early alpha release. openLGD is still in active development'. Presenting
      alpha software as production-ready violates user expectations and regulatory requirements for credit risk models.
    stage_ids:
    - federated_coordination
  - id: finance-C-049
    when: When processing JSON responses from federated servers
    action: validate response structure before accessing dictionary keys
    severity: medium
    kind: domain_rule
    modality: must
    consequence: Without validation, if a server returns malformed JSON or missing keys ('coefficient', 'intercept'), the
      code raises KeyError, crashing the entire federated run mid-epoch.
    stage_ids:
    - federated_coordination
  - id: finance-C-050
    when: When handling HTTP errors from server communication
    action: check HTTP status codes and implement retry logic for transient failures
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: Network partitions, server overload, or temporary unavailability cause HTTP errors that crash the federated
      run. Without error handling, a single epoch failure prevents any parameter updates from being applied.
    stage_ids:
    - federated_coordination
  - id: finance-C-051
    when: When implementing data source abstraction
    action: externalize file paths and URL patterns to configuration instead of hardcoding in source
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Hardcoded paths like './server_dirs/' and 'http://localhost:800' prevent the system from running in different
      environments without code modifications.
    stage_ids:
    - federated_coordination
  - id: finance-C-052
    when: When scaling the federated system to more than 4 servers
    action: assume the hardcoded weights dictionary remains valid without modification
    severity: high
    kind: architecture_guardrail
    modality: must_not
    consequence: Increasing servers > 4 in config.yml causes KeyError when accessing weights beyond the 4 hardcoded keys,
      crashing the federated run.
    stage_ids:
    - federated_coordination
  - id: finance-C-055
    when: When configuring SGDRegressor for iterative model estimation
    action: Set warm_start=True to enable parameter reuse across consecutive fit() calls
    severity: high
    kind: domain_rule
    modality: must
    consequence: With warm_start=False, each fit() call resets coefficients to random initialization, preventing convergence
      across epochs and producing non-monotonic parameter estimates
    stage_ids:
    - standalone_execution
  - id: finance-C-056
    when: When preparing CSV data for LGD model estimation
    action: Verify data files contain exactly two columns named 'X' (target variable) and 'Y' (explanatory variable) without
      missing values
    severity: high
    kind: domain_rule
    modality: must
    consequence: Mismatched column names or missing values will cause KeyError during DataFrame extraction or produce NaN
      coefficients, invalidating the LGD estimation
    stage_ids:
    - standalone_execution
  - id: finance-C-058
    when: When running standalone execution for environment validation
    action: Execute standalone_run.py first to verify paths, dependencies, and core estimation logic before launching federated
      servers
    severity: medium
    kind: architecture_guardrail
    modality: should
    consequence: Skipping standalone validation may lead to cryptic errors during federated execution when environment issues
      could have been caught earlier
    stage_ids:
    - standalone_execution
  - id: finance-C-059
    when: When configuring standalone execution epochs
    action: Hardcode Epochs value in standalone_run.py when it should be configurable via config.yml like federated_run.py
    severity: medium
    kind: resource_boundary
    modality: must_not
    consequence: Hardcoded 10 epochs prevents testing different convergence behaviors and creates inconsistency between standalone
      and federated execution configurations
    stage_ids:
    - standalone_execution
  - id: finance-C-060
    when: When comparing standalone vs federated estimation results
    action: Use identical epoch loop structure in standalone_run.py and federated_run.py to enable deterministic result comparison
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Different loop structures prevent meaningful validation that standalone lgdModel produces identical results
      to model_server endpoint, defeating the purpose of standalone as validation framework
    stage_ids:
    - standalone_execution
  - id: finance-C-061
    when: When accessing LGD estimation logic from standalone execution
    action: Import lgdModel directly without model_server abstraction to validate core estimation works independently of federation
      infrastructure
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Using model_server endpoints in standalone mode introduces unnecessary HTTP overhead and masks potential
      lgdModel issues behind server interface complexity
    stage_ids:
    - standalone_execution
  - id: finance-C-062
    when: When presenting standalone execution as production system
    action: Claim standalone execution produces production-ready LGD estimates equivalent to enterprise financial systems
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: openLGD is explicitly documented as 'early alpha' research software; presenting alpha results as production-ready
      violates the project's stated development status
    stage_ids:
    - standalone_execution
  - id: finance-C-063
    when: When claiming standalone results validate federated production deployments
    action: Claim that single-server standalone LGD estimates equal federated multi-server estimates without accounting for
      data partitioning and averaging differences
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Standalone runs use consolidated data while federated runs partition data across servers and average parameters,
      producing different estimation landscapes
    stage_ids:
    - standalone_execution
  - id: finance-C-064
    when: When executing standalone_run.py without understanding SGDRegressor convergence
    action: Assume 10 epochs produces converged parameters without verifying coefficient stability across consecutive epochs
    severity: medium
    kind: operational_lesson
    modality: must_not
    consequence: With tol=None and max_iter=1 per fit() call, 10 external epochs may be insufficient for convergence with
      complex datasets, leading to unreliable LGD estimates
    stage_ids:
    - standalone_execution
  - id: finance-C-065
    when: When using sklearn SGDRegressor with stochastic gradient descent for financial modeling
    action: Set random_state parameter explicitly to verify reproducible coefficient estimates across executions
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without explicit random_state, SGDRegressor will produce different coefficient estimates on each run due
      to random shuffling of training samples, preventing reproducible validation
    stage_ids:
    - standalone_execution
  - id: finance-C-068
    when: When sending parameters to POST /update endpoint, ensure Content-Type header is set
    action: 'Set HTTP header ''Content-Type'': ''application/json'' when posting JSON data'
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Without proper Content-Type header, Flask may parse request.data incorrectly, causing json.loads to fail
      with UnicodeDecodeError
    stage_ids:
    - federated_coordination
    - model_serving
  - id: finance-C-069
    when: When implementing federated averaging, ensure loop iterates over each servers
    action: Use correct loop variable in weight index - must be k, not n
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Loop on line 64 uses range(1, n) which excludes server 0, then uses weights[str(n)] which may be undefined,
      causing KeyError or incorrect weighted averaging
    stage_ids:
    - federated_coordination
  - id: finance-C-070
    when: When configuring federated workflow, ensure weights sum to 1.0 for proper averaging
    action: Validate that weight sum equals 1.0 or is proportional across each participating servers
    severity: high
    kind: domain_rule
    modality: must
    consequence: Incorrect weights cause model parameters to be improperly averaged, leading to biased LGD estimates and incorrect
      credit risk capital calculations
    stage_ids:
    - federated_coordination
  - id: finance-C-071
    when: When loading config.yml for federated coordination, validate each required keys exist
    action: Check that config contains 'hosts', 'epochs', and 'servers' keys before accessing them
    severity: high
    kind: resource_boundary
    modality: must
    consequence: Missing config keys cause KeyError when accessing config['hosts'], config['epochs'], or config['servers'],
      preventing federated coordination from starting
    stage_ids:
    - federated_coordination
  - id: finance-C-072
    when: When mapping server port to server ID, ensure port follows the 5000+ convention
    action: Format model server URL as base URL plus server number without trailing slash
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Incorrect URL format causes requests.get() to fail with ConnectionError, preventing federated parameter aggregation
    stage_ids:
    - federated_coordination
    - model_serving
  - id: finance-C-073
    when: When loading data from CSV, ensure the file has exactly 2 columns with proper headers
    action: Validate CSV structure returns DataFrame with exactly columns 'X' and 'Y'
    severity: medium
    kind: domain_rule
    modality: must
    consequence: CSV parsing errors or missing columns cause KeyError in lgdModel.py, producing incorrect LGD estimates without
      warning
    stage_ids:
    - data_acquisition
  - id: finance-C-074
    when: When receiving JSON in POST request, ensure both 'intercept' and 'coefficient' keys exist
    action: Validate params dictionary contains both 'intercept' and 'coefficient' keys before passing to lgdModel
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Missing keys cause KeyError in model_server.py:44-45, crashing the update endpoint and halting federated
      training
    stage_ids:
    - model_serving
  - id: finance-C-075
    when: When returning parameters from GET /start endpoint, ensure dict can be serialized
    action: Return dict with float values (not numpy scalars) for JSON serialization compatibility
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Flask jsonify fails on numpy float64 types, returning 500 Internal Server Error and breaking federated coordination
    stage_ids:
    - model_serving
  - id: finance-C-076
    when: When using openLGD for credit risk decisions, do not present simulated results as validated outcomes
    action: Claim results are for 'research and validation purposes' rather than production credit risk quantification
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Presenting early alpha LGD estimates as validated credit risk parameters may violate regulatory requirements
      for capital calculation under Basel frameworks
    stage_ids:
    - model_estimation
    - model_serving
  - id: finance-C-077
    when: When encountering model convergence warnings, do not skip investigation by assuming 'early iterations are normal'
    action: Investigate each convergence warnings before continuing federated iterations
    severity: low
    kind: rationalization_guard
    modality: must_not
    consequence: Skipping convergence investigation may hide numerical instability or data quality issues, producing unreliable
      LGD estimates
    stage_ids:
    - model_estimation
    - federated_coordination
  - id: finance-C-083
    when: When completing a federated averaging round before sending parameters to the next epoch
    action: Complete the averaging of each node parameters before distributing averaged params to any node, as premature sending
      of partial averages corrupts model convergence
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Partial averages sent to nodes cause parameter drift and non-convergence in subsequent epochs, producing
      invalid LGD model coefficients
  - id: finance-C-084
    when: When configuring federated averaging weights across model servers
    action: Adjust server weights proportionally when changing the number of participating servers, as the default 0.25 weight
      assumes exactly 4 equal-data-volume nodes
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Incorrect weighted averaging causes biased LGD model parameters when server data volumes differ from the
      4-node equal-weight assumption
  - id: finance-C-087
    when: When using openLGD for credit risk decision-making
    action: Claim that federated LGD model estimates are equivalent to centralized model estimates, as data distribution assumptions
      differ between modes
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Incorrect regulatory capital calculations when federated estimates diverge from centralized benchmarks without
      proper validation methodology
  - id: finance-C-088
    when: When selecting an LGD modeling approach
    action: Claim that openLGD supports non-linear LGD modeling, as the implementation uses only sklearn SGDRegressor with
      linear loss
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Incorrect credit risk assessments when users expect non-linear LGD capabilities (GLM with binomial, beta
      regression) that openLGD does not provide
  - id: finance-C-089
    when: When scaling the federated LGD deployment beyond the demo configuration
    action: Claim that openLGD supports large-scale federated deployments with many servers, as the sequential communication
      architecture creates a bottleneck
    severity: high
    kind: claim_boundary
    modality: must_not
    consequence: Severe performance degradation or timeout failures when scaling beyond 4 servers due to sequential HTTP-based
      parameter exchange
  - id: finance-C-090
    when: When using SGDRegressor with the configured hyperparameters
    action: Claim that the model will converge to an optimal solution within any specific number of epochs, as tol=None disables
      convergence checking
    severity: medium
    kind: claim_boundary
    modality: must_not
    consequence: Users may stop training prematurely expecting convergence, leading to under-fitted LGD models with suboptimal
      coefficient estimates
  - id: finance-C-091
    when: When implementing federated learning workflows based on this blueprint
    action: Fetch actual node data shapes via a controlled API instead of hardcoding weights, as TODO at federated_run.py:33
      acknowledges this limitation
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Incorrect model averaging when actual server data volumes differ from the assumed equal distribution, producing
      biased LGD estimates
  - id: finance-C-092
    when: When sourcing LGD training data through the dataSource abstraction
    action: Use parametric paths (server_dirs/N/datafile.csv or proper openNPL API endpoints) as hardcoded in dataSource.py,
      or risk data loading failures
    severity: high
    kind: resource_boundary
    modality: must
    consequence: FileNotFoundError or data loading failures when data files are not in the expected parametric locations,
      breaking both standalone and federated modes
  - id: finance-C-093
    when: When selecting data loading method in the LGD estimation workflow
    action: Use choice=1 for local CSV files or choice=2 for openNPL REST API, as the dataSource function branches on these
      discrete values only
    severity: medium
    kind: resource_boundary
    modality: must
    consequence: Unexpected behavior or data loading failure when using unsupported choice values for data sourcing
  - id: finance-C-094
    when: When implementing or refactoring training loop logic in standalone_run.py
    action: Maintain identical epoch loop structure and Epochs configuration as federated_run.py to verify valid comparison
      between standalone and federated training results
    severity: high
    kind: domain_rule
    modality: must
    consequence: Modifying the epoch loop structure independently in standalone_run.py breaks the comparison guarantee between
      standalone and federated training modes, making it impossible to verify that federation complexity does not introduce
      behavioral changes
    derived_from_bd_id: BD-017
  - id: finance-C-095
    when: When implementing or refactoring model initialization logic in lgdModel.py
    action: Assume parameters can be passed as None when intending to use existing values — always distinguish between 'parameter
      not provided' (use existing) and 'parameter explicitly set to None' (cold start)
    severity: high
    kind: operational_lesson
    modality: must_not
    consequence: Confusing None vs not-present causes silent cold starts that reset model state, producing incorrect LGD estimates
      and invalidating credit risk calculations
    derived_from_bd_id: BD-006
  - id: finance-C-096
    when: When implementing federated protocol communication between coordinator and servers
    action: Exchange only the two scalar parameters (intercept and coefficient) per communication round — must NOT add gradient
      vectors, Hessian information, or additional statistics to the payload
    severity: high
    kind: domain_rule
    modality: must
    consequence: Adding extra parameters to federated exchanges increases bandwidth requirements and attack surface for parameter
      tampering, violating the minimal payload design essential for bandwidth-constrained environments
    derived_from_bd_id: BD-028
  - id: finance-C-097
    when: When implementing data acquisition for LGD model training
    action: Query the openNPL API endpoint /api/npl_data/counterparties for structured entity data including financial metrics
      — must NOT use alternative data sources without validation against openNPL schema
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using mismatched data sources causes schema incompatibilities with downstream LGD estimation, potentially
      producing meaningless regression results or silent data corruption
    derived_from_bd_id: BD-043
  - id: finance-C-098
    when: When modifying training epoch configuration across the federated learning system
    action: Update epoch count in both config.yml and standalone_run.py simultaneously — implement a centralized constant
      or import from a shared module to prevent dual maintenance drift
    severity: medium
    kind: architecture_guardrail
    modality: should
    consequence: Updating epochs in only one location causes divergent training duration between federated and standalone
      modes, invalidating comparative results and producing non-reproducible experiments
    derived_from_bd_id: BD-081
  - id: finance-C-100
    when: When implementing FedAvg aggregation logic in federated_run.py
    action: Fetch actual node data shapes (sample counts) via controlled API and apply weighted averaging proportional to
      local data volumes — must NOT use hardcoded equal weights (0.25 each) in production environments
    severity: high
    kind: domain_rule
    modality: must
    consequence: Using equal weights when datasets have heterogeneous sizes causes model convergence bias toward smaller nodes,
      producing suboptimal LGD estimates that systematically underestimate risk for larger institutions
    derived_from_bd_id: BD-059
  - id: finance-C-101
    when: When implementing federated coordination and model aggregation in federated_run.py
    action: Use Federated Averaging (FedAvg) algorithm with synchronous rounds and parameter-level averaging only — must NOT
      implement asynchronous averaging, differential privacy mechanisms, or secure aggregation without explicit architectural
      approval
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Using alternative aggregation methods without architectural review breaks the synchronous FedAvg assumption,
      potentially causing parameter staleness, convergence failures, or compatibility issues with existing server implementations
    derived_from_bd_id: BD-012
  - id: finance-C-103
    when: When importing and using the lgdModel module for standalone execution
    action: Import lgdModel directly via 'from lgdModel import lgdModel' to verify standalone execution without network dependencies;
      do not introduce HTTP client abstractions that would couple core estimation to the model_server layer
    severity: high
    kind: domain_rule
    modality: must
    consequence: Refactoring to use HTTP client for lgdModel would break standalone execution, preventing unit testing and
      local development without network infrastructure
    derived_from_bd_id: BD-018
  - id: finance-C-104
    when: When implementing linear regression for LGD estimation in federated learning
    action: Use SGDRegressor from scikit-learn with partial_fit() for incremental learning; do not replace with OLS closed-form
      solution or other batch-only algorithms that require centralized data
    severity: high
    kind: domain_rule
    modality: must
    consequence: Switching to OLS or batch-only regression breaks the federated learning architecture, requiring centralized
      data aggregation that violates distributed processing assumptions
    derived_from_bd_id: BD-020
  - id: finance-C-105
    when: When implementing LGD estimation logic
    action: Use SGDRegressor as the regression algorithm; be aware that switching to alternative algorithms (e.g., RandomForest,
      neural networks) requires implementing abstract base class, factory pattern, and serialization abstraction layers
    severity: medium
    kind: architecture_guardrail
    modality: must
    consequence: Hardcoded SGDRegressor assumption means alternative algorithms require significant refactoring; strategy
      accuracy depends on regression model choice and must be validated independently
    derived_from_bd_id: BD-007
  - id: finance-C-107
    when: When configuring federated learning server infrastructure
    action: Use explicit server ID configuration via environment variable instead of port-derived ID (n = port - 5000); verify
      port availability in the 5001-5004 range before startup; implement graceful degradation when ports are unavailable
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Port-derived server ID creates cascading failure risk where port conflicts prevent server startup, causing
      health check failures that halt entire federated execution
    derived_from_bd_id: BD-084
  - id: finance-C-109
    when: When implementing model training with partial_fit() in lgdModel.py
    action: Use max_iter=1 for each partial_fit() call to maintain explicit per-epoch control in the federated orchestration
      loop — do not change to higher values as this blurs the distinction between local and global epochs
    severity: high
    kind: domain_rule
    modality: must
    consequence: Increasing max_iter beyond 1 causes local optimization iterations to blend with global federated rounds,
      making convergence analysis unreliable and breaking the federated coordination contract
    derived_from_bd_id: BD-021
  - id: finance-C-110
    when: When implementing server ID derivation in model_server.py
    action: Assume consecutive port allocation starting from 5001 for server ID derivation — use explicit server ID configuration
      instead
    severity: medium
    kind: operational_lesson
    modality: should_not
    consequence: Port-based ID derivation assumes a specific port numbering scheme that may not hold in all deployment scenarios,
      causing server ID mismatches when ports are allocated non-consecutively
    derived_from_bd_id: BD-008
  - id: finance-C-111
    when: When designing APIs for model serving endpoints
    action: Implement external state management for distributed deployments — the /, /start, /update three-endpoint design
      assumes stateless operation and does not handle distributed state coordination
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: Stateless API design fails in distributed scenarios where multiple server instances require coordination,
      causing inconsistent state across requests
    derived_from_bd_id: BD-011
  - id: finance-C-112
    when: When implementing server lifecycle management
    action: Implement the /stop endpoint for graceful shutdown to verify in-flight requests complete and state is properly
      finalized before termination
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Abrupt server termination without graceful shutdown risks data corruption and leaves clients with incomplete
      responses
    derived_from_bd_id: BD-047
  - id: finance-C-113
    when: When setting up project dependencies
    action: Use virtual environment isolation for dependency management to prevent conflicts with system packages and verify
      reproducible builds
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: System-wide package installation risks breaking system packages and causes dependency conflicts across projects
    derived_from_bd_id: BD-049
  - id: finance-C-114
    when: When integrating with sklearn utilities or ML pipelines
    action: Use standard sklearn convention where X=features and y=target — the code uses reversed convention with X as target
      and y as explanatory variables
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Reversed X/y convention causes silent failures when using sklearn utilities expecting standard ordering,
      producing incorrect model predictions or cryptic errors
    derived_from_bd_id: BD-068
  - id: finance-C-115
    when: When implementing federated round orchestration with sequential blocking
    action: Implement timeout handling and retry logic for individual server calls — sequential blocking with BD-072 (/start
      before /update) and BD-074 (averaging before next epoch) creates cascading deadlock if any server becomes unresponsive
      mid-round
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: A single slow or unresponsive server during /start or /update blocks the entire federated round with no timeout
      mechanism, causing cascading timeouts across all rounds
    derived_from_bd_id: BD-087
  - id: finance-C-116
    when: When configuring training epochs for LGD model
    action: Centralize epoch configuration in config.yml and import from it in both standalone_run.py and federated_run.py
      — do not hardcode epoch values separately
    severity: high
    kind: domain_rule
    modality: must
    consequence: Dual-hardcoded epoch values create maintenance hazard; updating epochs in one file but not the other causes
      federated and standalone modes to train for different durations, invalidating BD-055 validation baseline
    derived_from_bd_id: BD-088
  - id: finance-C-117
    when: When parsing server port numbers to derive server IDs
    action: Hardcode the magic formula n = int(port) - 5000 — the port-to-ID mapping depends on a specific port range allocation
      that must remain consistent
    severity: medium
    kind: architecture_guardrail
    modality: must_not
    consequence: Hardcoded port offset makes server ID derivation brittle; changing port allocation scheme breaks ID mapping
      silently throughout the system
    derived_from_bd_id: BD-075
  - id: finance-C-118
    when: When refactoring training loop code
    action: Consider extracting the shared SGD training loop from standalone_run.py and federated_run.py into a common module
      to eliminate duplication — duplicate logic in epoch_loop across both files creates maintenance risk
    severity: low
    kind: operational_lesson
    modality: should
    consequence: Identical training loop logic duplicated in two files requires synchronized updates; changes applied to one
      file but not the other cause divergent behavior between modes
    derived_from_bd_id: BD-079
  - id: finance-C-119
    when: When implementing federated learning fit logic using partial_fit() calls
    action: Enable warm_start on the SGDRegressor — warm_start must remain False to verify each partial_fit() call starts
      fresh without leveraging optimizer state from previous iterations
    severity: high
    kind: domain_rule
    modality: must_not
    consequence: Setting warm_start=True preserves optimizer state across partial_fit() calls, causing unintended state carryover
      between federated rounds and breaking the parameter averaging protocol semantics
    derived_from_bd_id: BD-052
  - id: finance-C-120
    when: When selecting features for Loss Given Default (LGD) credit risk estimation
    action: Verify that current_assets and cash_and_cash_equivalent_items are the intended features — if replacing with alternative
      features, verify liquidity characteristics are still captured as these are fundamental to credit risk modeling
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Using alternative features without liquidity coverage may cause the LGD model to underestimate default losses
      for asset-heavy borrowers, leading to insufficient provision calculations in live trading
    derived_from_bd_id: BD-044
  - id: finance-C-122
    when: When implementing warm-start functionality using coef_init or intercept_init parameters
    action: Explicitly set warm_start=True before calling fit() to enable parameter reuse — without warm_start=True, coef_init
      and intercept_init only apply to the first fit() call and subsequent calls will reinitialize parameters, silently discarding
      warm-start behavior
    severity: high
    kind: domain_rule
    modality: must
    consequence: If warm_start=False (default), coef_init and intercept_init parameters are ignored after the first fit()
      call, causing warm-start attempts to silently fail and lose previously learned parameter state
    derived_from_bd_id: BD-070
  - id: finance-C-123
    when: When consuming model output from lgdModel.fit()
    action: Verify that the consumer code expects dictionary return type from fitted_params — if integrating with downstream
      systems, verify dict interface compatibility or implement explicit type handling; coordinate with team before changing
      return format to tuple or dataclass
    severity: medium
    kind: operational_lesson
    modality: must
    consequence: Return format assumes dict interface consumer; if downstream systems expect a different type or the return
      format changes, data consumption breaks silently causing downstream processing failures
    derived_from_bd_id: BD-071
  - id: finance-C-124
    when: When implementing stateless server endpoints with warm-start functionality
    action: Do not rely on stateless server architecture for /update endpoints that require warm-start — implement state persistence
      via coordinator tracking iteration state, sticky sessions with persistent server instances, or shared state store; parameters
      received via intercept_init/coef_init must be preserved across requests
    severity: high
    kind: domain_rule
    modality: must
    consequence: Stateless server design initializes fresh state per request, but warm-start requires parameter state preservation;
      parameters passed via intercept_init/coef_init are silently discarded, causing the federated protocol to produce inconsistent
      models across rounds
    derived_from_bd_id: BD-082
  - id: finance-C-125
    when: When integrating data from multiple sources using X/Y column conventions
    action: Implement schema validation for column mapping between openNPL API fields and X/Y conventions — verify that hardcoded
      field mappings (current_assets, cash_and_cash_equivalent_items) in dataSource function remain synchronized with upstream
      API schema; add explicit error handling if expected columns are missing or renamed
    severity: high
    kind: domain_rule
    modality: must
    consequence: Hardcoded X/Y column convention breaks silently when openNPL API schema changes, causing incorrect feature
      extraction with no obvious error; downstream models train on misaligned data producing invalid LGD estimates
    derived_from_bd_id: BD-083
  - id: finance-C-126
    when: When implementing federated learning coordination logic in the framework
    action: Implement sequential blocking without timeout handling for inter-server ordering contracts — this creates a cascading
      deadlock vulnerability when servers have heterogeneous data volumes
    severity: high
    kind: architecture_guardrail
    modality: must_not
    consequence: Sequential blocking with no timeouts causes the federation to deadlock when any server experiences extended
      training time due to larger datasets; all participating servers hang waiting for the slowest server, causing complete
      federation failure
    derived_from_bd_id: BD-089
  - id: finance-C-127
    when: When implementing federated learning round coordination in the framework
    action: Implement timeout handling for inter-server ordering contracts and add data volume heterogeneity checks before
      initiating coordination rounds
    severity: high
    kind: domain_rule
    modality: must
    consequence: Without timeout handling and heterogeneity checks, the federation will experience cascading deadlocks when
      servers have significantly different dataset sizes, causing complete round failures and federation collapse
    derived_from_bd_id: BD-089
  - id: finance-C-128
    when: When implementing or modifying model serving logic
    action: Initialize model from scratch for each request using server identifier — do not cache model state in server memory
      between requests
    severity: high
    kind: architecture_guardrail
    modality: must
    consequence: 'Stateful model servers cause backtest-live inconsistency: in containerized deployments, instances may be
      created/destroyed with stale cached state, and load-balanced multi-instance setups may route requests to instances with
      outdated models, leading to unpredictable execution results'
    derived_from_bd_id: BD-038
  - id: finance-C-129
    when: When deploying model servers in containerized or load-balanced environments
    action: Verify each request loads model parameters from persistent storage (server_dirs/{server_id}) independently — confirm
      no in-memory model caching across requests
    severity: high
    kind: domain_rule
    modality: must
    consequence: 'In-memory caching of model state causes platform-dependent behavior: different container instances may have
      different cached states, making backtest results non-reproducible across deployment configurations'
    derived_from_bd_id: BD-038
  - id: finance-C-130
    when: When implementing federated averaging weight configuration
    action: Verify that sample_count is equal across each servers before using equal weights; if sample counts differ significantly,
      implement proportional weighting based on actual sample counts per server
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Equal weighting (25% each) silently distorts federated model accuracy when servers have unequal data volumes;
      in production, servers with smaller datasets are over-weighted while larger datasets are under-weighted, leading to
      suboptimal model convergence and degraded prediction accuracy
    derived_from_bd_id: BD-013/BD-015
  - id: finance-C-131
    when: When scaling the federated learning system beyond demo scale (>4 servers)
    action: Replace blocking sequential HTTP requests with async parallel execution (asyncio with aiohttp) or thread pool
      to reduce latency from O(n) linear scaling to near-constant time
    severity: medium
    kind: operational_lesson
    modality: should
    consequence: Sequential blocking communication creates linear latency growth O(n) with server count; for 8+ servers, round-trip
      time doubles compared to parallel execution, causing unacceptable delays in production federated training rounds
    derived_from_bd_id: BD-016
output_validator:
  assertions:
  - id: OV-01
    check_predicate: all(p in inspect.getsource(zvt.factors.algorithm.macd) for p in ['slow=26', 'fast=12', 'n=9'])
    failure_message: 'FATAL: MACD params drifted from (fast=12, slow=26, n=9) — SL-08 violation, non-reproducible signals'
    business_meaning: Standard MACD parameters are a semantic lock; drift makes results incomparable with industry-standard
      indicators and non-reproducible.
    source_ids:
    - SL-08
    - BD-036
  - id: OV-02
    check_predicate: result.get('total_trades', 0) > 0 or result.get('explicit_zero_trade_ack') is True
    failure_message: Zero trades executed — likely missing pre-fetched data (see PC-02) or over-restrictive filters
    business_meaning: A backtest with zero trades is not a valid result; either data is missing or the strategy never triggered.
      Structural non-emptiness check is insufficient — we need business confirmation.
    source_ids:
    - SL-01
    - finance-C-073
  - id: OV-03
    check_predicate: result.get('annual_return') is None or abs(float(result['annual_return'])) <= 5.0
    failure_message: 'FATAL: |annual_return| > 500% — likely look-ahead bias or data error'
    business_meaning: Annual returns exceeding 500% are physically implausible for A-share strategies; indicates look-ahead
      bias or corrupt data.
    source_ids: []
  - id: OV-04
    check_predicate: result.get('holding_change_pct') is None or abs(float(result['holding_change_pct'])) <= 1.0
    failure_message: 'FATAL: |holding_change_pct| > 100% — physically impossible'
    business_meaning: Holding change percentage cannot exceed 100%; violation indicates position accounting error.
    source_ids:
    - BD-029
  - id: OV-05
    check_predicate: result.get('max_drawdown') is None or abs(float(result['max_drawdown'])) <= 1.0
    failure_message: 'FATAL: |max_drawdown| > 100% — impossible for non-leveraged account'
    business_meaning: Maximum drawdown cannot exceed 100% without leverage; violation indicates calculation error or look-ahead
      bias.
    source_ids: []
  - id: OV-06
    check_predicate: not (hasattr(result, 'trade_log') and result.trade_log and any(result.trade_log[i].action == 'sell' and
      i+1 < len(result.trade_log) and result.trade_log[i+1].action == 'buy' and result.trade_log[i].timestamp == result.trade_log[i+1].timestamp
      for i in range(len(result.trade_log)-1)))
    failure_message: 'FATAL: buy-before-sell detected in same cycle — SL-01 violation, creates implicit leverage'
    business_meaning: SL-01 requires sell() before buy() in each cycle; violation means available_long was not updated before
      buying, risking duplicate positions.
    source_ids:
    - SL-01
  scaffold:
    validate_py_path: '{workspace}/validate.py'
    tail_block: "# === DO NOT MODIFY BELOW THIS LINE ===\nif __name__ == \"__main__\":\n    result = run_backtest()\n    from\
      \ validate import enforce_validation\n    enforce_validation(result, output_path=\"{workspace}/result.csv\")\n# ===\
      \ END DO NOT MODIFY ==="
  enforcement_protocol: 1. Never edit validate.py. 2. Never delete the DO NOT MODIFY tail block from the main script. 3. Never
    wrap enforce_validation() in try/except. 4. Never rewrite result write logic — it MUST go through enforce_validation.
    5. If validate.py raises ImportError, fix the dependency, do not remove the call.
acceptance:
  hard_gates:
  - id: G1
    check: '{workspace}/result.csv exists AND file size > 0'
    on_fail: Strategy did not produce output; check run_backtest() return value and enforce_validation() call
  - id: G2
    check: '{workspace}/result.csv.validation_passed marker file exists'
    on_fail: Validation did not complete; review validate.py output and fix assertion failures
  - id: G3
    check: 'Main script contains literal: from validate import enforce_validation'
    on_fail: Validation chain stripped; re-add the import in the DO NOT MODIFY block
  - id: G4
    check: 'Main script contains literal: # === DO NOT MODIFY BELOW THIS LINE ==='
    on_fail: Validation fence removed; regenerate DO NOT MODIFY tail block
  - id: G5
    check: 'result.csv has at least 1 row: pandas.read_csv(result.csv).shape[0] >= 1'
    on_fail: Empty result; check if trade_log is non-empty and factors generated signals. Confirm PC-02 (k-data exists) passed.
  - id: G6
    check: 'If MACD strategy: source contains ''slow=26'' AND ''fast=12'' AND ''n=9'' in algorithm call'
    on_fail: MACD params drifted from SL-08 lock; restore standard (12, 26, 9)
  - id: G7
    check: 'For data pipeline tasks: result.csv contains ''entity_id'' and ''timestamp'' fields'
    on_fail: Missing required columns; check Mixin.query_data return schema and DataFrame MultiIndex reset_index() before
      writing
  - id: G8
    check: 'OV-03 passes: abs(annual_return) <= 5.0 (500%)'
    on_fail: Physical plausibility check failed; investigate look-ahead bias or data corruption in input kdata
  soft_gates:
  - id: SG-01
    rubric: 'Strategy narrative consistency: user intent aligns with generated strategy.py logic. dim_a: signal direction
      (buy/sell) matches intent [1-5, pass>=4]; dim_b: frequency (daily/intraday) aligns [1-5, pass>=4]; dim_c: risk controls
      match user intent [1-5, pass>=4].'
  - id: SG-02
    rubric: 'Factor combination quality. dim_a: no highly correlated factor duplication [1-5, pass>=4]; dim_b: multi-period
      alignment correct [1-5, pass>=4]; dim_c: liquidity filter present for A-share [1-5, pass>=4].'
  - id: SG-03
    rubric: 'Data source selection appropriateness. dim_a: coverage sufficient for target entities [1-5, pass>=4]; dim_b:
      provider latency acceptable for strategy frequency [1-5, pass>=4]; dim_c: no unauthorized provider used without credentials
      [1-5, pass>=4].'
skill_crystallization:
  trigger: all_hard_gates_passed AND user_opt_out_skill_saving != true
  output_path_template: '{workspace}/../skills/{slug}.skill'
  slug_template: '{blueprint_id_short}-{uc_id_lower}'
  captured_fields:
  - name
  - intent_keywords
  - entry_point_script
  - validate_script
  - fatal_constraints
  - spec_locks
  - preconditions
  - install_recipes
  - human_summary_translated
  action: 'After all Hard Gates PASS, resolve slug via slug_template using the executed UC, then write the .skill YAML file
    at output_path_template. Notify user in their detected locale: ''Skill saved as {slug}.skill — next time say one of {sample_triggers}
    from the matched UC to invoke directly.'''
  violation_signal: All hard gates passed but no .skill file exists at expected path
  skill_file_schema:
    name: finance-bp-112 / Sphinx Documentation Configuration
    version: v5.3
    intent_keywords:
    - documentation
    - sphinx
    - configuration
    - build docs
    - project setup
    entry_point: run_backtest
    fatal_guards:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-10
    - SL-11
    - SL-12
    spec_locks:
    - SL-01
    - SL-02
    - SL-03
    - SL-04
    - SL-05
    - SL-06
    - SL-07
    - SL-08
    - SL-09
    - SL-10
    - SL-11
    - SL-12
    preconditions:
    - PC-01
    - PC-02
    - PC-03
    - PC-04
post_install_notice:
  trigger: skill_installation_complete
  message_template:
    positioning: I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow.
    capability_catalog:
      group_strategy:
        source: auto_grouped
        strategy_reason: no candidate field had 2-7 distinct values; all capabilities collapsed into single group
      groups:
      - group_id: all
        name: All Capabilities
        description: ''
        emoji: 📦
        uc_count: 1
        ucs:
        - uc_id: UC-101
          name: Sphinx Documentation Configuration
          short_description: This file configures the Sphinx documentation builder for the openLGD project, setting up project
            metadata, version information, and path configuratio
          sample_triggers:
          - documentation
          - sphinx
          - configuration
    call_to_action: Tell me which one you want to try.
    featured_entries:
    - uc_id: UC-101
      beginner_prompt: Try sphinx documentation configuration
      auto_selected: true
    - uc_id: UC-100
      beginner_prompt: Try capability UC-100
      auto_selected: true
    - uc_id: UC-101
      beginner_prompt: Try capability UC-101
      auto_selected: true
    more_info_hint: Ask me 'what else can you do?' to see all 1 capabilities.
  locale_rendering:
    instruction: On skill_installation_complete, translate ALL user-facing strings (positioning + capability_catalog.groups[].name
      + capability_catalog.groups[].description + capability_catalog.groups[].ucs[].short_description + call_to_action + featured_entries[].beginner_prompt
      + more_info_hint) into detected user locale per locale_contract. Preserve UC-IDs, group_id, emoji, and sample_triggers
      verbatim.
    preserve_verbatim:
    - UC-IDs
    - group_id
    - emoji
    - sample_triggers
    - technical_class_names
  enforcement:
    action: 'Host agent MUST send composed message to user as the FIRST user-facing response after skill_installation_complete
      event. Message MUST contain: positioning, capability_catalog (rendered as markdown tables per group), 3 featured_entries,
      call_to_action, and more_info_hint.'
    violation_code: PIN-01
    violation_signal: First user-facing message post-install does not contain the full capability_catalog (all UCs grouped)
      OR skips featured_entries OR skips call_to_action.
human_summary:
  persona: Doraemon
  what_i_can_do:
    tagline: 'I help you build quant strategies on A-share with ZVT — from data fetch to backtest, one flow. Just tell me
      what you want; I''ll write the code, you don''t have to dig docs. (Heads up: ZVT natively supports A-share, HK, and
      crypto. US stocks — stockus_nasdaq_AAPL — are half-baked; don''t bother for serious work.)'
    use_cases:
    - Sphinx Documentation Configuration
    - A-share MACD daily golden-cross backtest with hfq price adjustment from eastmoney
    - 'End-to-end ZVT pipeline: FinanceRecorder + GoodCompanyFactor + StockTrader'
    - Multi-factor strategy with TargetSelector (AND mode) combining MACD + volume breakout
    - Index composition data collection (SZ1000, SZ2000) with EM recorder
    - Institutional fund holdings tracker via joinquant_fund_runner pattern
    - Custom Transformer + Accumulator factor with per-entity rolling state
  what_i_auto_fetch:
  - ZVT stage pipeline structure (data_collection → visualization) from LATEST.yaml
  - Semantic locks (SL-01 through SL-12) — especially sell-before-buy ordering and MACD params
  - Fatal constraints (finance-C-*) relevant to your target strategy type
  - 'Default parameters: MACD(12,26,9), hfq adjustment, buy_cost=0.001, base_capital=1M CNY'
  - Entity ID format (stock_sh_600000) and DataFrame MultiIndex convention
  - Provider-specific recorder class names and required class attributes
  what_i_ask_you:
  - 'Target market: A-share (default), HK, or crypto? (US stocks in ZVT are half-baked — stockus_nasdaq_AAPL exists but coverage
    is thin)'
  - 'Data source / provider: eastmoney (free, no account), joinquant (account+paid), baostock (free, good history), akshare,
    or qmt (broker)?'
  - 'Strategy type: MACD golden-cross, MA crossover, volume breakout, fundamental screen, or custom factor?'
  - 'Time range: start_timestamp and end_timestamp for backtest period'
  - 'Target entity IDs: specific stocks (stock_sh_600000) or index components (SZ1000)?'
  locale_rendering:
    instruction: On first user contact, translate all fields above into detected user locale while preserving Doraemon persona
      (direct, frank, mildly snarky, knows limits).
    preserve_verbatim:
    - BD-IDs
    - SL-IDs
    - UC-IDs
    - finance-C-IDs
    - class_names
    - function_names
    - file_paths
    - numeric_thresholds