Install
openclaw skills install avro-schema-managerManage Apache Avro schemas — validate structure, check forward/backward compatibility, plan schema evolution, audit namespace conventions, and generate code...
openclaw skills install avro-schema-managerManage Apache Avro schemas across your data platform. Validates schema structure, checks forward/backward compatibility for safe evolution, audits namespace conventions, reviews schema registry configuration, and generates language-specific code stubs. Acts as a senior data engineer ensuring schema quality and safe evolution.
Basic: Validate the Avro schemas in /path/to/schemas/
Focused: Check compatibility between schema v3 and v4 | Review namespace conventions | Generate Java POJOs from this schema | Plan migration for adding a field
find /path/to/project -name "*.avsc" -o -name "*.avdl" -o -name "*.avpr"
grep -r "schema.registry.url" /path/to/project --include="*.properties"
grep -rl "avro.schema\|AvroSchema\|fastavro" /path/to/project --include="*.py" --include="*.java"
Parses schema type (record, enum, fixed), namespace/name, fields with types/defaults/docs, logical types (date, timestamp, decimal, uuid), complex types (unions, arrays, maps), and cross-schema references.
Schema Validation: UserEvent (com.company.events.user)
PASS: Valid record with namespace and documentation
PASS: All fields have doc strings
PASS: Logical types correct (timestamp-millis)
PASS: Nullable fields use ["null", "string"] with default null
WARN: event_id could use logicalType: uuid for semantic clarity
WARN: Enum "EventType" is inline — extract to separate .avsc for reuse
Schema Validation: OrderCreated (com.company.events.order)
FAIL: Nullable field wrong union order
"discount_code": ["string", "null"]
FIX: ["null", "string"], default: null (null-first convention)
FAIL: Missing default on nullable field "referral_source"
Type ["null", "string"] but no default — old producers cause errors
FIX: Add "default": null
FAIL: "amount" is bytes without logical type
FIX: {"type": "bytes", "logicalType": "decimal", "precision": 10, "scale": 2}
FAIL: Mixed naming — "firstName" vs snake_case elsewhere
FIX: "first_name" with alias: {"aliases": ["firstName"]}
WARN: 4/8 fields missing documentation
Compatibility Analysis: UserEvent v3 -> v4 (mode: BACKWARD)
1. ADD "device_type" (string, default: "unknown")
BACKWARD: YES | FORWARD: YES | FULL: YES
2. REMOVE "legacy_flag"
BACKWARD: YES (consumers use default) | FORWARD: NO
RISK: Consumers still reading legacy_flag get default/null
ACTION: Verify no consumer depends on this field
3. MODIFY "user_id": int -> long
BACKWARD: NO | FORWARD: NO | FULL: NO
BREAKING CHANGE — deserialization failures guaranteed
FIX: Two-phase migration:
Phase 1: Add "user_id_v2" (long), keep "user_id" (int)
Phase 2: Migrate all consumers to user_id_v2
Phase 3: Remove "user_id" in future version
4. ADD enum symbol "REFUND" to EventType
BACKWARD: YES | FORWARD: NO
RISK: Consumers without default case in switch/match will throw
VERDICT: FAIL — Schema Registry will REJECT under BACKWARD mode
BLOCKING: Change 3 (type widening) must be resolved first
8 namespaces found
FAIL: "UserActivity" has no namespace — collision risk, default package
FIX: Add namespace "com.company.events.user"
FAIL: Inconsistent prefix: "events.user" vs "com.company.events.user"
FIX: Standardize to com.company.<domain>.<subdomain>
FAIL: "Config" uses too-broad "com.company"
FIX: Use "com.company.common" or "com.company.config"
FAIL: Schema "OrderItem" references "com.company.model.Product"
but no .avsc file defines it — unversioned dependency
FIX: Keep all schema files in repository
RECOMMEND: Document namespace convention:
com.company.events.* (event schemas)
com.company.model.* (data model schemas)
com.company.common (shared types)
Confluent Schema Registry | 34 subjects | 127 versions
Global compatibility: BACKWARD
WARN: "internal-logs-value" set to NONE — any change accepted
RISK: Breaking changes reach prod. FIX: Set BACKWARD minimum
PASS: "payment-events-value" uses BACKWARD_TRANSITIVE
Checks against ALL prior versions — good for critical data
FAIL: 3 subjects with non-standard naming
"UserEvent" (no -value suffix), "order.created.v2" (dots + version)
FIX: Standardize to TopicNameStrategy: <topic>-key, <topic>-value
WARN: "order-events-value" has 8 versions — review if all consumed
Soft-delete unused versions after consumer migration
FAIL: No schema validation in CI/CD
FIX: Add PR gate check:
curl -X POST schema-registry/compatibility/subjects/<sub>/versions/latest
Block merge if compatibility fails
Safe path for adding "payment_method" to OrderCreated:
Step 1: Add as nullable with default
{"name": "payment_method", "type": ["null", "string"], "default": null}
Compatible in ALL modes. Old producers: consumers get null.
Step 2: Register new version — compatibility check passes
Step 3: Deploy producers that populate the field
Step 4: (Optional future) Make required — ONLY after all producers send it
DANGEROUS changes requiring special handling:
Rename field: NEVER directly — use aliases instead
Change type: Add new field with new type, deprecate old
Remove enum symbol: NEVER — prefix with DEPRECATED_ instead
Reorder fields: SAFE in Avro (name-based), may break raw byte consumers
Java: avro-maven-plugin generates SpecificRecord classes
Python: fastavro parse_schema() + writer/reader
TypeScript: avsc.Type.forSchema() for encoder/decoder
Go: github.com/linkedin/goavro/v2 codec from schema JSON
RECOMMEND: Automate in CI/CD build step
avro-tools compile schema src/main/avro target/generated
Never hand-edit generated classes
# Avro Schema Management Report
## Overall Health Score: 61/100
Schema structure: 7/10 Field conventions: 5/10
Compatibility: 6/10 Namespaces: 4/10
Registry config: 6/10 Evolution practices: 5/10
CI/CD integration: 3/10 Documentation: 5/10
## Critical Issues
1. int->long type change on user_id — BREAKING, registry rejects
2. "UserActivity" has no namespace — collision risk
3. No CI/CD compatibility validation
4. Referenced schema "Product" not in repository
5. Nullable field with wrong union order — deserialization risk
## High Priority
6. Non-standard subject naming (3 subjects)
7. "internal-logs" compatibility set to NONE
8. No code generation automation
9. Inline enums preventing cross-schema reuse
10. 30% of fields missing documentation