Skip to content

USDL: Universal Schema Definition Language

If UDM is the universal representation for data, USDL is the universal representation for schemas. Just as UDM lets you write format-agnostic transformations, USDL lets you write format-agnostic schema definitions — one schema that can be read from any schema format, edited, and written to any other schema format.

What USDL Is

USDL is NOT a separate language or file format. It is a convention — a set of property names starting with % that represent schema information in a format-agnostic way:

utlx
{
  "%namespace": "com.example.orders",
  "%types": {
    "Order": {
      "%kind": "object",
      "%fields": {
        "id": { "%type": "string", "%required": true },
        "total": { "%type": "number", "%minimum": 0 }
      }
    }
  }
}

These % properties can live alongside any other data in a UDM tree. The % prefix is reserved for USDL — it never collides with raw format properties (XML elements don't start with %, JSON keys don't start with %, etc.).

How USDL Works — Automatically

USDL works through three automatic mechanisms. No special tags or declarations are needed.

1. Enrichment (Input — Tier 2 Schema Formats)

When UTL-X reads a schema format (XSD, JSON Schema, Avro, Protobuf, EDMX, Table Schema), the parser automatically enriches the UDM with USDL % properties alongside the raw structure:

utlx
%utlx 1.0
input xsd
output json
---
// Both views available on the same $input:
{
  rawElementName: $input["xs:element"].@name,         // raw XSD access
  types: $input["%types"],                             // USDL access
  pattern: $input.^xsdPattern                          // metadata access
}

The raw schema structure is preserved AND the USDL abstraction is layered on top. No information loss.

2. Detection (Output — Tier 2 Schema Serializers)

All six Tier 2 serializers automatically detect %types in the UDM and generate the native schema format:

utlx
%utlx 1.0
input json
output xsd
---
// Write USDL in the body — the XSD serializer detects %types and generates XSD
{
  "%namespace": "com.example",
  "%types": {
    "Customer": {
      "%kind": "object",
      "%fields": {
        "name": { "%type": "string", "%required": true },
        "email": { "%type": "string", "%format": "email" }
      }
    }
  }
}

The serializer doesn't care HOW %types got there — enrichment, data, or literal. It just detects and generates.

3. Carrier (YAML/JSON as Human-Readable Schema)

YAML and JSON naturally carry USDL properties as regular data. This enables a powerful round-trip:

1. Read schema:    input xsd → $input has raw + USDL %types
2. Dump to YAML:   output yaml → human-readable YAML with %types
3. Edit:           developer modifies the YAML in a text editor
4. Read back:      input yaml → %types flows through as data
5. Generate:       output xsd → serializer sees %types → new XSD

YAML is the human-readable editing layer for schemas. Read any schema as YAML, edit in any text editor, write back as any schema format.

USDL enrichment is fully implemented. All 6 Tier 2 parsers add % properties alongside the raw structure on input. All 6 serializers detect %types on output. Both raw and USDL access work simultaneously: $input["xs:element"] (raw) and $input["%types"] (USDL). Enrichment failure never blocks — $input["%_diagnostics"]["%_status"] reports "complete", "partial", or "failed".

The USDL Directive Classification

USDL organizes its directives in three levels of portability — from universal concepts that every schema format shares, down to features unique to a single standard:

LevelScopeWhat it coversApplies to
CorePortableTypes, names, descriptions — the fundamentalsAll formats
SharedWidespreadConstraints, validation — supported by 80%+Most formats
Format-nativeExclusiveFeatures unique to one format familyOne format
ReservedFutureFuture USDL versionsPlanned

This classification is the key to understanding USDL. Core and Shared directives work everywhere. Format-native directives target a specific output format. When converting USDL to a format that doesn't support a Format-native directive, it's preserved as a comment or metadata.

Core Directives (Universal)

These directives are supported by every schema format:

DirectiveWhat it definesExample
%namespaceSchema namespace/package%namespace: "com.example.orders"
%versionSchema version%version: "1.0.0"
%typesType definitions block%types: { Customer: {...} }
%kindType kind (object, enum, union)%kind: "object"
%nameType or field name%name: "Customer"
%typeField data type%type: "string"
%descriptionHuman-readable description%description: "Customer ID"
%fieldsField definitions block%fields: { id: {...} }
%valuesEnum values%values: ["ACTIVE", "INACTIVE"]

Example: Customer schema using only Core directives

This works in ALL output formats because every format understands types, names, and descriptions:

utlx
%utlx 1.0
input auto
output json
---
{
  "%namespace": "com.example.crm",
  "%version": "1.0.0",
  "%types": {
    "Customer": {
      "%kind": "object",
      "%description": "A customer record",
      "%fields": {
        "id": { "%type": "string", "%description": "Unique identifier" },
        "name": { "%type": "string", "%description": "Full name" },
        "email": { "%type": "string", "%description": "Email address" },
        "active": { "%type": "boolean" }
      }
    }
  }
}

Shared Directives (Most Formats) (80%+ Format Support)

These express constraints and validation rules supported by most formats:

DirectiveWhat it constrainsSupported byExample value
%requiredField is mandatoryAll formatstrue
%nullableField can be nullJSON Schema, Avro, Prototrue
%arrayField is an arrayAll formatstrue
%defaultDefault valueJSON Schema, Avro, XSD"EUR"
%minLengthMin string lengthJSON Schema, XSD1
%maxLengthMax string lengthJSON Schema, XSD255
%patternRegex patternJSON Schema, XSDsee code examples
%minimumMin number valueJSON Schema, XSD0
%maximumMax number valueJSON Schema, XSD999999
%enumAllowed valuesAll formatssee code examples
%formatSemantic format hintJSON Schema"email"
%exampleExample valueJSON Schema, OpenAPIsee code examples

Example: Customer with Core + Shared constraints

utlx
%utlx 1.0
input auto
output json
---
{
  "%namespace": "com.example.crm",
  "%version": "1.0.0",
  "%types": {
    "Customer": {
      "%kind": "object",
      "%fields": {
        "id": { "%type": "string", "%required": true, "%pattern": "^C-[0-9]{5}$" },
        "name": { "%type": "string", "%required": true, "%minLength": 1, "%maxLength": 100 },
        "email": { "%type": "string", "%format": "email" },
        "age": { "%type": "integer", "%minimum": 0, "%maximum": 150 },
        "status": { "%type": "string", "%enum": ["ACTIVE", "INACTIVE"], "%default": "ACTIVE" },
        "orders": { "%type": "Order", "%array": true, "%required": false }
      }
    },
    "Order": {
      "%kind": "object",
      "%fields": {
        "orderId": { "%type": "string", "%required": true },
        "total": { "%type": "decimal", "%minimum": 0 },
        "currency": { "%type": "string", "%enum": ["EUR", "USD", "GBP"] }
      }
    }
  }
}

Formats that don't support a constraint (e.g., Protobuf has no %pattern) preserve it as a documentation comment.

Format-Native Directives

These target a specific schema format. They express features that only make sense in one context.

Protobuf-Specific (proto)

DirectiveWhat it setsExample value
%fieldNumberProtobuf field tag number1
%packedPacked encoding for repeated fieldstrue
%oneofProtobuf oneof group name"payment_method"
%mapProtobuf map typetrue
%reservedReserved field numberssee code examples
utlx
%utlx 1.0
input auto
output proto
---
{
  "%namespace": "com.example.orders",
  "%types": {
    "Order": {
      "%kind": "object",
      "%fields": {
        "id": { "%type": "string", "%required": true, "%fieldNumber": 1 },
        "customer": { "%type": "string", "%fieldNumber": 2 },
        "total": { "%type": "double", "%fieldNumber": 3 },
        "items": { "%type": "LineItem", "%array": true, "%fieldNumber": 4 }
      }
    }
  }
}

Generates:

proto
syntax = "proto3";
package com.example.orders;

message Order {
  string id = 1;
  string customer = 2;
  double total = 3;
  repeated LineItem items = 4;
}

The %fieldNumber directive is Protobuf-specific — it has no meaning in JSON Schema or XSD.

Avro-Specific (avro)

DirectiveWhat it setsExample value
%logicalTypeAvro logical type"date"
%aliasesAlternative field namessee code example
%precisionDecimal precision (digits)10
%scaleDecimal scale (decimal places)2
utlx
%utlx 1.0
input auto
output avro
---
{
  "%namespace": "com.example.finance",
  "%types": {
    "Transaction": {
      "%kind": "object",
      "%fields": {
        "id": { "%type": "string", "%required": true },
        "amount": { "%type": "decimal", "%precision": 10, "%scale": 2 },
        "date": { "%type": "date", "%logicalType": "date" },
        "oldId": { "%type": "string", "%aliases": ["legacy_id"] }
      }
    }
  }
}

Generates Avro schema with logical types, decimal precision, and field aliases.

XSD-Specific (xsd)

DirectiveWhat it setsExample value
%elementFormDefaultXSD element form"qualified"
%attributeFormDefaultXSD attribute form"unqualified"
%choiceXSD choice group (xs:choice)true
%allXSD all group (xs:all)true
%xmlXML-specific optionswrapped: true
utlx
%utlx 1.0
input auto
output xsd
---
{
  "%namespace": "urn:example:invoicing:v1",
  "%elementFormDefault": "qualified",
  "%types": {
    "Invoice": {
      "%kind": "object",
      "%fields": {
        "id": { "%type": "string", "%required": true },
        "issueDate": { "%type": "date", "%required": true },
        "paymentMethod": {
          "%choice": true,
          "%fields": {
            "creditCard": { "%type": "CreditCardPayment" },
            "bankTransfer": { "%type": "BankTransferPayment" }
          }
        }
      }
    }
  }
}

The %choice directive generates an xs:choice group — unique to XSD.

Database/SQL-Specific (future — no DDL output format yet)

DirectiveWhat it setsExample value
%primaryKeyPrimary key fieldtrue
%autoIncrementAuto-incrementtrue
%foreignKeyForeign key reference"customers.id"
%indexCreate index on fieldtrue
%uniqueUnique constrainttrue
%sqlTypeOverride SQL type"VARCHAR(50)"

These directives describe database concerns. UTL-X has no output sql format — there is no DDL serializer. These directives are preserved as metadata when converting between schema formats, and could be consumed by external code generators or a future DDL output format. They can be included in any USDL schema for documentation purposes. Example:

utlx
%utlx 1.0
input auto
output json
---
{
  "%types": {
    "Customer": {
      "%kind": "object",
      "%fields": {
        "id": { "%type": "integer", "%primaryKey": true, "%autoIncrement": true },
        "name": { "%type": "string", "%maxLength": 100, "%required": true, "%index": true },
        "email": { "%type": "string", "%maxLength": 255, "%unique": true },
        "country": { "%type": "string", "%sqlType": "CHAR(2)" }
      }
    },
    "Order": {
      "%kind": "object",
      "%fields": {
        "id": { "%type": "integer", "%primaryKey": true, "%autoIncrement": true },
        "customerId": { "%type": "integer", "%foreignKey": "Customer.id", "%index": true },
        "total": { "%type": "decimal", "%sqlType": "DECIMAL(10,2)" }
      }
    }
  }
}

OData-Specific (osch)

DirectiveWhat it setsExample value
%entityTypeOData entity type markertrue
%navigationNavigation propertytrue
%targetNavigation target entity"Orders"
%cardinalityRelationship cardinality"many"

These generate OData EDMX/CSDL metadata documents. Example:

utlx
%utlx 1.0
input auto
output osch
---
{
  "%namespace": "com.example.crm",
  "%types": {
    "Customer": {
      "%kind": "object",
      "%entityType": true,
      "%fields": {
        "id": { "%type": "integer", "%required": true },
        "name": { "%type": "string" },
        "orders": { "%type": "Order", "%navigation": true, "%target": "Orders", "%cardinality": "many" }
      }
    }
  }
}

JSON Schema-Specific (jsch)

DirectiveWhat it setsExample value
%additionalPropertiesAllow extra propertiesfalse
%prefixItemsTuple validation (array positional types)see code examples
%contentEncodingBinary content encoding"base64"
%contentMediaTypeContent MIME type"application/json"
%discriminatorPolymorphism discriminator field"type"
%if / %then / %elseConditional schema applicationsee code examples
%dependentSchemasSchema dependencies between fieldssee code examples
%dependentRequiredRequired field dependenciessee code examples

These are JSON Schema 2020-12 keywords that have no equivalent in XSD, Avro, or Protobuf. When converting to other formats, they are preserved as documentation comments.

Table Schema-Specific (tsch — Frictionless Data)

DirectiveWhat it setsExample value
%missingValuesValues treated as null/missingsee code examples
%groupCharThousands separator for number parsing","
%decimalCharDecimal separator"."
%bareNumberAllow non-numeric characters in number fieldsfalse
%trueValuesStrings treated as boolean truesee code examples
%falseValuesStrings treated as boolean falsesee code examples

Table Schema directives describe CSV/tabular data conventions — how numbers are formatted, what values mean "missing," and how booleans are represented in text. These are unique to tabular data and have no equivalent in XML or JSON schema formats.

How Directive Levels Interact During Conversion

When converting USDL to a specific format, each level is handled differently:

LevelBehavior during conversion
CoreAlways converted — every format supports these
SharedConverted when supported, preserved as comment when not
Format-nativeConverted only for the matching format, ignored/commented for others
ReservedPassed through as metadata

Example: converting a schema with %fieldNumber: 3 and %pattern: "^[A-Z]+$":

  • To Protobuf: %fieldNumber becomes = 3 (native), %pattern becomes a comment

  • To JSON Schema: %fieldNumber ignored, %pattern becomes "pattern": "^[A-Z]+$" (native)

  • To XSD: %fieldNumber ignored, %pattern becomes xs:pattern (native)

  • To Avro: both become metadata properties

This is the power of the classification system: write ONE schema with ALL the information, and each output format takes what it understands.

Converting Between Formats via USDL

XSD to JSON Schema

utlx
%utlx 1.0
input xsd
output jsch
---
$input

Avro to Protobuf

utlx
%utlx 1.0
input avro
output proto
---
$input

Any Schema to Human-Readable YAML

utlx
%utlx 1.0
input xsd
output yaml
---
$input

Outputs the schema as clean, readable YAML with USDL directives — useful for documentation and understanding unfamiliar schemas.

USDL for Validation

USDL schemas can be referenced for transformation validation:

utlx
%utlx 1.0
input json {schema: "customer.usdl.json"}
output xml {schema: "invoice.usdl.yaml"}
---
// transformation body — validated against both schemas

The validation orchestrator converts the USDL to the format-appropriate validator.

USDL as Documentation

USDL output in YAML is human-readable by design — it serves as documentation without additional tooling:

utlx
%utlx 1.0
input xsd
output yaml
---
$input

Reading the YAML output, you see field names, types, constraints, and descriptions in one place — no need to decode XSD's verbose XML syntax or JSON Schema's nested structure.

USDL vs Standalone Schema Languages

FeatureUSDLJSON SchemaXSDAvro
Multi-format outputYes (all 6)JSON onlyXML onlyAvro only
Classification systemYes (Core/Shared/Format-native)NoNoNo
Format-specific featuresFormat-nativeJSON keywordsXML facetsAvro types
Human-readableVery (YAML)ModerateLow (XML)Moderate
Part of UTL-XYes (dialect)StandaloneStandaloneStandalone

USDL is not a replacement for JSON Schema or XSD — those have their own ecosystems. USDL is the bridge between them: write schema information once using classified directives, generate whatever format you need. The Rosetta Stone for schemas — and it lives inside %utlx 1.0.

Released under AGPL-3.0.