ResEthiq / Product
v1.4.2 23M Parameter Architecture

The world's most comprehensive
dataset integrity engine.

95 forensic fingerprints across 15 categories. 21 statistical tests. Cryptographic sealing via Merkle proofs and Ed25519 signing. No academic paper, no commercial product, no open-source library has implemented all 95. This is genuinely new ground.

95
Forensic
fingerprints
15
Forensic
categories
21
Statistical
tests built
23M
Parameter
architecture
Previous world record: 12 fingerprints (our own earlier plan). ResEthiq v1: 95.
No academic paper, no commercial product, no open-source library has implemented all 95. This is genuinely new ground.
Read the research →
Architecture

How it works end to end.

Dataset Input
Parquet · CSV · Arrow
any format
Trust Kernel
Freeze · canonicalize
rk · Rust
23M Engine
95 fingerprints
23M params
Policy Engine
21 stat tests
configurable
Verdict
APPROVED / REJECTED
binary · CI/CD
Ed25519 Sign
Merkle root locked
spo.cbor
Audit Bundle
PDF + CBOR + ZIP
regulator-ready
Forensic Engine

95 fingerprints.
15 categories.

Every dataset that passes through ResEthiq is examined across all 15 forensic dimensions simultaneously. From frequency-domain artifacts to human fabrication fingerprints to cryptographic integrity — nothing escapes the engine.

Phase A: 40 built Phase B: 25 in progress Phase C: 30 planned
Full fingerprint registry
Statistical Engine

21 tests.
Already built.

Five Python modules covering the full statistical spectrum — from Benford's Law analysis to Bayesian synthesis with FDR correction. Every test is deterministic, reproducible, and feeds the policy engine.

statistical-engine · v1.4.2
# ResEthiq Statistical Engine Initializing 5 test modules... digits.py Benford x2, terminal digit, pair freq distributions.py KS, AD, SW, PSI, JS divergence structural.py CUSUM, Chow, Runs, Durbin-Watson correlations.py VIF, Pearson anomaly, MI, partial corr extreme.py Grubbs, GESD, IQR 3x, Isolation Forest Running Bayesian synthesis... bayes.py FDR correction · credible intervals 21 tests complete · 0 failures · 6 warnings VERDICT: APPROVED Bayesian confidence: 0.9847 FDR-corrected p-value: 0.0021 # Signing with Ed25519... merkle root a3f7c29d14b8e6f2... ed25519 sig applied spo_v3.cbor sealed
Cryptographic Foundation

Four layers. Zero ambiguity.

Roadmap

Shipping in three phases.

Integration

Fits into any pipeline. No GPU. No cloud dependency.

One instance per customer — isolated, air-gap capable. Two-layer token system. Runs on standard enterprise servers. Machine-readable exit codes for CI/CD gates.

Python SDK
pip install resethiq · drop-in to any ML pipeline
⌨️
CLI — rk and resethiq-verify
shell-native · exit codes 0/1 · CI/CD ready
REST API
JWT token auth · air-gap deployable · self-hosted
Verifier CLI · anyone, anywhere
public key only · no ResEthiq account · fully offline
CI/CD gate · GitHub Actions
# .github/workflows/data-integrity.yml - name: ResEthiq integrity check run: | pip install resethiq rk freeze \ --input data/train.parquet \ --spec policy/prod_v3.yaml \ --sign --verify # Exit 0 = APPROVED → pipeline continues # Exit 1 = REJECTED → pipeline blocked 95 fingerprints PASS 21 stat tests PASS merkle root a3f7c29d... ed25519 signed exit 0 · APPROVED · spo_v3.cbor
Documentation

Everything you need to integrate ResEthiq.

From a single CLI command to full enterprise pipeline integration — the Trust Kernel is designed to be operational in hours, not weeks.

Quick Start
Step 1 — Install
pip install resethiq
# or: cargo install rk
Step 2 — Freeze your dataset
rk freeze \
  --input dataset.parquet \
  --spec policy.yaml \
  --sign
Step 3 — Verify (by anyone)
resethiq-verify \
  --bundle spo_v3.cbor \
  --pubkey public.pem
# exit 0 = verified · no account needed
Policy Specification (YAML)
# policy_v3.yaml — example
version: "3.0"
domain: "healthcare"

fingerprints:
  frequency: all  # F01-F05
  information: all  # I01-I07
  generative: all  # V01-V07
  human_fab: all  # H01-H07

rules:
  - id: no_synthetic_data
    test: "V01.score < 0.05"
    severity: REJECT
  - id: benford_law
    test: "F01.chi2_p > 0.01"
    severity: REJECT
  - id: max_missing_rate
    test: "M01.mcar_p > 0.05"
    severity: WARN

output:
  sign: true
  format: "cbor"
  audit_bundle: true
CLI Reference
rk — Trust Kernel
freeze, examine, sign, verify, bundle. Full flag reference with examples.
View CLI docs →
Python SDK
resethiq Python Client
Drop-in integration for pandas, PyArrow, and any ML pipeline. Full API reference.
View SDK docs →
Policy Engine
YAML Policy Specification
How to write, compose, and version policies. Domain templates for healthcare, finance, legal.
View policy docs →
SPO Format
Signed Policy Object (CBOR)
Full CBOR schema, field definitions, verification algorithm. How to decode and present to regulators.
View SPO spec →
Research Foundations

The science behind the 95 fingerprints.

Each forensic fingerprint is grounded in peer-reviewed mathematics. ResEthiq synthesises five decades of signal processing, information theory, topological data analysis, and statistical process control into a single unified integrity engine.

Frequency Domain Analysis

Detecting synthetic data through spectral artifacts

GANs and VAEs leave measurable fingerprints in the frequency domain — checkerboard artifacts, periodic generator signatures, and 1/f noise deviations. ResEthiq's Fourier, Wavelet, and Hilbert-Huang transforms detect these at sub-column granularity.

Ref: Durall et al. 2019 · Dzanic et al. 2020
Topological Data Analysis

Shape of data in high dimensions reveals manipulation

Persistent homology captures topological features — connected components, loops, voids — that survive across scale. Real datasets exhibit characteristic Betti number signatures. Synthetic or manipulated data deforms these signatures in detectable ways.

Ref: Edelsbrunner et al. 2002 · Carlsson 2009
Information Theory

Entropy signatures separate real from fabricated data

Shannon entropy, Kolmogorov complexity, and transfer entropy form a three-layer information-theoretic screen. Real-world data has characteristic complexity gradients. Human fabrication produces too-uniform entropy. Synthetic generation produces too-regular complexity ratios.

Ref: Shannon 1948 · Kolmogorov 1968 · Schreiber 2000
Statistical Process Control

Industrial-grade anomaly detection for data pipelines

The full Western Electric and Nelson rule batteries — originally developed for manufacturing quality control — are repurposed for dataset integrity. All 8 Western Electric rules and 10 Nelson patterns run against every numeric column. Hotelling T² and CUSUM extend detection to multivariate shifts.

Ref: Western Electric 1956 · Nelson 1984 · Montgomery 2009
Human Fabrication Detection

How humans generate data — and how to catch them

Anchor bias, fatigue patterns, copy-increment sequences, and symmetric distribution preferences are cognitive fingerprints of manual data entry. ResEthiq's Category 11 detects all seven known human fabrication patterns simultaneously, providing a composite fabrication score.

Ref: Tversky & Kahneman 1974 · Diekmann 2007
Bayesian Synthesis

95 signals unified into one defensible verdict

Individual fingerprints are synthesised via a Bayesian evidence accumulation framework with Benjamini-Hochberg FDR correction. The result is a single posterior probability with calibrated credible intervals — not a heuristic score, but a statistically rigorous, court-defensible statement of dataset integrity.

Ref: Benjamini & Hochberg 1995 · Gelman et al. 2013
Full research paper available on request.
95-fingerprint methodology · Bayesian synthesis framework · Benchmark comparisons against prior art.
View research →

See the 95 fingerprints in action on your data.

We run the full forensic engine on your actual datasets and produce a court-defensible audit bundle.

Request pilot → Read docs