Data Infrastructure

ARFlow

v1.0.0GA

Batch ETL orchestration

Scheduled data movement between PostgreSQL and Oracle. Full, incremental and computed loads. CRON-driven runs.

Problem

What this product solves.

Enterprises still run scheduled cross-database loads — Oracle into PostgreSQL, PostgreSQL into PostgreSQL, snapshots into warehouses — and the production tooling for that work is typically a graveyard of shell scripts, undocumented cron entries and one engineer who remembers why a column is type-cast. ARFlow gives those workloads a control plane: declarative flows, full / incremental / computed modes, batch writers tuned for COPY, run history, and the same identity model as the rest of the platform.

Core capabilities

What the product does.

cap_01

PostgreSQL and Oracle connectors

JDBC sources and targets out of the box. Source query and target table mapping declared in the flow; ARFlow handles the transport.

cap_02

Full / incremental / computed modes

Full reloads a whole table. Incremental advances by a watermark column. Computed runs an expression-driven transformation before writing.

cap_03

CRON scheduler with run history

Schedules attached to flows. Every run records start, end, rows moved, error if any. Manual one-off runs share the same history surface.

cap_04

COPY-optimised batch writer

PostgreSQL target writer uses COPY for throughput; Oracle target writer uses batched inserts. Batch size is configurable per flow.

cap_05

Asynchronous run execution

ExecutorService-backed run engine. Multiple flows execute concurrently; per-flow concurrency is bounded so a slow target does not block the rest.

cap_06

JWT identity end-to-end

Same identity flow as ARCloud, ARStudio and ARStreams. Operators see flows for the tenants they own; admins see everything.

cap_07

Run log surface

REST + audit log. Every flow modification, schedule change and run trigger lands with operator, timestamp and target.

Operator surface

What operators actually see.

Captured from a live evaluation environment. Same UI customers run; nothing reproduced from a brochure.

ARFlow — flow list — shot_01
Flow registry — CRON schedule, last-run outcome, rows-moved counter and group taxonomy (functional / multi-step / performance regression). Same surface ops uses to triage failing pipelines.

ARFlow — run monitor — shot_02
Run monitor — every triggered or scheduled run with duration, rows and final state.

ARFlow — connections — shot_03
Source and target databases — JDBC connections managed in the platform, not in scripts.

Architecture

How it is built.

ARFlow is a Spring Boot service backed by PostgreSQL for state, with a React control surface. The data plane is a runtime that reads from a JDBC source, decodes rows according to the flow definition, and writes them to a JDBC target using a strategy chosen per target. There is no broker between source and target — the platform owns the entire batch lifecycle in-process.

arch_01

Stateless run engine

Each run is a fresh execution under ExecutorService. State for resumability lives in the metadata database, not in process memory.

arch_02

PostgreSQL-resident state

Flow definitions, connection metadata, schedule, run history, audit — all in the ARFlow metadata database. Flyway-versioned schema (V1–V5 today).

arch_03

Source extractor abstraction

Source query is parameterised by mode (full / incremental / computed). Watermarks advance only after the target acknowledges the batch.

arch_04

Pluggable target writers

PgWriter, OracleWriter, BatchWriter. Adding a target type means implementing the writer interface and registering it.

arch_05

Run history persistence

Every run row in the metadata database — start, end, rows in / out, exit status, error trace. Surfaced through REST and the React console.

arch_06

JWT-secured REST surface

CRUD on flows, connections, schedules, runs. Same identity tokens ARCloud and ARStudio issue.

REST API

Driven by a real REST surface.

Every product action available in the UI is reachable through a JWT-secured REST API. The control plane is the API; the UI is one of its consumers.

api_01JWT

List flows

Returns every flow with its schedule, last run state and rows moved.

GET/api/v1/flows

Response

[
  {
    "id": "flow_oracle_billing_nightly",
    "mode": "incremental",
    "schedule": "0 2 * * *",
    "last_run": { "state": "ok", "rows": 184232, "ended_at": "2026-05-17T02:04:11Z" }
  }
]

api_02JWT

Trigger a flow run

Starts an ad-hoc run; same code path as the scheduled one. Returns the run id for polling.

POST/api/v1/flows/{id}/runs

Response

{
  "run_id": "run_018f3c2a",
  "state": "running",
  "started_at": "2026-05-17T08:30:02Z"
}

api_03JWT

Read run history for a flow

Paginated history with rows moved, duration, errors. Same data the UI lists.

GET/api/v1/flows/{id}/runs?limit=50

Operator CLI

Operated from the terminal too.

The `arctl` CLI talks to the same control plane as the UI. Same primitives, scriptable, suitable for CI and on-call.

cli_01arctl

Trigger a flow run

Block until completion; non-zero exit on failure — wire into CI.

$ arctl flow run flow_oracle_billing_nightly --wait

Output

RUN  run_018f3c2a
STATE   running ...
STATE   ok
ROWS    184232  duration=4m12s

cli_02arctl

List flow runs with status

Recent runs across the platform — same surface as the run-history page.

$ arctl flow runs --since=24h

cli_03arctl

Inspect a single run

Full run trace: per-step rows, durations, errors.

$ arctl flow run-status run_018f3c2a

Integrations

What it connects to.

integration

PostgreSQL (source + target)

JDBC. Target writer uses PostgreSQL COPY for high-throughput inserts. Source query is arbitrary — no schema introspection required.

integration

Oracle (source + target)

JDBC. Oracle target writer uses batched inserts. Watermark-based incremental runs work on any indexable column.

integration

ARCloud

PostgreSQL clusters provisioned by ARCloud are registered as ARFlow connections automatically once a tenant grants visibility.

integration

ARStudio

Source and target clusters watched by ARStudio surface their ARFlow throughput in real time. One identity gates both.

integration

ARStreams

Streaming and batch coexist. ARStreams owns continuous CDC; ARFlow owns scheduled reconciliation and one-off loads. They share metadata schema family.

integration

CRON-driven schedulers

Standard cron expressions. Time zones per flow. Manual override available at any time.

Use cases

Where it runs in production.

case_01
Oracle → PostgreSQL nightly loads
Reference data, slowly-changing dimensions, regulatory snapshots — moved on a CRON schedule with full audit. The same engine runs the one-off backfill.
case_02
Cross-cluster PostgreSQL reconciliation
Pull from a source-of-truth Postgres into satellite databases that drive operational reporting. Incremental mode keeps lag bounded; full mode handles cold start.
case_03
Computed loads for compliance reports
Computed mode runs expression transformations before writing — useful for regulator-shaped outputs that diverge from operational schemas.
case_04
Streaming companion for reconciliation
ARStreams handles continuous CDC; ARFlow re-validates totals nightly. Two systems, one identity, one audit surface.
case_05
Migration playbook
During a multi-month migration ARFlow runs the bridge loads. Run history doubles as evidence of correctness for the cutover review.

Deployment

How it is operated.

ARFlow ships as a Spring Boot JAR with a React control surface and a small PostgreSQL for state. It needs outbound JDBC to source and target databases — nothing else. Runs on bare-metal, Proxmox VE, or container.

Single JAR, systemd unit, optional nginx reverse-proxy on top.
Outbound JDBC reachability to each source and target — read on source, write on target.
No broker, no scheduler service besides ARFlow itself.
Air-gap-ready: no outbound calls beyond your perimeter; no SaaS dependency.
Concurrent flows scale on a single instance; horizontal scale is on the roadmap.

Core capabilities

Evaluate this product.

Open the workspace if you already hold credentials, or request guided access through the briefing flow.

Request Access Read all products Read the docs

ARFlow

What this product solves.

What the product does.

PostgreSQL and Oracle connectors

Full / incremental / computed modes

CRON scheduler with run history

COPY-optimised batch writer

Asynchronous run execution

JWT identity end-to-end

Run log surface

What operators actually see.

How it is built.

Stateless run engine

PostgreSQL-resident state

Source extractor abstraction

Pluggable target writers

Run history persistence

JWT-secured REST surface

Driven by a real REST surface.

List flows

Trigger a flow run

Read run history for a flow

Operated from the terminal too.

Trigger a flow run

List flow runs with status

Inspect a single run

What it connects to.

PostgreSQL (source + target)

Oracle (source + target)

ARCloud

ARStudio

ARStreams

CRON-driven schedulers

Where it runs in production.

Oracle → PostgreSQL nightly loads

Cross-cluster PostgreSQL reconciliation

Computed loads for compliance reports

Streaming companion for reconciliation

Migration playbook

How it is operated.

Evaluate this product.