01 — Numerical Weather Prediction Foundation

The Governing Equations

The atmosphere obeys the Navier-Stokes equations on a rotating sphere. These are not approximations — they are the exact laws of fluid dynamics and thermodynamics that define how weather evolves. Solving them numerically on a discrete grid is what modern NWP is.

Momentum Equation (Primitive Form — Rotating Frame)

∂v/∂t + (v·∇)v + 2Ω×v = -∇p/ρ + g + F

Left side: local acceleration + nonlinear advection + Coriolis force (2Ω×v, where Ω = 7.292×10⁻⁵ rad/s is Earth's rotation rate). Right side: pressure gradient force (-∇p/ρ) + gravity (g = 9.81 m/s²) + friction/turbulence parameterization (F). The Coriolis term is not an approximation — it is the exact consequence of Newton's second law in a rotating reference frame. It is responsible for cyclone rotation (counterclockwise in NH, clockwise in SH) and the geostrophic balance that governs large-scale flow: fv = (1/ρ)∂p/∂x.

Thermodynamic Energy Equation (First Law of Thermodynamics)

∂T/∂t + v·∇T - (RT / c_p p)(∂p/∂t + v·∇p) = Q / c_p

Governs temperature evolution under advection and adiabatic compression/expansion. Q = diabatic heating rate (W/kg): solar shortwave absorption, longwave emission, latent heat release from condensation (L_v = 2.501 × 10⁶ J/kg), surface sensible heat flux. c_p = 1004 J/(kg·K) for dry air. The term (RT/c_p p) ≈ Γ_d/g links pressure changes to temperature — adiabatic compression warms air as it descends, the mechanism for Foehn winds and upper-level jet dynamics. Dry adiabatic lapse rate: Γ_d = g/c_p ≈ 9.8 K/km. Saturated: Γ_s ≈ 5–7 K/km (latent heat release reduces cooling rate).

Continuity Equation (Mass Conservation)

∂ρ/∂t + ∇·(ρv) = 0

Air mass is conserved. In pressure coordinates (used by most operational NWP for their favorable conservation properties), this becomes the diagnostic relation: ∇_p·v + ∂ω/∂p = 0, where ω = dp/dt is the vertical velocity in pressure coordinates (units: Pa/s). Upward motion is ω < 0. This diagnostic form allows ω to be derived from the horizontal divergence field — no prognostic equation needed for ω in hydrostatic models.

Moisture Continuity (Prognostic Water Vapor)

∂q/∂t + v·∇q = E - C

q = specific humidity (kg water / kg moist air). E = evaporation source terms (surface latent heat flux). C = condensation sink terms (microphysics parameterization). Condensation begins when q exceeds saturation specific humidity: q_s(T,p) = 0.622 e_s(T) / [p - e_s(T)] where e_s(T) from Clausius-Clapeyron: e_s = 6.112 exp(17.67T / (T + 243.5)) hPa. Modern models carry additional prognostic variables: cloud liquid water q_c, cloud ice q_i, rain q_r, snow q_s — full bulk microphysics.

Equation of State (Ideal Gas Law with Moisture Correction)

p = ρ R_d T_v where T_v = T(1 + 0.608q) R_d = 287.05 J/(kg·K)

T_v = virtual temperature. Moist air is less dense than dry air at the same T and p — water vapor (molecular mass 18 g/mol) is lighter than dry air (28.97 g/mol). This closes the system: 6 prognostic equations (u, v, w or ω, T, q, ρ or p) with 6 unknowns — a fully determined initial-value problem given boundary conditions (surface fluxes, top-of-atmosphere radiation).

Numerical Discretization — The CFL Stability Criterion

C = c × Δt / Δx ≤ 1 (Courant-Friedrichs-Lewy condition)

These PDEs have no closed-form analytical solution for real atmospheric states. Models discretize onto a 3D grid and integrate forward in time. GFS uses spectral transforms (T1534, ~13km effective resolution) with semi-Lagrangian advection and semi-implicit time stepping — allowing Δt larger than explicit CFL limits. ECMWF IFS uses a cubic octahedral reduced Gaussian grid (O1280 = 9km) with 4D-Var data assimilation over a 12-hour assimilation window. Grid staggering (Arakawa C-grid: u on east face, v on north face, scalars at cell center) preserves energy and enstrophy conservation properties critical for long integrations.

02 — Operational NWP Model Specifications

Four Models. One Calibrated Output.

NexiAtmos ingests GRIB2 output from four major operational NWP systems. No single model dominates across all variables, seasons, and regions — ensemble fusion exploits their complementary strengths.

NOAA / NCEP

GFS — Global Forecast System

United States National Weather Service · NCEP

Horizontal Resolution: 0.25° (~28 km at equator) spectral, T1534 reduced Gaussian (~13km) through first 240h
Vertical Levels: 127 hybrid-sigma levels, surface to 0.2 hPa (~60 km) — high-density near surface
Forecast Range: 384 hours (16 days) — 4 cycles/day at 00Z, 06Z, 12Z, 18Z UTC
Physics Schemes: RRTMG (Rapid Radiative Transfer Model for GCMs) · Simplified Arakawa-Schubert (SAS) convection · Noah-MP Land Surface Model · GFDL microphysics · MYNN PBL (planned)
Data Assimilation: GSI hybrid 3D-EnVar (4D-EnVar for higher cycles) — blends 4D-Var background with 80-member GDAS EnKF ensemble covariance
Data Access: NOAA NOMADS — freely available GRIB2. ~500 GB/day. No license fee.

4D-Var hybrid EnKF/Var DA — assimilates 3M+ observations per 6h cycle

ECMWF

IFS — Integrated Forecasting System

European Centre for Medium-Range Weather Forecasts · Reading, UK

HRES Resolution: 0.1° (~9 km) deterministic — cubic octahedral O1280 Gaussian grid, 4D-Var 12h window
Vertical Levels: 137 hybrid-eta levels, surface to 0.01 hPa (~80 km stratosphere) — resolves stratospheric warming events
ENS Ensemble: 51 members at 0.2° (O320, ~18km), 360-hour range. Initial perturbations: singular vectors + EDA (ensemble of 4D-Var). Stochastic physics: SPPT + SKEB.
Physics Schemes: ecRad radiation (McRad heritage) · Tiedtke-Bechtold mass-flux convection · HTESSEL land surface (carbon cycle) · ECTRANS spectral transform library
Data Access: ECMWF Open Data — HRES + ENS available free for research and commercial use via CDS API

Gold standard — leads WMO 500 hPa anomaly correlation for all forecast ranges

NOAA

NAM — North American Mesoscale

NOAA National Centers for Environmental Prediction

Resolution: 12 km CONUS — WRF-NMM dynamical core with 3 km CONUS nest (explicit convection, no cumulus parameterization)
Vertical Levels: 60 hybrid levels — terrain-following eta coordinates near surface, isobaric above 700 hPa
Forecast Range: 84 hours, 4 cycles/day. Hourly output for first 36h, 3-hourly thereafter.
Physics Schemes: Ferrier-Aligo double-moment microphysics · MYNN 2.5-order PBL · Betts-Miller-Janjic (BMJ) convection (12km only) · NOAH land surface
Advantage: Mesoscale resolution captures sea-breeze circulations, orographic precipitation, MCS organization, urban heat island

3 km explicit convection — resolves convective-scale updraft dynamics

NOAA / ESRL

HRRR — High-Resolution Rapid Refresh

NOAA Earth System Research Laboratories · Boulder, CO

Resolution: 3 km horizontal, 50 vertical levels — WRF-ARW (Advanced Research WRF) dynamical core
Update Cycle: Hourly updates — 18-hour forecasts, 3-km CONUS domain. 48h runs at 00Z and 12Z.
Radar Assimilation: 3D-Var with Doppler radial velocity + reflectivity from NEXRAD WSR-88D network — assimilated every 15 minutes within each cycle
Physics Schemes: Thompson aerosol-aware double-moment microphysics · MYNN 3rd-order PBL · RRTMG radiation · CLM4 (Community Land Model) land surface
Optimal Window: 0–18 hour convective nowcasting: thunderstorm initiation, flash flood risk, supercell tracking, severe wind timing

Radar DA every 15 min — best-in-class 0–6h convective forecast skill

03 — Data Assimilation

Ensemble Kalman Filter (80 Members)

Before a forecast can run, the model state must be initialized from observations. EnKF estimates the forecast error covariance from ensemble spread — solving the problem of what to trust: the model or the observations.

Analysis Equation (Bayes Optimal Estimate)

x_a = x_f + K(y - Hx_f)

x_a = analysis state vector (best estimate given observations).
x_f = forecast/background state vector (N-dimensional, N = ~10⁷ for global models).
y = observation vector (M-dimensional, M = ~10⁶ per 6h window).
H = forward observation operator mapping model state to observation space.
(y - Hx_f) = innovation vector: by how much do observations differ from model?

Kalman Gain Matrix

K = P_f H^T (H P_f H^T + R)^(-1)

P_f = N×N forecast error covariance — too large to store explicitly (10¹⁴ elements). Estimated from 80-member ensemble: P_f ≈ (1/(m-1)) X'X'^T where X' = ensemble perturbation matrix.
R = M×M observation error covariance (instrument noise + representativeness error).
K governs the trust allocation: when spread (P_f) is small → trust model. When R is small (precise observations) → trust observations. Covariance localization (Gaspari-Cohn taper, radius ~1000km) prevents spurious long-range correlations from finite ensemble size (sampling error).

Ensemble Inflation (Covariance Maintenance)

P_f^inflated = (1 + δ) P_f where δ ≈ 0.05–0.15

Finite ensemble underestimates spread due to sampling error — inflation prevents filter divergence (ensemble collapse). Adaptive inflation (Anderson 2007) estimates δ locally from observation-space diagnostics: if innovations > expected spread, inflate; if smaller, deflate.

Standard Assimilation Pressure Levels (hPa)

1000 975 950 925 900 850 800 750 700 650 600 550 500 450 400 350 300 250 200 150 100 70 50 30 20 10 + terrain-following σ levels (surface to 850 hPa)

Blue = PBL/boundary layer · Cyan = lower troposphere · Green = mid-troposphere · Gold = upper-trop/tropopause · Pink = stratosphere

Observation Sources (80-Member EnKF Cycle)

Radiosondes (TEMP) — ~1,500 global upper-air stations, 00Z/12Z daily balloon soundings: T, q, u, v on mandatory/significant levels
Surface observations (SYNOP/METAR) — 10,000+ hourly reports: MSLP, T_2m, Td_2m, wind_10m, precipitation
Aircraft (AMDAR/ACARS) — automated cruise-level T, wind from commercial fleet: ~700,000 obs/day
AMSU-A microwave radiances — 15-channel brightness temperatures, tropospheric temperature weighting functions
Infrared sounders (IASI, CrIS, AIRS) — hyperspectral radiances, 616–1305 channels, bias-corrected via Variational Bias Correction (VarBC)
GPS Radio Occultation (COSMIC-2, Sentinel-6) — bending angle profiles, all-weather, self-calibrating refractivity
Scatterometers (ASCAT) — 25 km ocean surface wind vectors from C-band radar backscatter
NEXRAD Doppler radar (HRRR only) — WSR-88D radial velocity + reflectivity, 15-min assimilation cycle

04 — Ensemble Fusion

Bayesian Model Averaging (BMA)

No single NWP model dominates across all variables, regions, and lead times. BMA constructs a calibrated probabilistic forecast by estimating model weights from recent skill and blending predictive distributions — producing reliable probability estimates that outperform any individual model.

p(y | f₁, f₂, ..., f_K) = Σᵢ₌₁ᴷ wᵢ × g(y | fᵢ, σᵢ²) ← BMA predictive PDF

↑ mixture of K model-conditional distributions

Weight Estimation (EM Algorithm)

Weights wᵢ constrained: Σwᵢ = 1, wᵢ ≥ 0. Estimated via Expectation-Maximization on a rolling 30-day training window against METAR/SYNOP verifying observations.

Weights vary by: geographic region (GFS outperforms ECMWF on short-range CONUS precipitation; ECMWF dominates day 7-14 geopotential), season, variable, and lead time. A 2D weight field (lat/lon) is maintained per variable per lead time.

Kernel g(·) by Variable

2m Temperature, geopotential: Normal N(fᵢ, σᵢ²) — bias-corrected mean
Precipitation: zero-inflated gamma Γ(α, β) — mass at zero + continuous part
Wind direction: von Mises distribution VM(μᵢ, κᵢ) — circular statistics
Wind speed, dewpoint depression: truncated normal (≥ 0 constraint)
Visibility, ceiling: log-normal (heavy right-tail for IFR events)

Verification and Calibration Scores

CRPS — Continuous Ranked Probability Score

CRPS = ∫₋∞^∞ [F(y) - 1{y≤x}]² dy

F(y) = predictive CDF. Measures both calibration and sharpness simultaneously. Strictly proper: minimized only by the true forecast distribution. Collapses to MAE for deterministic forecasts. CRPS is the primary optimization target for BMA weight estimation in NexiAtmos.

Brier Skill Score (BSS)

BSS = 1 - BS / BS_clim

BS = mean(p̂ - o)² over N events. BS_clim = climatological base rate forecast. BSS = 0: no skill vs. climatology. BSS = 1: perfect. Negative: worse than saying "use the historical average." Applied per threshold (e.g., P(rain ≥ 1mm), P(T < 0°C), P(wind ≥ 10 m/s)).

Reliability Diagram + Platt Scaling

E[obs | p̂ = p] = p ← calibrated

Perfect calibration: the observed frequency of events equals the forecast probability at each bin. Overconfident: points fall below the 45° line. Platt scaling — logistic regression on ensemble spread — forces the calibration curve to the diagonal. Recalibration is applied per variable, per lead time bin.

Spread-Skill Relationship (Ensemble Reliability Criterion)

E[σ²_ens] ≈ RMSE² ↔ E[(x̄ - x_truth)²] ≈ E[(1/m) Σ(xᵢ - x̄)²]

A well-calibrated ensemble has spread equal to RMS error over many forecast cases. Operational models are typically 10-20% underdispersive (overconfident) due to model error not captured in initial-condition perturbations alone. NexiAtmos applies variance inflation (per-variable, per-level) estimated from 90-day verification climatology to restore spread-skill alignment before BMA weighting.

05 — Predictability Theory

The Lorenz Problem — Why 45+ Days Is Achievable

Understanding the predictability horizon — and the scientifically rigorous case for sub-seasonal probabilistic forecasting.

Deterministic Chaos and the Predictability Horizon

Edward Lorenz (1963) proved that the atmosphere is a deterministic chaotic system. Two states separated by an arbitrarily small initial error diverge exponentially. This is not a computational limitation — it is a mathematical property of the governing equations.

Error Growth — Lyapunov Exponent

δ(t) ≈ δ(0) · e^(λt) where λ ≈ 0.9 day⁻¹ for the troposphere

An initial analysis error δ(0) grows by factor e^(0.9t) per day. Doubling time: t_d = ln(2)/λ ≈ 0.77 days (~18.5 hours). After 14 days: error grows by e^(12.6) ≈ 300,000×. The initial error has saturated at climatological amplitude — individual trajectory prediction has no skill. The practical predictability limit for synoptic-scale deterministic forecasts is ~10-14 days (depending on flow regime — highly predictable blocking patterns can extend this; chaotic trough amplification can shorten it).

Days 0–14: Deterministic Regime

Individual trajectory prediction has real skill (ACCs > 0.6 for 500 hPa geopotential)
Point forecasts: "It will rain Tuesday, 14mm expected, 60% confidence"
ECMWF HRES ACC_500 drops below 0.6 at ~9 days; GFS at ~8 days
Ensemble mean outperforms deterministic beyond day 4 (error cancellation)
Better DA and resolution push the boundary but cannot transcend it

Days 14–46+: Statistical/Sub-Seasonal Regime

Individual trajectory prediction: no deterministic skill
But statistical properties (regime frequencies) remain predictable
MJO (Madden-Julian Oscillation) phase prediction: measurable skill to 20-30 days
MJO modulates precipitation globally via teleconnections (PNA pattern, AAO)
ECMWF sub-seasonal ENS: BSS > 0 for large-scale T anomalies at 46 days
NexiAtmos: probability distributions ("above-normal precipitation week 3: 68%") — not point forecasts

NexiAtmos beyond day 14 provides calibrated probability distributions over climate states — not predictions of what day it will rain. This distinction is scientifically rigorous and is exactly the operational approach of ECMWF Extended Range, CPC (Climate Prediction Center), and WMO Lead Centre for Long-Range Forecast Verification.

06 — Domain Translation Layer

8 Industry Verticals

Raw NWP probabilistic output means nothing to a farm manager or maritime captain. NexiAtmos translates calibrated model output into domain-specific variables, industry thresholds, and actionable alert language.

⚓

Maritime

Beaufort scale (0–12)
Significant wave height H_s
Swell period T_s & direction
Sea surface temperature (SST)
Fog probability (T - Td < 2°C)
Port operation safety window

🌾

Agriculture

GDD = max(0, T_mean - T_base)
Frost risk (T_2m < 0°C probability)
ET₀ (Penman-Monteith equation)
Soil moisture index (Noah-MP output)
Spray window (wind < 4 m/s, dry)
Harvest window probability (%)

✈️

Aviation

Crosswind V_xw = V_w · sin(Δθ)
Ceiling & visibility (IFR/MVFR/VFR)
Icing: SLD probability (supercooled)
Turbulence: EDR (m²/³ s⁻¹)
Convective SIGMET windows
LLWS alert (shear within 1000 ft AGL)

🚚

Logistics

Road surface T (ice formation risk)
Precipitation type probability (%)
Visibility for trucking regulations
Wind gust impact on high-sided vehicles
Snow accumulation rate (cm/h)
Route delay probability index

🧗

Outdoor Rec

Heat index (NWS Rothfusz eqn)
Wind chill (NOAA/MSC formula)
UV index (WMO scale 0–11+)
Trail wetness & condition index
Lightning probability (CAPE-based)
Outdoor comfort score (0–100)

⚡

Energy

Solar irradiance GHI (W/m²)
Wind power density P = ½ρv³
HDD/CDD (heating/cooling degree days)
Cloud cover index for PV systems
Curtailment risk windows
Capacity factor probability distribution

🏗️

Construction

Precipitation P(>0.1mm) probability
Wind speed at 10m, 30m, 50m AGL
Concrete curing temperature window
Crane wind limit alert (≥12 m/s)
Frost heave risk (T_soil < 0°C)
Workable days probability (30-day)

🎪

Event Planning

P(outdoor suitable weather) score
Severe weather risk index (0–5)
Temperature comfort band (18–26°C)
Wind gust percentile distribution
Lightning clearance window (30-30 rule)
Rain-free window duration (hours)

07 — System Architecture

Data Flow

From raw operational NWP GRIB2 ingest through calibrated probabilistic output and vertical-specific API endpoints.

DATA SOURCES (free / low-cost) ├── NOAA NOMADS → GFS (GRIB2, 0.25°, 384h, ~500 GB/day) ├── ECMWF Open Data → IFS (GRIB2, 0.1° HRES + 51-mbr ENS, 240–360h) ├── NOAA NCEP → NAM (GRIB2, 12km + 3km CONUS nest, 84h) ├── NOAA ESRL → HRRR (GRIB2, 3km, hourly cycles, 18h) └── Obs Networks → METAR + SYNOP + radiosonde (verification/DA) ↓ NEXIATMOS ENGINE (Python/Rust) ├── Ingest Layer │ ├── cfgrib / xarray-grib GRIB2 parser (lazy loading) │ ├── Temporal interpolation → common 1h grid, 0.25° resolution │ ├── Variable extraction: T, q, u, v, ω, Z, MSLP, CAPE, CIN, precip, LCL │ └── Quality control: gross error check, time consistency, spatial buddy check ├── Data Assimilation │ ├── 80-member EnKF (DART framework — NCAR Data Assimilation Research Testbed) │ ├── Observation operator H (linear interpolation + radiative transfer for radiances) │ ├── Covariance localization (Gaspari-Cohn, 800km horizontal / 0.5 lnp vertical) │ └── Adaptive inflation (Anderson 2007, per-variable, per-level) ├── Ensemble Fusion (BMA) │ ├── EM weight estimation on 30-day rolling METAR/SYNOP verification window │ ├── Per-variable kernels: Normal / zero-inflated Gamma / von Mises / log-Normal │ ├── Spatial weight fields (per variable, per lead time, per 2.5° box) │ └── CRPS optimization convergence check (EM tolerance: 10⁻⁶) ├── Confidence Calibration │ ├── Platt scaling: sigmoid(a×spread + b) → calibrated P per threshold │ ├── Isotonic regression for non-parametric calibration (rare events) │ ├── CRPS, BSS, reliability diagram monitoring (auto-recalibrate on drift) │ └── Output: 10th/25th/50th/75th/90th percentile bands per variable └── Vertical Persona Engine ├── 8 domain interpreters: threshold evaluation, variable derivation ├── Alert logic engine: configurable thresholds per customer └── NLP summary layer: "Crane wind limit likely exceeded Tuesday 14:00–18:00 UTC" ↓ OUTPUTS ├── REST API (JSON — hourly to day 7, 6-hourly to day 45+, percentile bands) ├── Dashboard (per-vertical Svelte UI — ensemble plumes, alert timelines) └── Webhooks (push alerts when threshold probability exceeds configurable level)

08 — Access Tiers

Pricing

Structured around API call volume, vertical access, and forecast range. Enterprise includes custom BMA weight tuning to your region and dedicated compute.

Free

Explorer

30-day trial

✓ 1 vertical of choice

✓ 7-day forecast range

✓ 50 API calls/month

✓ GFS + NAM fusion

✗ ECMWF IFS access

✗ Extended range (14d+)

✗ Calibrated percentiles

Pro — Most Popular

Professional

$29

per month

✓ 3 verticals

✓ 14-day full range

✓ 1,000 API calls/month

✓ All 4 NWP models (incl. ECMWF)

✓ BMA calibrated output

✓ 10th/50th/90th percentiles

✗ 45-day extended range

Business

$199

per month

✓ All 8 verticals

✓ 45-day extended range

✓ 20,000 API calls/month

✓ ECMWF ENS 51-member

✓ Full 5-percentile bands

✓ Webhook alert delivery

✓ CRPS/BSS score reports

Enterprise

$500+

custom pricing

✓ Custom BMA weight tuning

✓ Unlimited API calls

✓ Dedicated infrastructure

✓ 99.9% uptime SLA

✓ White-label dashboard

✓ Private obs assimilation

✓ On-prem deployment option

09 — Financial Model

Revenue Projections

Data costs are near-zero (NOAA NOMADS is free, ECMWF Open Data is free for commercial use). Compute is modest — BMA ensemble processing is CPU-bound, not GPU-dependent. Margins are high.

Year 1 — MVP + Soft Launch

$180K

150 Pro ($29/mo) + 5 Business ($199/mo) + 1 Enterprise ($500/mo). Verticals: agriculture and maritime. API-first, developer GTM. Target: AgTech integration partners.

Year 2 — Channel + API Platform

$890K

600 Pro + 30 Business + 8 Enterprise. Aviation and energy verticals added. White-label channel deal with 1 maritime SaaS platform (est. 200 sub-users).

Year 3 — Vertical Depth + White-Label

$2.4M

1,500 Pro + 100 Business + 25 Enterprise. 3 white-label deals: agtech, maritime, energy. International expansion: EU market (ECMWF native, strong pull).

Year 5 — Market Leadership

$8M+

Proprietary observation network ingestion (Tempest, Tomorrow.io partnership). Custom model post-processing per enterprise customer. South America and SE Asia coverage.

12-Month Development Budget

Line Item	Description	Monthly	Annual
Cloud Compute (AWS/GCP)	GRIB2 ingest, BMA processing, API serving, storage	$1,800	$21,600
ECMWF Commercial Tier	If usage exceeds free Open Data limits (high volume)	$400	$4,800
Engineering (2 FTE equiv.)	NWP pipeline, BMA engine, API, dashboard development	$12,000	$144,000
Meteorologist Advisory	Domain expert review (ex-NOAA/ECMWF consultant, 20h/mo)	$2,000	$24,000
GTM + Channel Development	AgTech/maritime partnership development, content	$3,000	$36,000
Total Year 1 Budget		$19,200/mo	$230,400

NEXIATMOS