EU REACH format, section ordering, required fields.
Deterministic. Versioned. Auditable. A rules engine produces the same artifact the same way every time.
Should never touch an LLM.
01 / A working hypothesis
You are building natural eutectic systems at industrial scale. The science is 30+ SKUs deep, patents are filed, Syensqo wrote the check, and the applications lab in Tulsa is coming online. The next constraint is not chemistry.
REACH dossiers. FDA ingredient filings. Customer-specific technical data sheets. Toxicology packages that need to be restated for every jurisdiction a customer sells into. Each new partner opens a documentation queue, and every document pulls a chemist or a founder out of the lab for a day.
"Nature designs chemistry through systems, not isolated molecules." The same applies to the paperwork around the molecule.
Teams reach for a large language model when a template plus a database would ship the answer faster, cheaper, and with guarantees a regulator will actually accept. The work is about deciding which part of the dossier belongs on which layer.
Deterministic. Versioned. Auditable. A rules engine produces the same artifact the same way every time.
Should never touch an LLM.
Lives in a database. Queried. Composed into the template. Traceable back to the batch it came from.
Should never touch an LLM.
The narrow band where a reviewed LLM output earns its place. Heavy human sign-off, tight context, receipts on every claim.
Should touch an LLM, with discipline.
Your AI screening platform is already doing the hard version of this inside the lab. The question is whether the same discipline applies to the regulatory and commercial motion outside the lab.
The methodology behind it is published. Interpretable Context Methodology (ICM) frames agent context as a layered filesystem: identity at the top, working artifacts at the bottom, with measurable interpretability and reproducibility gains between them. Submitted to ACM TiiS.
Regulatory dossiers are a layered-folder problem in disguise. Per-SKU, per-jurisdiction, per-customer, per-batch. The structure is already there. The work is making it legible to an agent that can assemble the output.
The paper
Free to read. MIT license on the reference implementation.
Bring one recent dossier or customer-RFI cycle that ate more time than it should have. We will walk back through it and tell you honestly whether this is a real orchestration problem or whether you should wait a year. If it is out of our scope, Eduba partners with NLP Logix for work that sits below the orchestration layer. NLP Logix has been in machine learning since 2011 and runs over 150 data scientists.
Book the 30 minutes with Matt