Your research deserves better than copy-paste.
Extract structured knowledge from scientific documents with confidence scoring, human verification, and full provenance.
What researchers deal with
Valuable data, stuck in documents
Tables, measurements, chemical properties, experimental results, all locked inside PDFs and papers. Getting them out means hours of manual work.
Extraction you can't trust
Copy-pasting into spreadsheets loses context. Which paper was that number from? Who verified it? When did it change? Good luck tracing that.
Collaboration that falls apart
Your lab has five people extracting the same kinds of data in five different ways. No shared schemas, no shared datasets, no single source of truth.
Analysis disconnected from source
By the time data reaches your analysis tools, it's been through so many hands that reproducibility is a hope, not a guarantee.
From document to dataset
Built for how researchers actually work
Paper-to-dataset pipeline
Turn a stack of papers into a structured dataset for your meta-analysis or systematic review. No more spreadsheet gymnastics.
Reusable schema library
Build extraction schemas for your research domain once. Apply them across every new paper instantly. Your workflow compounds.
Provenance for publication
Every number in your dataset links back to the exact paragraph, table, or figure in the source document. Reviewers can verify anything.
Trust isn't a feature. It's the architecture.
Your data stays yours
No permanent document storage by default. Configurable retention policies. Full deletion controls. We never use your documents to train models.
Every value has a history
Every extraction is versioned. Every verification is attributed. Every change is logged. You can reproduce any result at any point in time.
Know who did what, when
Full audit trail: who extracted, who verified, who approved, what changed and why. From individual work to lab-wide governance.
Humans remain the authority
AI-generated suggestions never auto-promote to verified knowledge. Every value requires human judgment before it enters a dataset.
Where we're headed
No dates, because we ship when it's right. But we're accountable for every phase listed here, and we'll share our progress openly.
- Core extraction pipeline, documents to structured data
- Schema-based and open extraction modes
- Confidence scoring and source provenance
- Split-pane verification interface
- Individual workspaces and datasets
Be among the first to use Ontelya
We're building this for researchers who care about doing extraction right. Early supporters get priority access and a founding-member discount.