Your research deserves better than copy-paste.

Extract structured knowledge from scientific documents with confidence scoring, human verification, and full provenance.

What researchers deal with

Valuable data, stuck in documents

Tables, measurements, chemical properties, experimental results, all locked inside PDFs and papers. Getting them out means hours of manual work.

Extraction you can't trust

Copy-pasting into spreadsheets loses context. Which paper was that number from? Who verified it? When did it change? Good luck tracing that.

Collaboration that falls apart

Your lab has five people extracting the same kinds of data in five different ways. No shared schemas, no shared datasets, no single source of truth.

Analysis disconnected from source

By the time data reaches your analysis tools, it's been through so many hands that reproducibility is a hope, not a guarantee.

From document to dataset

ontelya.com
Upload
Parse
Structure
Verify
Dataset
Drop a research paper to begin

Built for how researchers actually work

01

Paper-to-dataset pipeline

Turn a stack of papers into a structured dataset for your meta-analysis or systematic review. No more spreadsheet gymnastics.

02

Reusable schema library

Build extraction schemas for your research domain once. Apply them across every new paper instantly. Your workflow compounds.

03

Provenance for publication

Every number in your dataset links back to the exact paragraph, table, or figure in the source document. Reviewers can verify anything.

Trust isn't a feature. It's the architecture.

Your data stays yours

No permanent document storage by default. Configurable retention policies. Full deletion controls. We never use your documents to train models.

Every value has a history

Every extraction is versioned. Every verification is attributed. Every change is logged. You can reproduce any result at any point in time.

Know who did what, when

Full audit trail: who extracted, who verified, who approved, what changed and why. From individual work to lab-wide governance.

Humans remain the authority

AI-generated suggestions never auto-promote to verified knowledge. Every value requires human judgment before it enters a dataset.

Where we're headed

No dates, because we ship when it's right. But we're accountable for every phase listed here, and we'll share our progress openly.

  • Core extraction pipeline, documents to structured data
  • Schema-based and open extraction modes
  • Confidence scoring and source provenance
  • Split-pane verification interface
  • Individual workspaces and datasets

Be among the first to use Ontelya

We're building this for researchers who care about doing extraction right. Early supporters get priority access and a founding-member discount.