← Back to home

Founder Updates

Building in public. Here's what's happening with Ontelya.

Why I'm Building Ontelya

The honest story behind Ontelya, what we're building, why it matters, and where we're headed.

If you've ever spent an afternoon pulling numbers out of research papers into a spreadsheet, you know the pain. You're reading a PDF, copying a value, pasting it, going back to check which table it came from, wondering if you got the units right, and repeating this dozens of times. By the end, you have a spreadsheet full of data and no easy way to tell where any of it actually came from.

That's the problem I kept running into. And the more I looked around, the more I realized it wasn't just me. Labs everywhere are doing this, manually extracting data from papers, each person using their own system, nobody quite sure whose numbers are the latest version.

So I'm building Ontelya.

It's a tool that takes research documents (papers, reports, theses) and turns them into structured, organized data. Not by replacing the researcher, but by doing the tedious parts so you can focus on what actually matters: understanding the science.

Here's what makes it different from just another AI tool:

  • Every value traces back to its source. Click any number and you can see exactly where it came from in the original document. No more "where did this data point come from?"
  • You stay in control. The AI suggests, but nothing becomes part of your dataset until you review and approve it. Your expertise is the final word.
  • It gets better the more you use it. Build schemas for your research area once, and apply them across every new paper. Your second extraction is faster than your first.

I'm not going to pretend this is easy to build. It's a hard problem with a lot of moving parts: document parsing, intelligent extraction, confidence scoring, version control for scientific data. But it's a problem worth solving, and I'd rather build something that does it right than ship something half-baked.

Where we are right now: We've started building the foundation, the core extraction pipeline, the verification interface, and the data model that makes provenance possible. It's early days, but the architecture is solid and the pieces are coming together.

What's next: Our MVP is on the way. The first version will focus on the basics: upload a paper, extract structured data, verify it against the source, and export a clean dataset. Simple, but done properly.

If this sounds like something that would help your work, I'd love for you to join the waitlist. Early supporters get priority access, and your feedback will directly shape what we build.

More updates soon. We're building this in the open, and I'll share progress as we hit milestones.

Yours truly, 0D