Projects · Noah Ullman

Digital Health

Digital health has been
let down by hype.

Massively over-indexed on novelty, under-indexed on rigor. My goal is to bring drug-discovery-level diligence to building care systems that actually work.

🏥

Actually Health

Work in progress

Actually Health is a continuous, closed-loop care system for patients with complex chronic conditions, built around a proprietary clinical decision engine I've spent the last year architecting from scratch. The core thesis: longitudinal data compounds in a way that episodic care structurally cannot. Every new signal makes prior signals more meaningful. The accumulated picture drives better decisions over time. Most digital health companies optimize for the first encounter. Actually Health is built around what happens after the hundredth.

The technical foundation is a structured architecture that normalizes patient state, runs clinical reasoning, surfaces ranked actions for clinician review, and captures intent in a way that feeds back into the system. Every decision is attributed. Every outcome loops back. The system gets more accurate as it scales, which breaks the linear hiring constraint that has historically capped how much care a clinical team can deliver.

⌚

The Future of Wearables

Work in progress

I've always been fascinated by how physiological signals are distributed across the body in ways most people never think about. Your heart isn't just beating in your chest -- its electrical activity propagates everywhere, all the time, waiting to be measured. The question I keep coming back to is: what if the hardware to do that measuring is already in everyone's pocket? This project is about extracting clinically meaningful biomarkers from the sensors that a majority of the population already carries in their daily lives. I can't get into specifics because I'm pursuing patent protection, but the broad strokes involve pushing consumer hardware well past its intended operating envelope, a Raspberry Pi acting as a ground truth reference, and a learned transfer function that bridges the gap between the two. The signal processing rabbit hole goes deep. The thing that drives me here isn't building a product, at least not yet. It's answering a more fundamental question: how much clinical signal is already being captured by devices we own, and simply thrown away by software that was never designed to look for it? I think the answer is going to surprise people.

Machine Learning & Software

NLP before it was cool.👴

My first scripts summarized books via TF-IDF so I could dump them to .txt and listen on my commute — take that Blinkist. It escalated from there.

📚

NLP BookSum

Built a local book summarization tool where I can upload EPUBs, parse them into chapter-level summaries, and pass specific points of interest to focus the output — anecdotes, analogies, conceptual frameworks, whatever is actually useful for understanding rather than just recalling. Running Llama 3 7B locally meant no API costs and no context window anxiety on long texts. The more interesting feature was talk-to-the-book: a RAG pipeline that lets you query salient sections of a book against something specific to your situation. Asking Andrew Chen's Cold Start Problem about a startup I was actually building produced advice that was noticeably higher resolution than base GPT — because the retrieval is grounding the generation in the author's actual reasoning rather than a lossy compression of it. To validate this properly, built an automated Q&A evaluation pipeline that scored the local RAG setup against the ChatGPT API on a set of book-derived questions. Local won. Not a surprise, but satisfying to have the number.

🦾

Heuristic Tagging at Scale

Training a useful model requires labeled data. Getting labeled data required either a lot of time or a lot of money, usually both. At least it did back in 2021 when I built this.

We were inspired by Snorkel's core insight: instead of hand-labeling thousands of records, write rules that label them for you. But we kept pulling on one thread. What if instead of writing the rules, you let the model figure out what the rules should be?

The result is a small tool we built for clinical NER. You seed it with terms you trust: a handful of drug names, a few diagnoses, some lab values. It labels a large corpus automatically using that dictionary, trains a fast model on those labels, then runs the model back over the same text asking a single question: where are you uncertain? Wherever the model thinks it's looking at something clinically meaningful but the dictionary is silent, that's a gap. The system surfaces those gaps ranked by confidence and frequency. You review, add what makes sense, and run again.

The kind of thing it's designed to catch: a model trained on "metformin" and "lisinopril" generalizing to flag "SGLT2 inhibitor" or "empagliflozin" as likely drug mentions before either term ever entered the dictionary. A model that learned "A1C" pointing you toward "glycated hemoglobin" and "HbA1c" on its own.

🐱‍💻

Domain-Adapted Language Models

Patients describe their health in ways that clinical corpora never trained on. "My chest feels tight when I walk upstairs" doesn't map cleanly to anything in PubMed.

We built a 10GB corpus of colloquial patient language, domain-adapted PubMedBioBERT on it, and hit state-of-the-art on a named entity recognition task using real-world patient-generated text. The labeling pipeline that made it possible was an early version of the tagging library above.

The model comparison tells you something real about where domain adaptation actually earns its keep. RoBERTa out of the box matched BioBERT on general tasks, which makes sense. Patient language is conversational and relatively plain English. But on NER for symptoms, diseases, and drugs, the domain-adapted model pulled ahead. Drug names are morphologically alien ("empagliflozin," "liraglutide"). Symptom descriptions blend colloquial and clinical vocabulary in ways general corpora don't cover. Disease mentions often appear as fragments or abbreviations that only resolve with medical context.

Domain adaptation is mostly neutral until you hit entity types with specialized surface forms, and then it matters a lot.

🌊

Surf Swell Forecasting

Not NLP but this was the first coding project I ever did and thusly holds a unique place in my heart. I scraped years of NOAA buoy data and wave height records to find the actual best surf windows along the New England coast. The existing forecasting apps treat swell period, height, and wind direction as equally weighted when they're not, especially in a region where the geography creates weird refraction effects. Built a personal model that weights the variables by what I know about specific breaks. "Just go in winter" turns out to be technically correct but operationally useless advice.

Life Sciences

I love biology.

I got my start co-founding Abyssinia Bio out of undergrad. Since then I've had the pleasure to work with everyone from Mayo Clinic to many of the top pharma companies on everything from trial design to drug repurposing.

🧬

Abyssinia Biologics

Co-founded out of undergrad to pursue pre-clinical diagnostic and therapeutic development for a portfolio of anti-amyloid monoclonal antibodies. The work started in a literal attic laboratory, driven by a core question: whether therapeutics could target amyloid aggregation earlier in the disease cascade than existing approaches. In collaboration with Dennis Selkoe's lab at Brigham and Women's Hospital, we characterized our lead antibodies across multiple media and benchmarked them against competing clinical-stage assets. The results demonstrated exceptional specificity and sensitivity to toxic Aβ conformations, distinguishing pathological oligomeric and protofibrillar species from inert monomeric amyloid with high fidelity. That selectivity profile was the foundation of the therapeutic thesis: intervening on the conformations that actually drive neurotoxicity, rather than bulk amyloid clearance.

⚗️

Drug Repurposing via Hypergraph

Built a hypergraph combining discrete drug features (mechanism, target class, indication history) with semantic embeddings from biomedical literature to identify repurposing candidates for rare Mendelian diseases. Nodes live in continuous embedding space while hyperedges encode higher-order feature overlap, capturing transitive multi-way relationships that pairwise similarity and standard knowledge graphs miss.

Scored candidates using three independent signals: embedding proximity, spectral coordinates from the normalized hypergraph Laplacian, and random-walk diffusion across hyperedge structure. Validated against known drug-disease pairs (indicated drugs ranked in the top 2.2nd percentile vs 50.5th for non-indicated, 23x enrichment). The system recovered lenalidomide as a top repurposing candidate for beta-thalassemia from first principles, a hypothesis with active clinical investigation based on CRBN-mediated fetal hemoglobin induction.

Data sourced from OpenTargets GraphQL API (200 drugs, 104 diseases, 2,106 hyperedges across 8 edge types). Embeddings via PubMedBERT sentence transformer.

📊

Biomarker Imputation from Sparse EHR Data

Longitudinal biomarker data in EHRs is almost always incomplete. Patients miss labs, monitoring cadences vary, and the gaps get worse for exactly the populations where continuity matters most. Using a cohort of ~9 million Mayo Clinic EHR patients, this project built imputation models for common cardiometabolic and CKD labs (creatinine, albumin, and others) by training on patients with high-resolution lab histories as ground truth. The feature set combined structured EHR signals: diagnosis codes, medication histories, vitals, procedure records, demographics, and temporal context like time since last encounter and lab ordering frequency. Predicting exact continuous values proved too noisy, but predicting clinically relevant ranges (normal, elevated, critically abnormal) worked well enough to reconstruct approximate biomarker trajectories across time. That made it possible to flag patients drifting toward high-risk ranges even when recent labs were missing.

Digital health has beenlet down by hype.