When someone asks how The Ruff detects pain, the answer they usually get is: “It uses AI.” That’s true but unhelpful — like saying a car “uses combustion.” The interesting part is how that AI works, what data it learns from, and why it outperforms human observation in specific, measurable ways.
This post is the technical explanation we’ve been promising since launch.
The Core Problem: Pain Is Multimodal #
Pain doesn’t have a single signature. There’s no biomarker — no hormone or enzyme — that unambiguously indicates “this dog is experiencing a 6/10 pain level right now.” Pain is expressed through behavior, physiology, and movement, all simultaneously, all in patterns that interact with each other in complex ways.
That complexity is actually what makes AI well-suited to the problem. Humans are remarkably good at detecting emotional states through faces — we’ve had millions of years of evolutionary pressure to do so. But we’re not optimized to simultaneously process gait kinematics, heart rate variability patterns, respiratory waveforms, thermal gradients, and vocalizations — all in real time, all with 300 samples per second of resolution.
Machines are.
The Training Data #
The Ruff’s pain detection model was trained on a proprietary dataset assembled over 26 months in partnership with 11 veterinary hospitals and 3 academic veterinary programs. The dataset contains:
- 24,316 hours of continuous multi-sensor recordings
- 8,847 clinical pain assessments performed by board-certified veterinary professionals, cross-referenced with concurrent sensor data
- 412 distinct dogs, spanning 38 breeds, ages 8 months to 14 years, body weights 4.2–68.1 kg
- Pain contexts including: post-surgical recovery (31%), orthopedic conditions (28%), acute injury (19%), chronic osteoarthritis (16%), dental/oral (6%)
Every 15-minute window in the dataset was labeled with a validated Glasgow Composite Measure Pain Scale score by a blinded assessor who reviewed video alongside clinical notes. The sensor data and the clinical label were never seen together during labeling — a critical step to prevent confirmation bias in the annotations.
The Model Architecture #
We use a multi-input temporal convolutional network (TCN) with attention pooling — not a transformer, not a simple LSTM. Here’s why:
Why TCN? Pain events have characteristic temporal structure at multiple timescales simultaneously. A gait asymmetry might unfold over 800ms. A pain-induced HRV suppression might develop over 4 minutes. A behavioral posture shift might stabilize over 2 hours. TCNs with dilated convolutions handle this multi-scale temporal structure more efficiently than recurrent architectures, and they parallelize well for on-device inference.
Why not a transformer? Attention mechanisms in transformers are powerful but expensive. Running a standard transformer architecture at our sampling rate (300Hz across 9 sensors) on a microcontroller with a 3.2MHz clock budget isn’t feasible without severe quantization that degrades model quality. The TCN achieves comparable accuracy at 8x lower inference cost.
Attention pooling aggregates the temporal representations across the input window, allowing the model to weight moments of high diagnostic significance — a yelp, a loading asymmetry, a HRV dip — more heavily than baseline periods.
The final model has 847K parameters, runs at 4-bit quantization on the device, and performs a pain score update in 340ms end-to-end.
What the Model Actually Sees #
Here are the six input streams, and what the model has learned to extract from each:
1. Bilateral Accelerometry #
The dual IMU configuration enables computation of Symmetry Index (SI) — a normalized measure of how differently the dog is loading its left vs. right limbs during the swing and stance phases of each step. Healthy dogs have an SI below 8%. Dogs with confirmed lameness typically show SI values between 15–40%. The model also extracts step frequency variability (pain disrupts the metronomic regularity of gait) and vertical force proxy (estimated from impact peaks in the z-axis).
2. Heart Rate Variability #
Not just heart rate — HRV. Specifically, RMSSD (root mean square of successive RR interval differences) and LF/HF ratio (the balance between sympathetic and parasympathetic nervous system activity). Pain activates the sympathetic nervous system, suppressing parasympathetic tone and collapsing RMSSD — a relationship well-documented in veterinary pain research. In our validation cohort, RMSSD was the single most predictive individual signal, with an AUC of 0.81 for detecting pain ≥4.
3. Respiratory Waveform #
The strain gauge captures breathing as a continuous waveform. Pain-associated respiratory patterns include: increased rate (tachypnea), reduced tidal volume (shallow breathing to minimize thoracic movement), irregular rhythm (especially in abdominal pain), and paradoxical breathing (abdomen and chest moving in opposite directions, a red flag for respiratory distress). The model extracts these features via spectral decomposition of the waveform.
4. Temperature Gradient #
Two temperature sensors — one near the skin surface, one measuring ambient — compute a corrected skin temperature reading. Acute inflammation elevates local skin temperature in ways that follow a characteristic time curve. The model tracks the derivative of temperature change, not just absolute temperature, because onset rate is diagnostically more meaningful than absolute value.
5. Vocalization Classification #
The microphone feeds a secondary classification model (a small CNN trained on 18,000+ labeled audio clips) that identifies vocalization events: whimper, yelp, groan, panting, normal bark, and ambient noise. Each classification and its temporal position feeds into the primary pain model as a binary event feature. Vocalization alone has low sensitivity — many dogs in significant pain never vocalize. But when combined with other signals, its presence provides strong positive predictive value.
6. Inactivity Pattern Analysis #
Distinguishing restorative sleep from pain-induced stillness is one of the subtler challenges. The model uses a combination of HRV patterns (deep sleep is associated with high HRV and parasympathetic dominance), respiratory rate (slow and regular in sleep), and micro-movement patterns (sleep involves periodic repositioning; pain-induced stillness involves sustained rigidity with occasional guarding) to classify inactivity periods.
Validation Results #
Across the held-out test set (20% of the dataset, stratified by breed, age, and pain context):
| Metric | Value |
|---|---|
| Overall accuracy (vs. clinical score ±1) | 87.3% |
| Sensitivity (detecting pain ≥4) | 89.1% |
| Specificity (correctly identifying no pain) | 84.7% |
| AUC (ROC) | 0.934 |
| Mean absolute error | 0.91 score points |
For context: published studies of owner-reported pain detection accuracy in moderate-pain dogs range from 38% to 55%.
The Honest Limitations #
No model is perfect. Ours has known limitations we’re actively working to address:
- Breed bias: the training dataset over-represents Labrador Retrievers (18% of dogs) and Golden Retrievers (12%). Underrepresented breeds include brachycephalic breeds (Bulldogs, Pugs — 2.4% combined), sighthounds (Greyhounds, Whippets — 1.9%), toy breeds (Chihuahuas, Pomeranians, Yorkies — 3.1%), and giant breeds (Great Danes, Mastiffs — 2.6%). Pain expression varies significantly across these groups — brachycephalic breeds have altered respiratory baselines that complicate respiratory pain signals, while sighthounds have atypical gait kinematics. Breed-specific fine-tuning is in progress for v2.1.
- Senior dog baseline drift: dogs with advanced osteoarthritis establish a new chronic pain baseline. Our model can detect changes from that baseline reliably, but absolute scores may be compressed relative to clinical assessment.
- Environmental confounds: extreme cold reduces peripheral circulation in ways that can mimic some pain signatures. We apply an ambient temperature correction, but it’s imperfect below -10°C.
- Novel pain locations: visceral pain (gastrointestinal, cardiac) presents differently than musculoskeletal pain and was underrepresented in our training set. Our sensitivity for visceral pain is lower (~72% for pain ≥6).
We publish these numbers because trust requires transparency. The Ruff is not a diagnostic tool — it’s a monitoring and alerting tool. It tells you when to seek veterinary care, not what’s causing the pain. The diagnosis belongs to your veterinarian.
Our full model architecture and validation methodology are described in a preprint under review at the Journal of Veterinary Internal Medicine. We’ll share the link when it’s published.