How I Would Use AI to Evaluate a De Novo Clinic Site Today

Healthcare M&AI is written by Shawn Rothlis. If someone forwarded this to you and you want the full workflow access, subscribe at the link.

When I was the Director of Corporate Development at an oncology platform, we were a value-based care play. Capitated contracts with health plans and risk-bearing entities. Revenue was driven by how many attributed members we could bring under management - which meant network coverage wasn't optional. We needed to fill geographic gaps or we'd be leaving capitation dollars on the table.

Acquisitions were the primary engine. But acquisitions are bounded by where oncologists happen to be willing to sell. They don't sell in every market you need. So we partnered with CBRE and their healthcare real estate advisory team to scout de novo sites that could fill the gaps.

What that process looked like: evaluating medical office buildings, mapping 20-minute drive-time radii against census tract data for 65+ populations, estimating renovation costs, checking zoning and CON requirements. Then LoopNet. Crexi. Calling CBRE reps. Scheduling tours. Waiting on callbacks.

From first market screen to a shortlist worth touring, that process took weeks.

The free AI tools available now can compress the front-end analytical work by 60-70% before you ever engage CBRE, Buxton SCOUT, or Trilliant Health. Not replace them - but handle the work that used to burn analyst time for weeks. Here's what that workflow looks like.

The Four-Layer Framework

The de novo evaluation process has four distinct analytical layers. AI has different levels of leverage at each one.

Layer 1: Market screening - which geographies are worth looking at.

Layer 2: Site-level evaluation - once you have a specific address, rapid pre-tour scoring against your criteria.

Layer 3: Financial modeling - building the pro forma before deeper diligence.

Layer 4: Deal execution - lease negotiation, CON navigation, physician recruitment. AI has very limited utility here. More on that below.

Layer 1: Market Screening with Free Public Data

The foundational question in oncology site selection is: is there enough of a patient pool here, with the right payer mix, that a de novo clinic can reach break-even within a reasonable ramp period?

Three free public datasets, all queryable via API, answer roughly 70% of that question:

Census ACS API - Age distribution, income levels, and insurance coverage type down to the census tract level. For oncology, the operative variable is 65+ population density combined with commercial insurance penetration. High age concentration without commercial coverage means a Medicare-heavy payer mix - which matters because the spread between commercial and Medicare reimbursement on chemotherapy drugs can swing site-level EBITDA by 30-40%. ACS carries a 12-18 month lag, but it's the best publicly available payer mix proxy for initial screening.

SEER API - The NCI's Surveillance, Epidemiology, and End Results program. County-level cancer incidence rates by site (breast, lung, colorectal, prostate), overlaid against ACS age data to estimate raw patient pool size. SEER's reporting lag is improving - previously 22+ months from diagnosis to publication, they're now targeting a 2-month submission cycle using ML-based extraction from pathology reports.

CMS Provider and Utilization Data - The Medicare Physician and Other Supplier Public Use File maps service delivery volume by specialty and geography - useful for identifying existing oncology practice concentration and white space. The Provider of Services (POS) file adds facility characteristics for all Medicare-certified providers. Trilliant Health also launched a free chatbot called Oria that lets you query hospital price transparency files in natural language - ask "What is Cigna's negotiated rate for infusion administration (CPT 96413) at hospitals in Nashville, TN?" and get an actual answer.

Claude or ChatGPT can write Python scripts to query all three APIs, export structured dataframes, and analyze them for you. The only cost is your LLM subscription.

Layer 2: Real Estate Monitoring

This is where the CoStar-LoopNet-Crexi dynamic requires some navigation.

LoopNet is the largest commercial real estate marketplace in the U.S. It has native email alert functionality that works well - you can configure saved searches for MOB listings by geography, size range, and listing type. However, CoStar's terms of service explicitly prohibit AI scraping agents from programmatically reading their platforms, including LoopNet. Their terms specifically state that "no Information is exposed to an environment susceptible to access or use, directly or indirectly by any third party, including without limitation open artificial intelligence tools." That's a hard wall. The workaround: configure LoopNet's native alerts, then forward the emails to Claude or a GPT workspace. The AI can analyze the listing data from the email - it just can't scrape the site directly.

Crexi is more permissive and, in my experience, useful for medical/healthcare property filters. It offers real-time activity alerts and exposes more listing metadata at the property record level - zoning district descriptions, parcel numbers, CBSA classification, demographics data (1/3/5 mile radius). Same workflow: use the native alerts to populate your deal pipeline, then paste listing details into Claude with a structured evaluation prompt.

Once you have a specific address, geospatial analysis is where AI adds significant leverage. Google Maps Platform's Routes API - the successor to the legacy Distance Matrix API - runs $5 per 1,000 requests with 10,000 free requests per month. For a de novo project evaluating 50-100 candidate sites with 10/20/30-minute drive-time radius analysis, you're looking at under $100 in API costs total. Claude can write the Python code to run the analysis and generate isochrone maps - geographic polygons showing the catchment area reachable within a given drive time.

For bulk screening across hundreds of locations before you narrow to a shortlist, OSRM (Open Source Routing Machine) is free and surprisingly accurate for routes under 50 minutes. A Harvard study found it could process 32.8 million origin-destination pairs in under 6 minutes. That's plenty of horsepower for market-level screening.

Layer 3: Financial Modeling

The JLL 2026 benchmark for all-in MOB fit-out cost is $412 per square foot - covering hard construction (~$226/SF) plus soft costs, design fees, AV/IT, contingency, and FF&E. That's the warm white box baseline for standard outpatient. In my experience with oncology specifically, you need to adjust upward:

Moderate acuity (expanded imaging, specialized procedure rooms): roughly +10% to ~$453/SF
High acuity (infusion suites, radiation shielding, imaging with heavy structural requirements): add another ~20% over moderate, putting you at $544/SF or higher

Radiation vaulting is in a different category entirely - $1M+ per treatment room for concrete shielding alone, plus $2-5M for the linear accelerator. Most community-based de novo oncology sites stay in medical oncology (infusion/chemotherapy) and treat radiation as a Phase 2 expansion or a partnership arrangement, precisely because of those capital requirements.

AI can take those inputs - square footage, acuity level, payer mix estimate, CPT code reimbursement rates from CMS fee schedules, FTE ramp schedule - and build a structured pro forma with break-even analysis, sensitivity tables, and 5-year cash flow projections. The modeling is only as good as your input assumptions, and payer mix is the most consequential variable in oncology. But AI handles the modeling mechanics well once you feed it clean inputs.

The Paid Platforms Still Matter

What the free workflow cannot replicate:

Buxton SCOUT has 600+ proprietary datasets including psychographic segmentation across 128M+ U.S. households, covered lives by payer type at the zip code level, and automated site scoring models for 24 healthcare service lines. When you need to know not just who lives within 20 minutes of a site but what proportion of those households carry commercial insurance vs. Medicaid, and how that compares to your existing site performance benchmarks - that's Buxton. Enterprise pricing that runs six figures annually.

Trilliant Health addresses something important: traditional healthcare demand forecasting assumed perpetual growth based on demographic trends and national forecasts. Their platform uses actual claims data to avoid that bias. AdventHealth used it to evaluate net-new ambulatory site selection. Enterprise pricing as well.

Definitive Healthcare gives you KOL mapping, referral pattern data, and physician affiliation intelligence - things no public dataset can approximate. Critical for referral network analysis once you've identified a market.

CBRE Dimension is what we used. It's not a standalone software product you can purchase - it's deployed in conjunction with CBRE's advisory team. It incorporates their broker network, off-market deal flow, historical lease comps, and claims-based demand forecasting. The value is the combination of the platform with the relationship capital.

The free AI workflow is most useful as pre-work before engaging those platforms and advisors - so you arrive with a pre-ranked shortlist and sharper questions rather than starting from scratch.

What AI Cannot Do

From what I've seen, there are four areas where no amount of prompt engineering substitutes for human judgment and local relationships:

Physician recruitment - AI can identify that a market has oncology supply gaps. It cannot tell you whether qualified oncologists are recruitable into that market, whether the major platform consolidators (US Oncology, American Oncology Network, OneOncology) have already locked up the available talent pool, or how a physician's family situation affects their willingness to relocate. Physician search firms and your existing professional network are the only reliable intelligence sources here.

Referral network dynamics - AI can map the geographic concentration of primary care providers in a target market. It cannot assess contractual exclusivity arrangements between PCPs and competing oncology platforms, informal referral loyalties built over years of relationships, or whether a health system's employed physician network will refer outside their system. In oncology specifically, the referral pathway is the business.

CON navigation - 35 states plus DC still have active Certificate of Need laws. AI can tell you which states have CON and which service types trigger review. It cannot navigate the process. A recent North Carolina case involving WakeMed's radiation oncology CON application illustrates this well - approved in 2023, challenged by Duke and UNC Health, overturned by an administrative law judge in February 2025, then settled. Three years, hundreds of thousands in legal fees, delayed patient access. CON navigation requires specialized healthcare regulatory attorneys and state-specific political relationships.

Actual lease negotiation - Tenant improvement allowances directly offset the JLL $412/SF benchmark, and they are highly negotiated. The range depends on landlord credit quality, your covenant strength as a tenant, local market vacancy, and broker leverage. CBRE's advisors bring transaction history and comparable TI data that no public AI tool can access.

That's the framework. The free tools can do the heavy lifting on market screening and site-level quantitative analysis. The proprietary platforms and expert advisors earn their fees on the intelligence that isn't publicly available.

The prompts, API workflows, and step-by-step process to actually run this - including copy-paste prompts for each stage - are in the subscriber section below.

The Full AI Workflow: Step-by-Step with Copy-Paste Prompts

Stage 1: Market Screening (Before You Look at a Single Listing)

Tools needed: Claude or ChatGPT (for API scripting and analysis), Perplexity AI (for real-time market intelligence)

Step 1.1 - Demographic and cancer incidence pull

Ask Claude to write a Python script querying the Census ACS API and SEER API for your target geography. You don't need to know Python. Give Claude this prompt:

❝

"Write a Python script that does the following:

Queries the Census ACS 5-year estimates API (api.census.gov/data) for all zip codes in [target MSA or county FIPS code]. Pull tables DP03 (economic characteristics) and DP05 (demographic characteristics) and B27001 (health insurance coverage by age). Export the results as a CSV.
Queries the SEER API (api.seer.cancer.gov) for [State FIPS] to pull county-level cancer incidence rates for breast, colorectal, lung, and prostate cancer. Export as a CSV.
Joins the two datasets on geography, calculates a weighted oncology demand score for each zip code using: (65+ population share) x (cancer incidence rate relative to national average) x (commercial insurance penetration rate), and outputs a ranked table from highest to lowest score.

Use the Census API key [YOUR_KEY] and the SEER API requires no authentication. Flag any zip codes where data confidence is low due to small population size."

Census API keys are free - register at api.census.gov. The SEER API requires no key.

Step 1.2 - Competitive supply mapping

Download the CMS Provider of Services (POS) file from data.cms.gov. Filter for specialty codes covering hematology/oncology and radiation oncology in your target county or MSA. Upload the filtered CSV to Claude:

❝

"Using the attached CMS POS data filtered to [County/MSA], identify: (1) all oncology and hematology practices by address, (2) their Medicare utilization volume where available from the Physician PUF, (3) which appear to be affiliated with major health systems vs. independent. Map the provider density against the zip code demand scores from the prior analysis and flag zip codes with high demand scores and below-average provider density as priority targets."

Step 1.3 - Real-time market intelligence check

For anything the static datasets miss - new competitor openings, health system real estate announcements, recent consolidation activity - use Perplexity. Direct API access isn't required; just query it manually with targeted questions:

"Which oncology practices have opened, expanded, or been acquired in [target market] in the past 18 months?"
"What are the announced health system real estate plans in [county or city]?"
"What is the Medicaid managed care penetration rate in [state], and which MA plans are dominant in [MSA]?"
"Does [state] have active CON laws for outpatient oncology or infusion services, and what is the current application timeline?"

Perplexity returns inline citations you can verify, which matters when you're presenting a market analysis to a board or investment committee.

Stage 2: Site-Level Evaluation (You Have a Specific Address)

Use case: A LoopNet or Crexi alert has flagged an 8,500 SF medical office building in a target market. Pre-tour scoring before spending time on a site visit.

Step 2.1 - Drive-time catchment analysis

If you're running a handful of sites, use Google Maps API (ask Claude to write the script). If you're evaluating dozens of locations in a bulk screening phase, use OSRM (free, and Claude can write that script too).

Prompt for Claude (Google Maps API version):

❝

"I'm evaluating a medical office building at [full address] for a de novo oncology clinic. Using the Google Maps Routes API with my key [YOUR_KEY], write a Python script that:

Geocodes the candidate site address
Pulls the 10 largest population-weighted zip code centroids within a 25-mile radius using Census population data
Calculates drive times from the candidate site to each zip code centroid
Reports: what percentage of the target county's total 65+ population (from ACS DP05) lives within a 10-minute drive, 20-minute drive, and 30-minute drive of the site

Output a summary table and flag whether this site meets the threshold of 40%+ of target population within 20 minutes."

Step 2.2 - Full site evaluation scorecard

After pulling listing details from the Crexi or LoopNet email alert, paste them into Claude with this prompt:

❝

"I'm evaluating a medical office building listing for an oncology clinic conversion. Analyze the following property:

Address: [address]
Square footage: [SF]
Asking rent or price: [rate]
Year built: [year]
Current/prior tenant: [if known]
Parking ratio: [spaces per 1,000 SF]
Listing notes: [paste from email alert]

Evaluate across four dimensions and produce a Green/Yellow/Red scorecard:

Renovation cost estimate: At JLL's 2026 benchmark of $412/SF warm white box baseline, with a +20% premium for oncology infusion suite requirements (structural, plumbing, power), what is the estimated total hard + soft renovation cost for this space? If the building pre-dates 1990, flag Yellow for potential asbestos/environmental risk and HVAC capacity constraints.
Competitive context: Based on the CMS POS data for [County] (I'll provide a filtered list), identify the oncology providers within a 10-mile radius and flag any that appear health system-affiliated.
Zoning and regulatory flags: For [State/County], note whether this address is likely to require CON review for outpatient oncology. Flag if [State] has active CON laws. Note standard permitting timelines for medical office in this jurisdiction if known.
Structural concerns relevant to oncology: Based on year built and SF, flag any likely structural limitations for high-draw medical equipment (MRI: ~14,000 lbs floor load requirement; linear accelerator vault: 5-6 ft concrete shielding requirement). Red flag if the building's mechanical capacity is unlikely to support infusion suite electrical loads.

Add a summary recommendation: worth scheduling a tour, or pass."

Stage 3: Ranking a Shortlist of Markets

Use case: You've identified 10-15 MSAs as potential targets for next-year de novo expansion. Rank them before engaging CBRE or Buxton.

Step A - Perplexity pass on each market (run in parallel, one query per market):

❝

"For [City/MSA], provide: (1) estimated annual new cancer cases (SEER data or state cancer registry), (2) commercial insurance market penetration rate, (3) major oncology groups and health systems present and their approximate market share, (4) any announced oncology platform expansions or acquisitions in the past 18 months, (5) CON law status for outpatient oncology in [State]. Cite your sources."

Step B - Claude scoring matrix (after compiling Step A outputs):

❝

"I have market intelligence on [N] candidate MSAs for oncology clinic de novo expansion. Using the attached data, build a ranked scoring matrix using these weighted criteria:

Cancer incidence and addressable patient pool size: 25%
Commercial insurance penetration (higher = better for a commercial-first platform): 20%
Competitive supply gap (fewer specialized oncology practices per capita = higher score): 20%
Population growth trajectory (5-year ACS trend): 15%
CON law friction (no CON or limited CON scope for outpatient oncology = higher score): 10%
Real estate market conditions (MOB vacancy, estimated asking rents per Crexi/news sources): 10%

Output: A ranked list with a score on each dimension (0-10), a weighted composite score, and a one-sentence rationale for each market's ranking. Flag any markets where your data confidence is low and flag what additional data would improve confidence."

Stage 4: AI-Assisted Pro Forma Build

Once you have a site shortlisted and renovation cost estimates, use Claude to build a working pro forma. Give it this structure:

❝

"Build a 36-month de novo clinic pro forma in a structured table format for an oncology clinic with the following inputs:

Service mix: medical oncology (infusion + E&M)
Square footage: [SF] at [estimated rent $/SF NNN]
Renovation cost: [$ total, per JLL $412/SF + acuity adjustment]
Tenant improvement allowance assumption: [$/SF from landlord if known, otherwise use $50/SF as conservative baseline]
Starting FTEs: [1 oncologist, 2 RNs, 1 MA, 1 patient access coordinator, 0.5 billing]
Patient volume ramp: assume months 1-6 at 20% capacity, months 7-12 at 50%, months 13-24 at 80%, month 25+ at 100%
Steady-state active patient count: [target number]
Payer mix assumption: [% commercial / % Medicare / % Medicaid]
Revenue per patient visit assumptions: use CMS 2026 Medicare fee schedule as baseline, apply commercial multiplier of [1.2x-1.4x] for commercial payer
Include: monthly revenue, direct clinical expenses (labor, drugs/supplies), occupancy, G&A, EBITDA
Include: break-even month, cumulative cash required through break-even, 3-year IRR assuming a [8x] EBITDA exit multiple

Build in a sensitivity table showing break-even month and 3-year IRR under three payer mix scenarios: [your base case], [10 points more Medicaid], [10 points more commercial]."

The model will ask clarifying questions or flag missing inputs. That's actually useful - it forces you to pressure-test your assumptions before the spreadsheet looks polished.

Tool Cost Summary

Tool	Cost	Primary Use
Census ACS API	Free	Demographic screening, payer mix proxy
SEER API	Free	Cancer incidence market sizing
CMS data.cms.gov	Free	Provider density, utilization benchmarking
Trilliant Health Oria	Free	Hospital price transparency queries
Google Maps Routes API	$5/1K requests (10K free/month)	Drive-time catchment analysis
OSRM	Free (open source)	Bulk drive-time screening
Perplexity AI	Free / $20/month Pro	Real-time market intelligence
Claude / ChatGPT	$20-200/month	Scripting, analysis, modeling, prompts
LoopNet alerts	Free (native alerts only)	MOB listing monitoring
Crexi alerts	Free (native alerts only)	MOB listing monitoring
Buxton SCOUT	Enterprise (~$100K+/year)	Full-service analytics with proprietary psychographics
Trilliant Health Site Selection	Enterprise	Multi-market growth opportunity analysis
Definitive Healthcare	Enterprise (~$50K+/year)	Oncology KOL mapping, payer mix intelligence
CBRE Dimension	Via CBRE engagement	Healthcare real estate portfolio analytics

The free stack handles the analytical scaffolding. The paid platforms and advisors own the proprietary intelligence that actually closes the gap between a good-looking screen and a defensible investment decision.

One more thing: if you found this useful and want to see the same workflow applied to an acquisition pipeline - how AI changes the buy-side process for evaluating platform acquisitions - I covered that in an earlier piece. Start here with the pipeline analysis to see how the due diligence framing translates to live deal evaluation.