www.parcllabs.com Open in urlscan Pro
52.17.119.105  Public Scan

URL: https://www.parcllabs.com/articles/parcl-labs-price-feed-whitepaper
Submission: On March 04 via api from US — Scanned from DE

Form analysis 2 forms found in the DOM

Name: wf-form-Article-Email-FormGET

<form id="wf-form-Article-Email-Form" name="wf-form-Article-Email-Form" data-name="Article Email Form" method="get" data-wf-page-id="640bf07297b40fc3e171af40" data-wf-element-id="0751b53f-5f64-0755-bd81-0472509e696a" aria-label="Article Email Form">
  <div class="blogpost05_form-wrapper"><input class="form-input w-input" maxlength="256" name="Email" data-name="Email" placeholder="Enter your email" type="email" id="Email-3" required=""><input type="submit" data-wait="Please wait..."
      class="button is-button-small w-button" value="Subscribe"></div>
  <div class="text-size-small">Read about our <a href="#" class="text-style-link">privacy policy</a>.</div>
</form>

Name: email-formGET

<form id="email-form" name="email-form" data-name="Email Form" method="get" class="footer02_form" data-wf-flow="c09d3222760a57a40db2af96" data-wf-page-id="640bf07297b40fc3e171af40" data-wf-element-id="7a2b0489-e4c1-ef35-ab86-d924c4f2a7e6"
  aria-label="Email Form"><input class="form-input w-input" maxlength="256" name="email-2" data-name="Email 2" placeholder="Enter your email" type="email" id="email-2" required=""><input type="submit" data-wait="Please wait..."
    class="button is-button-small w-button" value="Subscribe"></form>

Text Content

Research

Research
Stay ahead with data-driven insights, research, and news

Whitepaper
Learn about Parcl Labs in depth
FAQ
Log in
Contact


Articles

Research
April 12, 2023
•

8 min read


PARCL LABS PRICE FEED WHITEPAPER




EXECUTIVE SUMMARY

 * The Parcl Labs Price Feed (PLPF) is an indicator that tracks price changes in
   residential real estate on a daily basis across multiple markets and property
   types using a simple metric; price per square foot.
 * Existing data sources such as the Case Shiller provide a distorted view as
   they only look at specific types of properties (single family houses), only
   considers repeated sales, have lags on when the information is released, and
   these data sources often lack granularity below the level of Metropolitan
   Areas.
 * Parcl Labs developed an enterprise level ETL process to ingest, clean and
   transform hundreds of millions of individual data points that leverages
   spatial data science to generate price estimates for multiple levels of
   geography.
 * Our methodology looks at representative segments of the market, filters
   outliers and other irregularities, creates smooth time series, and is tested
   on a daily basis to guarantee the quality of our indicator.
 * The PLPF empowers users to make better informed decisions and can be accessed
   through our user friendly API.


INTRODUCTION

Residential real estate is the cornerstone of wealth accumulation for a typical
individual and it is the largest asset class in the world, with a staggering
value of 258 trillion dollars. Despite its dominance within the global economy,
the industry is supported by an information ecosystem that is decades past its
prime and features a myriad of incomplete and potentially biased data.

Residential real estate is heavily fragmented, it lives in silos, and is very
concentrated among local actors. Even in the cases where the information exists
and can be accessed it tends to be lagged and/or heavily transformed. This
results in a scenario where buyers, sellers, and investors only get a fragmented
and distorted view of their local real estate markets and, as a result, can only
make educated guesses on the asset they plan to acquire.

Existing data sources that aim to bring clarity to a market have serious
shortcomings on how they present information. The most well known and widely
referenced residential real estate metric within the United States is the Case
Shiller Index. The Case Shiller produces indexes for 20 different metropolitan
areas as well as for the country in aggregate, However, we contend that the
methodology and output of this index are far from ideal. The index only looks at
specific types of properties (single family houses), has an obscure data
cleaning process that has presented difficulties when other researchers attempt
to replicate it, even when using an identical dataset, and has a lag of more
than two months from when the index is published.

Further, the index exclusively considers repeated sales which excludes a
substantial and increasing amount of multi-family home transactions. Analysis of
Parcl Labs data for 2022 reveals that the Case Shiller Index's approach to
calculate price changes would discard 50 percent of all transactions for the New
York Metropolitan region. For metros like Boston and Miami, the Case Shiller
methodology would exclude 61 and 64 percent of the true total transactions
within these given markets. This can be observed in the figure below where we
analyze the coverage gaps for the group of metropolitan areas that are part of
the Case Shiller 10.


Figure 1. Market Gaps in Transactions of Case Shiller 10 metros for 2022

The Parcl Labs Price Feed (PLPF) represents the sole presently available daily
estimate of residential real estate price per square foot across multiple
markets with varying geographic scopes (States, Metros, Cities, etc.), that
looks at all available transactions in a systematic and standardized way. The
PLPF allows users to monitor real-time price fluctuations in their respective
markets with minimal delay with minimal lag and with an indicator (median price
per square feet) that is easy to understand and that can be used to compare more
representatively with other markets.

This approach represents a significant departure from existing solutions such as
the previously mentioned Case Shiller Index, which provides users solely with an
index comparing the value of the average home in a given month with the value of
the average home in 2000, the most recent baseline reference year. Even if we
were able to independently establish the average price of a home in 2000, a user
would need to perform unintuitive transformations to get the average price of a
home across a given period of time. In the hypothetical scenario where a user
can make this transformation, the end result is the average price of a home in a
given market and comparison across different geographies would not be
standardized to a metric that allows this type of comparison; a similarly priced
condo in New York City is not necessarily comparable to a single family home in
Tampa, Florida or Washington DC. The PLPF addresses this and offers a normalized
metric (median price per square feet) that can be compared across markets and
that can be used to gauge historical price fluctuations; a square foot of
property is the same dimension across all imperial system markets.  Future
markets accustomed to metric systems will use a square meter as the unit of
analysis.


Figure 2. Changes in Price per Square Foot in Case Shiller 10 metros using Parcl
Labs data vs lates Case Shiller data (January 2023)

At Parcl Labs we have developed a price feed (PLPF) that tracks price changes on
a daily basis across the widest variety of markets for all property types, such
that users can understand the real-time dynamics of pricing in their markets and
in other markets across the country, and eventually around the world. The Parcl
Labs Price Feed addresses the following:

 * Provides an intuitive and easy-to-understand metric for evaluating
   residential real estate price movements
 * Closes the gap between county and other administrative real estate records
   authorities and current market conditions, resulting in the first-ever
   real-time tracking indicator of residential real estate
 * Generates estimates based on all available information within a given market,
   rather than relying solely on a subset of transactions
 * Standardizes and integrates data from different recording systems to create a
   single source of truth
 * Incorporates more timely information such as listings and other data feeds
   into the depiction of real estate market conditions
 * Reduces asymmetries in access to residential real estate data, thereby
   empowering users.
 * Offers an index that dynamically adjusts to the velocity and volume of
   transactions occurring within any given market to provide an unbiased
   assessment of real estate prices

Having a price feed that updates daily, with clean and standardized data from
multiple sources allows us to provide a more complete picture than relying on
estimates for large geographies. Our research has shown that markets do not
behave as a monolith and significant divergences can occur. As illustrated in
the following figure, there are marked differences in price dynamics between the
New York Metropolitan Statistical Area (MSA), Manhattan, and Brooklyn.


Figure 3. PLPF for New York Metro Area, Brooklyn and Manhattan

In the third quarter of 2023, Brooklyn and Manhattan both exhibited small
increases followed by downward trend in prices. However, during the same period,
the New York metropolitan area, which encompasses cities from states such as
Pennsylvania, Connecticut, and New Jersey, demonstrated a distinct trend,
indicating a split in the New York metropolitan market. Indexes, like the Case
Shiller, or the House Price Index from the U.S. Federal Housing Finance Agency,
fail to break down these types of regional divergences in a timely manner or
with adequate geographical decomposition. Further, the price per square foot in
Manhattan is nearly one thousand dollars more expensive than in the New York
Metro Area.

We are able to provide a timely, accurate and detailed price feed thanks to our
use of Data Science, Data Engineering and expertise in the real estate market.
Like any model, the first step in creating the best price indicator for real
estate starts with high quality, clean, and standardized data.


DATA

Our approach solves the problems that have been present in real estate
information since the genesis of recording real estate transactions, namely an
incomplete and inaccurate data universe that lives in silos. Unlike indexes that
rely only on single family homes that have been sold at least once, our data
universe contains information from new construction, repeated sales as well as
properties with no previous sale information. Further we include multiple types
of housing such as single family homes, townhomes, condos, etc. and as such are
able to present a more accurate depiction of real estate markets.

The data enrichment process also tackles one of the most important challenges in
getting accurate and timely data; the use of historical records from county
registrars that have heterogeneous timelines for publishing information. When a
property is sold it can take anywhere from 2 weeks to 6 months to make it into
the corresponding county register. To address this issue, we employ a range of
sources and compare them against historical records to improve the accuracy and
timeliness of our data. This is specially relevant in markets where there is a
lot of volatility in the volume of transactions and additional data points are
required to have a more accurate estimate of real estate. Our research has shown
that listing prices and real estate prices are strongly correlated with a
correlation coefficient of 0.89,  as depicted in figure 3.


Figure 4. Correlation between Price Per Square foot of Sales and Price Per
Square foot of Listing Prices for properties in Miami and Los Angeles in 2022

Collecting and integrating the data is only the beginning, to extract valuable
insights we have developed an enterprise-level ETL (extract, transform, load)
process that enables us to handle hundreds of millions of data points. However,
since each data source has its own idiosyncrasies and time lags, we need to
undertake a rigorous data cleaning and harmonization process that includes the
following steps:

 * Cleaning, de-duplicating, and standardizing property addresses to obtain the
   highest quality record of each property: For example the following address
   2323 West Av can also appear as 2323 w Av or 2323 w Avenue. We undergo a
   process that compares and creates a unique source of truth.
 * To further increase the confidence in our data we validate it using a third
   party to guarantee its integrity: This is an important step as it ensures
   that our assumptions are validated by external sources.
 * After performing an initial data pass, we conduct a reconciliation process to
   ensure that the most current information associated with each record is
   consistent. For instance, a property may be recorded as having 1700 square
   feet in one source while another source has it at 1600. To resolve these
   discrepancies, we undergo an iterative process that examines various sources
   and time periods. This helps us arrive at a reliable resolution.
 * After the data has been cleaned, standardized, and processed, we utilize the
   latest geographic information systems technology to assign properties to
   multiple types of markets. Using spatial data science, we are able to assign
   a property to its corresponding neighborhood, city, county, or other desired
   geographical construct, offering the flexibility to build PLPF at the desired
   geography level.

Once the data standardization process is completed we have a unique state of the
art database to build scalable and timely PLPF for any desired level of
geography. This is a necessary step before we can build the most reliable and
timely price feed for residential real estate.


PARCL LABS PRICE FEED (PLPF) METHODOLOGY

The Parcl Labs Price Feed (PLPF) is based on a multi-stage approach that ensures
the reliability and consistency of our price estimates. We take the data created
in the previous step and transform it into a final time series for each market.
This process consists of three stages that correct for volatility and market
idiosyncrasies, combine historical and more timely series in a logical and
consistent manner, and test the estimates produced to ensure the reliability of
the data.

Time Series Correction and Smoothing

We use a correction method that is robust to outliers and representative of the
markets we cover to adjust our daily estimates. Real estate data is often skewed
and can be distorted by the impact of large outliers. In the next figure we see
that the vast majority of sales are concentrated in the range of $11 to $1,900
dollars per square foot, even though we have transactions that indicate luxury
real estate properties in the right hand of the distribution. Using a simple
average generates a price per square foot of $653, while the more representative
median price is $563 a whooping $90 difference.


Figure 5. Price per Square Foot distribution for Sales in Los Angeles MSA in
2022

Given how skewed the data is, we only look at the sales that fall within the 35
and 65 percentile distribution of prices to limit the impact of outliers on our
analysis. With this more representative sample, we then use a moving median to
build daily sales price estimates. This sample space captures the movement of
the most representative parts of the markets, dynamically adjusts to changes in
underlying distributional shifts in transactions, and offers a more stable price
estimate.

The PLPF adds another step in perfecting how we impute information for each
market by using a dynamic backpropagation window based on volatility of
transactions. This simply looks at how many transactions in a given period of
time are available in each market before deciding how far back in time we are
going to look to create a sample space. While the availability of data in a
metropolitan area may merit using a short window of time, the sparsity of a
geography may require a more ample period to build stable samples.  In the
following figure we can appreciate the difference in the volume of transactions
that are available for Manhattan and for Tribeca, a popular and well sought
neighborhood in New York City.


Figure 6. Sparsity of Manhattan vs Tribeca

‍

To ensure that the sample is representative we select only observations within
the 35th to 65th percentile of the distribution. Next, we calculate the daily
median price per square foot for each market, using a window range that is
appropriate for the characteristics of that particular market. This process can
be represented by the following formula::



where ti is the dynamic window for market i.

We apply this smoothing and filtering process to both historical sales and more
timely data across all markets. By using a dynamic moving median, we are able to
smooth out price fluctuations, capture short- and long-term trends, and
regularize market idiosyncrasies. The figure below illustrates the effects of
applying our filtering process to sales data in the Pittsburgh MSA versus using
raw, unfiltered data.


Figure 7. Unadjusted Daily Price per SQ FT for Pittsburgh MSA vs PFPL in 2022

Calculation of Price levels using exponential decay weights

After smoothing out the after-sales information and incorporating more
up-to-date data sources, we generate a new estimate that combines historical
sales data with real-time information. This approach enables us to identify
rapidly changing market conditions that may not be reflected in traditional
indicators such as sales data alone. For instance, if a market is experiencing a
downturn, metrics such as listings will capture the emerging trend first. To
blend these two time series, we use a weighted average with weights that
exponentially decay. This process assigns greater weight to recent observations
while still preserving the influence of more distant data points. This is
particularly relevant for more timely data sources. The index can be represented
by the following formulas



Where variable psi represents the moving median price of market i multiplied by
the weights of sales wsi. Meanwhile, pti refers to the median price of timely
sources multiplied by the weight for timely sources wti. The weight wti is
defined by the decay factor λ. By combining traditional sales with timely
sources, this formula generates a daily estimate. Furthermore, our model can
still provide estimates for markets with limited timely sources by relying
solely on sales transactions when timely data is not available. Finally, we
employ a 7-day smoothing process to further minimize the impact of market
fluctuations.

While a seasonal adjustment is the norm on monthly and quarterly series there is
no consensus on what the best methodology for adjusting daily time series is.
This is due to the fact that intra-week and market-specific irregular factors
can vary significantly between different markets, making it challenging to
develop a one-size-fits-all approach. Additionally, even for monthly estimates
that are seasonally adjusted the seasonal components exhibit irregularities that
accentuate lagging data points on their estimates with calls to use unadjusted
estimates in periods of market volatility such as the one we are living at the
end of the Covid-19 housing boom. This method of adjusting our data for
irregularities also allows us to scale to thousands of different markets across
the USA, with global markets coming soon.

Dynamic testing of the data for price irregularities daily

To guarantee the consistency and reliability of our data we conduct tailored
testing to each one of the markets available in our API before publishing a data
update. This testing takes into consideration abnormal behavior in the different
data sources that compose our database, the local market idiosyncrasies that
explain volatility in volume and prices, as well as a geographic factor that
further adjusts the volatility of our series. This results in a time series that
rigorously tests for any sudden movements on the price per square foot.

Finally, as part of our effort to ensure data consistency and transparency we
performed a correlation analysis between our PLPF and the metros listed in the
Case Shiller 20 index. We compared the monthly median prices from our PLPF with
the non-seasonally adjusted Case Shiller index for each metro where data was
available. The table below demonstrates the close alignment of our numbers, with
an average positive correlation of 0.98. And while the Case Shiller has a lag of
3 and half months, our PLPF is updated daily.


Table 1. Correlation Coefficient of PLPF with Metros in Case Shiller 20


CONCLUSION

Residential real estate tends to be siloed, with incomplete snapshots of the
market available for users either by limiting the estimates to a specific type
of property and transactions (e.g. repeated sales of single family homes) or due
to a long lag period. This has resulted in a lack of information that is
complete and reliable for everyday users.

The Parcl Labs Price Feed (PLPF) provides a daily estimate of price per square
foot of residential real estate across multiple markets and property types.  We
do this by cleaning and standardizing millions of traditional and real time data
points, ingest them into our data warehouse, and by applying time series
correction and smoothing to hundreds of different time series

We break down asymmetries in access to residential real estate data to empower
users to make better and more informed decisions around real estate. The PLPF
provides a comprehensive solution to the problems associated with the real
estate information ecosystem, offering a reliable and timely price feed that
allows buyers, sellers, and investors to make informed decisions based on
accurate and real-time market data.

Sign up for api here.

‍

Table of contents
Executive SummaryIntroductionDataParcl Labs Price Feed (PLPF)
MethodologyConclusion
Written by
Jesus Leal Trujillo
Principal Data Scientist

Subscribe to our newsletter
Read about our privacy policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Share this post










ARTICLES TO KEEP YOU INFORMED

Feb 26, 2024 • 5 MIN READ


PARCL LABS PRICE FEED WHITE PAPER PARIS V2

Feb 23, 2024 • 8 MIN READ


MARKET UPDATE: PREDICTING CASE SHILLER NUMBERS FOR DECEMBER

Feb 23, 2024 • 8 MIN READ


A TALE OF TWO SFR MARKETS: ATLANTA AND TAMPA PRICE TRENDS AND INVESTOR ACTIVITY

View more posts
Company
About us
Contact
Resources
Research
FAQ
Join our newsletter
Stay up to date on releases and news.
We care about your data in our privacy policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

© 2023 Parcl Labs
TermsPrivacy


Share