www.singlestore.com Open in urlscan Pro
2600:9000:238d:ca00:1d:6ef1:2fc0:93a1  Public Scan

Submitted URL: https://app.go.singlestore.com/e/er?s=1387486446&lid=6899&elqTrackId=4b84d3dae7ca4705bf0f80f07e7dbed8&elq=c6272017a17641f59151a...
Effective URL: https://www.singlestore.com/blog/real-time-data-platforms-singlestore-vs-databricks/?utm_medium=email&utm_source=singlestore...
Submission Tags: falconsandbox
Submission: On December 20 via api from US — Scanned from DE

Form analysis 0 forms found in the DOM

Text Content

We value your privacy

We use cookies to enhance your browsing experience, serve personalized ads or
content, and analyze our traffic. By clicking "Accept All", you consent to our
use of cookies. Read More

Customize Reject All Accept All
Customize Consent Preferences

We use cookies to help you navigate efficiently and perform certain functions.
You will find detailed information about all cookies under each consent category
below.

The cookies that are categorized as "Necessary" are stored on your browser as
they are essential for enabling the basic functionalities of the site. ... Show
more


NecessaryAlways Active

Necessary cookies are required to enable the basic features of this site, such
as providing secure log-in or adjusting your consent preferences. These cookies
do not store any personally identifiable data.

 * Cookie
   AUTH_SESSION_ID
 * Duration
   session
 * Description
   
   No description available.

 * Cookie
   AUTH_SESSION_ID_LEGACY
 * Duration
   session
 * Description
   
   No description available.

 * Cookie
   KC_RESTART
 * Duration
   session
 * Description
   
   No description available.

 * Cookie
   _forum_session
 * Duration
   session
 * Description
   
   No description available.

 * Cookie
   __cfruid
 * Duration
   session
 * Description
   Cloudflare sets this cookie to identify trusted web traffic.

 * Cookie
   AWSALBCORS
 * Duration
   7 days
 * Description
   This cookie is managed by Amazon Web Services and is used for load balancing.

 * Cookie
   YII_CSRF_TOKEN
 * Duration
   session
 * Description
   This cookie is used a unique token that used in securing forms and other
   website inputs against XSS attacks.

 * Cookie
   _GRECAPTCHA
 * Duration
   5 months 27 days
 * Description
   This cookie is set by the Google recaptcha service to identify bots to
   protect the website against malicious spam attacks.

 * Cookie
   AWSALB
 * Duration
   7 days
 * Description
   
   AWSALB is an application load balancer cookie set by Amazon Web Services to
   map the session to the target.

 * Cookie
   cookieyes-consent
 * Duration
   1 year
 * Description
   CookieYes sets this cookie to remember users' consent preferences so that
   their preferences are respected on their subsequent visits to this site. It
   does not collect or store any personal information of the site visitors.

 * Cookie
   __stripe_sid
 * Duration
   0
 * Description
   
   Cookie set by Stripe for fraud prevention purposes

 * Cookie
   __stripe_mid
 * Duration
   0
 * Description
   
   Set by Stripe for fraud prevention purposes

 * Cookie
   JSESSIONID
 * Duration
   session
 * Description
   New Relic uses this cookie to store a session identifier so that New Relic
   can monitor session counts for an application.

 * Cookie
   intercom-id-*
 * Duration
   8 months 26 days 1 hour
 * Description
   Intercom sets this cookie that allows visitors to see any conversations
   they've had on Intercom websites.

 * Cookie
   intercom-session-*
 * Duration
   7 days
 * Description
   Intercom sets this cookie that allows visitors to see any conversations
   they've had on Intercom websites.

 * Cookie
   intercom-device-id-*
 * Duration
   8 months 26 days 1 hour
 * Description
   Intercom sets this cookie that allows visitors to see any conversations
   they've had on Intercom websites.

Functional


Functional cookies help perform certain functionalities like sharing the content
of the website on social media platforms, collecting feedback, and other
third-party features.

 * Cookie
   _biz_uid
 * Duration
   1 year
 * Description
   This cookie is set by Bizible, to store user id on the current domain.

 * Cookie
   _biz_nA
 * Duration
   1 year
 * Description
   This cookie, set by Bizible, is a sequence number that Bizible includes for
   all requests, for internal diagnostics purposes.

 * Cookie
   __cf_bm
 * Duration
   30 minutes
 * Description
   This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.

 * Cookie
   UserMatchHistory
 * Duration
   1 month
 * Description
   LinkedIn sets this cookie for LinkedIn Ads ID syncing.

 * Cookie
   lang
 * Duration
   session
 * Description
   LinkedIn sets this cookie to remember a user's language setting.

 * Cookie
   bcookie
 * Duration
   1 year
 * Description
   LinkedIn sets this cookie from LinkedIn share buttons and ad tags to
   recognize browser ID.

 * Cookie
   lidc
 * Duration
   1 day
 * Description
   LinkedIn sets the lidc cookie to facilitate data center selection.

 * Cookie
   bscookie
 * Duration
   1 year
 * Description
   LinkedIn sets this cookie to store performed actions on the website.

 * Cookie
   ELOQUA
 * Duration
   1 year 1 month
 * Description
   
   Eloqua global user identifier

 * Cookie
   ELOQUA
 * Duration
   1 year 1 month
 * Description
   
   Eloqua global user identifier

 * Cookie
   ajs_anonymous_id
 * Duration
   1 year
 * Description
   
   This cookie set by Segment is used to record the number of people that visit
   our site, and track whether you've visited before.

Analytics


Analytical cookies are used to understand how visitors interact with the
website. These cookies help provide information on metrics such as the number of
visitors, bounce rate, traffic source, etc.

 * Cookie
   singlestoreTraits
 * Duration
   1 year
 * Description
   
   No description

 * Cookie
   _biz_sid
 * Duration
   30 minutes
 * Description
   
   This cookie is set by Bizible, to store the user's session id.

 * Cookie
   _biz_pendingA
 * Duration
   1 year
 * Description
   
   A Cloudflare cookie set to record users’ settings as well as for
   authentication and analytics.

 * Cookie
   _biz_flagsA
 * Duration
   1 year
 * Description
   
   A single cookie from Bizible that stores multiple information, such as
   whether or not the user has submitted a form, performed a crossdomain
   migration, sent a viewthrough pixel, opted out from tracking, etc.

 * Cookie
   _BUID
 * Duration
   1 year
 * Description
   This cookie, set by Bizible, is a universal user id to identify the same user
   across multiple clients’ domains.

 * Cookie
   _ga
 * Duration
   1 year 1 month 4 days
 * Description
   The _ga cookie, installed by Google Analytics, calculates visitor, session
   and campaign data and also keeps track of site usage for the site's analytics
   report. The cookie stores information anonymously and assigns a randomly
   generated number to recognize unique visitors.

 * Cookie
   _gid
 * Duration
   1 day
 * Description
   Installed by Google Analytics, _gid cookie stores information on how visitors
   use a website, while also creating an analytics report of the website's
   performance. Some of the data that are collected include the number of
   visitors, their source, and the pages they visit anonymously.

 * Cookie
   _gcl_au
 * Duration
   3 months
 * Description
   Provided by Google Tag Manager to experiment advertisement efficiency of
   websites using their services.

 * Cookie
   _hjFirstSeen
 * Duration
   30 minutes
 * Description
   Hotjar sets this cookie to identify a new user’s first session. It stores a
   true/false value, indicating whether it was the first time Hotjar saw this
   user.

 * Cookie
   _hjAbsoluteSessionInProgress
 * Duration
   30 minutes
 * Description
   Hotjar sets this cookie to detect the first pageview session of a user. This
   is a True/False flag set by the cookie.

 * Cookie
   _hjTLDTest
 * Duration
   session
 * Description
   To determine the most generic cookie path that has to be used instead of the
   page hostname, Hotjar sets the _hjTLDTest cookie to store different URL
   substring alternatives until it fails.

 * Cookie
   CONSENT
 * Duration
   2 years
 * Description
   YouTube sets this cookie via embedded youtube-videos and registers anonymous
   statistical data.

 * Cookie
   browser_id
 * Duration
   5 years
 * Description
   This cookie is used for identifying the visitor browser on re-visit to the
   website.

 * Cookie
   IDE
 * Duration
   1 year 24 days
 * Description
   Google DoubleClick IDE cookies are used to store information about how the
   user uses the website to present them with relevant ads and according to the
   user profile.

 * Cookie
   mp_aee4da1111c439e10ee2982f40abcd0d_mixpanel
 * Duration
   1 year
 * Description
   
   Mixpanel development

 * Cookie
   _ga_*
 * Duration
   1 year 1 month 4 days
 * Description
   Google Analytics sets this cookie to store and count page views.

 * Cookie
   _gat_UA-*
 * Duration
   1 minute
 * Description
   Google Analytics sets this cookie for user behaviour tracking.

 * Cookie
   _gd_visitor
 * Duration
   1 year 1 month 4 days
 * Description
   This cookie is used for collecting information on the users visit such as
   number of visits, average time spent on the website and the pages loaded for
   displaying targeted ads.

 * Cookie
   _gd_session
 * Duration
   4 hours
 * Description
   This cookie is used for collecting information on users visit to the website.
   It collects data such as total number of visits, average time spent on the
   website and the pages loaded.

 * Cookie
   demdex
 * Duration
   5 months 27 days
 * Description
   The demdex cookie, set under the domain demdex.net, is used by Adobe Audience
   Manager to help identify a unique visitor across domains.

 * Cookie
   u
 * Duration
   session
 * Description
   This cookie is used by Bombora to collect information that is used either in
   aggregate form, to help understand how websites are being used or how
   effective marketing campaigns are, or to help customize the websites for
   visitors.

 * Cookie
   ajs_user_id
 * Duration
   never
 * Description
   This cookie is set by Segment to help track visitor usage, events, target
   marketing, and also measure application performance and stability.

 * Cookie
   MR
 * Duration
   7 days
 * Description
   This cookie, set by Bing, is used to collect user information for analytics
   purposes.

 * Cookie
   _hjSessionUser_*
 * Duration
   1 year
 * Description
   Hotjar sets this cookie to ensure data from subsequent visits to the same
   site is attributed to the same user ID, which persists in the Hotjar User ID,
   which is unique to that site.

 * Cookie
   _hjSession_*
 * Duration
   30 minutes
 * Description
   Hotjar sets this cookie to ensure data from subsequent visits to the same
   site is attributed to the same user ID, which persists in the Hotjar User ID,
   which is unique to that site.

Performance


Performance cookies are used to understand and analyze the key performance
indexes of the website which helps in delivering a better user experience for
the visitors.

 * Cookie
   _gat
 * Duration
   1 minute
 * Description
   This cookie is installed by Google Universal Analytics to restrain request
   rate and thus limit the collection of data on high traffic sites.

 * Cookie
   _uetsid
 * Duration
   1 day
 * Description
   Bing Ads sets this cookie to engage with a user that has previously visited
   the website.

 * Cookie
   _uetvid
 * Duration
   1 year 24 days
 * Description
   Bing Ads sets this cookie to engage with a user that has previously visited
   the website.

 * Cookie
   SRM_B
 * Duration
   1 year 24 days
 * Description
   Used by Microsoft Advertising as a unique ID for visitors.

Advertisement


Advertisement cookies are used to provide visitors with customized
advertisements based on the pages you visited previously and to analyze the
effectiveness of the ad campaigns.

 * Cookie
   li_gc
 * Duration
   5 months 27 days
 * Description
   No description

 * Cookie
   MUID
 * Duration
   1 year 24 days
 * Description
   Bing sets this cookie to recognize unique web browsers visiting Microsoft
   sites. This cookie is used for advertising, site analytics, and other
   operations.

 * Cookie
   test_cookie
 * Duration
   15 minutes
 * Description
   The test_cookie is set by doubleclick.net and is used to determine if the
   user's browser supports cookies.

 * Cookie
   _fbp
 * Duration
   3 months
 * Description
   This cookie is set by Facebook to display advertisements when either on
   Facebook or on a digital platform powered by Facebook advertising, after
   visiting the website.

 * Cookie
   personalization_id
 * Duration
   1 year 1 month 4 days
 * Description
   Twitter sets this cookie to integrate and share features for social media and
   also store information about how the user uses the website, for tracking and
   targeting.

 * Cookie
   IDE
 * Duration
   1 year 24 days
 * Description
   Google DoubleClick IDE cookies are used to store information about how the
   user uses the website to present them with relevant ads and according to the
   user profile.

 * Cookie
   tuuid
 * Duration
   1 year 1 month 4 days
 * Description
   The tuuid cookie, set by BidSwitch, stores an unique ID to determine what
   adverts the users have seen if they have visited any of the advertiser's
   websites. The information is used to decide when and how often users will see
   a certain banner.

 * Cookie
   tuuid_lu
 * Duration
   1 year 1 month 4 days
 * Description
   This cookie, set by BidSwitch, stores a unique ID to determine what adverts
   the users have seen while visiting an advertiser's website. This information
   is then used to understand when and how often users will see a certain
   banner.

 * Cookie
   ANONCHK
 * Duration
   10 minutes
 * Description
   The ANONCHK cookie, set by Bing, is used to store a user's session ID and
   also verify the clicks from ads on the Bing search engine. The cookie helps
   in reporting and personalization as well.

 * Cookie
   bku
 * Duration
   6 months
 * Description
   Bluekai uses this cookie to build an anonymous user profile with data like
   the user's online behaviour and interests.

 * Cookie
   bkpa
 * Duration
   6 months
 * Description
   Set by Bluekai, this cookie stores anonymized data about the users' web usage
   in an aggregate form to build a profile for targeted advertising.

 * Cookie
   NID
 * Duration
   6 months
 * Description
   NID cookie, set by Google, is used for advertising purposes; to limit the
   number of times the user sees an ad, to mute unwanted ads, and to measure the
   effectiveness of ads.

 * Cookie
   YSC
 * Duration
   session
 * Description
   YSC cookie is set by Youtube and is used to track the views of embedded
   videos on Youtube pages.

 * Cookie
   VISITOR_INFO1_LIVE
 * Duration
   5 months 27 days
 * Description
   A cookie set by YouTube to measure bandwidth that determines whether the user
   gets the new or old player interface.

 * Cookie
   yt-remote-device-id
 * Duration
   never
 * Description
   YouTube sets this cookie to store the video preferences of the user using
   embedded YouTube video.

 * Cookie
   yt.innertube::requests
 * Duration
   never
 * Description
   This cookie, set by YouTube, registers a unique ID to store data on what
   videos from YouTube the user has seen.

 * Cookie
   yt.innertube::nextId
 * Duration
   never
 * Description
   This cookie, set by YouTube, registers a unique ID to store data on what
   videos from YouTube the user has seen.

 * Cookie
   yt-remote-connected-devices
 * Duration
   never
 * Description
   YouTube sets this cookie to store the video preferences of the user using
   embedded YouTube video.

 * Cookie
   ln_or
 * Duration
   1 day
 * Description
   
   Set by LinkedIn. Used to determine if Oribi analytics can be carried out on a
   specific domain

 * Cookie
   li_sugr
 * Duration
   3 months
 * Description
   LinkedIn sets this cookie to collect user behaviour data to optimise the
   website and make advertisements on the website more relevant.

 * Cookie
   CMID
 * Duration
   1 year
 * Description
   Casale Media sets this cookie to collect information on user behaviour for
   targeted advertising.

 * Cookie
   CMPS
 * Duration
   3 months
 * Description
   CasaleMedia sets CMPS cookie for anonymous user tracking based on users'
   website visits to display targeted ads.

 * Cookie
   CMPRO
 * Duration
   3 months
 * Description
   CasaleMedia sets CMPRO cookie for anonymous usage tracking and targeted
   advertising.

 * Cookie
   dpm
 * Duration
   5 months 27 days
 * Description
   The dpm cookie, set under the Demdex domain, assigns a unique ID to each
   visiting user, hence allowing third-party advertisers to target these users
   with relevant ads.

 * Cookie
   ab
 * Duration
   1 year
 * Description
   Owned by agkn, this cookie is used for targeting and advertising purposes.

 * Cookie
   scribd_ubtc
 * Duration
   10 years
 * Description
   Scribd sets this cookie to gather data on user behaviour across several
   websites and maximise the relevancy of the advertisements on the website.

 * Cookie
   PREF
 * Duration
   1 year 1 month 4 days
 * Description
   PREF cookie is set by Youtube to store user preferences like language, format
   of search results and other customizations for YouTube Videos embedded in
   different sites.

 * Cookie
   __Host-GAPS
 * Duration
   2 years
 * Description
   This cookie allows the website to identify a user and provide enhanced
   functionality and personalisation.

Others


Other uncategorized cookies are those that are being analyzed and have not been
classified into a category as yet.

 * Cookie
   AnalyticsSyncHistory
 * Duration
   1 month
 * Description
   No description

 * Cookie
   _rdt_uuid
 * Duration
   3 months
 * Description
   No description available.

 * Cookie
   muc_ads
 * Duration
   1 year 1 month 4 days
 * Description
   No description

 * Cookie
   CLID
 * Duration
   1 year
 * Description
   No description

 * Cookie
   chzdpsync
 * Duration
   1 month
 * Description
   No description available.

 * Cookie
   _clck
 * Duration
   1 year
 * Description
   No description

 * Cookie
   _clsk
 * Duration
   1 day
 * Description
   No description

 * Cookie
   SM
 * Duration
   session
 * Description
   No description available.

 * Cookie
   loglevel
 * Duration
   never
 * Description
   No description available.

 * Cookie
   _zendesk_shared_session
 * Duration
   session
 * Description
   No description available.

 * Cookie
   _zendesk_session
 * Duration
   session
 * Description
   No description available.

 * Cookie
   _zendesk_authenticated
 * Duration
   past
 * Description
   No description

 * Cookie
   __tld__
 * Duration
   session
 * Description
   No description

 * Cookie
   r
 * Duration
   session
 * Description
   No description

 * Cookie
   docebo_session
 * Duration
   session
 * Description
   No description available.

 * Cookie
   _help_center_session
 * Duration
   session
 * Description
   No description available.

 * Cookie
   UserSettings
 * Duration
   1 year 1 month 4 days
 * Description
   No description

 * Cookie
   visitorId
 * Duration
   1 year
 * Description
   No description

 * Cookie
   _cfuvid
 * Duration
   session
 * Description
   Description is currently not available.

 * Cookie
   6suuid
 * Duration
   1 year 1 month 4 days
 * Description
   No description available.

 * Cookie
   tvid
 * Duration
   1 year
 * Description
   No description available.

 * Cookie
   tv_UIDM
 * Duration
   1 year 1 month 4 days
 * Description
   Description is currently not available.

 * Cookie
   _hjIncludedInSessionSample_2171074
 * Duration
   2 minutes
 * Description
   Description is currently not available.

 * Cookie
   ph_phc_tmyI0UQGFnLiRkVseDcCpO2vJmB1fuq8UI8XB2tmCU4_posthog
 * Duration
   1 year
 * Description
   Description is currently not available.

 * Cookie
   _an_uid
 * Duration
   7 days
 * Description
   No description available.

 * Cookie
   __Secure-YEC
 * Duration
   1 year 1 month
 * Description
   Description is currently not available.

 * Cookie
   VISITOR_PRIVACY_METADATA
 * Duration
   5 months 27 days
 * Description
   Description is currently not available.

 * Cookie
   state
 * Duration
   session
 * Description
   No description available.

 * Cookie
   pkce
 * Duration
   session
 * Description
   Description is currently not available.

 * Cookie
   cf_clearance
 * Duration
   1 year
 * Description
   Description is currently not available.

 * Cookie
   KEYCLOAK_IDENTITY
 * Duration
   past
 * Description
   Description is currently not available.

 * Cookie
   KEYCLOAK_IDENTITY_LEGACY
 * Duration
   past
 * Description
   Description is currently not available.

 * Cookie
   KEYCLOAK_SESSION
 * Duration
   past
 * Description
   Description is currently not available.

 * Cookie
   KEYCLOAK_SESSION_LEGACY
 * Duration
   past
 * Description
   Description is currently not available.

 * Cookie
   cloud.session.token
 * Duration
   past
 * Description
   Description is currently not available.

 * Cookie
   atlassian.account.ffs.id
 * Duration
   1 year
 * Description
   No description available.

 * Cookie
   atlassian.account.xsrf.token
 * Duration
   session
 * Description
   No description available.

Reject All Save My Preferences Accept All
Pages
Webinar
Event
Blog
Customer
Press Release
Spaces
Docs


START TYPING TO FIND WHAT YOU NEED

Search our web pages, docs, blog posts, events, and more.

Arrow Turn Down Left Icon

to select

Arrow Up IconArrow Down Icon

to navigate

esc

to close

Xmark Icon
 * ProductChevron Down Icon
 * SolutionsChevron Down Icon
 * DocsChevron Down Icon
 * ResourcesChevron Down Icon
 * Pricing

 * SearchSearch Icon
 * Sign In
 * Try FreeChevron Right Icon

Bars Icon
All postsProductData IntensityEngineeringCompanyCase StudiesTrending
Twitter IconLinkedin IconGithub IconRss Icon
Share to FacebookShare to TwitterShare to LinkedInShare to RedditShare to Email
November 14th, 2023


REAL-TIME DATA PLATFORMS: SINGLESTORE VS. DATABRICKS

Dave Eyler

Senior Director, Product Management



SingleStore and Databricks are both exceptional data platforms that address
important challenges for their customers.



However, when it comes to performance and cost, SingleStore has several, major
advantages because it’s built from the ground up for performance, which ends up
leading to lower cost. This blog is the first of a multi-part series in which we
will examine these differences, and we will begin on the subject of real-time
analytics and operations, an area in which SingleStore excels. 



Additionally, we have observed that SingleStore also has cost and performance
advantages in non real-time, batch ETL jobs — and we will cover those in a
follow up blog.




UNDERSTANDING THE VALUE OF REAL-TIME
DATAUNDERSTANDING-THE-VALUE-OF-REAL-TIME-DATA

To begin, let's establish the significance of real-time data. Why do customers
value it? The simple answer is in many use cases, the value of data diminishes
as it ages. Whether you're optimizing a marketing campaign, monitoring trade
speeds, pushing real-time inventory updates, observing network hiccups or
watching security events, delays in customer reactions translate to financial
losses. The events generated by these sources arrive continuously — in a stream
— which has led to the rise of streaming technologies. Databricks' recent blog,
"Latency goes subsecond in Apache Spark Structured Streaming," aptly describes
this:

“In our conversations with many customers, we have encountered use cases that
require consistent sub-second latency. Such low latency use cases arise from
applications like operational alerting and real time monitoring, a.k.a
"operational workloads."



At SingleStore, we deal in milliseconds, because that’s what matters to our
customers. Let’s call this quality latency, and define it as the time it takes
for one event to enter the platform, reach its destination and generate value. 
There are other important factors to consider, and Databricks correctly points
out two more in their blog which describes “give[ing] users the flexibility to
balance the tradeoff between throughput, cost and latency”.  We’ll add two more,
simplicity and availability, to complete our goals for the ideal real time data
platform:

 1. Minimize latency
 2. Maximize throughput
 3. Minimize cost
 4. Maximize availability
 5. Maximize simplicity




HOW SINGLESTORE HANDLES REAL-TIME USE
CASESHOW-SINGLE-STORE-HANDLES-REAL-TIME-USE-CASES

First, we’d like to discuss SingleStore’s recommended approach to real-time data
use cases, which is to ingest streaming data into SingleStore and query it,
illustrated in the following figure.





At this point you are probably thinking, huh? That’s it? There must be more to
it than that!  How could one data platform ingest in real time AND serve
analytical queries without sacrificing real- time SLA? I hear companies talking
about adding new, specialized streaming products all the time. What do they do?




HOW DATABRICKS HANDLES REAL-TIME USE
CASESHOW-DATABRICKS-HANDLES-REAL-TIME-USE-CASES

As it turns out, Databricks is one such company.  Let’s examine their approach
in their recent blog, Latency goes subsecond in Apache Spark Structured
Streaming, which includes two illustrations.  In the first illustration,

“Analytical workloads typically ingest, transform, process and analyze data in
real time and write the results into Delta Lake backed by object storage” [where
it stops being real time]



That’s not the end of the story, as the blog also contains an entirely separate
‘operational workloads’ configuration.  While this existence of this
configuration is, by itself, compelling evidence the analytical workloads
configuration stops being real time when it reaches Delta Lake, Databricks also
pretty much admits this in their blog:

“On the other hand, operational workloads, ingest and process data in real time
and automatically trigger a business process.” [that is also in real time]







The curious thing about this second figure is that it ends in a message bus. The
data never lands and nothing ends up using it. Databricks solution for real time
is to read from Kafka, do transformations and write back to either Kafka or…



“fast key value stores like Apache Cassandra or Redis for downstream integration
to business process”



...or other databases! Why would a data platform company like Databricks tell
their customers to store data in another database?  Because those databases
offer something that Databricks doesn’t: fast point reads and writes (CRUD).
They use a key-value format to enable this capability, at the expense of
analytical queries, which neither those databases nor Kafka can do easily and
efficiently. 



SingleStoreDB, with its patented Universal Storage, can do both transactional
and analytical queries. In fact, SingleStore is more than the sum of Databricks
and a key value store, since it provides a single SQL interface to perform reads
and writes with:

 1. High selectivity (OLTP, including CRUD)
 2. Medium selectivity (real-time analytics) — only SingleStore can do this
 3. Low selectivity (large scale analytics and bulk insert)



While this is certainly enough to explain why Databricks recommends Cassandra or
Redis for real time, there is another compelling reason: SingleStore and those
databases are more highly available than Databricks. SingleStore has automatic
redundancy within the nodes of its clusters (Standard Edition) and even across
availability zones with the push of a button (Premium Edition).  Databricks, on
the other hand, doesn’t have a page about high availability in its docs.
Instead, Databricks talks about how AWS S3, a component of their system is
highly available (which does not mean the whole system is highly available).



The absence of this feature explains the existence of this AWS deployment guide
which describes how, with considerable effort, you can deploy Databricks
clusters in two AZs, but note this is still not your cluster that is cross-AZ,
it is just the existence of any clusters in two AZs. If you want your
Databricks-powered app to be truly tolerant of an AZ failure, you are doing that
yourself by configuring the above and changing your app talk to two clusters —
both of which come at the price of a lot more effort, expense and complexity.   



With all of this in mind, this illustration of Databricks’ proposal is a more
complete representation of their proposed Rube Goldberg Machine — cough, we mean
real-time data platform, along with its drawbacks.





Databricks' recommended configuration of operational streaming pipelines can be
greatly simplified by replacing all of it with SingleStore, which is built for
real time and requires only a single message bus for ingestion.





Option 3: Simple analytical queries, highly available and real time


HOW SINGLESTORE WORKS UNDER THE HOODHOW-SINGLE-STORE-WORKS-UNDER-THE-HOOD

Wondering how we do it?  We’re glad you asked! Let’s take a deeper dive into the
architecture that makes SingleStore a simple and performant platform for
real-time analytics.



Streaming data originates from the source and events are ingested by
SingleStore’s Pipelines, which are fully parallelized, and can read data from
Kafka and a variety of other sources in many popular formats.  Another possible
source of real-time data is DML statements to insert, update, delete and upsert
data.  These can run with high throughput and concurrently with streaming ingest
thanks to row level locking — which means that individual rows, rather than
whole tables, can be locked for writes.  This greatly increases the throughput
of the end-to-end system.



Transformations can be applied with stored procedures, which can be called as
the endpoints of pipelines in SingleStore and allow our customers to apply
complex transformations to streaming data including filtering, joins, grouping
aggregations and the ability to write into multiple tables.  Since they can
serve as pipeline endpoints, there’s a single partitioned writer working on
batches of data, facilitating parallelism. 



Here’s an example of a stored procedure that maintains a custom running SUM (or
AVG) aggregation on grouped data from a pipeline containing CDC data (where the
‘action’ column may contain ‘DELETED’ and ‘INSERTED’).

CREATE PROCEDURE my_custom_sum (
    cdc QUERY(c1 INT, c2 TEXT, action VARCHAR)
AS
BEGIN
  INSERT INTO my_custom_mv
  SELECT col2, SUM( IF(action=’DELETED’, -col1, col1) ) AS sum,
               SUM( IF(action=’DELETED’, -1, 1) ) AS num_rows
  FROM cdc
  GROUP BY col2
  HAVING sum != 0 OR num_rows != 0
  ON DUPLICATE KEY UPDATE sum=sum+VALUES(sum),
                          num_rows=num_rows+VALUES(num_rows);
  DELETE FROM my_custom_mv WHERE num_rows = 0;
END



After it’s transformed, data is written into Tier 1, which is the memory layer
of the LSM Tree (the main data structure backing SingleStoreDB tables). These
writes use a replicated Write Ahead Log (WAL) to persist to Tier 2, the local
disk and persistence to Tier 3 is done lazily in the background — not on the
latency critical path. The net result? The data becomes consistently queryable
in single-digit milliseconds.






KEY DIFFERENCES BETWEEN SINGLESTORE AND DATABRICKS
ARCHITECTUREKEY-DIFFERENCES-BETWEEN-SINGLE-STORE-AND-DATABRICKS-ARCHITECTURE

Why can’t Databricks offer comparable real time capabilities?  There are two
main reasons:

 1. For writes, Tiers 1 + 2 don’t exist
 2. For reads, Tier 1 doesn’t exist and Tier 2 is off by default, harder to use
    and adds latency

Let’s examine the write path first. In SingleStore, writes arrive in Tier 1, the
logs are written to Tier 2 and data is replicated throughout the system and
instantly queryable.  Contrast this with Databricks, where writes have to go all
the way to the cloud object store before they are acknowledged.



The read path has similar limitations. In SingleStore, Universal Storage takes
advantage of both Tiers 1 and 2, and purely in-memory rowstore tables can also
be used for the maximum performance optimization.  Compare this with Databricks,
which famously stores nothing in its Spark memory layer — which is great, until
you want to read really fast. 



Further, Databricks’ disk layer is off by default and even when enabled, new
data must first be ingested into the object store and only then pulled into the
cache, adding a lot of latency. In SingleStore new data is written to disk on
the way in, so it’s already there to be read when you need it.



Most importantly, Databricks knows it’s not possible to write to and read from
the cloud object store with low latency — and they have designed their entire
streaming architecture as a way to compensate for the absence of this
capability.  



Databricks recommends their users split their application into two parts,
executed by completely different systems:

 1. Pre-processing data with Spark Structured Streaming pipelines
 2. Lighter weight queries over pre-processed data

However, the first system introduces delays and makes processing less real time,
and the second still doesn’t deliver low enough latency for many scenarios.



SingleStore can do fast, low latency queries either over raw ingested data or
pre-processed data in stored procedures that are endpoints of ingest pipelines.
In the latter case, pre-processing is done in the same environment using SQL.
This results in legitimately real-time processing.




STRENGTHS OF DATABRICKSSTRENGTHS-OF-DATABRICKS

Despite all of the above, streaming architectures that never touch a database do
have their uses.  For example, you might have a truly massive amount of data —
more than would ever fit in storage — and you just want to make a few transforms
to events in one Kafka stream, and re-emit another Kafka stream that triggers an
alert. 



Databricks has also made great advances in data exploration, and developers love
the flexibility of their notebook interface.  Furthermore, their product has a
lot of advanced machine learning capabilities. 



Databricks is also widely used to power ETL jobs, although SingleStore has some
performance and cost advantages in this space, so some jobs might make more
sense on SingleStore. We will cover this topic and the best ways to use the two
products together in a future blog in this series.




SUMMARY: REAL-TIME DATA PLATFORMS: SINGLESTORE VS.
DATABRICKSSUMMARY-REAL-TIME-DATA-PLATFORMS-SINGLE-STORE-VS-DATABRICKS

For real-time use cases, Apache Spark Structured Streaming and another database
is an overly complicated and impractical solution when you can simply ingest
streaming data into SingleStore and query it.



Lower latency

 * SingleStore has an in-memory data tier for freshly ingested trickle inserts
   and updates, as well as faster access to metadata. This layer is absent in
   Databricks 

 * SingleStore has row-level indexes found in operational systems and data
   formats supporting cheap seeks, while Databricks only supports redundant data
   structures that are used to prune read sets on the file level (which
   SingleStore does as well), and not on the row level. This enables SingleStore
   to use significantly less CPU and disk I/O than Databricks — especially on
   queries with high and medium selectivity

 * Data in SingleStore can be stored in hybrid row and column-centric
   representations, a key area of innovation that the company began years ago
   with Universal Storage and we recently extended with Column Group Indexes.
   This also allows SingleStore to save on disk I/O and CPU compared to
   Databricks —  especially on queries that select all or most of the columns in
   a table
 * Writes to SingleStore become consistently queryable in single-digit
   milliseconds thanks to the in-memory tier and write ahead logging (WAL);
   compare this to a pipeline that terminates in a Delta table, which is backed
   by an object store. Each blob write to S3 could take up to 100 ms, there are
   likely multiple blob writes for each update, and that’s after the data has
   been translated to Parquet — another step not needed in SingleStore on the
   latency-critical code path. Finally, end to end, this means writes to
   Databricks will be one-to-two orders of magnitude slower than SingleStore
 * Add up all the preceding advantages, and it’s not surprising that SingleStore
   queries are exceptionally fast compared to Databricks, as you can see in this
   TPC-H benchmark



More throughput

 * There are two key factors that influence throughput, the most important being
   latency.  If a SingleStore query takes 10 ms, and the same query on a
   similarly sized Databricks cluster takes 1 second then, all other things
   being equal, SingleStore will have 100x the throughput of Databricks. See the
   above section for details on the superiority of SingleStore in terms of
   latency

 * The other additional factor is concurrency. A system that has interruptions
   from queries interfering with each other will have less throughput —  again,
   with all other things being equal.  SingleStore has advantages over
   Databricks in this regard as well. For example, SingleStore has default row
   level locking, which you can compare to the equivalent write conflict
   functionality in Databricks that only operates at the table level (except in
   a few heavily caveated cases only available in preview). This type of feature
   is much harder for Databricks because anyone can write to their open tables
   at any time, which means they have to add a lot of additional steps to avoid
   write conflicts

 * The most popular benchmark to test throughput is derived from TPC-C, which
   delivers its results in “transactions per minute”. We’ve published
   SingleStore’s performance on TPC-C, and as far as we can tell, Databricks has
   never done the same and neither have any other third parties



More cost effective

 * To meet the same real-time SLA as SingleStoreDB, Databricks requires an extra
   database and an extra messaging bus. And whether you choose open-source
   software or a managed solution, you are going to end up paying more either
   way because the former takes more employees and the latter costs money
 * SingleStore can often execute the same query 10x - 100x faster than
   Databricks (see latency section), and SingleStore has better concurrency (see
   throughput section). Since no amount of money will let Databricks match
   SingleStore latency, throughput can only be matched if Databricks users scale
   up and spend a lot more money to achieve the same result.  Net / net, CSPs
   charge by the hour, and if you can make your job take way less time, it will
   cost you way less money



More available

 * Databricks can’t serve applications and use cases that need RPO=0 and very
   low RTO because they don’t have high availability features like replication,
   cross-AZ, always having two hot copies of the data ready for querying and
   incremental backups



Much simpler

 * SingleStore is more real time. If an aggregate on streaming data has a
   windowing function with a 5-second or 1-minute window, SingleStore will
   surface the data immediately on a partial time window in the next query. 
   Contrast this with Databricks users computing a result of an aggregation in a
   streaming pipeline — they will only see the result of the aggregation once
   the window ends and the result is inserted into a database 

 * We won’t force you to reason about joining streams — joining tables is much
   easier to reason about

 * You won’t need to worry about late arriving data. If some events are late,
   the next query will reflect changes made in the past in the event timeline

 * We support exactly once, so we won’t lose your data — unlike Databricks,
   where “Exactly once end-to-end processing will not be supported.”
 * Pipelines ending in stored procedures can perform transformations and
   maintain running aggregates
 * SingleStore supports read-modify-write so the final use case can be simpler,
   without the need to stick to a pure event-based programming and data modeling
   paradigm
 * SingleStore can store and execute code in notebooks or stored procedures,
   whereas Databricks only has notebooks
 * And finally, at the risk of of repeating ourselves, but it bears repeating,
   no extra databases are needed



To put it simply, SingleStore’s queries are so efficient and reliably fast that
we can support high concurrency and, combined with our high availability, even
power applications.  This is why companies like LiveRamp and Outreach (which
also use Databricks), trust SingleStore to power their mission critical, real-
time analytics workloads.



Here’s a table to help you keep track of the everything we’ve discussed:



CapabilityDatabricksSingleStoreDBStorage layers2 (only 1 automatic)3Ingest
layerObject Store (high latency)Local Disk with replication (low
latency)Products needed for streamingDatabricks + another dbOne; only
SingleStoreDBTPCH-SF-10 Benchmark58.4 seconds33.2 secondsTPC-C
BenchmarkUnavailable12,545Can serve low RPO / RTO applicationsNoYesCan transform
streaming dataYes (structured streaming)Yes (pipelines -> stored proc)Exactly
once supportedNoYesEasy relational queriesNot in structured streamingYesBest
solution for data exploration and machine learningYesNoBest solution for
real-time analytics, operations, and applicationsNoYes

Stick around for part 2 of this series, in which we will add more details about
the best ways to use SingleStore and Databricks together, and SingleStore’s
performance and cost advantages in the non real-time, batch ETL space.

November 14th, 2023
Tag Icon
Product

--------------------------------------------------------------------------------

Dave Eyler
Dave Eyler is Senior Director of Product Management at SingleStore.
Eugene Kogan
Eugene Kogan is a Principal Architect at SingleStore.
Adam Prout
Adam Prout was co-founding engineer and former head of SingleStore Engineering.
Adam spent five years as a senior database engineer at Microsoft SQL Server
where he led engineering efforts on kernel development. Adam holds a bachelor’s
and master’s in computer science from the University of Waterloo and is an
expert in distributed database systems.


RELATED-READINGRELATED READING

Chevron Left IconChevron Right Icon
Product


HOW TO USE THE SINGLESTORE CHATGPT PLUGIN

July 28th, 2023
Read Now
Product


UNIVERSAL STORAGE, PART 6: COLUMN GROUP

May 17th, 2023
Read Now
Product


CONSOLIDATE COMPLEX DATA WORKFLOWS INTO FAST, IMPACTFUL BU…

August 23rd, 2022
Read Now
Product


ARE MY APPLICATIONS DATA INTENSIVE?

March 15th, 2022
Read Now
Product


THE RECIPE FOR A SINGLESTORE DATABASE

January 4th, 2022
Read Now
Product


LOAD-BALANCED FAILOVER IN SINGLESTOREDB SELF-MANAGED 7.1

July 16th, 2020
Read Now


START BUILDING WITH SINGLESTOREDB



Start freeAngle Right IconTalk to a specialist


EXPLORE MORE RESOURCES

Book IconDocumentationCircle Dollar IconPricingCircle Play IconGet started with
SingleStore


FOOTER-NAV-LABELSITEMAP

Follow Us

Github IconTwitter IconLinkedin In IconYoutube Icon


PRODUCT

 * Overview
   
    * Platform
    * Security
    * Data Ingestion
    * Tools and Monitoring
    * Partner Connect
    * Pricing

 * Services
   
    * Professional Services
    * Support


SOLUTIONS

 * Featured
   
    * Why SingleStore?
    * Compare Databases
    * Customer Case Studies

 * Use CasesExternal Link Icon
    * SaaS Applications
    * Real-Time Analytics
    * Generative AI

 * IndustriesExternal Link Icon
    * Financial Services
    * Media and Communication
    * High Tech SaaS


DOCS

 * Overview
   
    * Getting Started
    * Build an Application
    * Speed up Dashboard

 * CloudExternal Link Icon
    * Overview
    * FAQ
    * Guides
    * Releases

 * Self-ManagedExternal Link Icon
    * Overview
    * FAQ
    * Guides
    * Releases


RESOURCES

 * Community
   
    * Developer Hub
    * Forums
    * Events
    * Webinars

 * Education
   
    * Training
    * Certifications
    * Courses

 * LearnExternal Link Icon
    * Industry Reports
    * Videos
    * eBooks
    * White Papers
    * Solution Briefs


COMPANY

 * Overview
   
    * About Us
    * Blog
    * News & Press
    * Leadership
    * Careers
    * SingleStore.org
    * Brand
    * Legal
    * Contact Us

 * Partnerships
   
    * Become a Partner
    * Partnerships

© SingleStore, Inc.
PrivacyTerms of ServiceLegal Terms and Conditions