www.analytical.works Open in urlscan Pro
104.21.59.9  Public Scan

Submitted URL: https://m-kh.info/
Effective URL: https://www.analytical.works/
Submission: On July 04 via api from US — Scanned from CA

Form analysis 0 forms found in the DOM

Text Content

Navigation
Masood Krohy Senior Solutions Architect (AI & Big Data)
Home (Blog) Talks, Publications & Theses Tech Profile

I am a Senior Solutions Architect and have led several large-scale data projects
in a number of large corporations in different industries. I am also the CEO at
PatternedScience and the chief architect of UniAnalytica, an advanced data
science platform (Big Data and AI). I hold a Ph.D. in computer engineering. This
site contains my writings regarding various projects; feel free to get in touch
if you've got a question.
Email LinkedIn Twitter RSS

0 Results

 * 13 Mar, 2024


DESIGNING AN ENTERPRISE SOLUTION ARCHITECTURE FOR ML / AI / GENAI USE CASES

INTRODUCTION

Designing a fit-for-purpose solution architecture at an enterprise requires
consideration of the following three elements:

 1. Requirements specific to the new use case (ML / AI / GenAI)
 2. The existing IT architecture landscape
 3. New candidate platforms and infrastructure that could help fill the gaps

This short article highlights the importance of identifying the intersection
among the three elements so that one ends up with a solution architecture that
meets the need of the new use case, uses as much as possible components from the
existing IT architecture landscape and fills the gaps with tools, platforms or
infrastructure that are needed to be brought in to arrive at a viable, working
solution architecture.

Read More

Share
 * 
 * 

 * architecture
 * data-science
 * ML
 * AI
 * GenAI

 * 12 Jun, 2019


TALK/DEMO: BIG GEOSPATIAL DATA WITH OPEN-SOURCE TECH (VECTORS, RASTERS &
MAP-MATCHING)

SUMMARY

Geospatial datasets (i.e. geocoded data points) are everywhere nowadays and
often add enormous value to data analytics/mining and machine learning projects.
In this new era of Big Data, libraries and engines such as GeoPandas, PostGIS
and the equivalent products in the commercial space often fall short and cannot
scale up sufficiently to let us tap into the Big Data that is being collected in
many use cases and by many organizations. In this talk/demo, we explore free,
open-source, Big Data-ready technologies and workflows like GeoMesa, GeoPySpark
and OSRM-on-Spark and show how to use these Apache Spark-based tech/workflows
for key geospatial operations and use cases. We start by introducing GeoMesa and
demo-ing how it can be used to ingest Big Geospatial Data and perform operations
on vectors. Next, we briefly introduce GeoPySpark, the Python interface to
Geotrellis, for performing operations on rasters. At the end, we turn to
map-matching which is the process of associating names to geocoded data points
from an underlying network (e.g., determining which street a particular GPS
point should be associated with). We describe and demo how we can combine OSRM
with Spark to do scalable map-matching on Big Data and therefore open up a lot
of possibilities for advanced data mining and machine learning projects.

Slides

VIDEO


Share
 * 
 * 

 * talk
 * data-science

 * 22 May, 2019


TALK/DEMO: LARGE-SCALE EXPERIMENTATION WITH SPARK & PRODUCTIONIZING NATIVE SPARK
ML MODELS

SUMMARY

Apache Spark is the state-of-the-art distributed processing, analytics and ML
engine and we are presenting and demo-ing two interesting ways one can use Spark
in ML projects: 1) we use Spark to distribute the grid-search optimization of a
generic ML model (from a regular, single-machine ML library). We show how Spark
can distribute processing tasks over the CPU cores of a cluster which gives a
near-linear speedup and lowers processing times; hence it facilitates the
exploration of a much larger space to find the optimal hyperparameters for the
ML model. This use case is suitable when the projects do not involve Big Data
and we use Big Data technologies, i.e., Spark, for the purpose of speeding up
the processing of tasks; 2) we demonstrate how to train an example model using
the ML lib of Spark itself and how to serve the model with MLeap, a
production-quality, low-latency serving engine. This second use case/workflow is
suitable when projects do involve Big Data.

Slides

VIDEO


Share
 * 
 * 

 * talk
 * data-science

 * 1 May, 2019


TALK/DEMO: SUPERCHARGING ANALYTICS WITH GPUS: OMNISCI/CUDF VS
POSTGRES/PANDAS/PDAL

SUMMARY

GPUs are known to significantly accelerate machine learning model training
speeds, especially when using deep learning libraries like TensorFlow. But did
you know that there are now solid options to also accelerate data analytics
workloads, BI tools and dashboards with the help of GPUs? Join us for a
presentation of performance benchmarks of GPU-based options and their CPU-based
counterparts. We compare the performance that one could get from OmniSci Core DB
(a GPU database) compared to the performance of Postgres DB (for data analytics)
and PDAL (for LiDAR processing). On the in-memory side, we benchmark cuDF
(NVIDIA’s GPU DataFrame) against the widely popular Pandas DataFrame. We will
share results and include some code walk-throughs and live benchmarking. Coming
out of this technical talk, you will have insight regarding how GPUs can
accelerate your data analytics and geospatial workloads.

 * Slides
 * GitHub Repo

VIDEO


Share
 * 
 * 

 * talk
 * data-science

 * 10 Apr, 2019


TALK/DEMO: SEQ2SEQ MODEL ON TIME-SERIES DATA: TRAINING AND SERVING WITH
TENSORFLOW

SUMMARY

Seq2seq models are a class of Deep Learning models that have provided
state-of-the-art solutions to language problems recently. They also perform very
well on numerical, time-series data which is of particular interest in finance
and IoT, among others. In this hands-on demo/code walkthrough, we explain the
model development and optimization with TensorFlow (its low-level API). We then
serve the model with TensorFlow Serving and show how to write a client to
communicate with TF Serving over the network and use/plot the received
predictions.

 * Slides
 * GitHub Repo

VIDEO


Share
 * 
 * 

 * talk
 * data-science

 * 1 Jul, 2018


NEWS BRIEF: 2013-2018 SUMMARY

Most projects on this site cover the pre-PhD period. The projects done during
the PhD are documented under PhD Thesis as well as under Publications. Projects
after the PhD, i.e., 2013 onward, are largely undocumented due to
confidentiality restrictions, as I have been working in the industry. The
projects that I have worked on in this period and recently are in Big Data and
Deep Learning. There are few items that I have worked on on the side and could
publish:

 * Productionization of TensorSpark in yarn-cluster mode (tested in an HDP
   cluster): I contributed to the TensorSpark project, helping people run it in
   a YARN-based production environment. TensorSpark implements Downpour SGD, a
   Google idea. This asynchronous stochastic gradient descent (SGD) is
   intuitively more suitable for cloud-based Spark clusters, as your cluster
   workers are typically sprinkled all over the data center and you want to
   avoid a network bottleneck which affects few workers to slow down too much
   the model training. See the GitHub issue/PR for details.
 * Class Activation Map is a great tool to help fine-tune and better understand
   a Deep Learning model (ConvNets). I created a notebook to help with this.
   Tech setup: Jupyter notebook / Python / TensorFlow / VGG model / Caltech256
   dataset

While the above items are developed in my own time, I used them subsequently in
the projects of the companies that I have worked for at the time.

Share
 * 
 * 

 * news-brief

 * 12 Dec, 2009


PROJECT: A PRESENCE-BASED MESSAGING APPLICATION

Date Completed: December 2009

Here is the Report of this project.

As per the specification, a presence-based messaging and file-exchange
application has been designed and implemented. Here is the scenario:

 * Clients connect to the server and immediately declare their presence.
 * A connected client can initiate a session by sending the request to the
   server along with the preferred number of clients in the session.
 * The server application checks the number of available clients for the
   session.
 * The server application initiates a session between the clients, if preferred
   number of clients are available.
 * When the session is underway, participants can exchange messages and files.
 * Only the session initiator can terminate the session.

Share
 * 
 * 

 * project
 * networks

 * 27 Nov, 2009


PROJECT: SIMULATION OF SLOTTED ALOHA

Date Completed: November 2009

Here is the Report of this project.

Share
 * 
 * 

 * project
 * networks

 * 30 Aug, 2009


TECH REPORT: PEER-TO-PEER TRAFFIC

Date Completed: August 2009

This Tech Report deals with Peer-to-Peer protocols. We start by giving a brief
account of history of P2P applications and then cite from some of the P2P
traffic measurement studies. P2P traffic identification methods and the recent
P2P traffic optimization schemes constitute the core of this report, in which we
examine the state-of-the-art in this field.

Share
 * 
 * 

 * tech-report
 * networks

 * 30 Jul, 2009


TECH REPORT: BITTORRENT

Date Completed: July 2009

BitTorrent protocol has emerged as the most popular P2P protocol over the past
years. The core BitTorrent protocol has been designed and implemented by Bram
Cohen in 2001.

The protocol is especially useful for distributing large popular files (like
open-source operating system distributions) as its performance improves as the
number of interested connected peers increases. The way in which BitTorrent
operates lessens the burden (hardware costs and bandwidth resources) of servers
hosting the files and distributes that burden among all the peers currently
connected, reducing costs significantly for original content distributors as a
result. Connected peers share the task of serving the content to newly-connected
peers and a “tit-for-tat” mechanism ensures fairness among all the peers. This
method of content sharing also improves redundancy in the overlay network
(formed around that specific content), as a probable malfunctioning of the
original content provider does not render the content unavailable. In this Tech
Report, we explain the functionality of the BitTorrent protocol and its various
system components.

Share
 * 
 * 

 * tech-report
 * networks

Page 1 of 3 Next →