Skip to main content

Panning for Gold

Michael Kleinhaus

Michael Kleinhaus

Technologist

When we think of the use cases that we are trying to solve with data. This data ends up going through common pillars of data management

ex. business intelligence and machine learning BI - find the data, blend it from different systems, model it to see how it fits, curate and clean the data, build beautiful visualizations and charts ML - find the data, blend it, model it, clean it, choose algorithms, deploy to prod, annotate and label data, expose models to third party systems

It's easy to identify that there is a common journey - what can be summed up as core data management

no matter the use case integrate, govern, clean, etc

data stewarding tools - something closer to citizen developer

cluedin is like a piece of middleware -

show potential dataprivacy issues - at the organization level

business have the best context for their data, cleaning tasks make sense to be handed to them rather than developers.

clued in uses a polyglot persistence layer - uses many different types of DBs.

What does CluedIn not do

  • no data warehousing - send it to a data warehouse with known structured data
  • not a BI tool
  • not a machine learning platform
  • no virtualization
  • not supply ms level processing data - more about getting data ready to use

All this is called a Data Fabric - stitching together the core data management layers

separate compute / storage - some parts of clued in are stateless and others are stateful

Technology Stack

.net Core / C# Docker / Kubernetes Neo4j / Janus / OngDB, ElasticSearch, SQL Server / Oracle / MySQL, Redis Rabbit MQ - Enterprise service bus REACT Web API GraphQL