Panning for Gold
Michael Kleinhaus
TechnologistWhen we think of the use cases that we are trying to solve with data. This data ends up going through common pillars of data management
ex. business intelligence and machine learning BI - find the data, blend it from different systems, model it to see how it fits, curate and clean the data, build beautiful visualizations and charts ML - find the data, blend it, model it, clean it, choose algorithms, deploy to prod, annotate and label data, expose models to third party systems
It's easy to identify that there is a common journey - what can be summed up as core data management
no matter the use case integrate, govern, clean, etc
data stewarding tools - something closer to citizen developer
cluedin is like a piece of middleware -
show potential dataprivacy issues - at the organization level
business have the best context for their data, cleaning tasks make sense to be handed to them rather than developers.
clued in uses a polyglot persistence layer - uses many different types of DBs.
What does CluedIn not do
- no data warehousing - send it to a data warehouse with known structured data
- not a BI tool
- not a machine learning platform
- no virtualization
- not supply ms level processing data - more about getting data ready to use
All this is called a Data Fabric - stitching together the core data management layers
separate compute / storage - some parts of clued in are stateless and others are stateful
Technology Stack
.net Core / C# Docker / Kubernetes Neo4j / Janus / OngDB, ElasticSearch, SQL Server / Oracle / MySQL, Redis Rabbit MQ - Enterprise service bus REACT Web API GraphQL