Data Engineering Hub
Welcome! It’s a great place for data engineers who want to learn and share their knowledge and ideas through GitHub ⭐ Our GitHub repository and 🗣️ shared wiki Star
Let’s get started~
There are various types of databases, such as relational databases, key-value pair databases, document databases, wide column databases, graph databases, analytical databases, etc.
Data ingestion, or data integration, is the first step of a data platform, and both open source and cloud service providers provide data integration services for different situations.
The look and feel can be easily customized by CSS custom properties (variables), features can be adjusted by Hugo parameters.
Data computing corresponds to various ETL operations in data warehouse construction, and the focus here is on open source Spark, Flink, and common cloud data computing services.
Data analysis is closely related to the business value of the data platform, and the focus here is on the relationship between data analysis and business value, and how to use business value to drive the construction of the entire data platform.
The data ultimately needs to be presented to the user in a certain format, such as line charts, pie charts, funnel charts, etc., here are several common open-source data visualization tools and business suites.
Data governance is a comprehensive field that involves many fragmented aspects, such as data quality management, metadata management, data compliance management, data permission control, data lineage, etc., which are introduced in this chapter.
Data science is a more ambitious topic, and here is just a discussion of common machine learning algorithms and their application scenarios, natural language processing technology, computer vision technology, etc.
This chapter introduces how to build data platforms in the cloud from cloud vendors such as AWS and Alibaba Cloud, as well as the related services involved.