The article starts from the four modules of the data life cycle, and makes a brief analysis and introduction to the data collection, processing, storage and analysis. Hope it helps you. We have learned 4 steps before, using OSM and the first key indicator bulk sms service method to determine the core indicators , and then we will talk about the full life cycle of data. Next, these major modules are introduced: data collection Data Preprocessing - ETL Data Storage - Data Warehouse Data Analysis - OLAP/Business Models 1. Data collection According to the data source, the data can be divided into the following types: Embedding behavior data : Some behavioral data collected by burying points, bulk sms service such as browsing, clicking, staying time, etc. Business data : along with the data generated by the business, the core is the business form data stored in the production system Log data : generally the data recorded by the web-side log
Externally Accessed Data : Data obtained from third parties According to the data type, it can be divided into: structured data, semi-structured data, unstructured data. (1) Structured data It is generally obtained from internal databases and external bulk sms service open database interfaces, and generally stores product business operation data and user operation result data, such as the number of registered users, the number of orders placed, and the number of completed orders. This type of data format specification is typically represented by data in a relational database, which can be stored in a two-dimensional table, with a fixed number of fields, each field has a fixed data bulk sms service type (numbers, characters, dates, etc.), and the length of each byte is relatively fixed. This kind of data is easy to maintain and manage, and it is also the most convenient data format for query, display and analysis.
Semi-structured data The click log of the application and some user behavior data usually refer to the data output in the log data, xml, and other bulk sms service formats. The format is relatively standardized, usually plain text data. The data format needs to be parsed before it can be used for query or analysis of data. Each record has a predefined specification, but each record contains different information, different number of fields, different field names and field types, or also contains nested formats. (3) Unstructured data Refers to non-plain text data, there is no standard format, and the corresponding value cannot be directly parsed. Common unstructured data include rich text, pictures, sounds, bulk sms service videos and other data. Unless this type of data is for advanced text mining or multimedia data mining, unstructured data has no analytical value for daily data statistics and analysis. Generally,