Metrics Advisor for Equipment three columns:
Timestamp
that indicates the date and time. The timestamp column should contain date-time values in the yyyy-MM-ddTHH:mm:ss format and conform to ISO 8601. All timestamps passed to the service must be formatted as UTC timestamps.Variable name
. The variable name column should contain the name of the variable for each data pointVariable value
.The value column should correspond to numerical metrics such as revenue, number of users, latency, and error rate.Timestamp(Date-Time) | Variable name(String) | Variable value(numeric) |
---|---|---|
1/1/2020 0:05 | Sensor 1 | 2.0456 |
1/1/2020 0:10 | Sensor 2 | 6.4948 |
1/1/2020 0:15 | Sensor 3 | 4.2938 |
1/1/2020 0:20 | Sensor 2 | 4.3894 |
1/1/2020 0:25 | Sensor 1 | 5.4098 |
The easiest way to prevent problems with column headers is to take the following precautions: o Character length limit: 200 characters. o Valid characters: 0-9, a-z, A-Z, and _ (underscore). o Make sure that you don't have any duplicated column headers.
:white_check_mark:Your dataset should contain time-series data that's generated from an industrial asset such as a pump, compressor, motor, and so on. Each asset should generate data from one or more sensors. The data that Metrics Advisor for Equipment uses for training should represent the asset's condition and operation.
:white_check_mark:We recommend that you remove unnecessary sensor data. With data from too few sensors, you might miss critical information. With data from too many sensors, your model might overfit the data and miss out on critical patterns.
**Missing data: **In general, the missing value ratio of training data should be under 20%. Too much missing data may result in automatically filled values (usually linear or constant values) being learned as normal patterns. That may result in real (not missing) data points being detected as anomalies. Metrics Advisor for Equipment automatically fills in missing data (known as imputing). It does this by forwarding filling previous sensor readings. However, if too much original data is missing, it might affect your results.
Data granularity: Ensure each variable has at most one data point within each interval. For example, suppose that data from some sensors are being recorded every 1 minute and other sensors are recording every 5 minutes. In this case, set the data granularity to an interval of 1 minute. If your data granularity unit is minute, make sure that your data granularity is a multiple or factors of 60.
**Note: **Finding the right granularity is important. When data granularity is very high, you have more data, but it will be more "noisy," and as your data granularity goes down, it becomes tough to detect anomalies.
Once you have your data, you must establish how Metrics Advisor for Equipment will ingest it. Metrics Advisor for Equipment can look very different depending on how fast you need results or how much data goes into detecting an anomaly. To establish data ingestion, we'll consider two other states of the data: batch and streaming data. The data's state impacts the speed and accuracy of the anomaly detection algorithm(s).