Anomaly Detection¶
Anomaly¶
Anomaly-related algorithms.
-
class
nupic.algorithms.anomaly.
Anomaly
(slidingWindowSize=None, mode='pure', binaryAnomalyThreshold=None)¶ Utility class for generating anomaly scores in different ways.
Parameters: - slidingWindowSize – [optional] - how many elements are summed up; enables moving average on final anomaly score; int >= 0
- mode –
(string) [optional] how to compute anomaly, one of:
- binaryAnomalyThreshold – [optional] if set [0,1] anomaly score will be discretized to 1/0 (1 if >= binaryAnomalyThreshold) The transformation is applied after moving average is computed.
-
MODE_LIKELIHOOD
= 'likelihood'¶ Uses the
AnomalyLikelihood
class, which models probability of receiving this value and anomalyScore
-
MODE_PURE
= 'pure'¶ Default mode. The raw anomaly score as computed by
computeRawAnomalyScore()
-
MODE_WEIGHTED
= 'weighted'¶ Multiplies the likelihood result with the raw anomaly score that was used to generate the likelihood (anomaly * likelihood)
-
compute
(activeColumns, predictedColumns, inputValue=None, timestamp=None)¶ Compute the anomaly score as the percent of active columns not predicted.
Parameters: - activeColumns – array of active column indices
- predictedColumns – array of columns indices predicted in this step (used for anomaly in step T+1)
- inputValue – (optional) value of current input to encoders (eg “cat” for category encoder) (used in anomaly-likelihood)
- timestamp – (optional) date timestamp when the sample occured (used in anomaly-likelihood)
Returns: the computed anomaly score; float 0..1
-
nupic.algorithms.anomaly.
computeRawAnomalyScore
(activeColumns, prevPredictedColumns)¶ Computes the raw anomaly score.
The raw anomaly score is the fraction of active columns not predicted.
Parameters: - activeColumns – array of active column indices
- prevPredictedColumns – array of columns indices predicted in prev step
Returns: anomaly score 0..1 (float)
AnomalyLikelihood¶
This module analyzes and estimates the distribution of averaged anomaly scores
from a given model. Given a new anomaly score s
, estimates
P(score >= s)
.
The number P(score >= s)
represents the likelihood of the current state of
predictability. For example, a likelihood of 0.01 or 1% means we see this much
predictability about one out of every 100 records. The number is not as unusual
as it seems. For records that arrive every minute, this means once every hour
and 40 minutes. A likelihood of 0.0001 or 0.01% means we see it once out of
10,000 records, or about once every 7 days.
USAGE¶
There are two ways to use the code: using the
anomaly_likelihood.AnomalyLikelihood
helper class or using the raw
individual functions estimateAnomalyLikelihoods()
and
updateAnomalyLikelihoods()
.
Low-Level Function Usage¶
There are two primary interface routines.
estimateAnomalyLikelihoods()
: batch routine, called initially and once in a whileupdateAnomalyLikelihoods()
: online routine, called for every new data point
Initially:
likelihoods, avgRecordList, estimatorParams = \
estimateAnomalyLikelihoods(metric_data)
Whenever you get new data:
likelihoods, avgRecordList, estimatorParams = \
updateAnomalyLikelihoods(data2, estimatorParams)
And again (make sure you use the new estimatorParams returned in the above call to updateAnomalyLikelihoods!).
likelihoods, avgRecordList, estimatorParams = \
updateAnomalyLikelihoods(data3, estimatorParams)
Every once in a while update estimator with a lot of recent data.
likelihoods, avgRecordList, estimatorParams = \
estimateAnomalyLikelihoods(lots_of_metric_data)
PARAMS¶
The parameters dict returned by the above functions has the following structure. Note: the client does not need to know the details of this.
{
"distribution": # describes the distribution
{
"name": STRING, # name of the distribution, such as 'normal'
"mean": SCALAR, # mean of the distribution
"variance": SCALAR, # variance of the distribution
# There may also be some keys that are specific to the distribution
},
"historicalLikelihoods": [] # Contains the last windowSize likelihood
# values returned
"movingAverage": # stuff needed to compute a rolling average
# of the anomaly scores
{
"windowSize": SCALAR, # the size of the averaging window
"historicalValues": [], # list with the last windowSize anomaly
# scores
"total": SCALAR, # the total of the values in historicalValues
},
}
-
class
nupic.algorithms.anomaly_likelihood.
AnomalyLikelihood
(claLearningPeriod=None, learningPeriod=288, estimationSamples=100, historicWindowSize=8640, reestimationPeriod=100)¶ Bases:
nupic.serializable.Serializable
Helper class for running anomaly likelihood computation. To use it simply create an instance and then feed it successive anomaly scores:
anomalyLikelihood = AnomalyLikelihood() while still_have_data: # Get anomaly score from model # Compute probability that an anomaly has ocurred anomalyProbability = anomalyLikelihood.anomalyProbability( value, anomalyScore, timestamp)
-
anomalyProbability
(value, anomalyScore, timestamp=None)¶ Compute the probability that the current value plus anomaly score represents an anomaly given the historical distribution of anomaly scores. The closer the number is to 1, the higher the chance it is an anomaly.
Parameters: - value – the current metric (“raw”) input value, eg. “orange”, or ‘21.2’ (deg. Celsius), ...
- anomalyScore – the current anomaly score
- timestamp – [optional] timestamp of the ocurrence, default (None) results in using iteration step.
Returns: the anomalyLikelihood for this record.
-
static
computeLogLikelihood
(likelihood)¶ Compute a log scale representation of the likelihood value. Since the likelihood computations return low probabilities that often go into four 9’s or five 9’s, a log value is more useful for visualization, thresholding, etc.
-
classmethod
read
(proto)¶ capnp deserialization method for the anomaly likelihood object
Parameters: proto – (Object) capnp proto object specified in nupic.regions.AnomalyLikelihoodRegion.capnp Returns: (Object) the deserialized AnomalyLikelihood object
-
write
(proto)¶ capnp serialization method for the anomaly likelihood object
Parameters: proto – (Object) capnp proto object specified in nupic.regions.AnomalyLikelihoodRegion.capnp
-
-
nupic.algorithms.anomaly_likelihood.
estimateAnomalyLikelihoods
(anomalyScores, averagingWindow=10, skipRecords=0, verbosity=0)¶ Given a series of anomaly scores, compute the likelihood for each score. This function should be called once on a bunch of historical anomaly scores for an initial estimate of the distribution. It should be called again every so often (say every 50 records) to update the estimate.
Parameters: - anomalyScores –
a list of records. Each record is a list with the following three elements: [timestamp, value, score]
Example:
[datetime.datetime(2013, 8, 10, 23, 0), 6.0, 1.0]
For best results, the list should be between 1000 and 10,000 records
- averagingWindow – integer number of records to average over
- skipRecords – integer specifying number of records to skip when estimating distributions. If skip records are >= len(anomalyScores), a very broad distribution is returned that makes everything pretty likely.
- verbosity –
integer controlling extent of printouts for debugging
0 = none 1 = occasional information 2 = print every record
Returns: 3-tuple consisting of:
likelihoods
numpy array of likelihoods, one for each aggregated point
avgRecordList
list of averaged input records
params
a small JSON dict that contains the state of the estimator
- anomalyScores –
-
nupic.algorithms.anomaly_likelihood.
updateAnomalyLikelihoods
(anomalyScores, params, verbosity=0)¶ Compute updated probabilities for anomalyScores using the given params.
Parameters: - anomalyScores –
a list of records. Each record is a list with the following three elements: [timestamp, value, score]
Example:
[datetime.datetime(2013, 8, 10, 23, 0), 6.0, 1.0]
- params – the JSON dict returned by estimateAnomalyLikelihoods
- verbosity (int) – integer controlling extent of printouts for debugging
Returns: 3-tuple consisting of:
likelihoods
numpy array of likelihoods, one for each aggregated point
avgRecordList
list of averaged input records
params
an updated JSON object containing the state of this metric.
- anomalyScores –