Anomaly Detection

Anomaly

Anomaly-related algorithms.

class nupic.algorithms.anomaly.Anomaly(slidingWindowSize=None, mode='pure', binaryAnomalyThreshold=None)

Utility class for generating anomaly scores in different ways.

Parameters:
MODE_LIKELIHOOD = 'likelihood'

Uses the AnomalyLikelihood class, which models probability of receiving this value and anomalyScore

MODE_PURE = 'pure'

Default mode. The raw anomaly score as computed by computeRawAnomalyScore()

MODE_WEIGHTED = 'weighted'

Multiplies the likelihood result with the raw anomaly score that was used to generate the likelihood (anomaly * likelihood)

compute(activeColumns, predictedColumns, inputValue=None, timestamp=None)

Compute the anomaly score as the percent of active columns not predicted.

Parameters:
  • activeColumns – array of active column indices
  • predictedColumns – array of columns indices predicted in this step (used for anomaly in step T+1)
  • inputValue – (optional) value of current input to encoders (eg “cat” for category encoder) (used in anomaly-likelihood)
  • timestamp – (optional) date timestamp when the sample occured (used in anomaly-likelihood)
Returns:

the computed anomaly score; float 0..1

nupic.algorithms.anomaly.computeRawAnomalyScore(activeColumns, prevPredictedColumns)

Computes the raw anomaly score.

The raw anomaly score is the fraction of active columns not predicted.

Parameters:
  • activeColumns – array of active column indices
  • prevPredictedColumns – array of columns indices predicted in prev step
Returns:

anomaly score 0..1 (float)

AnomalyLikelihood

This module analyzes and estimates the distribution of averaged anomaly scores from a given model. Given a new anomaly score s, estimates P(score >= s).

The number P(score >= s) represents the likelihood of the current state of predictability. For example, a likelihood of 0.01 or 1% means we see this much predictability about one out of every 100 records. The number is not as unusual as it seems. For records that arrive every minute, this means once every hour and 40 minutes. A likelihood of 0.0001 or 0.01% means we see it once out of 10,000 records, or about once every 7 days.

USAGE

There are two ways to use the code: using the anomaly_likelihood.AnomalyLikelihood helper class or using the raw individual functions estimateAnomalyLikelihoods() and updateAnomalyLikelihoods().

Low-Level Function Usage

There are two primary interface routines.

Initially:

likelihoods, avgRecordList, estimatorParams = \
  estimateAnomalyLikelihoods(metric_data)

Whenever you get new data:

likelihoods, avgRecordList, estimatorParams = \
  updateAnomalyLikelihoods(data2, estimatorParams)

And again (make sure you use the new estimatorParams returned in the above call to updateAnomalyLikelihoods!).

likelihoods, avgRecordList, estimatorParams = \
  updateAnomalyLikelihoods(data3, estimatorParams)

Every once in a while update estimator with a lot of recent data.

likelihoods, avgRecordList, estimatorParams = \
  estimateAnomalyLikelihoods(lots_of_metric_data)

PARAMS

The parameters dict returned by the above functions has the following structure. Note: the client does not need to know the details of this.

{
  "distribution":               # describes the distribution
    {
      "name": STRING,           # name of the distribution, such as 'normal'
      "mean": SCALAR,           # mean of the distribution
      "variance": SCALAR,       # variance of the distribution

      # There may also be some keys that are specific to the distribution
    },

  "historicalLikelihoods": []   # Contains the last windowSize likelihood
                                # values returned

  "movingAverage":              # stuff needed to compute a rolling average
                                # of the anomaly scores
    {
      "windowSize": SCALAR,     # the size of the averaging window
      "historicalValues": [],   # list with the last windowSize anomaly
                                # scores
      "total": SCALAR,          # the total of the values in historicalValues
    },

}
class nupic.algorithms.anomaly_likelihood.AnomalyLikelihood(claLearningPeriod=None, learningPeriod=288, estimationSamples=100, historicWindowSize=8640, reestimationPeriod=100)

Helper class for running anomaly likelihood computation. To use it simply create an instance and then feed it successive anomaly scores:

anomalyLikelihood = AnomalyLikelihood()
while still_have_data:
  # Get anomaly score from model

  # Compute probability that an anomaly has ocurred
  anomalyProbability = anomalyLikelihood.anomalyProbability(
      value, anomalyScore, timestamp)
anomalyProbability(value, anomalyScore, timestamp=None)

Compute the probability that the current value plus anomaly score represents an anomaly given the historical distribution of anomaly scores. The closer the number is to 1, the higher the chance it is an anomaly.

Parameters:
  • value – the current metric (“raw”) input value, eg. “orange”, or ‘21.2’ (deg. Celsius), ...
  • anomalyScore – the current anomaly score
  • timestamp – [optional] timestamp of the ocurrence, default (None) results in using iteration step.
Returns:

the anomalyLikelihood for this record.

static computeLogLikelihood(likelihood)

Compute a log scale representation of the likelihood value. Since the likelihood computations return low probabilities that often go into four 9’s or five 9’s, a log value is more useful for visualization, thresholding, etc.

classmethod read(proto)

capnp deserialization method for the anomaly likelihood object

Parameters:proto – (Object) capnp proto object specified in nupic.regions.AnomalyLikelihoodRegion.capnp
Returns:(Object) the deserialized AnomalyLikelihood object
write(proto)

capnp serialization method for the anomaly likelihood object

Parameters:proto – (Object) capnp proto object specified in nupic.regions.AnomalyLikelihoodRegion.capnp
nupic.algorithms.anomaly_likelihood.estimateAnomalyLikelihoods(anomalyScores, averagingWindow=10, skipRecords=0, verbosity=0)

Given a series of anomaly scores, compute the likelihood for each score. This function should be called once on a bunch of historical anomaly scores for an initial estimate of the distribution. It should be called again every so often (say every 50 records) to update the estimate.

Parameters:
  • anomalyScores

    a list of records. Each record is a list with the following three elements: [timestamp, value, score]

    Example:

    [datetime.datetime(2013, 8, 10, 23, 0), 6.0, 1.0]
    

    For best results, the list should be between 1000 and 10,000 records

  • averagingWindow – integer number of records to average over
  • skipRecords – integer specifying number of records to skip when estimating distributions. If skip records are >= len(anomalyScores), a very broad distribution is returned that makes everything pretty likely.
  • verbosity

    integer controlling extent of printouts for debugging

    0 = none 1 = occasional information 2 = print every record

Returns:

3-tuple consisting of:

  • likelihoods

    numpy array of likelihoods, one for each aggregated point

  • avgRecordList

    list of averaged input records

  • params

    a small JSON dict that contains the state of the estimator

nupic.algorithms.anomaly_likelihood.updateAnomalyLikelihoods(anomalyScores, params, verbosity=0)

Compute updated probabilities for anomalyScores using the given params.

Parameters:
  • anomalyScores

    a list of records. Each record is a list with the following three elements: [timestamp, value, score]

    Example:

    [datetime.datetime(2013, 8, 10, 23, 0), 6.0, 1.0]
    
  • params – the JSON dict returned by estimateAnomalyLikelihoods
  • verbosity (int) – integer controlling extent of printouts for debugging
Returns:

3-tuple consisting of:

  • likelihoods

    numpy array of likelihoods, one for each aggregated point

  • avgRecordList

    list of averaged input records

  • params

    an updated JSON object containing the state of this metric.