../_images/quickstart.png

Algorithms API

See the Algorithms API for an overview of this API.

Here is the complete program we are going to use as an example. Descriptions of the algorithm parameters we’re using in this Quick Start can be found here. In sections below, we’ll break it down into parts and explain what is happening (without some of the plumbing details).

import csv
import datetime
import numpy
import os
import yaml

from nupic.algorithms.sdr_classifier_factory import SDRClassifierFactory
from nupic.algorithms.spatial_pooler import SpatialPooler
from nupic.algorithms.temporal_memory import TemporalMemory
from nupic.encoders.date import DateEncoder
from nupic.encoders.random_distributed_scalar import \
  RandomDistributedScalarEncoder

_NUM_RECORDS = 3000
_EXAMPLE_DIR = os.path.dirname(os.path.abspath(__file__))
_INPUT_FILE_PATH = os.path.join(_EXAMPLE_DIR, os.pardir, "data", "gymdata.csv")
_PARAMS_PATH = os.path.join(_EXAMPLE_DIR, os.pardir, "params", "model.yaml")



def runHotgym(numRecords):
  with open(_PARAMS_PATH, "r") as f:
    modelParams = yaml.safe_load(f)["modelParams"]
    enParams = modelParams["sensorParams"]["encoders"]
    spParams = modelParams["spParams"]
    tmParams = modelParams["tmParams"]

  timeOfDayEncoder = DateEncoder(
    timeOfDay=enParams["timestamp_timeOfDay"]["timeOfDay"])
  weekendEncoder = DateEncoder(
    weekend=enParams["timestamp_weekend"]["weekend"])
  scalarEncoder = RandomDistributedScalarEncoder(
    enParams["consumption"]["resolution"])

  encodingWidth = (timeOfDayEncoder.getWidth()
                   + weekendEncoder.getWidth()
                   + scalarEncoder.getWidth())

  sp = SpatialPooler(
    inputDimensions=(encodingWidth,),
    columnDimensions=(spParams["columnCount"],),
    potentialPct=spParams["potentialPct"],
    potentialRadius=encodingWidth,
    globalInhibition=spParams["globalInhibition"],
    localAreaDensity=spParams["localAreaDensity"],
    numActiveColumnsPerInhArea=spParams["numActiveColumnsPerInhArea"],
    synPermInactiveDec=spParams["synPermInactiveDec"],
    synPermActiveInc=spParams["synPermActiveInc"],
    synPermConnected=spParams["synPermConnected"],
    boostStrength=spParams["boostStrength"],
    seed=spParams["seed"],
    wrapAround=True
  )

  tm = TemporalMemory(
    columnDimensions=(tmParams["columnCount"],),
    cellsPerColumn=tmParams["cellsPerColumn"],
    activationThreshold=tmParams["activationThreshold"],
    initialPermanence=tmParams["initialPerm"],
    connectedPermanence=spParams["synPermConnected"],
    minThreshold=tmParams["minThreshold"],
    maxNewSynapseCount=tmParams["newSynapseCount"],
    permanenceIncrement=tmParams["permanenceInc"],
    permanenceDecrement=tmParams["permanenceDec"],
    predictedSegmentDecrement=0.0,
    maxSegmentsPerCell=tmParams["maxSegmentsPerCell"],
    maxSynapsesPerSegment=tmParams["maxSynapsesPerSegment"],
    seed=tmParams["seed"]
  )

  classifier = SDRClassifierFactory.create()
  results = []
  with open(_INPUT_FILE_PATH, "r") as fin:
    reader = csv.reader(fin)
    headers = reader.next()
    reader.next()
    reader.next()

    for count, record in enumerate(reader):

      if count >= numRecords: break

      # Convert data string into Python date object.
      dateString = datetime.datetime.strptime(record[0], "%m/%d/%y %H:%M")
      # Convert data value string into float.
      consumption = float(record[1])

      # To encode, we need to provide zero-filled numpy arrays for the encoders
      # to populate.
      timeOfDayBits = numpy.zeros(timeOfDayEncoder.getWidth())
      weekendBits = numpy.zeros(weekendEncoder.getWidth())
      consumptionBits = numpy.zeros(scalarEncoder.getWidth())

      # Now we call the encoders to create bit representations for each value.
      timeOfDayEncoder.encodeIntoArray(dateString, timeOfDayBits)
      weekendEncoder.encodeIntoArray(dateString, weekendBits)
      scalarEncoder.encodeIntoArray(consumption, consumptionBits)

      # Concatenate all these encodings into one large encoding for Spatial
      # Pooling.
      encoding = numpy.concatenate(
        [timeOfDayBits, weekendBits, consumptionBits]
      )

      # Create an array to represent active columns, all initially zero. This
      # will be populated by the compute method below. It must have the same
      # dimensions as the Spatial Pooler.
      activeColumns = numpy.zeros(spParams["columnCount"])

      # Execute Spatial Pooling algorithm over input space.
      sp.compute(encoding, True, activeColumns)
      activeColumnIndices = numpy.nonzero(activeColumns)[0]

      # Execute Temporal Memory algorithm over active mini-columns.
      tm.compute(activeColumnIndices, learn=True)

      activeCells = tm.getActiveCells()

      # Get the bucket info for this input value for classification.
      bucketIdx = scalarEncoder.getBucketIndices(consumption)[0]

      # Run classifier to translate active cells back to scalar value.
      classifierResult = classifier.compute(
        recordNum=count,
        patternNZ=activeCells,
        classification={
          "bucketIdx": bucketIdx,
          "actValue": consumption
        },
        learn=True,
        infer=True
      )

      # Print the best prediction for 1 step out.
      oneStepConfidence, oneStep = sorted(
        zip(classifierResult[1], classifierResult["actualValues"]),
        reverse=True
      )[0]
      print("1-step: {:16} ({:4.4}%)".format(oneStep, oneStepConfidence * 100))
      results.append([oneStep, oneStepConfidence * 100, None, None])

    return results


if __name__ == "__main__":
  runHotgym(_NUM_RECORDS)

Encoding Data

For this quick start, we’ll be using the same raw input data file described here in detail and used for the OPF Quick Start. But we will be ignoring the file format and just looping over the CSV, encoding one row at a time programmatically before sending encodings to the Spatial Pooler and Temporal Memory algorithms.

One Row Of Data

Each row of data in this input file is formatted like this:

7/2/10 9:00,41.5

We need to encode this into three encodings:

  • time of day
  • weekend or not
  • scalar value for energy consumption

Creating Encoders

First, let’s create the encoders we’ll use to encode different semantics of our input data stream:

from nupic.encoders.date import DateEncoder
from nupic.encoders.random_distributed_scalar import \
    RandomDistributedScalarEncoder

timeOfDayEncoder = DateEncoder(timeOfDay=(21,1))
weekendEncoder = DateEncoder(weekend=21)
scalarEncoder = RandomDistributedScalarEncoder(0.88)

Encoding Data

With these encoders created, we can loop over each row of data, encode it into bits, and concatenate them together to form a complete representation for the next step:

with open (_INPUT_FILE_PATH) as fin:
  reader = csv.reader(fin)
  for count, record in enumerate(reader):
    # Convert data string into Python date object.
    dateString = datetime.datetime.strptime(record[0], "%m/%d/%y %H:%M")
    # Convert data value string into float.
    consumption = float(record[1])

    # To encode, we need to provide zero-filled numpy arrays for the encoders
    # to populate.
    timeOfDayBits = numpy.zeros(timeOfDayEncoder.getWidth())
    weekendBits = numpy.zeros(weekendEncoder.getWidth())
    consumptionBits = numpy.zeros(scalarEncoder.getWidth())

    # Now we call the encoders create bit representations for each value.
    timeOfDayEncoder.encodeIntoArray(dateString, timeOfDayBits)
    weekendEncoder.encodeIntoArray(dateString, weekendBits)
    scalarEncoder.encodeIntoArray(consumption, consumptionBits)

    # Concatenate all these encodings into one large encoding for Spatial
    # Pooling.
    encoding = numpy.concatenate(
      [timeOfDayBits, weekendBits, consumptionBits]
    )

    # Print complete encoding to the console as a binary representation.
    print encoding.astype('int16')

Each encoding will print to the console and look something like this:

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 1 1 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

By visually inspecting this output, you can see the buckets of continuous on bits representing the different encodings for time of day and weekend. Near the bottom of the encoding are the bits representing scalar data, distributed throughout the space by the RandomDistributedScalarEncoder.

Spatial Pooling

Now that we have data encoded into a binary format with semantic meaning, we can pass each encoding to into the Spatial Pooling algorithm.

Creating the SP

First, we must identify parameters for the creation of the SpatialPooler instance. We will be using the same parameters identified in the OPF Quick Start document’s model parameters (see the spParams section).

from nupic.algorithms.spatial_pooler import SpatialPooler

encodingWidth = timeOfDayEncoder.getWidth() \
  + weekendEncoder.getWidth() \
  + scalarEncoder.getWidth()

sp = SpatialPooler(
  # How large the input encoding will be.
  inputDimensions=(encodingWidth),
  # How many mini-columns will be in the Spatial Pooler.
  columnDimensions=(2048),
  # What percent of the columns's receptive field is available for potential
  # synapses?
  potentialPct=0.85,
  # This means that the input space has no topology.
  globalInhibition=True,
  localAreaDensity=-1.0,
  # Roughly 2%, giving that there is only one inhibition area because we have
  # turned on globalInhibition (40 / 2048 = 0.0195)
  numActiveColumnsPerInhArea=40.0,
  # How quickly synapses grow and degrade.
  synPermInactiveDec=0.005,
  synPermActiveInc=0.04,
  synPermConnected=0.1,
  # boostStrength controls the strength of boosting. Boosting encourages
  # efficient usage of SP columns.
  boostStrength=3.0,
  # Random number generator seed.
  seed=1956,
  # Determines if inputs at the beginning and end of an input dimension should
  # be considered neighbors when mapping columns to inputs.
  wrapAround=False
)

Running the SP

The SpatialPooler instance should be created once, and each encoded row of data passed into its compute() function.

# Create an array to represent active columns, all initially zero. This
# will be populated by the compute method below. It must have the same
# dimensions as the Spatial Pooler.
activeColumns = numpy.zeros(2048)

# Execute Spatial Pooling algorithm over input space.
sp.compute(encoding, True, activeColumns)
activeColumnIndices = numpy.nonzero(activeColumns)[0]

print activeColumnIndices

This will print out the indices of the active mini-columns in the SP at each time step, and will look something like this:

[ 929  932  938  939  940  941  942  943  944  945  946  949  950  951  953
  955  956  957  958  960  961  962  964  965  966  968  969  970  971  973
  974  975  977  978  979  980 1105 1114 1120 1129]

Temporal Memory

The TemporalMemory algorithm works within the active mini-columns created by the SpatialPooler. Given a list of active columns, it performs sequence memory operations by activating individual cells within each mini-column structure.

Creating the TM

Just like the SP, we must create an instance of the TemporalMemory with parameters we identified in the OPF Quick Start document’s model parameters (see the tmParams section).

from nupic.algorithms.temporal_memory import TemporalMemory

tm = TemporalMemory(
  # Must be the same dimensions as the SP
  columnDimensions=(2048, ),
  # How many cells in each mini-column.
  cellsPerColumn=32,
  # A segment is active if it has >= activationThreshold connected synapses
  # that are active due to infActiveState
  activationThreshold=16,
  initialPermanence=0.21,
  connectedPermanence=0.5,
  # Minimum number of active synapses for a segment to be considered during
  # search for the best-matching segments.
  minThreshold=12,
  # The max number of synapses added to a segment during learning
  maxNewSynapseCount=20,
  permanenceIncrement=0.1,
  permanenceDecrement=0.1,
  predictedSegmentDecrement=0.0,
  maxSegmentsPerCell=128,
  maxSynapsesPerSegment=32,
  seed=1960
)

Running the TM

Now, after SpatialPooler.compute(), we will run the TemporalMemory.compute() function using the active columns presented by the SP. Then we can call TemporalMemory.getActiveCells() to return the indices of the active cells within the structure. All active cells will fall within active mini-columns.

# Execute Temporal Memory algorithm over active mini-columns.
tm.compute(activeColumnIndices, learn=True)
activeCells = tm.getActiveCells()
print activeCells

Now we have an array of indices for each active cell in the HTM structure. When printed to the console, each step looks like this:

[13817, 13856, 14003, 14078, 14104, 14159, 14259, 14313, 14351, 14377, 14415,
14524, 14563, 14594, 14652, 14686, 14715, 14763, 14863, 14950, 15037, 15053,
15107, 15209, 15258, 15412, 15449, 35008, 35009, 35010, 35011, 35012, 35013,
35014, 35015, 35016, 35017, 35018, 35019, 35020, 35021, 35022, 35023, 35024,
35025, 35026, 35027, 35028, 35029, 35030, 35031, 35032, 35033, 35034, 35035,
35036, 35037, 35038, 35039, 35040, 35041, 35042, 35043, 35044, 35045, 35046,
35047, 35048, 35049, 35050, 35051, 35052, 35053, 35054, 35055, 35056, 35057,
35058, 35059, 35060, 35061, 35062, 35063, 35064, 35065, 35066, 35067, 35068,
35069, 35070, 35071, 35360, 35361, 35362, 35363, 35364, 35365, 35366, 35367,
35368, 35369, 35370, 35371, 35372, 35373, 35374, 35375, 35376, 35377, 35378,
35379, 35380, 35381, 35382, 35383, 35384, 35385, 35386, 35387, 35388, 35389,
35390, 35391, 35424, 35425, 35426, 35427, 35428, 35429, 35430, 35431, 35432,
35433, 35434, 35435, 35436, 35437, 35438, 35439, 35440, 35441, 35442, 35443,
35444, 35445, 35446, 35447, 35448, 35449, 35450, 35451, 35452, 35453, 35454,
35455, 35488, 35489, 35490, 35491, 35492, 35493, 35494, 35495, 35496, 35497,
35498, 35499, 35500, 35501, 35502, 35503, 35504, 35505, 35506, 35507, 35508,
35509, 35510, 35511, 35512, 35513, 35514, 35515, 35516, 35517, 35518, 35519,
35584, 35585, 35586, 35587, 35588, 35589, 35590, 35591, 35592, 35593, 35594,
35595, 35596, 35597, 35598, 35599, 35600, 35601, 35602, 35603, 35604, 35605,
35606, 35607, 35608, 35609, 35610, 35611, 35612, 35613, 35614, 35615, 35616,
35617, 35618, 35619, 35620, 35621, 35622, 35623, 35624, 35625, 35626, 35627,
35628, 35629, 35630, 35631, 35632, 35633, 35634, 35635, 35636, 35637, 35638,
35639, 35640, 35641, 35642, 35643, 35644, 35645, 35646, 35647, 35648, 35649,
35650, 35651, 35652, 35653, 35654, 35655, 35656, 35657, 35658, 35659, 35660,
35661, 35662, 35663, 35664, 35665, 35666, 35667, 35668, 35669, 35670, 35671,
35672, 35673, 35674, 35675, 35676, 35677, 35678, 35679, 35840, 35841, 35842,
35843, 35844, 35845, 35846, 35847, 35848, 35849, 35850, 35851, 35852, 35853,
35854, 35855, 35856, 35857, 35858, 35859, 35860, 35861, 35862, 35863, 35864,
35865, 35866, 35867, 35868, 35869, 35870, 35871, 35936, 35937, 35938, 35939,
35940, 35941, 35942, 35943, 35944, 35945, 35946, 35947, 35948, 35949, 35950,
35951, 35952, 35953, 35954, 35955, 35956, 35957, 35958, 35959, 35960, 35961,
35962, 35963, 35964, 35965, 35966, 35967, 35968, 35969, 35970, 35971, 35972,
35973, 35974, 35975, 35976, 35977, 35978, 35979, 35980, 35981, 35982, 35983,
35984, 35985, 35986, 35987, 35988, 35989, 35990, 35991, 35992, 35993, 35994,
35995, 35996, 35997, 35998, 35999, 36352, 36353, 36354, 36355, 36356, 36357,
36358, 36359, 36360, 36361, 36362, 36363, 36364, 36365, 36366, 36367, 36368,
36369, 36370, 36371, 36372, 36373, 36374, 36375, 36376, 36377, 36378, 36379,
36380, 36381, 36382, 36383, 36576, 36577, 36578, 36579, 36580, 36581, 36582,
36583, 36584, 36585, 36586, 36587, 36588, 36589, 36590, 36591, 36592, 36593,
36594, 36595, 36596, 36597, 36598, 36599, 36600, 36601, 36602, 36603, 36604,
36605, 36606, 36607]

That’s a lot of active cells! But remember our structure is 65,536 cells total (2,048 columns, each with 32 cells), so these active cells represent only about 2% of the total number of cells.

Predictive Cells

The TemporalMemory interface has many methods of getting cellular state information. In the section above, we used the TemporalMemory.getActiveCells() function to get the indices of active cells. We can also get predictive cells by calling TemporalMemory.getPredictiveCells(), which returns an array of indices of cells in a depolarized, or predictive, state.

Getting Predictions

In order to associate the predictive cells in the TM to an input pattern, we use a non biological method of classification. This requires that we add a Classifier to do this work. We will be using the SDRClassifier to do this.

The goal is to extract a prediction for the value of consumption that was passed into the system.

Creating an SDR Classifier

We will use the SDRClassifierFactory for this and use the default factory settings.

from nupic.algorithms.sdr_classifier_factory import SDRClassifierFactory

classifier = SDRClassifierFactory.create()

Running the Classifier

In order to call SDRClassifier.compute() on the classifier, we need pass it both the actual consumption value and the bucketIdx (bucket index), which we can get from the encoder itself. This will allow the encoder to classify predictions into a previously seen value.

# Get the bucket info for this input value for classification.
bucketIdx = scalarEncoder.getBucketIndices(consumption)[0]

# Run classifier to translate active cells back to scalar value.
classifierResult = classifier.compute(
  recordNum=count,
  patternNZ=activeCells,
  classification={
    "bucketIdx": bucketIdx,
    "actValue": consumption
  },
  learn=True,
  infer=True
)

# Print the best prediction for 1 step out.
probability, value = sorted(
  zip(classifierResult[1], classifierResult["actualValues"]),
  reverse=True
)[0]
print("1-step: {:16} ({:4.4}%)".format(value, probability * 100))

The classiferResult contains predicted values. Running the code above will print the best prediction and its associated probability to the console:

1-step:    5.11804943107 (52.12%)

Congratulations! You’ve got HTM predictions for a scalar data stream!