
Algorithms API¶
See the Algorithms API for an overview of this API.
Here is the complete program we are going to use as an example. Descriptions of the algorithm parameters we’re using in this Quick Start can be found here. In sections below, we’ll break it down into parts and explain what is happening (without some of the plumbing details).
import csv
import datetime
import numpy
import os
import yaml
from nupic.algorithms.sdr_classifier_factory import SDRClassifierFactory
from nupic.algorithms.spatial_pooler import SpatialPooler
from nupic.algorithms.temporal_memory import TemporalMemory
from nupic.encoders.date import DateEncoder
from nupic.encoders.random_distributed_scalar import \
RandomDistributedScalarEncoder
_NUM_RECORDS = 3000
_EXAMPLE_DIR = os.path.dirname(os.path.abspath(__file__))
_INPUT_FILE_PATH = os.path.join(_EXAMPLE_DIR, os.pardir, "data", "gymdata.csv")
_PARAMS_PATH = os.path.join(_EXAMPLE_DIR, os.pardir, "params", "model.yaml")
def runHotgym(numRecords):
with open(_PARAMS_PATH, "r") as f:
modelParams = yaml.safe_load(f)["modelParams"]
enParams = modelParams["sensorParams"]["encoders"]
spParams = modelParams["spParams"]
tmParams = modelParams["tmParams"]
timeOfDayEncoder = DateEncoder(
timeOfDay=enParams["timestamp_timeOfDay"]["timeOfDay"])
weekendEncoder = DateEncoder(
weekend=enParams["timestamp_weekend"]["weekend"])
scalarEncoder = RandomDistributedScalarEncoder(
enParams["consumption"]["resolution"])
encodingWidth = (timeOfDayEncoder.getWidth()
+ weekendEncoder.getWidth()
+ scalarEncoder.getWidth())
sp = SpatialPooler(
inputDimensions=(encodingWidth,),
columnDimensions=(spParams["columnCount"],),
potentialPct=spParams["potentialPct"],
potentialRadius=encodingWidth,
globalInhibition=spParams["globalInhibition"],
localAreaDensity=spParams["localAreaDensity"],
numActiveColumnsPerInhArea=spParams["numActiveColumnsPerInhArea"],
synPermInactiveDec=spParams["synPermInactiveDec"],
synPermActiveInc=spParams["synPermActiveInc"],
synPermConnected=spParams["synPermConnected"],
boostStrength=spParams["boostStrength"],
seed=spParams["seed"],
wrapAround=True
)
tm = TemporalMemory(
columnDimensions=(tmParams["columnCount"],),
cellsPerColumn=tmParams["cellsPerColumn"],
activationThreshold=tmParams["activationThreshold"],
initialPermanence=tmParams["initialPerm"],
connectedPermanence=spParams["synPermConnected"],
minThreshold=tmParams["minThreshold"],
maxNewSynapseCount=tmParams["newSynapseCount"],
permanenceIncrement=tmParams["permanenceInc"],
permanenceDecrement=tmParams["permanenceDec"],
predictedSegmentDecrement=0.0,
maxSegmentsPerCell=tmParams["maxSegmentsPerCell"],
maxSynapsesPerSegment=tmParams["maxSynapsesPerSegment"],
seed=tmParams["seed"]
)
classifier = SDRClassifierFactory.create()
results = []
with open(_INPUT_FILE_PATH, "r") as fin:
reader = csv.reader(fin)
headers = reader.next()
reader.next()
reader.next()
for count, record in enumerate(reader):
if count >= numRecords: break
# Convert data string into Python date object.
dateString = datetime.datetime.strptime(record[0], "%m/%d/%y %H:%M")
# Convert data value string into float.
consumption = float(record[1])
# To encode, we need to provide zero-filled numpy arrays for the encoders
# to populate.
timeOfDayBits = numpy.zeros(timeOfDayEncoder.getWidth())
weekendBits = numpy.zeros(weekendEncoder.getWidth())
consumptionBits = numpy.zeros(scalarEncoder.getWidth())
# Now we call the encoders to create bit representations for each value.
timeOfDayEncoder.encodeIntoArray(dateString, timeOfDayBits)
weekendEncoder.encodeIntoArray(dateString, weekendBits)
scalarEncoder.encodeIntoArray(consumption, consumptionBits)
# Concatenate all these encodings into one large encoding for Spatial
# Pooling.
encoding = numpy.concatenate(
[timeOfDayBits, weekendBits, consumptionBits]
)
# Create an array to represent active columns, all initially zero. This
# will be populated by the compute method below. It must have the same
# dimensions as the Spatial Pooler.
activeColumns = numpy.zeros(spParams["columnCount"])
# Execute Spatial Pooling algorithm over input space.
sp.compute(encoding, True, activeColumns)
activeColumnIndices = numpy.nonzero(activeColumns)[0]
# Execute Temporal Memory algorithm over active mini-columns.
tm.compute(activeColumnIndices, learn=True)
activeCells = tm.getActiveCells()
# Get the bucket info for this input value for classification.
bucketIdx = scalarEncoder.getBucketIndices(consumption)[0]
# Run classifier to translate active cells back to scalar value.
classifierResult = classifier.compute(
recordNum=count,
patternNZ=activeCells,
classification={
"bucketIdx": bucketIdx,
"actValue": consumption
},
learn=True,
infer=True
)
# Print the best prediction for 1 step out.
oneStepConfidence, oneStep = sorted(
zip(classifierResult[1], classifierResult["actualValues"]),
reverse=True
)[0]
print("1-step: {:16} ({:4.4}%)".format(oneStep, oneStepConfidence * 100))
results.append([oneStep, oneStepConfidence * 100, None, None])
return results
if __name__ == "__main__":
runHotgym(_NUM_RECORDS)
Encoding Data¶
For this quick start, we’ll be using the same raw input data file described here in detail and used for the OPF Quick Start. But we will be ignoring the file format and just looping over the CSV, encoding one row at a time programmatically before sending encodings to the Spatial Pooler and Temporal Memory algorithms.
One Row Of Data¶
Each row of data in this input file is formatted like this:
7/2/10 9:00,41.5
We need to encode this into three encodings:
- time of day
- weekend or not
- scalar value for energy consumption
Creating Encoders¶
First, let’s create the encoders we’ll use to encode different semantics of our input data stream:
from nupic.encoders.date import DateEncoder
from nupic.encoders.random_distributed_scalar import \
RandomDistributedScalarEncoder
timeOfDayEncoder = DateEncoder(timeOfDay=(21,1))
weekendEncoder = DateEncoder(weekend=21)
scalarEncoder = RandomDistributedScalarEncoder(0.88)
Encoding Data¶
With these encoders created, we can loop over each row of data, encode it into bits, and concatenate them together to form a complete representation for the next step:
with open (_INPUT_FILE_PATH) as fin:
reader = csv.reader(fin)
for count, record in enumerate(reader):
# Convert data string into Python date object.
dateString = datetime.datetime.strptime(record[0], "%m/%d/%y %H:%M")
# Convert data value string into float.
consumption = float(record[1])
# To encode, we need to provide zero-filled numpy arrays for the encoders
# to populate.
timeOfDayBits = numpy.zeros(timeOfDayEncoder.getWidth())
weekendBits = numpy.zeros(weekendEncoder.getWidth())
consumptionBits = numpy.zeros(scalarEncoder.getWidth())
# Now we call the encoders create bit representations for each value.
timeOfDayEncoder.encodeIntoArray(dateString, timeOfDayBits)
weekendEncoder.encodeIntoArray(dateString, weekendBits)
scalarEncoder.encodeIntoArray(consumption, consumptionBits)
# Concatenate all these encodings into one large encoding for Spatial
# Pooling.
encoding = numpy.concatenate(
[timeOfDayBits, weekendBits, consumptionBits]
)
# Print complete encoding to the console as a binary representation.
print encoding.astype('int16')
Each encoding will print to the console and look something like this:
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 1 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
By visually inspecting this output, you can see the buckets of continuous on
bits representing the different encodings for time of day and weekend. Near the
bottom of the encoding are the bits representing scalar data, distributed
throughout the space by the RandomDistributedScalarEncoder
.
Spatial Pooling¶
Now that we have data encoded into a binary format with semantic meaning, we can pass each encoding to into the Spatial Pooling algorithm.
Creating the SP¶
First, we must identify parameters for the creation of the
SpatialPooler
instance. We will be using the same parameters
identified in the OPF Quick Start document’s
model parameters (see the spParams
section).
from nupic.algorithms.spatial_pooler import SpatialPooler
encodingWidth = timeOfDayEncoder.getWidth() \
+ weekendEncoder.getWidth() \
+ scalarEncoder.getWidth()
sp = SpatialPooler(
# How large the input encoding will be.
inputDimensions=(encodingWidth),
# How many mini-columns will be in the Spatial Pooler.
columnDimensions=(2048),
# What percent of the columns's receptive field is available for potential
# synapses?
potentialPct=0.85,
# This means that the input space has no topology.
globalInhibition=True,
localAreaDensity=-1.0,
# Roughly 2%, giving that there is only one inhibition area because we have
# turned on globalInhibition (40 / 2048 = 0.0195)
numActiveColumnsPerInhArea=40.0,
# How quickly synapses grow and degrade.
synPermInactiveDec=0.005,
synPermActiveInc=0.04,
synPermConnected=0.1,
# boostStrength controls the strength of boosting. Boosting encourages
# efficient usage of SP columns.
boostStrength=3.0,
# Random number generator seed.
seed=1956,
# Determines if inputs at the beginning and end of an input dimension should
# be considered neighbors when mapping columns to inputs.
wrapAround=False
)
Running the SP¶
The SpatialPooler
instance should be created once, and each encoded
row of data passed into its compute()
function.
# Create an array to represent active columns, all initially zero. This
# will be populated by the compute method below. It must have the same
# dimensions as the Spatial Pooler.
activeColumns = numpy.zeros(2048)
# Execute Spatial Pooling algorithm over input space.
sp.compute(encoding, True, activeColumns)
activeColumnIndices = numpy.nonzero(activeColumns)[0]
print activeColumnIndices
This will print out the indices of the active mini-columns in the SP at each time step, and will look something like this:
[ 929 932 938 939 940 941 942 943 944 945 946 949 950 951 953
955 956 957 958 960 961 962 964 965 966 968 969 970 971 973
974 975 977 978 979 980 1105 1114 1120 1129]
Temporal Memory¶
The TemporalMemory
algorithm works within the active mini-columns
created by the SpatialPooler
. Given a list of active columns, it
performs sequence memory operations by activating individual cells within each
mini-column structure.
Creating the TM¶
Just like the SP, we must create an instance of the
TemporalMemory
with parameters we identified in the
OPF Quick Start document’s
model parameters (see the tmParams
section).
from nupic.algorithms.temporal_memory import TemporalMemory
tm = TemporalMemory(
# Must be the same dimensions as the SP
columnDimensions=(2048, ),
# How many cells in each mini-column.
cellsPerColumn=32,
# A segment is active if it has >= activationThreshold connected synapses
# that are active due to infActiveState
activationThreshold=16,
initialPermanence=0.21,
connectedPermanence=0.5,
# Minimum number of active synapses for a segment to be considered during
# search for the best-matching segments.
minThreshold=12,
# The max number of synapses added to a segment during learning
maxNewSynapseCount=20,
permanenceIncrement=0.1,
permanenceDecrement=0.1,
predictedSegmentDecrement=0.0,
maxSegmentsPerCell=128,
maxSynapsesPerSegment=32,
seed=1960
)
Running the TM¶
Now, after SpatialPooler.compute()
, we will run the
TemporalMemory.compute()
function using the active columns presented by
the SP. Then we can call TemporalMemory.getActiveCells()
to return the
indices of the active cells within the structure. All active cells will fall
within active mini-columns.
# Execute Temporal Memory algorithm over active mini-columns.
tm.compute(activeColumnIndices, learn=True)
activeCells = tm.getActiveCells()
print activeCells
Now we have an array of indices for each active cell in the HTM structure. When printed to the console, each step looks like this:
[13817, 13856, 14003, 14078, 14104, 14159, 14259, 14313, 14351, 14377, 14415,
14524, 14563, 14594, 14652, 14686, 14715, 14763, 14863, 14950, 15037, 15053,
15107, 15209, 15258, 15412, 15449, 35008, 35009, 35010, 35011, 35012, 35013,
35014, 35015, 35016, 35017, 35018, 35019, 35020, 35021, 35022, 35023, 35024,
35025, 35026, 35027, 35028, 35029, 35030, 35031, 35032, 35033, 35034, 35035,
35036, 35037, 35038, 35039, 35040, 35041, 35042, 35043, 35044, 35045, 35046,
35047, 35048, 35049, 35050, 35051, 35052, 35053, 35054, 35055, 35056, 35057,
35058, 35059, 35060, 35061, 35062, 35063, 35064, 35065, 35066, 35067, 35068,
35069, 35070, 35071, 35360, 35361, 35362, 35363, 35364, 35365, 35366, 35367,
35368, 35369, 35370, 35371, 35372, 35373, 35374, 35375, 35376, 35377, 35378,
35379, 35380, 35381, 35382, 35383, 35384, 35385, 35386, 35387, 35388, 35389,
35390, 35391, 35424, 35425, 35426, 35427, 35428, 35429, 35430, 35431, 35432,
35433, 35434, 35435, 35436, 35437, 35438, 35439, 35440, 35441, 35442, 35443,
35444, 35445, 35446, 35447, 35448, 35449, 35450, 35451, 35452, 35453, 35454,
35455, 35488, 35489, 35490, 35491, 35492, 35493, 35494, 35495, 35496, 35497,
35498, 35499, 35500, 35501, 35502, 35503, 35504, 35505, 35506, 35507, 35508,
35509, 35510, 35511, 35512, 35513, 35514, 35515, 35516, 35517, 35518, 35519,
35584, 35585, 35586, 35587, 35588, 35589, 35590, 35591, 35592, 35593, 35594,
35595, 35596, 35597, 35598, 35599, 35600, 35601, 35602, 35603, 35604, 35605,
35606, 35607, 35608, 35609, 35610, 35611, 35612, 35613, 35614, 35615, 35616,
35617, 35618, 35619, 35620, 35621, 35622, 35623, 35624, 35625, 35626, 35627,
35628, 35629, 35630, 35631, 35632, 35633, 35634, 35635, 35636, 35637, 35638,
35639, 35640, 35641, 35642, 35643, 35644, 35645, 35646, 35647, 35648, 35649,
35650, 35651, 35652, 35653, 35654, 35655, 35656, 35657, 35658, 35659, 35660,
35661, 35662, 35663, 35664, 35665, 35666, 35667, 35668, 35669, 35670, 35671,
35672, 35673, 35674, 35675, 35676, 35677, 35678, 35679, 35840, 35841, 35842,
35843, 35844, 35845, 35846, 35847, 35848, 35849, 35850, 35851, 35852, 35853,
35854, 35855, 35856, 35857, 35858, 35859, 35860, 35861, 35862, 35863, 35864,
35865, 35866, 35867, 35868, 35869, 35870, 35871, 35936, 35937, 35938, 35939,
35940, 35941, 35942, 35943, 35944, 35945, 35946, 35947, 35948, 35949, 35950,
35951, 35952, 35953, 35954, 35955, 35956, 35957, 35958, 35959, 35960, 35961,
35962, 35963, 35964, 35965, 35966, 35967, 35968, 35969, 35970, 35971, 35972,
35973, 35974, 35975, 35976, 35977, 35978, 35979, 35980, 35981, 35982, 35983,
35984, 35985, 35986, 35987, 35988, 35989, 35990, 35991, 35992, 35993, 35994,
35995, 35996, 35997, 35998, 35999, 36352, 36353, 36354, 36355, 36356, 36357,
36358, 36359, 36360, 36361, 36362, 36363, 36364, 36365, 36366, 36367, 36368,
36369, 36370, 36371, 36372, 36373, 36374, 36375, 36376, 36377, 36378, 36379,
36380, 36381, 36382, 36383, 36576, 36577, 36578, 36579, 36580, 36581, 36582,
36583, 36584, 36585, 36586, 36587, 36588, 36589, 36590, 36591, 36592, 36593,
36594, 36595, 36596, 36597, 36598, 36599, 36600, 36601, 36602, 36603, 36604,
36605, 36606, 36607]
That’s a lot of active cells! But remember our structure is 65,536 cells total (2,048 columns, each with 32 cells), so these active cells represent only about 2% of the total number of cells.
Predictive Cells¶
The TemporalMemory
interface has many methods of getting cellular
state information. In the section above, we used the
TemporalMemory.getActiveCells()
function to get the indices of active
cells. We can also get predictive cells by calling
TemporalMemory.getPredictiveCells()
, which returns an array of indices of
cells in a depolarized, or predictive, state.
Getting Predictions¶
In order to associate the predictive cells in the TM to an input pattern, we use
a non biological method of classification. This requires that we add a
Classifier to do this work. We will be using the
SDRClassifier
to do this.
The goal is to extract a prediction for the value of consumption that was passed into the system.
Creating an SDR Classifier¶
We will use the SDRClassifierFactory
for this and use the default
factory settings.
from nupic.algorithms.sdr_classifier_factory import SDRClassifierFactory
classifier = SDRClassifierFactory.create()
Running the Classifier¶
In order to call SDRClassifier.compute()
on the classifier, we need pass
it both the actual consumption
value and the bucketIdx
(bucket index),
which we can get from the encoder itself. This will allow the encoder to
classify predictions into a previously seen value.
# Get the bucket info for this input value for classification.
bucketIdx = scalarEncoder.getBucketIndices(consumption)[0]
# Run classifier to translate active cells back to scalar value.
classifierResult = classifier.compute(
recordNum=count,
patternNZ=activeCells,
classification={
"bucketIdx": bucketIdx,
"actValue": consumption
},
learn=True,
infer=True
)
# Print the best prediction for 1 step out.
probability, value = sorted(
zip(classifierResult[1], classifierResult["actualValues"]),
reverse=True
)[0]
print("1-step: {:16} ({:4.4}%)".format(value, probability * 100))
The classiferResult
contains predicted values. Running the code above will
print the best prediction and its associated probability to the console:
1-step: 5.11804943107 (52.12%)
Congratulations! You’ve got HTM predictions for a scalar data stream!