
Online Prediction Framework (OPF)¶
See the OPF Guide for an overview of this API.
Here is the complete program we are going to use as an example. In sections below, we’ll break it down into parts and explain what is happening (without some of the plumbing details).
import csv
import datetime
import os
import yaml
from itertools import islice
from nupic.frameworks.opf.model_factory import ModelFactory
_NUM_RECORDS = 3000
_EXAMPLE_DIR = os.path.dirname(os.path.abspath(__file__))
_INPUT_FILE_PATH = os.path.join(_EXAMPLE_DIR, os.pardir, "data", "gymdata.csv")
_PARAMS_PATH = os.path.join(_EXAMPLE_DIR, os.pardir, "params", "model.yaml")
def createModel():
with open(_PARAMS_PATH, "r") as f:
modelParams = yaml.safe_load(f)
return ModelFactory.create(modelParams)
def runHotgym(numRecords):
model = createModel()
model.enableInference({"predictedField": "consumption"})
with open(_INPUT_FILE_PATH) as fin:
reader = csv.reader(fin)
headers = reader.next()
reader.next()
reader.next()
results = []
for record in islice(reader, numRecords):
modelInput = dict(zip(headers, record))
modelInput["consumption"] = float(modelInput["consumption"])
modelInput["timestamp"] = datetime.datetime.strptime(
modelInput["timestamp"], "%m/%d/%y %H:%M")
result = model.run(modelInput)
bestPredictions = result.inferences["multiStepBestPredictions"]
allPredictions = result.inferences["multiStepPredictions"]
oneStep = bestPredictions[1]
oneStepConfidence = allPredictions[1][oneStep]
fiveStep = bestPredictions[5]
fiveStepConfidence = allPredictions[5][fiveStep]
result = (oneStep, oneStepConfidence * 100,
fiveStep, fiveStepConfidence * 100)
print "1-step: {:16} ({:4.4}%)\t 5-step: {:16} ({:4.4}%)".format(*result)
results.append(result)
return results
if __name__ == "__main__":
runHotgym(_NUM_RECORDS)
Model Parameters¶
Before you can create an OPF model, you need to have model parameters defined in a file. These model parameters contain many details about how the HTM network will be constructed, what encoder configurations will be used, and individual algorithm parameters that can drastically affect how a model operates. The model parameters we’re using in this Quick Start can be found here.
To use model parameters, they can be written to a file and imported into
your script. In this example, our model parameters existing in a
YAML file called params.yaml
and are identical to
those linked above.
Create an OPF Model¶
The easiest way to create a model once you have access to model parameters is by
using the ModelFactory
.
import yaml
from nupic.frameworks.opf.model_factory import ModelFactory
_PARAMS_PATH = "/path/to/model.yaml"
with open(_PARAMS_PATH, "r") as f:
modelParams = yaml.safe_load(f)
model = ModelFactory.create(modelParams.MODEL_PARAMS)
# This tells the model the field to predict.
model.enableInference({'predictedField': 'consumption'})
The resulting model
will be an instance of HTMPredictionModel
.
Feed the Model Data¶
The raw input data file is described here in detail.
Our model parameters define how this data will be
encoded in the encoders
section:
# List of encoders and their parameters.
encoders:
consumption:
fieldname: consumption
name: consumption
resolution: 0.88
seed: 1
type: RandomDistributedScalarEncoder
timestamp_timeOfDay:
fieldname: timestamp
name: timestamp_timeOfDay
timeOfDay: [21, 1]
type: DateEncoder
timestamp_weekend:
fieldname: timestamp
name: timestamp_weekend
type: DateEncoder
weekend: 21
Notice that three semantic values are being encoded into the input space. The
first is the scalar energy consumption
value, which is being encoded with
the RandomDistributedScalarEncoder
. The next two values represent two
different aspects of time using the DateEncoder
. The encoder called
timestamp_timeOfDay
encodes the time of day, while the timestamp_weekend
encoder will output different representations for weekends vs weekdays. The
HTMPredictionModel
will combine these encodings using the
MultiEncoder
.
For details about encoding and how these encoders work, see the HTM School episodes on encoders.
Now that you see the raw input data and how it is configured to be encoded into
binary arrays for the HTM to process it, let’s see the code that actually reads
the CSV data file and runs it through our model
.
import csv
import datetime
# Open the file to loop over each row
with open ("gymdata.csv") as fileIn:
reader = csv.reader(fileIn)
# The first three rows are not data, but we'll need the field names when
# passing data into the model.
headers = reader.next()
reader.next()
reader.next()
for record in reader:
# Create a dictionary with field names as keys, row values as values.
modelInput = dict(zip(headers, record))
# Convert string consumption to float value.
modelInput["consumption"] = float(modelInput["consumption"])
# Convert timestamp string to Python datetime.
modelInput["timestamp"] = datetime.datetime.strptime(
modelInput["timestamp"], "%m/%d/%y %H:%M"
)
# Push the data into the model and get back results.
result = model.run(modelInput)
Extract the results¶
In the classifier configuration of our model parameters,
identified as modelParams.clParams
, the steps
value tells the model how
many steps into the future to predict. In this case, we are predicting both one
and five steps into the future as shown by the value 1,5
.
This means the results
object will have prediction information keyed by both
1
and 5
.
result = model.run(modelInput)
bestPredictions = result.inferences['multiStepBestPredictions']
allPredictions = result.inferences['multiStepPredictions']
oneStep = bestPredictions[1]
fiveStep = bestPredictions[5]
# Confidence values are keyed by prediction value in multiStepPredictions.
oneStepConfidence = allPredictions[1][oneStep]
fiveStepConfidence = allPredictions[5][fiveStep]
result = (oneStep, oneStepConfidence * 100,
fiveStep, fiveStepConfidence * 100)
print "1-step: {:16} ({:4.4}%)\t 5-step: {:16} ({:4.4}%)".format(*result)
As you can see in the example above, the results
object contains an
inferences
property that contains all the information about predictions.
This includes the following keys:
multiStepBestPredictions
: Contains information about the best prediction- that was returned for the last row of data.
multiStepPredictions
: Contains information about all predictions for the- last row of data, including confidence values for each prediction.
Each of these dictionaries should have a key corresponding to the steps ahead
for each prediction. In this example, we are retrieving predictions for both
1
and 5
steps ahead (which was defined in the Model Parameters).
In order to get both the best prediction as well as the confidence in the
prediction, we need to find the value for the best prediction from the
multiStepBestPredictions
structure, then use it to find the confidence in
the multiStepPredictions
(for both 1
and 5
step predictions).
When this example program is run, you can see both predictions and their confidences in the console output, which should look something like this:
1-step: 35.7 (65.53%) 5-step: 35.7 (99.82%)
1-step: 38.9 (65.73%) 5-step: 23.5 (99.82%)
1-step: 36.6 (99.11%) 5-step: 35.7 (99.81%)
1-step: 38.9 (85.73%) 5-step: 36.6 (99.96%)
1-step: 38.2 (89.59%) 5-step: 38.2 (92.61%)
Congratulations! You’ve got HTM predictions for a scalar data stream!