https://noisebridge.net/index.php?title=Machine_Learning/HMM_R_Example&feed=atom&action=historyMachine Learning/HMM R Example - Revision history2015-07-28T03:36:07ZRevision history for this page on the wikiMediaWiki 1.19.1https://noisebridge.net/index.php?title=Machine_Learning/HMM_R_Example&diff=12250&oldid=prevThomasLotze: Created page with 'Examples of using HMM R packages, based on the model in "[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.91.7020&rep=rep1&type=pdf A Bayes Net Toolkit for Student Modelâ€¦'2010-08-05T04:22:40Z<p>Created page with 'Examples of using HMM R packages, based on the model in "[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.91.7020&rep=rep1&type=pdf A Bayes Net Toolkit for Student Modelâ€¦'</p>
<p><b>New page</b></p><div>Examples of using HMM R packages, based on the model in "[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.91.7020&rep=rep1&type=pdf A Bayes Net Toolkit for Student Modeling in Intelligent Tutoring Systems]" by Chang, et. al. We're trying to come up with an estimate for how well each student knows a certain area of knowledge (which we're calling a skill). We observe each student's performance on answering some number of questions that use this skill, and mark whether they got them correct or incorrect.<br />
<br />
We assume that at each time point, a student is in one of two states: either they "know" the skill, or they "do not know" the skill. If they know they skill, they are more likely to generate a correct output; if not, they are less likely; but in each case, it is stochastic (a student has a probability of guessing the correct answer even if they don't know the skill, and of slipping/getting it wrong even if they do know the skill). Between each time point, there is a transition probability from know -> don't know (forgetting, which Change et. al constrain to 0) and from don't know -> know (learning). Finally, there is a probability that the student enters already knowing the skill. So we have five parameters: two transition probabilities (learn and forget), two outcome probabilities based on state (guess and slip), and initial state probabilities (already know).<br />
<br />
The data (student_outcomes.csv) is for a single skill, measuring various students' performance on that skill: a series of correct/incorrect responses, at various times. We're ignoring the time data for the moment (other than for ordering purposes), and trying to fit the HMM model. Once we have it, we can then figure out, for each student, an estimated likelihood of being in the "know" state at their last observed output.<br />
<br />
===hmm.discnp===<br />
<pre><br />
require("hmm.discnp")<br />
student_outcomes = read.csv("student_outcomes.csv", header=TRUE)<br />
<br />
# convert created_at from a string<br />
student_outcomes$created_at = as.POSIXct(as.character(student_outcomes$created_at))<br />
<br />
# remove users with few observations on this skill<br />
by_user = split(student_outcomes, student_outcomes$student_id)<br />
obs_by_user = sapply(by_user, nrow)<br />
valid_users = names(obs_by_user[obs_by_user > 10])<br />
student_outcomes = student_outcomes[student_outcomes$student_id %in% valid_users,]<br />
<br />
by_good_user = split(student_outcomes, student_outcomes$student_id)<br />
<br />
# attempt to estimate model parameters<br />
my_hmm = hmm(by_good_user, yval=c(0,1),<br />
par0=list(tpm=rbind(c(0.8,0.2),c(0.01,0.99)),<br />
Rho=rbind(c(0.75,0.25),c(0.25,0.75))),<br />
stationary=FALSE)<br />
if (!my_hmm$converged) {<br />
print(sprintf("Error! HMM did not converge for skill %s!", skill))<br />
} else {<br />
for (user_id in valid_users) {<br />
student_est = sp(correct_by_user[[user_id]], object = my_hmm, means=TRUE)<br />
print(sprintf("%s/%s: %f chance know, %f chance correct", skill, user_id, student_est$probs[2,ncol(student_est$probs)], student_est$means[length(student_est$means)]))<br />
# print(correct_by_user[[user_id]])<br />
}<br />
}<br />
# transition probability matrix<br />
my_hmm$tpm<br />
# output probabilities<br />
my_hmm$Rho<br />
# initial probabilities (don't know/know)<br />
my_hmm$ispd<br />
</pre><br />
<br />
===msm===<br />
<pre><br />
student_outcomes = read.csv("student_outcomes.csv", header=TRUE)<br />
<br />
# convert created_at from a string<br />
student_outcomes$created_at = as.POSIXct(as.character(student_outcomes$created_at))<br />
<br />
# remove users with few observations on this skill<br />
min_observations = 10<br />
by_user = split(student_outcomes, student_outcomes$student_id)<br />
obs_by_user = sapply(by_user, nrow)<br />
valid_users = names(obs_by_user[obs_by_user >= min_observations])<br />
student_outcomes = student_outcomes[student_outcomes$student_id %in% valid_users,]<br />
<br />
require("msm")<br />
# convert time to simple sequence<br />
student_outcomes$created_index = c(sapply(by_user, function(df) {1:nrow(df)}), recursive=TRUE)<br />
my_hmm = msm(correct ~ created_index, subject = student_id, data = student_outcomes,<br />
qmatrix = rbind(c(NA,0.25),c(0.25,NA)),<br />
hmodel = list(hmmBinom(1,0.3), hmmBinom(1,0.7)),<br />
obstype = 2,<br />
initprobs = c(0.5,0.5),<br />
est.initprobs = TRUE,<br />
method="BFGS"<br />
)<br />
# display final probability for each user<br />
for (user_id in valid_users) {<br />
student_est = estimate_knowledge(correct_by_user[[user_id]], my_msm)<br />
print(sprintf("%s/%s: %f chance know, %f chance correct", skill, user_id, student_est[["p_know"]], student_est[["p_correct"]]))<br />
print(correct_by_user[[user_id]])<br />
}<br />
<br />
</pre></div>ThomasLotze