TelecomParis_IPParis.png Telecom Paris
Dep. Informatique & Réseaux

nils.png Nils HolzenbergerHome page

April 2024

5

NeurSymAI.png Neuro-Symbolic Artificial Intelligence

with             Simon Coumes     portrait.jpg         and         Zacchary Sadeddine     Zac.jpg

            other AI courses


5

Statistical Machine Learning in ProbLog


We saw how ProbLog programs can be used to express probabilistic models, then compute the probabilities of certain predicates. Sometimes, the structure of the underlying mechanism is known, but not the exact parameters of the mechanism. Let’s start with the example below.







Dice

Alice and Bob like to roll dice. They play this simple game: each have two dice. Alice starts, and rolls her pair of dice, then records the sum. Then it’s Bob’s turn, and Bob rolls his pair of dice, also recording the sum. Whoever rolls the largest sum wins the round. Bob is using biased dice. While he would like his dice to always land on 6, that would be too obvious, so the bias is more subtle, only slightly skewing the probabilities towards 6, and increasing the expected sum of both dice. Alice is suspecting this, and would like to estimate the bias of Bob’s dice, assuming they are equal.
The structure
Write ProbLog code that describes one throw of two unbiased dice. You should write two predicates:
1. die(+Id, -Value) where Id can be any integer, and Value is the result of rolling one die (so, an integer from 1 to 6).
2. roll(?Outcome) where Outcome is the result of one round of the game above: roll two dice, then sum their value. Outcome is an integer.

    

Learnable parameters
Alice expects that both of Bob’s dice are biased, with the same bias. Knowing that t(_)::a. means that a is a probabilistic ProbLog atom with a learnable parameter, edit the previous die/2 predicate to have learnable probabilities.

    







Hidden Markov Models in ProbLog

If you have been outside of a building sometime in the past month, you know that weather can change quickly. Some changes are predictable, some aren’t. One simple way of modeling the weather is to use a Hidden Markov Model. Here is a simple summary of what an HMM is.
Assume that there is an automaton with \(n\) states. It can shift from one state to the next, with a certain probability, or stay in its current state. Those probabilities can be described by a matrix \(T\), where \(T_{i,j}\) is the probability that the automaton goes from state \(i\) to state \(j\). Naturally, \(\sum \limits_{j=1}^n T_{i,j} = 1\). In each state, the automaton can emit a symbol from a finite list of \(m\) symbols, with probability \(E_{i,j}\) where \(1 \leq i \leq n\) and \(1 \leq j \leq m\). \(E_{i,j}\) is the probability that the automaton will emit symbol number \(j\) when in state number \(i\). Of course, \(\sum \limits_{j=1}^n E_{i,j} = 1\). Last ingredient of this automaton, it starts from state \(i\) with probability \(s_i\).
An HMM starts in some state, then emits a symbol. Then, it moves on to the next state, and emits another symbol. Then, it moves to another state, and so on. This happens in discrete time, i.e. \(t=0, 1, 2, 3, ...\). We only observe the symbols that were emitted, without knowledge of the HMM’s state (hence the name Hidden Markov Model).

Describing an HMM using ProbLog


Let’s write an HMM that represents a fictitious weather model. In that fictitious model, there are 6 types of weather: Each of these is mutually exclusive; it cannot be both 'fog' and 'sunshine'. The HMM that generates these observable states of the weather has 3 hidden states (0, 1 and 2) respectively corresponding to dry, wet and funny weather.
First, the weather can start in state 0 or 1, each with probability .5: .5::start(0) ; .5::start(1). In the dry state, the weather can be ‘cloudy-dry’ with probability .3 or ‘sunshine’ with probability .7, but not both at the same time. We express this with the following ProbLog disjunction: .3::emit(Time, 0, 'cloudy-dry'); .7::emit(Time, 0, 'sunshine').. In the funny state, the weather is invariably 'sunshine+rain', which we encode with this deterministic rule: emit(_, 2, 'sunshine+rain').
Wet weather
Write a probabilistic clause to express that in the wet state, the weather is 'cloudy-rainy' with probability \(\frac{4}{7}\), 'fog' with probability \(\frac{2}{7}\) and 'foggy-rainy' with probability \(\frac{1}{7}\).

    


The weather changes from one hidden state to the next. To say that from state 0 (dry), the weather stays in state 0 with probability .5, goes to state 1 (wet) with probability .3, and to state 2 (funny) with probability .2, we use this annotated disjunction:
.5::trans(Time, 0, 0); .3::trans(Time, 0, 1); .2::trans(Time, 0, 2).

To say that from state 1, the weather stays in state 1 with probability .5, goes to state 0 with probability .3, and to state 2 with probability .2, we write:
.3::trans(Time, 1, 0); .5::trans(Time, 1, 1); .2::trans(Time, 1, 2).

Funny weather
Write a ProbLog clause expressing that from state 2, the weather stays in state 2 w.p. .2, and goes to state 0 or 1 w.p. .4 for each.

    


Combine the code you have written -- and that which was shown up to here about start, trans and emit -- with this implementation of HMMs in ProbLog. The query query(observe_sequence([X1,X2,X3])). will return all possible sequences of 3 weather types along with their probabilities. However, the number of possible sequences is \(m^t\), so that the query query(observe([X1,X2,X3,X4,X5,X6,X7,X8])). is unlikely to finish in a reasonable time on your laptop. However, it is possible to sample from this program by running
problog sample filename.pl -N 100

With this command, you can also estimate the probability of all 8-symbol sequences, using
problog sample filename.pl --estimate

and setting a maximum time of sampling or a maximum number of samples -- very convenient for intractable problems.

Sampling
Try sampling from your program, and paste 10 samples of 8-symbol sequences into the answer form. You can make ProbLog print the probability of each sample using the flag --with-probability.

    

Here is a file containing samples from this weather program, formatted as evidence for parameter learning. Let’s see if we can recover the parameters of the HMM that served to produce them.







Learning parameters

Our goal is to learn the parameters of an HMM, in order to fit the data we sampled from the previous weather model. Of course, this is not something you would do in practice, but this is a good exercise to get familiar with ProbLog’s parameter learning. Hopefully, ProbLog should be able to recover the parameters of the HMM model used to produce the data.
Sanity check (1)
Take all the rules about start, trans and emit that we had so far, and change them to turn their probabilities into learnable parameters. Paste in those lines of code into the answer form.

    

Now let’s learn those parameters. Use the code with the learnable parameters, and combine it with weather-base.pl. Use evidence.pl, containing samples from the weather program and run
problog lfi weather-learn.pl evidence.pl -O learned-model.pl -v

learned-model.pl contains the probabilities learned by ProbLog. Compare that to the original probabilities -- do they match?
To learn the parameters, ProbLog adjusts the parameters to maximize the observed evidence. This is the maximum likelihood estimation of the parameters. To achieve that, ProbLog uses expectation maximization.

Sanity check (2)
Rewrite the clauses about start, trans and emit, so that
  • The HMM can start from any of the 3 states (0, 1, or 2)
  • The HMM can transition from any state to any other state
  • The HMM can emit any symbol from any state
  • All probabilities are learnable.
Paste in those new clauses into the answer form.

    

Parameter learning
Use the learnable HMM from the previous question to learn the parameters of the weather model:
problog lfi weather-learn.pl evidence.pl -O learned-model.pl -v -n 500

This command will run up to 500 learning iterations, which should take about 10 minutes. Take that opportunity to get up and stretch your legs. You can find the learned parameters in learned-model.pl. Compare the learned parameters to the parameters of the original model. What do you notice?

    

Sleep

We have worked with an HMM where transition probabilities are fixed, i.e. the matrices \(T\) and \(E\) are fixed across states. For an example of probabilities that change depending on the state, see this simplified model of sleep I coded.
Line.jpg







Resources

Bibliography

Documentation and tutorials

            
Line.jpg

Back to the main page