pict

Stud Farm

By: Hugin Expert

In the stud farm example, a Bayesian network is used to calculate the probabilities of the horses in a stud farm being carriers of a recessive gene causing a life threatening disease.

Stud Farm - Download

A Constructed Example from a Stud Farm

The stallion Alan has with the mare Ann sired Betsy and with the mare Alice sired Benny. Betsy has with Bill born Carl, and Benny has with Bonnie sired Cecily. Both Bill and Bonnie are born by Ann, but their fathers (A1 and (A2) are in no way related. Carl and Cecily have just born a colt, Dennis.

stud_farm_genes1

Figure 1: Dennis's genealogy

It turns out that Dennis suffers from a life threatening hereditary disease carried by a recessive gene a. The corresponding dominant gene is A. The disease is so serious that Dennis is put down instantly, and as the stud farm wants the gene out of the production, Carl and Cecily are taken out of breeding because they both must be carriers of the gene having genotype Aa.

Now the problem is: Which other horses are to be taken out of breeding? Bonnie is a very fine mare, whereas Alan can be replaced more easily in the production. What will the stud farm be best off doing? It would be nice to know the probabilities of each of the horses being a carrier of the sick gene. Normally the probability of being a carrier is known to be 0.01.

Bayesian Networks

The domain of the inheritance of genes in the stud farm can easily be modeled by a Bayesian network (BN). Actually, the genealogy in figure 1 only needs a conditional probability table (CPT) on each node to be a BN. First we specify the states of the nodes: All horses except Dennis are either carriers (Aa) or not (AA) since none of them are sick. We give them states "AA" and "Aa". Each of the nodes in the upper layer in figure 1 has the CPT shown in table 1. The others except for Dennis have the CPT shown in table 2. Dennis has the CPT shown in table 3.

Alan="AA" Alan="Aa"
0.99 0.01
Table 1: CPT of the nodes in the upper layer (Alan used as an example).
Alan="AA" Alan="Aa"
Ann="AA" Ann="Aa" Ann="AA" Ann="Aa"
Betsy="AA" 1.00 0.50 0.50 0.33
Betsy="Aa" 0.00 0.50 0.50 0.67
Table 2: CPT of the nodes in the middle layers (Betsy used as an example).
Cecily="AA" Cecily="Aa"
Carl="AA" Carl="Aa" Carl="AA" Carl="Aa"
Dennis="AA" 1.00 0.50 0.50 0.25
Dennis="Aa" 0.00 0.50 0.50 0.50
Dennis="aa" 0.00 0.00 0.00 0.25
Table 3: CPT of the node Dennis: P(Dennis | Carl, Cecily).

This BN has been implemented using the Hugin GUI in less than half an hour. Then, the evidence that Dennis is aa is entered and sum propagation is performed. The result is shown in figure 2.

stud_results2

Figure 2: The probabilities of the horses being carriers (Aa) of the sick gene.

In figure 2, we can see that it is very likely that Betsy is a carrier of the sick gene. Both her parents (Ann and Alan) also have great probability of being carriers. However, a more thorough investigation shows that it is very unlikely that both of them are carriers at the same time. In figure 3 we see that if Alan is known to be a carrier, it becomes most unlikely that Ann is also a carrier. This is because a sick gene is only inherited from one parent. The figure shows that the gene is inherited from Alan to Betsy and Benny to Carl and Cecily.

The conclusion to the results would be very dependent on how much the farmer wants to be sure of getting the sick gene out of production. He can never be absolutely sure that he gets rid of the right horses, but he should at least get rid of Betsy, Ann and Bonnie. If he also wants to get rid of Alan because he is easily replaced, this would have no effect if he does not also get rid of Benny, since Benny probably has inherited the sick gene if Alan has it.

stud_results3

Figure 3: If we assume that Alan carries the sick gene, this figure shows that Ann is probably not carrier

This network has been installed on your computer with the Hugin software. Open the network in the Hugin GUI. You can find the network in the Samples subdirectory of your Hugin installation.

Comments

A long list of areas have essential characteristics in common with the above example, e.g. medical diagnosis and treatment, credit valuation of customers, search for minerals, monitoring of biological production plants, image understanding, information retrieval and fault analysis.

The areas are characterized by a cause-effect structure, where effects are not completely determined. Sometimes an event has one effect and sometimes it has another. This phenomenon is called causal uncertainty. A domain characterized by causal uncertainty can be modeled by a BN.

Another characteristic of the areas is that the number of essential properties can not be observed directly. This is the diagnosis problem: You know only the symptoms, and from them you must conclude the causes. You must so to speak reason in the opposite direction of the arrows in the network.