Is this a type of the old query and reply software from the '70's and '80's or more like a Wizard? Either way I would approach it by baselining the replies to specific queries utilizing data driven techniques and then on future versions compare the baseline results with current results. Of course any links or special GUI functions would have to be tested separately.
A complete system test would imply examining all imaginable situations (cases). An extensive test like this would be ideal, but unfortunately unfeasible. Even a quasi extensive test would be practically impossible.