What is it that you would like to test with the site? The content itself? One thing you can do is use regular expressions as dlai was getting at. Another possibility is running it through a function to strip out all formatting, HTML, etc. What you want to do is get it down to a single string of text, no spaces, no line breaks, etc. This would help you verify that the correct text is being displayed. From there, work backwards and try to verify the formatting of the document, if this is part of your requirements.
Try to get your step-by-step process for each case down on paper in the simplest form possible. Then evaluate what you can do with those steps. What can you accomplish easily, what is difficult, what is impossible/improbable? It might be easier for you to automate smaller tasks first and leave the more difficult ones to be built over time, as your automation matures a little more. What is feasible to do now?
9 out of 10 people I prove wrong agree that I'm right. The other person is my wife.