"Hidden" Information Stored in Word Documents


CSI Staff
Staff Writer
Center for Support of Instruction
Published: 0 2003

Category: » Tech-skills-software » Microsoft-word

All over the news, everyone is talking about computer security. With heightened computer hardware and software security on everyone's mind, many people might overlook the problems a simple Word document could cause.

Word "macro" viruses have made the news in recent years, but many people don't realize that attaching a standard virus-free Word document contains more information about its author and the document's history than most people realize -- or are comfortable with.

Take any Word document you've been working on for some time. Click File—>Properties. Click on the various tabs, and you'll notice the kind of information that's available. For instance, you'll see when the document was first created, last modified, and most recently accessed; how often it's been revised and who last edited it. This could be a problem if, for example, an instructor posts exam questions as a Word document, and the student opens it up to find that the document was first created two years ago. The student may suspect that these questions were used in a previous semester and an unscrupulous student might then start hunting for answers to that previous exam. If the student sees the document was last modified a long time ago, their suspicions may be even further aroused. If there are multiple authors for a document, that can show up here, too, in addition to the total amount of time that the file has been worked on.

And that's just the information that's very easy to access. BBC News Online recently reported in an article entitled "The Hidden Dangers of Documents", that hidden embedded "metadata tags" used by Microsoft's Word software are stored in the document files. The author, Mark Ward, writes that if a Word document is converted to text, the hidden embedded data can be revealed. The metadata is not necessarily sensitive data, and it does take time and special software to access the information, but nonetheless the danger is still there.

According to the article, information can be extracted not only from Word documents, but from Excel and PowerPoint as well, which also include the metadata feature. This includes such details as:

  • Text from other documents open at the same time
  • Previously deleted text
  • E-mail headers and server information
  • Printer names
  • Data about the machine where the document was written
  • Where the document was saved
  • Word version number and document format
  • Names and usernames of document authors

There are, however, ways in which you can protect yourself. Instead of attaching Word documents you could use the text editor feature in WebTycho or you could use Netscape composer to create a web page. You don't have to necessarily publish the web page to a server, you could cut/past the HTML code into WebTycho so long as you don't use images.

And, if you must use Word and are concerned about sharing this kind of information, there are ways to "sanitize" your documents. See: http://support.microsoft.com/default.aspx?scid=kb;EN-US;223396 for additional information.

Rating: Not yet rated



Comments

No comments posted.

Post a Comment / Vote

You must be logged in and be a member of the UMUC community in order to comment.

If you are a member of the UMUC community and do not have an account, please register for a FREE one.

If you have a guest account but are Faculty/Staff of UMUC please send an email to the DE Oracle Site Manager so that your guest account can be updated.