Have you ever copied text from a Word document or another source and pasted it into your web content management system (CMS), only to find that the formatting of the text gets messy and full of odd characters? If you are not careful you may end up with text that looks like this in your Word document:
This is text that was keyed in Microsoft® Word. “It’s possible, you can never know, that the universe exists only for me. If so — it’s sure going well for me, I must admit,” Bill Gates, ©TIME magazine, January 13, 1997. Available for £5 at newsstands everywhere.
That happens because text copied from Word (or other rich text documents) contains formatting that comes along for the ride and can wreak havoc on your web page text formatting in the process. And once it happens it’s hard to undo. An ounce of prevention will save you a pound of cure…just don’t paste formatted text into a CMS or web authoring tool. There are three methods to solve this problem.
Method 1: copy-paste-copy-paste using a plain text document
It’s that easy, at least it can be. You may still have some special characters that come along for the ride, even with this copy-paste-copy-paste method. More about that later.
Method 2: copy-paste-copy-paste using a WYSIWYG control
Many content management systems (CMS) and html editors use a WYSIWYG (what you see is what you get) tool bar like the one shown below. If your system has something similar, see if there is a button that allows you to “Paste as plain text” or “Paste from Word.”
Use these tools just as described in method #1. Select and copy the text in your Word document, click the “Paste from Word” or “Paste as plain text” button, paste the text into the resulting popup window, then click OK. This method should get rid of text formatting like font specification, point size, style, etc. But it doesn’t always clean up all the potential problems. In particular you may still need to change special characters to web-friendly formatting.
Debugging after you copy-paste-copy-paste
Most of your “phantom formatting” problems should be solved by using one of the above methods. You may still have some unwanted special characters, even with those copy-paste-copy-paste methods. Some of the more common ones are “curly” quotation marks and apostrophes (also called “smart quotes”), which should be changed to “straight” quotation marks and apostrophes. To prevent this, you can search your Word document and replace all the quotation marks and apostrophes with straight ones. Microsoft also offers a helpful article on turning off smart quotes so you don’t end up with them in your Word document in the first place. Other characters that may not play nice with “copy and paste from Word” are dashes — and symbols like © copyright marks, trademarks™ and registered trademarks ® — and there are dozens of other potential offenders.
Note that you may not notice these problem characters until you view your web pages in a variety browsers on different platforms (See The 10 Cs of Great Content, #9: Compatible). That’s because the special characters look good to you, since your system knows how to display them. That may not be the case in other browsers or operating systems.
If you see problem characters show up on your web pages you must seek them out and eradicate them. Your html or WYSIWYG tool bar should have a button for “special characters” that gives you a character selector like the one shown here. Delete the bad characters and use this tool to insert web-friendly replacements.
If you have the ability to work directly on the html source code, you can also fix special character problems by using html code known as “ampersand characters” or “character entities.” This html source website offers a helpful list of all the special character codes. For example, to produce a © symbol, you place this code in the html: ©
Another helpful tool that some WYSIWYG toolbars offer is a “Remove Format” tool (in the example below it’s an eraser icon).
If you have such a tool, select all the text after is it pasted into the CMS, and click the eraser button. Voila! Unwanted formatting that may have been pasted in from Word is erased. You can also use this button to erase formatting that you did using the CMS’ or html editor’s WYSIWYG tools.