How do I remove HTML from MS Word 2010 document with Find/Replace Wildcards/Regex?
2014-07
I found a website to help me choose domain names. I have my shortlist which I can't export, but I need to share the list with some other team members first. It also won't let me copy and paste the list of domains.
With my limited knowledge, I clicked on inspect element, edit as HTML, copy and paste into MS Word 2010. Then I am left with a bunch of HTML looking like this:
<div id="cartList">
<div id="cartdomain_mydomain1.com" class="wordDiv">
<img class="deleteImage" src="/images/trans.gif">
<button class="buyButton">Buy</button>
<div title="mydomain1.com">mydomain1.com</div>
</div>
<div id="cartdomain_mydomain2.com" class="wordDiv">
<img class="deleteImage" src="/images/trans.gif">
<button class="buyButton">Buy</button>
<div title="mydomain2.com">mydomain2.com</div>
</div>
How do I remove all the HTML code so I am only left with mydomain1.com, mydomain2.com in a plain text list?
Be sure to have the "More >>" panel unfolded, and to select "Use wildcards". You can then use this expression:
Find what: \<div id="cartdomain?*\<div title="([!"]*)"?*\</div\>?*\</div\>
Replace with: \1
When clicking "Replace all", you will be left with your first line (<div id="cartList">
) followed by everything that is in the "title" parameter.
MS2010 doesn't use the standard regex syntax, and is very limited. The way the expression works is:
<
and>
are special delimiters, they have to be escaped with\
?*
is basically the equivalent of.*?
: it matches any number of any character, non-greedily- the parenthesis are used to define a capturing block, which is referred as
\1
in theWith:
field [!"]*
means "any number of any character which is not a double quote"
How can I replace a word with Regular Expression in Microsoft Word 2007?
For example, I want to find and replace all \n
with some spaces.
I've found the response.
I had to use ^13
instead of \n
in Find What
textbox.