How do I remove HTML from MS Word 2010 document with Find/Replace Wildcards/Regex?

07
2014-07
  • user325124

    I found a website to help me choose domain names. I have my shortlist which I can't export, but I need to share the list with some other team members first. It also won't let me copy and paste the list of domains.

    With my limited knowledge, I clicked on inspect element, edit as HTML, copy and paste into MS Word 2010. Then I am left with a bunch of HTML looking like this:

    <div id="cartList">
    <div id="cartdomain_mydomain1.com" class="wordDiv">
        <img class="deleteImage" src="/images/trans.gif">
        <button class="buyButton">Buy</button>
        <div title="mydomain1.com">mydomain1.com</div>
    </div>
    <div id="cartdomain_mydomain2.com" class="wordDiv">
        <img class="deleteImage" src="/images/trans.gif">
        <button class="buyButton">Buy</button>
        <div title="mydomain2.com">mydomain2.com</div>
    </div>
    

    How do I remove all the HTML code so I am only left with mydomain1.com, mydomain2.com in a plain text list?

  • Answers
  • m4573r

    Be sure to have the "More >>" panel unfolded, and to select "Use wildcards". You can then use this expression:

    Find what: \<div id="cartdomain?*\<div title="([!"]*)"?*\</div\>?*\</div\>

    Replace with: \1

    When clicking "Replace all", you will be left with your first line (<div id="cartList">) followed by everything that is in the "title" parameter.

    MS2010 doesn't use the standard regex syntax, and is very limited. The way the expression works is:

    • < and > are special delimiters, they have to be escaped with \
    • ?* is basically the equivalent of .*?: it matches any number of any character, non-greedily
    • the parenthesis are used to define a capturing block, which is referred as \1 in the With: field
    • [!"]* means "any number of any character which is not a double quote"

  • Related Question

    regex - How to replace a word with Regular Expression in Microsoft Word 2007?
  • Mohammad

    How can I replace a word with Regular Expression in Microsoft Word 2007?

    For example, I want to find and replace all \n with some spaces.


  • Related Answers
  • Mohammad

    I've found the response.
    I had to use ^13 instead of \n in Find What textbox.

    Find and Replace

    Reference