python - create multiple files from single text file based on headings

25
2013-11
  • Dane

    I have what I think is a really simple problem to solve (preferably with OS X / *nix terminal tools rather than windows command line).

    I have a manuscript formatted in markdown with # (H1) denoting a chapter heading. I would like to break this large file up into smaller chapter files as I'll later be converting the manuscript to epub / mobi with pandoc. This separate-file-per-chapter format is the the recommended format according to the pandoc docs (http://johnmacfarlane.net/pandoc/epub.html ) and also makes the editing process a little less unwieldy. Interestingly it is similar to what some other projects have done e.g. https://github.com/visionmedia/masteringnode

    I was thinking that this would be possible with a simple python script or using something like sed but I just don't know where to start. Any help would be great!

  • Answers
  • Ignacio Vazquez-Abrams

    You're looking for the csplit tool.


  • Related Question

    python - To make a table of contents for a markdown document
  • Masi

    I have the following markdown document

    Heading-a
    ==========
    
    ---text---
    
    Heading-b
    ------------
    
    --- text ---
    
    Heading-c
    ----------
    
    --- text---
    
    Heading-d
    =======
    
    --- text----
    
    Heading-e
    ---
    
    ...
    

    I did not find a tool which makes me a click-able table of contents similarly as LaTex does.

    This suggests me that we should build one. The tool should collect 'H1' headings and 'H2' headings such that it assigns the number 1 to Heading-a and the number 1.1. to Heading-b, 1.2. to Heading-c, 2. to Heading-d, 2.1. to Heading-e and so on.

    We should get the following Table of contens

      1. Heading-a
      1.1. Heading-b
      1.2. Heading-c
      2. Heading-d
      2.1. Heading-e
    

    How can you make such a table of contents by Python/AWK/SED?


  • Related Answers
  • Tyler

    The Markdown in Python implementation has support for extensions one of which includes Table of Contents generation. Additionally Pandoc (which is a Haskell markup->PDF has support for markdown (in addition to a bunch of other formats) and can output pretty HTML, LaTeX, PDFs, etc.

  • Dennis Williamson

    See this article for a comparison of lightweight markup languages with some information on tables of contents that might lead you in the direction of a solution.