forensics - .ddd file - Verity Documentum?

07
2014-07
  • Mat

    I work in computer forensics - one of the data sets that I have recently been asked to analyse contains a number of .ddd files that I have so far been unable to open.

    Reading through these files in a text/hex editor reveals various mentions of 'Verity Inc version 5.5.0'. Some intense googling reveals they may be related to some old document management software called 'verity documentum'.

    These files are dated from back in 2003 - a little before my time! Verity has since been bought by a company called 'Autonomy Corp' which was then purchased by HP. As expected no-one at HP has any idea what i'm talking about and all verity/autonomy contacts I have tried to comminicate with have been dead-ends.

    Asking the 'more experienced' members, has anyone come across these kinds of files or this software before? If so, do you have any idea how to open them or convert them to a more readable format?

  • Answers
  • and31415

    Verity collections

    Verity, Inc. is the company behind the K2 enterprise search engine. Verity's technology has been included in various third-party software such as ColdFusion (from version 5 all the way to version 9.0.1), PeopleSoft, OrCAD, and PaperPort.

    An individual collection represents a logical group of documents plus a set of metadata about those documents. The specific information stored for a collection includes various word indexes, an internal documents table containing document field information, and logical pointers to the actual document files.

    Source: Features of Collections - Contents of Collection Indexes

    Directory structure

    From the Verity Collection Reference:

    Each collection includes the following subdirectories:

    • assists Contains files that give general collection information and assist in optimizing searches, such as spanning word lists (*.wld), the collection "about" file (*.abt), and ngram indexes (*.ngm).

    • morgue Contains collection files scheduled for deletion.

    • parts Contains the internal fields table (*.ddd) and the word index (*.did) for each of the partitions in the collection.

    • pdd Contains the partition map file (*.pdd) for the collection.

    • style The style set that configures the collection. Contains both gateway style files and collection style files.

    • temp Temporary storage used by Verity Spider and K2 Spider.

    • topicidx Contains indexed topic sets, if they exist for this collection.

    • trans Contains files (*.trn) that store information on pending indexing transactions.

    • work Temporary storage for files being processed.

    Source: Verity Collection Reference

    Depending on the collection, some of the folders listed above might be empty or missing entirely. The style and the parts folders are the most relevant ones.

    Partitions

    When indexing documents, the Verity engine stores document metadata in units called partitions. Each partition contains metadata (typically a full-word index) for a set of documents consisting of anywhere from 1 to 64K documents. The Verity engine does not actually copy your document; rather, a partition contains all of the metadata associated with the documents that make them searchable, including:

    • The internal documents table including fields; some fields are defined by default, and custom fields may be defined, like "Title" and "Author".

    • The full word index of the words (sometimes referred to as the word list) in the documents of that partition.

    Source: Inside a Verity Collection - What Are Partitions?

    Each partition consists of a word list and a documents table, which are named after a sequential 8-digit number (e.g. 00000001.did and 00000001.ddd). Both are stored as binary files.

    The fields within the documents table are defined by the following collection style files:

    • style.ddd, defines fields used internally by the Verity engine, identified by an initial underscore character (_).

    • style.sfl, defines standard fields (many of which are commented out to limit the size of the documents table).

    • style.ufl, defines custom fields that are not included in style.sfl.

    The value of each field can be filled in from source documents or can be provided explicitly. If a field is blank, it has not been populated.

    Source: Using browse

    Further reading


    Viewing partition data

    All Verity products come bundled with some maintenance and troubleshooting tools. Among them there's didump and browse. The first one can be used to display the contents of the word lists; the latter can be used to display indexed document fields.

    browse

    The program accepts a single parameter, which is the path of a .ddd file:

    browse.exe "X:\collection\parts\00000001.ddd"
    

    After successfully opening a file it will display the available options:

    BROWSE OPTIONS
      ?) help
      q) quit
      c) Number of entries in field
      _) Toggle viewing fields beginning with '_'
      v) Toggle viewing selected fields
     ##) Display all fields in specified record number
    Dispatch/Compound field options:
      n) No dispatch
      d) Dispatch
      s) Dispatch as stream
    

    Count the amount of records

    To check the amount of indexed records you can type c, and then specify VdkVgwKey as the field, which is the primary key used to identify each entry in the document table:

    Action (? for help): c
    Number of entries in field named: VdkVgwKey
    There are (58) entries in the field (VdkVgwKey)
    

    Display a specific record

    All indexes are zero-based. For example, to get the first entry, type 0 and press Enter:

    Record number: 0
    0  _DDFLAG          FIX-unsg (  1) = 0x00
    1  _DDVALUE         VAR-text (  0) =
    2  _DDVALUE_OF      FIX-unsg (  4) = 0
    3  _DDVALUE_SZ      FIX-unsg (  2) = 0
    4  _DBVERSION       CON-text (  7) = vdk060
    5  _DDDSTAMP        FIX-date (  4) = 17-Apr-2003 01:51:06 pm
    6  _DOCIDX          FIX-text ( 12) = ☺
    7  _PARTDESC        FIX-text ( 32) = vdk150.dll (Verity, Inc. Version
    8  _STYLE           AUT-text ( 58) = C:/Users/Test/Desktop/coll/style/style.ddd
    9  _DOCID           FIX-unsg (  4) = 1
    10 _SECURITY        FIX-unsg (  4) = 0
    12 VdkVgwKey_IX     FIX-unsg (  3) = 53
    13 VdkVgwKey_MI     WRM-text ( 93) = C:\Documents and Settings\khakkara.RATIONAL
    \Desktop\DOCCD\rational_clearcase_lt\cc_admin.pdf
    14 VdkVgwKey_MX     WRM-text ( 75) = C:\Documents and Settings\khakkara.RATIONAL
    \Desktop\DOCCD\using_search.pdf
    15 VdkVgwKey_OF     FIX-unsg (  4) = 32
    16 VdkVgwKey_SZ     FIX-unsg (  2) = 75
    17 Exists           FIX-unsg (  1) = 100
    18 IsAChunk         FIX-unsg (  1) = 0
    19 LargeDoc         FIX-unsg (  1) = 187
    20 StartPage        FIX-unsg (  4) = 1
    21 EndPage          FIX-unsg (  4) = 0
    22 StartPageFrom    FIX-unsg (  4) = 0
    23 EndPageAt        FIX-unsg (  4) = 0
    24 FileName         VAR-text ( 24) = ()(.)(using_search.pdf)
    25 PageMap          VAR-text (  4) = D
    26 NumPages         FIX-unsg (  4) = 2
    27 PermanentID      FIX-text ( 32) = 177032712d4a99426aa238bdad896ba2
    28 WXEVersion       FIX-unsg (  1) = 2
    29 FTS_Title        VAR-text ( 41) = Using Search with Rational Documentation
    30 FTS_Subject      VAR-text (  0) =
    31 FTS_Author       VAR-text ( 18) = Rational Software
    32 FTS_Keywords     VAR-text ( 57) = search, find, full-text Rational Version 20
    03.06.00 Beta
    33 FTS_Creator      VAR-text ( 15) = FrameMaker 7.0
    34 FTS_Producer     VAR-text ( 34) = Acrobat Distiller 5.0.5 (Windows)
    35 FTS_CreationDate FIX-xdat (  4) = 02-Jul-2002 09:01:00 pm
    36 FTS_ModificationDate FIX-xdat (  4) = 03-Apr-2003 10:08:00 pm
    37 DOC              DSP-text ( -1) = C:\Documents and Settings\khakkara.RATIONAL
    \Desktop\DOCCD\using_search.pdf
    38 DOC_FN           VAR-text ( 75) = C:/Documents and Settings/khakkara.RATIONAL
    /Desktop/DOCCD/using_search.pdf
    39 FileName_OF      FIX-unsg (  4) = 32
    40 FileName_SZ      FIX-unsg (  2) = 24
    41 PageMap_OF       FIX-unsg (  4) = 105
    42 PageMap_SZ       FIX-unsg (  2) = 4
    43 FTS_Title_OF     FIX-unsg (  4) = 32
    44 FTS_Title_SZ     FIX-unsg (  2) = 41
    45 FTS_Subject_OF   FIX-unsg (  4) = 0
    46 FTS_Subject_SZ   FIX-unsg (  2) = 0
    47 FTS_Author_OF    FIX-unsg (  4) = 32
    48 FTS_Author_SZ    FIX-unsg (  2) = 18
    49 FTS_Keywords_OF  FIX-unsg (  4) = 32
    50 FTS_Keywords_SZ  FIX-unsg (  2) = 57
    51 FTS_Creator_OF   FIX-unsg (  4) = 90
    52 FTS_Creator_SZ   FIX-unsg (  2) = 15
    53 FTS_Producer_OF  FIX-unsg (  4) = 56
    54 FTS_Producer_SZ  FIX-unsg (  2) = 34
    55 DOC_OF           FIX-unsg (  4) = 0
    56 DOC_SZ           FIX-unsg (  4) = 4294967295
    57 DOC_FN_OF        FIX-unsg (  4) = 32
    58 DOC_FN_SZ        FIX-unsg (  2) = 75
    59 InstanceID       FIX-text ( 32) = 77b25f03d16bf386317bd13c3eba7d5e
    60 InstanceID_IX    FIX-unsg (  3) = 22
    61 DirID            VAR-text (  6) = ()(.)
    62 DirID_IX         FIX-unsg (  3) = 0
    63 DirID_OF         FIX-unsg (  4) = 32
    64 DirID_SZ         FIX-unsg (  2) = 6
    

    By pressing Enter again you can display the next record.

    Further reading


    Obtaining the Verity utilities

    The easiest way to get a copy is to download some software which includes them. For example, the PaperPort application bundled with some Dell multifunction printers and old ColdFusion trial versions.

    Manual installation

    I'll use the PaperPort 15-day trial as an example.

    1. Download the trial. Here are the direct links:

    2. Open the executable using 7-Zip, and extract the PaperPort folder somewhere.

    3. Open a command prompt and navigate to the folder you just extracted:

      cd /d "X:\Whatever\PaperPort"
      
    4. Extract all the files by running the MSI installer in administrative mode:

      msiexec /a "Nuance PaperPort 14.msi" targetdir="%cd%\Temp"
      
    5. Proceed with the installation. When the installer has finished you'll find the Verity tools in the following folder:

      X:\Whatever\PaperPort\Temp\program files\Nuance\PaperPort\Verity\vdk\_nti40\bin
      

    Sample collections

    Here are some Verity collections I found around the web. They might be useful to testing purposes or simply to better understand how they work:


  • Related Question

    Site with list of most popular file formats
  • Željko Filipin

    Is there a site that lists popular file formats?

    I am aware that it would be hard to find out which file formats are popular, and I do not need exact popularity, just approximation. I guess companies that develop anti-malware software could have some information about files that are scanned.

    I have found a few pages that list all file formats (like http://en.wikipedia.org/wiki/List_of_file_formats), but I need just a few popular ones.

    It would be nice if list could be filtered by type (audio, video...) or platform (Windows, Linux, Mac...), but that is optional.

    Some background: I am testing file upload for web application, and I do not want to test all file formats, just popular ones.


  • Related Answers
  • M4dRefluX
  • Bruce McLeod

    If you are testing file uploads, I wouldn't be too concerned with file formats per-se, unless you are post processing them after they are uploaded what I would test is:

    • Very large files
    • Files that may be insecure .. such as buffer overflow payloads.
    • Use some sort of checksum to ensure that the files are uploaded correctly with out error, particularly on a flaky connection.
    • Disconnecting partially through an upload an see what state that leaves the server and clients in
    • File loads that take longer than web server session timeout
    • Files from different filesystems hfs_, ext3, ntfs and Fat32
    • Very long filenames
    • filenames with multiple dots
    • filenames with punctuation, underscores, dashes

    etc

  • nik

    Could not resist this one --- TXT format!
    The 'plain-text' files that manage to get messed up across unix and windows platforms all time.


    +1 to Bruce for approaching the question correctly.
    @Željko Filipin, If you know there is different behavior for some formats,
    get that list specifically and check for it -- why look at all the formats in the world?
    That list itself should suggest if other formats need to be checked.

  • TankorSmash

    Here's a handmade CSV of all the most popular filetypes according to FileInfo.com, and here's a CSV of all the file types listed.