forensics - .ddd file - Verity Documentum?

file-format forensics

07
2014-07

Mat

I work in computer forensics - one of the data sets that I have recently been asked to analyse contains a number of .ddd files that I have so far been unable to open.

Reading through these files in a text/hex editor reveals various mentions of 'Verity Inc version 5.5.0'. Some intense googling reveals they may be related to some old document management software called 'verity documentum'.

These files are dated from back in 2003 - a little before my time! Verity has since been bought by a company called 'Autonomy Corp' which was then purchased by HP. As expected no-one at HP has any idea what i'm talking about and all verity/autonomy contacts I have tried to comminicate with have been dead-ends.

Asking the 'more experienced' members, has anyone come across these kinds of files or this software before? If so, do you have any idea how to open them or convert them to a more readable format?

Answers

and31415

Verity collections

Verity, Inc. is the company behind the K2 enterprise search engine. Verity's technology has been included in various third-party software such as ColdFusion (from version 5 all the way to version 9.0.1), PeopleSoft, OrCAD, and PaperPort.

An individual collection represents a logical group of documents plus a set of metadata about those documents. The specific information stored for a collection includes various word indexes, an internal documents table containing document field information, and logical pointers to the actual document files.

_{Source: Features of Collections - Contents of Collection Indexes}

Directory structure

From the Verity Collection Reference:

Each collection includes the following subdirectories:

assists Contains files that give general collection information and assist in optimizing searches, such as spanning word lists (*.wld), the collection "about" file (*.abt), and ngram indexes (*.ngm).

morgue Contains collection files scheduled for deletion.

parts Contains the internal fields table (*.ddd) and the word index (*.did) for each of the partitions in the collection.

pdd Contains the partition map file (*.pdd) for the collection.

style The style set that configures the collection. Contains both gateway style files and collection style files.

temp Temporary storage used by Verity Spider and K2 Spider.

topicidx Contains indexed topic sets, if they exist for this collection.

trans Contains files (*.trn) that store information on pending indexing transactions.

work Temporary storage for files being processed.

_{Source: Verity Collection Reference}

Depending on the collection, some of the folders listed above might be empty or missing entirely. The style and the parts folders are the most relevant ones.

Partitions

When indexing documents, the Verity engine stores document metadata in units called partitions. Each partition contains metadata (typically a full-word index) for a set of documents consisting of anywhere from 1 to 64K documents. The Verity engine does not actually copy your document; rather, a partition contains all of the metadata associated with the documents that make them searchable, including:

The internal documents table including fields; some fields are defined by default, and custom fields may be defined, like "Title" and "Author".

The full word index of the words (sometimes referred to as the word list) in the documents of that partition.

_{Source: Inside a Verity Collection - What Are Partitions?}

Each partition consists of a word list and a documents table, which are named after a sequential 8-digit number (e.g. 00000001.did and 00000001.ddd). Both are stored as binary files.

The fields within the documents table are defined by the following collection style files:

style.ddd, defines fields used internally by the Verity engine, identified by an initial underscore character (_).

style.sfl, defines standard fields (many of which are commented out to limit the size of the documents table).

style.ufl, defines custom fields that are not included in style.sfl.

The value of each field can be filled in from source documents or can be provided explicitly. If a field is blank, it has not been populated.

_{Source: Using browse}

Viewing partition data

All Verity products come bundled with some maintenance and troubleshooting tools. Among them there's didump and browse. The first one can be used to display the contents of the word lists; the latter can be used to display indexed document fields.

browse

The program accepts a single parameter, which is the path of a .ddd file:

browse.exe "X:\collection\parts\00000001.ddd"

After successfully opening a file it will display the available options:

BROWSE OPTIONS
  ?) help
  q) quit
  c) Number of entries in field
  _) Toggle viewing fields beginning with '_'
  v) Toggle viewing selected fields
 ##) Display all fields in specified record number
Dispatch/Compound field options:
  n) No dispatch
  d) Dispatch
  s) Dispatch as stream

Count the amount of records

To check the amount of indexed records you can type c, and then specify VdkVgwKey as the field, which is the primary key used to identify each entry in the document table:

Action (? for help): c
Number of entries in field named: VdkVgwKey
There are (58) entries in the field (VdkVgwKey)

Display a specific record

All indexes are zero-based. For example, to get the first entry, type 0 and press Enter:

Record number: 0
0  _DDFLAG          FIX-unsg (  1) = 0x00
1  _DDVALUE         VAR-text (  0) =
2  _DDVALUE_OF      FIX-unsg (  4) = 0
3  _DDVALUE_SZ      FIX-unsg (  2) = 0
4  _DBVERSION       CON-text (  7) = vdk060
5  _DDDSTAMP        FIX-date (  4) = 17-Apr-2003 01:51:06 pm
6  _DOCIDX          FIX-text ( 12) = ☺
7  _PARTDESC        FIX-text ( 32) = vdk150.dll (Verity, Inc. Version
8  _STYLE           AUT-text ( 58) = C:/Users/Test/Desktop/coll/style/style.ddd
9  _DOCID           FIX-unsg (  4) = 1
10 _SECURITY        FIX-unsg (  4) = 0
12 VdkVgwKey_IX     FIX-unsg (  3) = 53
13 VdkVgwKey_MI     WRM-text ( 93) = C:\Documents and Settings\khakkara.RATIONAL
\Desktop\DOCCD\rational_clearcase_lt\cc_admin.pdf
14 VdkVgwKey_MX     WRM-text ( 75) = C:\Documents and Settings\khakkara.RATIONAL
\Desktop\DOCCD\using_search.pdf
15 VdkVgwKey_OF     FIX-unsg (  4) = 32
16 VdkVgwKey_SZ     FIX-unsg (  2) = 75
17 Exists           FIX-unsg (  1) = 100
18 IsAChunk         FIX-unsg (  1) = 0
19 LargeDoc         FIX-unsg (  1) = 187
20 StartPage        FIX-unsg (  4) = 1
21 EndPage          FIX-unsg (  4) = 0
22 StartPageFrom    FIX-unsg (  4) = 0
23 EndPageAt        FIX-unsg (  4) = 0
24 FileName         VAR-text ( 24) = ()(.)(using_search.pdf)
25 PageMap          VAR-text (  4) = D
26 NumPages         FIX-unsg (  4) = 2
27 PermanentID      FIX-text ( 32) = 177032712d4a99426aa238bdad896ba2
28 WXEVersion       FIX-unsg (  1) = 2
29 FTS_Title        VAR-text ( 41) = Using Search with Rational Documentation
30 FTS_Subject      VAR-text (  0) =
31 FTS_Author       VAR-text ( 18) = Rational Software
32 FTS_Keywords     VAR-text ( 57) = search, find, full-text Rational Version 20
03.06.00 Beta
33 FTS_Creator      VAR-text ( 15) = FrameMaker 7.0
34 FTS_Producer     VAR-text ( 34) = Acrobat Distiller 5.0.5 (Windows)
35 FTS_CreationDate FIX-xdat (  4) = 02-Jul-2002 09:01:00 pm
36 FTS_ModificationDate FIX-xdat (  4) = 03-Apr-2003 10:08:00 pm
37 DOC              DSP-text ( -1) = C:\Documents and Settings\khakkara.RATIONAL
\Desktop\DOCCD\using_search.pdf
38 DOC_FN           VAR-text ( 75) = C:/Documents and Settings/khakkara.RATIONAL
/Desktop/DOCCD/using_search.pdf
39 FileName_OF      FIX-unsg (  4) = 32
40 FileName_SZ      FIX-unsg (  2) = 24
41 PageMap_OF       FIX-unsg (  4) = 105
42 PageMap_SZ       FIX-unsg (  2) = 4
43 FTS_Title_OF     FIX-unsg (  4) = 32
44 FTS_Title_SZ     FIX-unsg (  2) = 41
45 FTS_Subject_OF   FIX-unsg (  4) = 0
46 FTS_Subject_SZ   FIX-unsg (  2) = 0
47 FTS_Author_OF    FIX-unsg (  4) = 32
48 FTS_Author_SZ    FIX-unsg (  2) = 18
49 FTS_Keywords_OF  FIX-unsg (  4) = 32
50 FTS_Keywords_SZ  FIX-unsg (  2) = 57
51 FTS_Creator_OF   FIX-unsg (  4) = 90
52 FTS_Creator_SZ   FIX-unsg (  2) = 15
53 FTS_Producer_OF  FIX-unsg (  4) = 56
54 FTS_Producer_SZ  FIX-unsg (  2) = 34
55 DOC_OF           FIX-unsg (  4) = 0
56 DOC_SZ           FIX-unsg (  4) = 4294967295
57 DOC_FN_OF        FIX-unsg (  4) = 32
58 DOC_FN_SZ        FIX-unsg (  2) = 75
59 InstanceID       FIX-text ( 32) = 77b25f03d16bf386317bd13c3eba7d5e
60 InstanceID_IX    FIX-unsg (  3) = 22
61 DirID            VAR-text (  6) = ()(.)
62 DirID_IX         FIX-unsg (  3) = 0
63 DirID_OF         FIX-unsg (  4) = 32
64 DirID_SZ         FIX-unsg (  2) = 6

By pressing Enter again you can display the next record.

Obtaining the Verity utilities

The easiest way to get a copy is to download some software which includes them. For example, the PaperPort application bundled with some Dell multifunction printers and old ColdFusion trial versions.

Manual installation

I'll use the PaperPort 15-day trial as an example.

Download the trial. Here are the direct links:
- http://content.nuance.com/PP14/PP14_Pro_BEFIGSD_Trial.exe
- http://imagingcontent.nuance.com/PaperPort14/PP14_Pro_BEFIGSD_Trial.exe
Open the executable using 7-Zip, and extract the PaperPort folder somewhere.
Open a command prompt and navigate to the folder you just extracted:
```
cd /d "X:\Whatever\PaperPort"
```
Extract all the files by running the MSI installer in administrative mode:
```
msiexec /a "Nuance PaperPort 14.msi" targetdir="%cd%\Temp"
```
Proceed with the installation. When the installer has finished you'll find the Verity tools in the following folder:
```
X:\Whatever\PaperPort\Temp\program files\Nuance\PaperPort\Verity\vdk\_nti40\bin
```

Sample collections

Here are some Verity collections I found around the web. They might be useful to testing purposes or simply to better understand how they work:

Home