notepad - Powershell Split Large Text File to Individual Files

07
2014-07
  • Edgarion

    I am working on a Powershell script that will allow a user to define a large text file, have the script run, then save a portion of that large file to individual files with a unique file name which the script will grab and save to a defined directory. This is what I have so far.

    $inputfile = Read-Host "Please type the FULL directory and file extenstion of the file you are wanting to split"
    
    #contents of input file
    $filecontent = Get-Content $inputfile
    
    #Find the line numbers of each line containing specified text
    $Lines = (Select-String -InputObject $filecontent -Pattern "Defined Pattern"| Select-Object LineNumber)
    
    [int[]]$LineNumbers = @()
    #remove the extra text from the beginning and end of each array element in the Lines array
    
    foreach ($line in $Lines)
    {
    [string]$value = $line
    $value = ($value.trim("@{LineNumber="))
    $value = ($value.Substring(0,$value.length-1))
    $LineNumbers += $value
    }
    
    
    
    #loop through array and write lines to new files
    for ($i=0;$i -lt $LineNumbers.Count + 1;$i++) 
    {
    
        $startLine = $LineNumbers[$i]
        $endLine = (($LineNumbers[$i + 1])-1)
    
        #Create the file to write into using the first word in the second line as the filename
        $AnimalIDLine = ($fileContent | Select-Object -Index ($startLine + 1))
        $AnimalID = ($AnimalIDLine -split '\s+')[0]
    
        New-Item -type file -Force ("C:\folder\folder" + $AnimalID + ".txt")
    
        #create a new .Net streamwriter object
        $stream = [System.IO.StreamWriter] ("C:\folder\folder" + $AnimalID + ".txt")
    
        #loop through start and end lines and write to file
        for ($h=$startLine; $h -le $endline; $h++)
            {
                #read the line
                $readLine = ($fileContent | Select-Object -Index $startLine)
    
                #write the line to the file
                $stream.WriteLine($readLine)
    
                $startLine++
            }
        $stream.close
    
    }
    $readline
    

    The problem that I am having is when I run the script, it will only create 2 files, one that is blank with the incorrect filename(0.txt), and another that is blank with the correct filename(7HO12423.txt).

    The message i am getting from Powershell when running the script is as follows(it stops running after it creates the two blank files).

    -a---         6/19/2014   8:45 AM          0 7HO12423.txt                                                            
    
    MemberType          : Method
    OverloadDefinitions : {void Close()}
    TypeNameOfValue     : System.Management.Automation.PSMethod
    Value               : void Close()
    Name                : Close
    IsInstance          : True
    
    -a---         6/19/2014   8:45 AM          0 .txt                                                                    
    
    MemberType          : Method
    OverloadDefinitions : {void Close()}
    TypeNameOfValue     : System.Management.Automation.PSMethod
    Value               : void Close()
    Name                : Close
    IsInstance          : True
    

    In addition to fixing this issue and getting the script to run correctly, I would ideally also like for a user to define where the file gets saved instead of having it hard-coded into the script.

    I am still fairly new at using Powershell, so any help would be appreciated. Please let me know if there is any additional information you require.

    Thanks!

  • Answers
    Know someone who can answer? Share a link to this question via email, Google+, Twitter, or Facebook.

    Related Question

    HDI Powershell: If detect line in file, then delete associated file
  • User 92873492834

    I'm trying to remove a huge amount of corrupt jpegs from an image library. Using jpegsnoop.exe, I created a [$jpgname.txt] file for every picture. The corrupt jpegs will have "ERROR" somewhere in the jpgname.txt file.

    So far, I can detect all of the .txt files that flag bad files with the below:

    gci ./ "*.txt" | Select-String -pattern "ERROR" | Format-Table -GroupBy Path

    It outputs something like this for every file it detects (there are thousands):

    Path: H:\library\001.AE3923.jpg.txt
    
    IgnoreCase     LineNumber Line           Filename       Path           Pattern        Context        Matches
    ----------     ---------- ----           --------       ----           -------        -------        -------
         True            285     ERROR: ... 001.AE3923.... H:\library... ERROR                         {ERROR}
         True            286 *** ERROR: ... 001.AE3923.... H:\library... ERROR                         {ERROR}
         True            287 *** ERROR: ... 001.AE3923.... H:\library... ERROR                         {ERROR}
         True            288 *** ERROR: ... 001.AE3923.... H:\library... ERROR                         {ERROR}
         True            290 *** ERROR: ... 001.AE3923.... H:\library... ERROR                         {ERROR}
         True            291 *** ERROR: ... 001.AE3923.... H:\library... ERROR                         {ERROR}
         True            292 *** ERROR: ... 001.AE3923.... H:\library... ERROR                         {ERROR}
         True            294 *** ERROR: ... 001.AE3923.... H:\library... ERROR                         {ERROR}
         True            295 *** ERROR: ... 001.AE3923.... H:\library... ERROR                         {ERROR}
         True            296 *** ERROR: ... 001.AE3923.... H:\library... ERROR                         {ERROR}
         True            298 *** ERROR: ... 001.AE3923.... H:\library... ERROR                         {ERROR}
         True            299 *** ERROR: ... 001.AE3923.... H:\library... ERROR                         {ERROR}
         True            301 *** ERROR: ... 001.AE3923.... H:\library... ERROR                         {ERROR}
         True            302 *** ERROR: ... 001.AE3923.... H:\library... ERROR                         {ERROR}
         True            304   ERROR: Ex... 001.AE3923.... H:\library... ERROR                         {ERROR}
         True            307     ERROR: ... 001.AE3923.... H:\library... ERROR                         {ERROR}
    

    The question is: How do I go from here to deleting the file returned in the "Path" line AND it's jpeg equivalent? That is, deleting both H:\library\001.AE3923.jpg.txt and H:\library\001.AE3923.jpg for every file returned by the gci / grep.

    Thank you.

    Response to EBGreen:

    Thanks for responding: It's much closer. I get the following errors, though:

    Remove-Item : Cannot bind argument to parameter 'Path' because it is null.
    At line:1 char:171
    + gci ./ "*.txt" | Select-String -pattern "ERROR" | %{$txtFile = Get-Item $_.Path; $jpgFile = Get-Item ('{0}\{1}' -f $t
    xtFile.DirectoryName, $txtFile.BaseName); Remove-Item <<<<  $jpgFile; Remove-Item $txtFile}
        + CategoryInfo          : InvalidData: (:) [Remove-Item], ParameterBindingValidationException
        + FullyQualifiedErrorId : ParameterArgumentValidationErrorNullNotAllowed,Microsoft.PowerShell.Commands.RemoveItemC
       ommand
    
    Remove-Item : Cannot remove item H:\library\0010712x1024.jpg.txt: **The process cannot acces
    s the file 'H:\library\0010712x1024.jpg.txt' because it is being used by another process.**
    At line:1 char:193
    + gci ./ "*.txt" | Select-String -pattern "ERROR" | %{$txtFile = Get-Item $_.Path; $jpgFile = Get-Item ('{0}\{1}' -f $t
    xtFile.DirectoryName, $txtFile.BaseName); Remove-Item $jpgFile; Remove-Item <<<<  $txtFile}
        + CategoryInfo          : WriteError: (H:\library\__I...12x1024.jpg.txt:FileInfo) [Remove-Item], IOException
        + FullyQualifiedErrorId : RemoveFileSystemItemIOError,Microsoft.PowerShell.Commands.RemoveItemCommand
    Remove-Item : Cannot remove item H:\library\0010_54165.jpg.txt: The process cannot access
    the file 'H:\library\0010_54165.jpg.txt' because it is being used by another process.
    At line:1 char:193
    + gci ./ "*.txt" | Select-String -pattern "ERROR" | %{$txtFile = Get-Item $_.Path; $jpgFile = Get-Item ('{0}\{1}' -f $t
    xtFile.DirectoryName, $txtFile.BaseName); Remove-Item $jpgFile; Remove-Item <<<<  $txtFile}
        + CategoryInfo          : WriteError: (H:\library\__I...0_54165.jpg.txt:FileInfo) [Remove-Item], IOException
        + FullyQualifiedErrorId : RemoveFileSystemItemIOError,Microsoft.PowerShell.Commands.RemoveItemCommand
    

  • Related Answers
  • EBGreen

    try this:

    gci ./ "*.txt" | Select-String -pattern "ERROR" | %{$txtFile = Get-Item $_.Path; $jpgFile = Get-Item ('{0}\{1}' -f $txtFile.DirectoryName, $txtFile.BaseName); Remove-Item $jpgFile; Remove-Item $txtFile}