notepad - Powershell Split Large Text File to Individual Files
2014-07
I am working on a Powershell script that will allow a user to define a large text file, have the script run, then save a portion of that large file to individual files with a unique file name which the script will grab and save to a defined directory. This is what I have so far.
$inputfile = Read-Host "Please type the FULL directory and file extenstion of the file you are wanting to split"
#contents of input file
$filecontent = Get-Content $inputfile
#Find the line numbers of each line containing specified text
$Lines = (Select-String -InputObject $filecontent -Pattern "Defined Pattern"| Select-Object LineNumber)
[int[]]$LineNumbers = @()
#remove the extra text from the beginning and end of each array element in the Lines array
foreach ($line in $Lines)
{
[string]$value = $line
$value = ($value.trim("@{LineNumber="))
$value = ($value.Substring(0,$value.length-1))
$LineNumbers += $value
}
#loop through array and write lines to new files
for ($i=0;$i -lt $LineNumbers.Count + 1;$i++)
{
$startLine = $LineNumbers[$i]
$endLine = (($LineNumbers[$i + 1])-1)
#Create the file to write into using the first word in the second line as the filename
$AnimalIDLine = ($fileContent | Select-Object -Index ($startLine + 1))
$AnimalID = ($AnimalIDLine -split '\s+')[0]
New-Item -type file -Force ("C:\folder\folder" + $AnimalID + ".txt")
#create a new .Net streamwriter object
$stream = [System.IO.StreamWriter] ("C:\folder\folder" + $AnimalID + ".txt")
#loop through start and end lines and write to file
for ($h=$startLine; $h -le $endline; $h++)
{
#read the line
$readLine = ($fileContent | Select-Object -Index $startLine)
#write the line to the file
$stream.WriteLine($readLine)
$startLine++
}
$stream.close
}
$readline
The problem that I am having is when I run the script, it will only create 2 files, one that is blank with the incorrect filename(0.txt), and another that is blank with the correct filename(7HO12423.txt).
The message i am getting from Powershell when running the script is as follows(it stops running after it creates the two blank files).
-a--- 6/19/2014 8:45 AM 0 7HO12423.txt
MemberType : Method
OverloadDefinitions : {void Close()}
TypeNameOfValue : System.Management.Automation.PSMethod
Value : void Close()
Name : Close
IsInstance : True
-a--- 6/19/2014 8:45 AM 0 .txt
MemberType : Method
OverloadDefinitions : {void Close()}
TypeNameOfValue : System.Management.Automation.PSMethod
Value : void Close()
Name : Close
IsInstance : True
In addition to fixing this issue and getting the script to run correctly, I would ideally also like for a user to define where the file gets saved instead of having it hard-coded into the script.
I am still fairly new at using Powershell, so any help would be appreciated. Please let me know if there is any additional information you require.
Thanks!
I'm trying to remove a huge amount of corrupt jpegs from an image library. Using jpegsnoop.exe, I created a [$jpgname.txt] file for every picture. The corrupt jpegs will have "ERROR" somewhere in the jpgname.txt file.
So far, I can detect all of the .txt files that flag bad files with the below:
gci ./ "*.txt" | Select-String -pattern "ERROR" | Format-Table -GroupBy Path
It outputs something like this for every file it detects (there are thousands):
Path: H:\library\001.AE3923.jpg.txt
IgnoreCase LineNumber Line Filename Path Pattern Context Matches
---------- ---------- ---- -------- ---- ------- ------- -------
True 285 ERROR: ... 001.AE3923.... H:\library... ERROR {ERROR}
True 286 *** ERROR: ... 001.AE3923.... H:\library... ERROR {ERROR}
True 287 *** ERROR: ... 001.AE3923.... H:\library... ERROR {ERROR}
True 288 *** ERROR: ... 001.AE3923.... H:\library... ERROR {ERROR}
True 290 *** ERROR: ... 001.AE3923.... H:\library... ERROR {ERROR}
True 291 *** ERROR: ... 001.AE3923.... H:\library... ERROR {ERROR}
True 292 *** ERROR: ... 001.AE3923.... H:\library... ERROR {ERROR}
True 294 *** ERROR: ... 001.AE3923.... H:\library... ERROR {ERROR}
True 295 *** ERROR: ... 001.AE3923.... H:\library... ERROR {ERROR}
True 296 *** ERROR: ... 001.AE3923.... H:\library... ERROR {ERROR}
True 298 *** ERROR: ... 001.AE3923.... H:\library... ERROR {ERROR}
True 299 *** ERROR: ... 001.AE3923.... H:\library... ERROR {ERROR}
True 301 *** ERROR: ... 001.AE3923.... H:\library... ERROR {ERROR}
True 302 *** ERROR: ... 001.AE3923.... H:\library... ERROR {ERROR}
True 304 ERROR: Ex... 001.AE3923.... H:\library... ERROR {ERROR}
True 307 ERROR: ... 001.AE3923.... H:\library... ERROR {ERROR}
The question is: How do I go from here to deleting the file returned in the "Path" line AND it's jpeg equivalent? That is, deleting both H:\library\001.AE3923.jpg.txt and H:\library\001.AE3923.jpg for every file returned by the gci / grep.
Thank you.
Response to EBGreen:
Thanks for responding: It's much closer. I get the following errors, though:
Remove-Item : Cannot bind argument to parameter 'Path' because it is null.
At line:1 char:171
+ gci ./ "*.txt" | Select-String -pattern "ERROR" | %{$txtFile = Get-Item $_.Path; $jpgFile = Get-Item ('{0}\{1}' -f $t
xtFile.DirectoryName, $txtFile.BaseName); Remove-Item <<<< $jpgFile; Remove-Item $txtFile}
+ CategoryInfo : InvalidData: (:) [Remove-Item], ParameterBindingValidationException
+ FullyQualifiedErrorId : ParameterArgumentValidationErrorNullNotAllowed,Microsoft.PowerShell.Commands.RemoveItemC
ommand
Remove-Item : Cannot remove item H:\library\0010712x1024.jpg.txt: **The process cannot acces
s the file 'H:\library\0010712x1024.jpg.txt' because it is being used by another process.**
At line:1 char:193
+ gci ./ "*.txt" | Select-String -pattern "ERROR" | %{$txtFile = Get-Item $_.Path; $jpgFile = Get-Item ('{0}\{1}' -f $t
xtFile.DirectoryName, $txtFile.BaseName); Remove-Item $jpgFile; Remove-Item <<<< $txtFile}
+ CategoryInfo : WriteError: (H:\library\__I...12x1024.jpg.txt:FileInfo) [Remove-Item], IOException
+ FullyQualifiedErrorId : RemoveFileSystemItemIOError,Microsoft.PowerShell.Commands.RemoveItemCommand
Remove-Item : Cannot remove item H:\library\0010_54165.jpg.txt: The process cannot access
the file 'H:\library\0010_54165.jpg.txt' because it is being used by another process.
At line:1 char:193
+ gci ./ "*.txt" | Select-String -pattern "ERROR" | %{$txtFile = Get-Item $_.Path; $jpgFile = Get-Item ('{0}\{1}' -f $t
xtFile.DirectoryName, $txtFile.BaseName); Remove-Item $jpgFile; Remove-Item <<<< $txtFile}
+ CategoryInfo : WriteError: (H:\library\__I...0_54165.jpg.txt:FileInfo) [Remove-Item], IOException
+ FullyQualifiedErrorId : RemoveFileSystemItemIOError,Microsoft.PowerShell.Commands.RemoveItemCommand
try this:
gci ./ "*.txt" | Select-String -pattern "ERROR" | %{$txtFile = Get-Item $_.Path; $jpgFile = Get-Item ('{0}\{1}' -f $txtFile.DirectoryName, $txtFile.BaseName); Remove-Item $jpgFile; Remove-Item $txtFile}