In a project at work I had to compare two files and generate a list of missing lines so that a spreadsheet could be created. Sounds easy and probably is. Example, file A has 1,2,4. File B has 1,2. So File B is missing 4.
So, if args[] has the file paths, a simple approach is:
MissingLines.groovy script
// MissingLines.groovy def firstList = new File(args[0]).readLines() new File(args[1]).readLines().each { if(!firstList.contains(it)){ println(it) } }
That was easy! 🙂 Testing it is not so easy. 🙁
Alternative approach using Map
// MissingLines.groovy using Map def inFirst = [:] new File(args[0]).eachLine{ inFirst.put(it, null) } new File(args[1]).eachLine { if(!inFirst.containsKey(it)){ println(it) } }
Still easy …
The script above compares two files and prints the lines that appear in the first but not in the second. It uses a simple string compare. To find out whats missing in each file a better recourse is to use a real file diffing tool. This script reads one of the files into memory, so with very large files may balk. How to test a Groovy script?
From a Computer Science point of view this script is suboptimal of course, all of those calls to contains() must be bad. But, this was a script to solve a single task, so a throwaway.
For some reasons using the available tools I could not do it. I tried different file comparison tools, even KDiff3, diff, some editors, and simple command line utilities. Maybe I was having a bad hair day. It’s not that they won’t show the missing lines, they gave no way to reuse that info. Hint to tool builders, allow the user to re-purpose the result.
Yes, I know this can be done in one line on Linux using the incredible Linux utilities and shell, or a Perl script so tiny it can be etched using atoms. And, since I’m running on Windows, can also be done using Cygwin. I searched and saw how. I’m sure I could have figured it out too, using sort, diff, and all that.
So this post is on how to write a simple script that does this using the Groovy JVM based language and how to test it with JUnit. The testing part for a learning exercise; I really used the compare script. But, why is it so hard to do with common tools, oh well. Groovy offers an easy to use syntax and for Java developers, a comfortable super set of standard Java.
Testing it
Below is a test of this script. Note that this is just a JUnit 3 test. JUnit is embedded in Groovy (that is cool). As I wrote above, this simple script does not require a test class. But, it gave me the opportunity to figure out how to test a Groovy script using JUnit. This illustrates:
- how to test a simple script
- how to provide command line args
- how to capture and test the console output.
- use of the default Eclipse test invocation project location.
Test data
a.txt b.txt ----- ----- Alfa India Bravo Juliet Charlie Kilo Delta Lima Echo Mike Foxtrot November Golf Oscar Hotel Papa Romeo Quebec Sierra Tango Uniform Victor Whiskey Xray Yankee Zulu
MissingLinesTest.groovy class
// MissingLinesTest.groovy import java.io.ByteArrayOutputStream; import java.io.PrintStream; import groovy.util.GroovyTestCase; import groovy.lang.GroovyShell; /** * Unit Test the MissingLines.groovy script. * Also illustrates techniques * 1. Capturing console out. * 2. Accessing test data via Eclipse's project folder. * 3. Supplying command line args to script. * * The project structure is: * . * +---.hg * +---.settings * +---bin * +---docs * +---data * ---src * */ class MissingLinesTest extends GroovyTestCase { def basedir = new java.io.File(".").getAbsolutePath() def originalOut // console output def ByteArrayOutputStream captured // def script = new GroovyShell() // @Override protected void setUp() throws Exception { originalOut = System.out captured = new ByteArrayOutputStream() System.setOut(new PrintStream(captured)) // can fail } /** first test */ public final void testAB(){ callTest("dataa.txt","datab.txt", ["India","Juliet","Kilo","Lima","Mike","November","Oscar","Papa","Quebec"]) } /** Second test */ public final void testBA(){ callTest("datab.txt","dataa.txt", ["Alfa","Bravo","Charlie","Delta","Echo","Foxtrot", "Golf","Hotel","Romeo","Sierra","Tango","Uniform","Victor","Whiskey","Xray","Yankee","Zulu"]) } /** */ public final void testNothingFellOut(){ // todo: combining missing from each should give total lines from both files. } /** avoids repeating test code */ def callTest(String file1Path, String file2Path, List expected){ script.run(new File(buildPath("srcMissingLines.groovy")), [buildPath(file1Path), buildPath(file2Path)] ) def list = captured.toString().readLines() def flag = list.containsAll(expected) assertTrue("Did not find missing strings",flag) } private String buildPath(name){ return basedir + "" + name } @Override protected void tearDown() throws Exception { System.out = originalOut } }
Compare the size and complexity of the test class with the target test script. Could it have been smaller too?
One thought on “Use Groovy to find missing lines in a file”