Use Groovy to find missing lines in a file

Compare two files, print missing lines from one file. Also show how to test a Groovy script.

In a project at work I had to compare two files and generate a list of missing lines so that a spreadsheet could be created. Sounds easy and probably is. Example, file A has 1,2,4. File B has 1,2. So File B is missing 4.

So, if args[] has the file paths, a simple approach is:

MissingLines.groovy script

// MissingLines.groovy

def firstList = new File(args[0]).readLines()

new File(args[1]).readLines().each {
    if(!firstList.contains(it)){
        println(it)
    }
}

That was easy! 🙂 Testing it is not so easy. 🙁

Alternative approach using Map

// MissingLines.groovy  using Map

def inFirst = [:]

new File(args[0]).eachLine{
	inFirst.put(it, null)
}

new File(args[1]).eachLine {
	if(!inFirst.containsKey(it)){
		println(it)
	}
}

Still easy …

The script above compares two files and prints the lines that appear in the first but not in the second. It uses a simple string compare. To find out whats missing in each file a better recourse is to use a real file diffing tool. This script reads one of the files into memory, so with very large files may balk. How to test a Groovy script?

From a Computer Science point of view this script is suboptimal of course, all of those calls to contains() must be bad. But, this was a script to solve a single task, so a throwaway.

For some reasons using the available tools I could not do it. I tried different file comparison tools, even KDiff3, diff, some editors, and simple command line utilities. Maybe I was having a bad hair day. It’s not that they won’t show the missing lines, they gave no way to reuse that info. Hint to tool builders, allow the user to re-purpose the result.

Yes, I know this can be done in one line on Linux using the incredible Linux utilities and shell, or a Perl script so tiny it can be etched using atoms. And, since I’m running on Windows, can also be done using Cygwin. I searched and saw how. I’m sure I could have figured it out too, using sort, diff, and all that.

So this post is on how to write a simple script that does this using the Groovy JVM based language and how to test it with JUnit. The testing part for a learning exercise; I really used the compare script. But, why is it so hard to do with common tools, oh well. Groovy offers an easy to use syntax and for Java developers, a comfortable super set of standard Java.

Testing it

Below is a test of this script. Note that this is just a JUnit 3 test. JUnit is embedded in Groovy (that is cool). As I wrote above, this simple script does not require a test class. But, it gave me the opportunity to figure out how to test a Groovy script using JUnit. This illustrates:

  • how to test a simple script
  • how to provide command line args
  • how to capture and test the console output.
  • use of the default Eclipse test invocation project location.
Test data
a.txt             b.txt 
-----             ----- 
Alfa              India
Bravo             Juliet
Charlie           Kilo
Delta             Lima
Echo              Mike
Foxtrot           November
Golf              Oscar
Hotel             Papa
Romeo             Quebec
Sierra            
Tango             
Uniform           
Victor            
Whiskey           
Xray              
Yankee            
Zulu              

MissingLinesTest.groovy class

// MissingLinesTest.groovy
import java.io.ByteArrayOutputStream;
import java.io.PrintStream;
import groovy.util.GroovyTestCase;
import groovy.lang.GroovyShell;

/**
 * Unit Test the MissingLines.groovy script.
 * Also illustrates techniques
 * 1.  Capturing console out.
 * 2.  Accessing test data via Eclipse's project folder.
 * 3.  Supplying command line args to script.
 * 
 * The project structure is:
 * .
 * +---.hg
 * +---.settings
 * +---bin
 * +---docs 
 * +---data
 * ---src
 * 
 */
class MissingLinesTest extends GroovyTestCase {
	def basedir = new java.io.File(".").getAbsolutePath()
	def originalOut // console output
	def ByteArrayOutputStream captured // 
	def script = new GroovyShell() //

	@Override
	protected void setUp() throws Exception {
		originalOut = System.out
		captured = new ByteArrayOutputStream()
		System.setOut(new PrintStream(captured)) // can fail
	}
	
	/** first test */
	public final void testAB(){		
		callTest("dataa.txt","datab.txt", 
		["India","Juliet","Kilo","Lima","Mike","November","Oscar","Papa","Quebec"])
	}
	
	/** Second test */
	public final void testBA(){		
		callTest("datab.txt","dataa.txt", 
			["Alfa","Bravo","Charlie","Delta","Echo","Foxtrot",
			"Golf","Hotel","Romeo","Sierra","Tango","Uniform","Victor","Whiskey","Xray","Yankee","Zulu"])
	}
        
	/**  */
	public final void testNothingFellOut(){
		// todo: combining missing from each should give total lines from both files.
	}
	
	/** avoids repeating test code */
	def callTest(String file1Path, String file2Path, List expected){
		script.run(new File(buildPath("srcMissingLines.groovy")), [buildPath(file1Path), buildPath(file2Path)] )
		def list = captured.toString().readLines()
		def flag = list.containsAll(expected)
		assertTrue("Did not find missing strings",flag)
	}	
	
	
	private String buildPath(name){
		return basedir + "" + name		
	}
	
	@Override
	protected void tearDown() throws Exception {
		System.out = originalOut
	}

	
}

Compare the size and complexity of the test class with the target test script. Could it have been smaller too?

Further reading

Use Eclipse project dir for Unit Test data

Use the default launch configuration of a unit test in Eclipse to access Unit Test data.

Use the default launch configuration of a unit test in Eclipse to access Unit Test data.

Best practice is to not access test data in relative or absolute paths in the file system. Instead to use classpath resources. But, maybe you have valid reasons to want to do things your own way. Thus, you want to get at test data in a folder within your project without monkeying around with adding file paths to launch configuration and all that stuff.

Ok, who am I to diminish your mojo. Just use the current directory that Eclipse creates. Example, you have this:

class FileCompareTest extends GroovyTestCase {
       /** 	 */
       public void testSomething(){
		def aFileName = 
                 "C:\path\to\projects\data\alpha.txt";
           // use file in test.
       }

Change it to:

class FileCompareTest extends GroovyTestCase{
       def basedir
	
       @Override
       protected void setUp() throws Exception {
           basedir = new java.io.File(".").getAbsolutePath()
       }
	
       /** 	 */
       public void testSomething(){			
           def aFileName = basedir + "dataalpha.txt"
       }

Updates

17JAN11: Could basedir = System.getProperty(“user.dir”); also be used?

Related links

How to default the working directory for JUnit launch configurations in Eclipse?