Use Groovy to find missing lines in a file

Compare two files, print missing lines from one file. Also show how to test a Groovy script.

In a project at work I had to compare two files and generate a list of missing lines so that a spreadsheet could be created. Sounds easy and probably is. Example, file A has 1,2,4. File B has 1,2. So File B is missing 4.

So, if args[] has the file paths, a simple approach is:

MissingLines.groovy script

// MissingLines.groovy

def firstList = new File(args[0]).readLines()

new File(args[1]).readLines().each {

That was easy! 🙂 Testing it is not so easy. 🙁

Alternative approach using Map

// MissingLines.groovy  using Map

def inFirst = [:]

new File(args[0]).eachLine{
	inFirst.put(it, null)

new File(args[1]).eachLine {

Still easy …

The script above compares two files and prints the lines that appear in the first but not in the second. It uses a simple string compare. To find out whats missing in each file a better recourse is to use a real file diffing tool. This script reads one of the files into memory, so with very large files may balk. How to test a Groovy script?

From a Computer Science point of view this script is suboptimal of course, all of those calls to contains() must be bad. But, this was a script to solve a single task, so a throwaway.

For some reasons using the available tools I could not do it. I tried different file comparison tools, even KDiff3, diff, some editors, and simple command line utilities. Maybe I was having a bad hair day. It’s not that they won’t show the missing lines, they gave no way to reuse that info. Hint to tool builders, allow the user to re-purpose the result.

Yes, I know this can be done in one line on Linux using the incredible Linux utilities and shell, or a Perl script so tiny it can be etched using atoms. And, since I’m running on Windows, can also be done using Cygwin. I searched and saw how. I’m sure I could have figured it out too, using sort, diff, and all that.

So this post is on how to write a simple script that does this using the Groovy JVM based language and how to test it with JUnit. The testing part for a learning exercise; I really used the compare script. But, why is it so hard to do with common tools, oh well. Groovy offers an easy to use syntax and for Java developers, a comfortable super set of standard Java.

Testing it

Below is a test of this script. Note that this is just a JUnit 3 test. JUnit is embedded in Groovy (that is cool). As I wrote above, this simple script does not require a test class. But, it gave me the opportunity to figure out how to test a Groovy script using JUnit. This illustrates:

  • how to test a simple script
  • how to provide command line args
  • how to capture and test the console output.
  • use of the default Eclipse test invocation project location.
Test data
a.txt             b.txt 
-----             ----- 
Alfa              India
Bravo             Juliet
Charlie           Kilo
Delta             Lima
Echo              Mike
Foxtrot           November
Golf              Oscar
Hotel             Papa
Romeo             Quebec

MissingLinesTest.groovy class

// MissingLinesTest.groovy
import groovy.util.GroovyTestCase;
import groovy.lang.GroovyShell;

 * Unit Test the MissingLines.groovy script.
 * Also illustrates techniques
 * 1.  Capturing console out.
 * 2.  Accessing test data via Eclipse's project folder.
 * 3.  Supplying command line args to script.
 * The project structure is:
 * .
 * +---.hg
 * +---.settings
 * +---bin
 * +---docs 
 * +---data
 * ---src
class MissingLinesTest extends GroovyTestCase {
	def basedir = new".").getAbsolutePath()
	def originalOut // console output
	def ByteArrayOutputStream captured // 
	def script = new GroovyShell() //

	protected void setUp() throws Exception {
		originalOut = System.out
		captured = new ByteArrayOutputStream()
		System.setOut(new PrintStream(captured)) // can fail
	/** first test */
	public final void testAB(){		
	/** Second test */
	public final void testBA(){		
	/**  */
	public final void testNothingFellOut(){
		// todo: combining missing from each should give total lines from both files.
	/** avoids repeating test code */
	def callTest(String file1Path, String file2Path, List expected){ File(buildPath("srcMissingLines.groovy")), [buildPath(file1Path), buildPath(file2Path)] )
		def list = captured.toString().readLines()
		def flag = list.containsAll(expected)
		assertTrue("Did not find missing strings",flag)
	private String buildPath(name){
		return basedir + "" + name		
	protected void tearDown() throws Exception {
		System.out = originalOut


Compare the size and complexity of the test class with the target test script. Could it have been smaller too?

Further reading

One thought on “Use Groovy to find missing lines in a file”

  1. Pingback: Josefbetancourt

Leave a Reply

Your email address will not be published. Required fields are marked *