What is the simplest data file metaformat you can create and yet be able to handle future complexity? I started puzzling about this yet again.
Also see follow up post: Simple Java Data File
An example application is given here: Java testing using XStream, AspectJ, Vsdf
Scenario
I had some maintenance to do. So, to reduce the big ball of mud I decided to use external data files. This is where the complexity came in. If I have d data files for each component, and c “components” then the total number of data files is d*c. Future maintenance of so many files is not optimal.
One thing about the required data files in this scenario, some would contain lists, others would be key, value pairs, and so forth. Could these be combined into one data file? I looked at JSON, YAML, XML, and even GRON. Though good they seemed excessive. What if, for example, I needed a simple list? In a simple text file this could be stored with an item per line, or using simple separators. In the aforementioned metaformats not so.
Solution
I revisited the Windows INI file format and just added metadata to a section. A section, indicated with a header “[…]”, also indicates what its data format is. Also, we allow subsections: [type:identifier/section]. This is similar to a URI. The subsection, which can be a hierarchical ‘path’, is optional. The type is optional, default being list (update: text). If the file has no sections, it is just a line oriented file of data in a list. (Update: line oriented string data).
In the original ini file format, the section data were key=value pairs. Here we follow the freedom of a HEREDOC.
The data type indication is practical when standard collections are being created such as lists, map, arrays, and so forth. We can use a generic “text’ type for a non-typed string payload. Since a host application will know what data it is extracting from a data file, the higher level types such as XML, JSon, and others are of limited value.
The use of subsections in the section name allows scoping, but this was also possible in the original INI file format, just not “formalized”. True subsections should probably be nested sections, i.e. hierarchy. But, then we are now losing the simplicity.
Subsections (though not nested) allow the use of cascading data. Data in a section is automatically reused or available in matching subsections.
See Cascading Configuration Pattern.
Example
# Example very simple data file # [>list:credit/tests] one two three [>csv:credit/report] one,two,three [>properties:credit/config] one=alpha two=beta three=charlie [>xml:credit/beans] <description> <item>one</item> <item>two</item> <item>three</item> </description> [>json:credit/alerts] ["one","two","three"] [>credit] one two three [>gron:credit/coverages] ["one","two","three"]
**** Note: this is an incorrect production *****
file ::= section* ;
section ::= ‘[>’ (type:)? identifier (‘/’ subsection)? ‘]’ (data)+ sectionTerminator;
data ::= (line lineEnd)*;
identifier ::= name
subsection ::= name [/name]* ;
sectionTerminator ::= ‘[<' identifier ('/' subsection)? name ::= [a-zA-Z0-9-_]+
What we have now is a line oriented data file that can contain other data formats, and with no sections the file is just a line oriented list. Listing three is a demo in the form of a Groovy language JUnit 4 test.
import com.octodecillion.vsdf.* import static com.octodecillion.vsdf.Vsdf.EventType.* import org.junit.Test import static org.junit.Assert.assertEquals /** Test Vsdf */ class VsdfTest /*extends GroovyTestCase*/ { def LINESEP = System.properties.get("line.separator") @Test void testshouldGetListData(){ def reader = new Vsdf() reader.reader = new BufferedReader( new FileReader(new File("data-1.vsdf"))) def theEvent = reader.next() while(theEvent != Vsdf.EventType.END){ def event = reader.getEvent() if(isSectionCreditWithList(event)){ def data = event.text.split(LINESEP) assert data.size() == 3 assert ( (data[0] == 'one') && (data[1] == 'two') && (data[2] == 'three') ) } theEvent = reader.next() } } /** */ def isSectionCreditWithList(evt){ return evt.id == 'credit' && evt.dataType == 'list' } }
[/expand]
Sample run:
groovy -cp . VsdfTest.groovy JUnit 4 Runner, Tests: 1, Failures: 0, Time: 281
Limitations
Not quite correct yet. One issue is that file encoding format. If we want to include other formats they have their own requirements, Java properties, JSON, XML, and so forth. For example, JSON is Unicode. I don’t think this is a major issue, this solution is meant for config data, so ASCII files are adequate.
Also, should the sections have terminators? Right now, the end of a section is simply the start of another. (Update: the version of this concept in actual use is terminator based, i.e., [<] or [<id/subsection...])
Implementation
Below in listing 3 is a very simple implementation in the Groovy language to show how easy this data file is to use. Note this is just a proof of concept and has not been thoroughly tested. I don’t think the use of mark and reset in the file reading is robust; how do you determine the correct read ahead buffer? To make it easier to parse I think the format will need to have section terminators as does HEREDOCS in Linux.
Source code available as a gist.
// File Vsdf.groovy // Author: Josef Betancourt // package com.octodecillion.vsdf import groovy.transform.TypeChecked; import java.text.BreakDictionary; import java.util.regex.Matcher import org.codehaus.groovy.control.io.ReaderSource; /** * @author Josef Betancourt * */ class Vsdf { String currentFolder String iniFilePath Reader reader int lineNum int sectionNum VsdfEvent data int READAHEADSIZE = 8*1024 def LINESEP = System.properties.get("line.separator") enum State { INIT, ACCEPT, SHIFT, END } public enum EventType { COMMENT, SECTION, END } def state = State.INIT public Vsdf(){ currentFolder = new File(".").getAbsolutePath() } /** * Value object for parsed sections. * */ class VsdfEvent { EventType event String dataType String namespace String id String text int lineNum int sectionNum String sectionString } VsdfEvent getEvent(){ return data } /** * * @return */ @TypeChecked EventType next(){ String line = reader.readLine(); lineNum++ if(line == null){ return EventType.END } String type ='' String namespace = '' String id = '' data = new VsdfEvent() EventType eventType def isBlank = !line.trim() // skip blank lines if( isBlank){ while(true){ line = reader.readLine() lineNum++ if(line || line == null){ break; } } } def isComment = line =~ /^\s*#/ if(isComment){ data.text = line data.event = EventType.COMMENT data.lineNum = lineNum eventType = EventType.COMMENT } if( line.trim() =~ /^\[>.*\]/){ // section? eventType = EventType.SECTION data.sectionString = line sectionNum++ processSection(line, sectionNum, data) } // end if section head return eventType } /** * * @param line * @return */ def processSection(String line, int sectionNum, VsdfEvent data){ data.event = EventType.SECTION data.sectionNum = sectionNum Matcher m = (line.trim() =~ /^\[>(.*)\]/) String mString = m[0][1] def current = mString.trim() if(!current){ def msg = "section $sectionNum is blank" throw new IllegalArgumentException(msg) } def parts = (current =~ /^(.*):(.*)\/(.*)$/) if(!parts){ data.id = current data.dataType='list' }else{ long size = ((String[])parts[0]).length data.dataType = size > 0? parts[0][1] : '' data.id = size > 1 ? parts[0][2] : '' data.namespace = size > 2 ? parts[0][3] : '' } String readData = readSectionData() data.text = readData } /** * * @return */ String readSectionData(){ StringBuilder buffer = new StringBuilder(READAHEADSIZE) while(true){ reader.mark(READAHEADSIZE) String line = reader.readLine(); lineNum++ if(line == null){ reader.reset() break } if( line.trim() =~ /^\[>.*\]/){ // section? reader.reset() break }else{ buffer.append(line + LINESEP) } } return buffer.toString() } }
[/expand]
Further Reading
4 thoughts on “A very simple data file metaformat”