## Declarative Object Graphs: programs as queries

If there are no physical constraints could a program look up it’s solution by firing off a set of queries into a solution space?

While taking a shower this morning I was pondering about some things I read recently.

First some history. Programs were created to run in constrained resources. As such, a program is just collection of small bits of data transforms. Yes, more complicated than this, don’t interrupt.

What if there were no constraints? In future we will have more memory and network speed will be much better.

With no constraints we could map inputs into outputs directly, no computations. A program becomes a query into possible solution spaces. A functionless approach.

This is already the case with database technology. For example, relational databases use a Relational Algebraic basis for declarative programming. In some data centers a whole database of terabyte size fit in memory.

What is required is a mathematical basis for non-constraint programming. Here is my intuitive notion. A program is a dot in a vast tensor state space. When a user clicks on a button, that dot moves to a different subspace and all it’s queries are evaluated in with new basis vectors. When the dots are related to other dots, we have a system.

Since a program is just a bunch of queries from an “instance” to a ‘space’, it should be possible to use big data techniques (“AI”) to generate that space. This would make it possible to automatically generate applications based on a transition graph of how objects move through space.

Well, I have no time or expertise to make the above work. It was probably some chemical in the soap. Doesn’t seem possible to create such a space.

Edits

## Why not store numbers as diff of previous number?

This morning had a thought. We store numbers in a fixed sized memory space. So if we use four bytes to store numbers we would need eight bytes to store the numbers 5 and 6. But, what if we store 5 in four bytes and then 6 in two bit, as a delta? The bits can indicate +1, 0, -1., here an increment of the 5 by one. Larger increments would use more bits of course. Thus, we naturally get compression.

Subject areas: mobile computing, Internet of Things.

True, this wouldn’t work as a ‘live’ memory storage, the I/O would be complex. Or would it? In constrained devices such as Wearable Computing, for example, a smart watch, or in an Internet Of Things remote device, memory limitations may require compressed storage.

As usual, this is not a new idea. It is related to “Delta Encoding” or “Data differencing”. An interesting article on how delta coding could be used for compression is “Effective compression using frame-of-reference and delta coding”.

I have not seen this delta encoding memory approach mentioned anywhere yet.

## Application logging using unique id

For programs that write to log files a best practice is to include a tracking ID. What should this ID be and how to use it? The following is presented as a ‘design pattern’.

## TL;DR

At the earliest inception of a ‘process’ create an ID which is a combination of an ‘origin id’ and ‘unique id’. This non-business related global inception ID (GIID), should be used for every log output to provide machine readability and tool use. When each new business level or operational id is obtained or generated, it is logged to provide an ‘association’ with this GIID. This is similar to various existing practices so it is presented as a Software Design Pattern.

Author: Josef Betancourt, Jan 11, 2015

CR Categories: H.3.4 [Systems and Software]; D1.3 [Concurrent Programming]; D.2.9 [Management]; K.6 [MANAGEMENT OF COMPUTING AND INFORMATION SYSTEMS]; K.6.3 [Software Management]

# Context

## App Logs

An application log file records runtime programmatic events: details, exceptions, debug, and performance related measures. This is different than specialized server log files, such as a webserver’s access logs, error.log, etc. The latter are more standardized, for example with W3C Extended Log File Format, and well supported.

App logs usually capture details at specific logging levels and named contexts. In the Java ecosystem there are plenty of libraries to support this and now the JDK supports this as part of the java.util.logging package.

Despite the advances made in logging APIs and support within operating systems and frameworks, app logs are at a primitive level of software maturity. What A. Chuvakin and G. Peterson describe as the “… horrific world of application logging” is composed of ad hoc, minimally documented, unstructured, untested, and under-specified delivered components.

Attempts to create widely used industry standards have failed and every business, software project, dev team, or industry reinvents and attempt to tackle the same problems.

# Forces

In the context of server app logs, multiple sessions will output log information that can be intermixed. These sessions can be part of an ongoing process, such as a user interaction with a web site.

External integration points (web services, database, etc) may also be invoked. Unless each session is identified in the log and integration logs, subsequent support, debug, and auditing are very difficult.

The problem is not just tracking where and when ‘integration’ occurred or its non-functional integration criteria (i.e., timing), but the tracking of subsequent logging, if any, at that location.

App logs are used extensively during development. Their importance is illustrated by an old mime “debuggers are for wimps”. As such, logs with impeccable tracking used for design and test are a good Return On Investment (ROI).

The same is true for deployed systems. In many cases the only information available on historical behavior is in a log file.

This seems like a programming 101 lesson, but it is widely disregarded in practice. That log file output is a minor concern and sometimes not even part of a “code review” is puzzling.

## Scenarios

1. A service must invoke a distributed call to another system. The service has retry logic, and logs each failure. If each log output does not identify the session or operation, the retries could get mixed with other threads. Identifying an end user’s request is a hit or miss bracketing of time stamps if the development team did not enough identifiable data in each log output.

2. A computer savvy end user or family may attempt to register into your system with multiple browsers simultaneously. This could cause problems if multiple logins are supported and an error occurs. How do you track this and debug it?

3. The app server makes a remote call to a service integration point and that service fails. How is the owner of that service informed as to the specific invocation? There are probably deployed systems where one would have to compare time stamps on log output to even coordinate where the two systems communicated and even then it is vague. Some systems may not even do any logging and the unless there is a fault of some kind.

4. You have to identify time periods based on hazy user complaints, search through multiple log files with regular expressions, then walk each output to recreate a specific error scenario. Isn’t this manual drudgery what computers were supposed to eliminate?

# Solution

## Global Inception ID

Logging with Unique identifiers is encouraged as a best practice:

“Unique identifiers such as transaction IDs and user IDs are tremendously helpful when debugging, and even more helpful when you are gathering analytics. Unique IDs can point you to the exact transaction. Without them, you might only have a time range to use. When possible, carry these IDs through multiple touch points and avoid changing the format of these IDs between modules. That way, you can track transactions through the system and follow them across machines, networks, and services.” — http://dev.splunk.com/view/logging-best-practices/SP-CAAADP6

This unique identifier is generalized so that on first entry into a system or the start of a process, A Global Inception ID (GIID), is assigned to distinguish that ongoing process from others. A more descriptive term would be a Global Tracking ID, but that conjures up privacy and security concerns and is already being used in commerce for a different purpose. But ‘inception ID’ brings up visions of barcodes on people’s foreheads. Ok, how about “bludzwknxxkjysjkajskjjj”?

The term “Global” is to indicate that this ID is unique in a specific enterprise system. The uniqueness comes from its creation on a first contact basis on a specific subsystem. In essence this is a log tracking ID.

For example, a web server or an app server would be the first point of contact or request from an external User. The GIID, consisting of a combination of origin id and a unique id, would be created at this point. GIID ::= originID uniqueID

In article “Write Logs for Machines, use JSON” Paul Querna uses the term “txnId” for this type of ID:

“… this is essentially a unique identifier that follows a request across all our of services. When a User hits our API we generate a new txnId and attach it to our request object. Any requests to a backend service also include the txnId. This means you can clearly see how a web request is tied to multiple backend service requests, or what frontend request caused a specific Cassandra query.”

Another term for this GIID, or ‘txnId’ is Correlation ID. This terminology is used in SharePoint.

The correlation ID is a GUID (globally unique identifier) that is automatically generated for every request that the SharePoint web server receives.

Basically, it is used to identify your particular request, and it persists throughout further communications, if any, between the other servers in the farm. Technically, this correlation ID is visible at every level in the farm, even at a SQL profiler level and possibly on a separate farm from which your SharePoint site consumes federated services. So for example, if your request needs to fetch some information from an application server (say, if you are using the web client to edit an Excel spreadsheet), then all the other operations that occur will be linked to your original request via this unique correlation ID, so you can trace it to see where the failure or error occurred, and get something more specific than “unknown error”. — https://support.office.com/en-nz/article/SharePoint-2010-Correlation-ID-in-error-messages-what-it-is-and-how-to-use-it-5bf2dba7-43d2-484c-8ef4-e059f76e3efa

Contextualized IDs
Another related term is called “Contextualized IDs”, by Michael Nygard:
“… it’s important that IDs carry along their context. It isn’t enough to have an alphanumeric Policy ID field. You need a URN or URI to identify which policy system issued that policy number.”
— Micheal Nygard, ‘Inverted Ownership, http://www.michaelnygard.com/blog/2015/05/inverted-ownership/

Various ‘Structured Logging’ efforts or syslog implementations already contain a ‘sending’ field specification. The GIID incorporates the sender id as the Origin ID, and this combination is more amendable to human and textual tools parsing.

# Consequence

## Size

A good candidate for a GIID must be large enough to satisfy uniqueness requirements. This could be, for example, a 36 character field. Where the log files are manually inspected with a text editor, this increases the log line which already contains many common fields like a time stamp.

## Security

Unintentionally, “bad” logging practices makes it harder to track and correlate personally identifiable information (PIN). With the use the trans-system GIID, correlation between various business related identifiers is made easier.

The correlation ID is not necessarily a secret, but like other tracking objects like cookies, can be used for information discovery or questionable information storage. But, if an attack can already access your log files, there are other more serious issues?

## Redundancy

What determines the first contact subsystem? A true distributed system could be configured or dynamically configured so that any system could be the first contact system. If so, then each subsystem is creating GIID and passing that GIID to other systems that are themselves creating GIIDs.

One approach to handle this is that a system will only create a GIID if none is present in the incoming request.

## Feedback

For user interfaces, exposing the GIID or parts of it in exception situations can be beneficial:

“We also send the txnId to our user’s in our 500 error messages and the X-Response-Idheader, so if a user reports an issue, we can quickly see all of the related log entries.” — https://journal.paul.querna.org/articles/2011/12/26/log-for-machines-in-json/

Compare this to the Hunt The Wampus adventure in enterprises that only have an approximate time of an issue and must track this over multiple systems.

## Accuracy

If a giid is part of a support system and as above the ID would be shared with Users in some way, would the value need some form of validity testing? Should it be tested that it is wellformed and include a checksum?

Example crc calculation for a UUID, based on textual version of id:

```groovy -e "java.util.zip.Adler32 crc = new java.util.zip.Adler32(); crc.update(UUID.randomUUID().toString().getBytes());println Long.toHexString(crc.getValue())"
af9c09a3
```

# Implementation

## Origin ID

An OID uniquely identifies a system in an enterprise. This could be a web server or messaging system. Using a code for the actual system is recommended. Thus, instead of Acctsys, it would be a code, PE65K for example. Using a code is more ‘durable’ than a human readable name.

An extension is to also encode other information in the origin ID, such as application or subsystem identifiers.

This could even reuse various ‘naming’ standards, as found, for example in Directory services such as LDAP.

## Unique ID

This ID must not be a business related entity such as user id or account number. The simple reason is that these may occur in the logging record multiple times for different sessions or transactions. For example, user Jean Valjean with account number 24601 may log in multiple times into a web site. Tracking a specific session if a problem occurs is easier if we use a unique ID.

A business level ID may not even be relevant in another system that interacts with the origin point. In one system the ID could be one type of ID, and in the other the same party or user could be identified with a different ID.

Note that as soon as determined, accessed, or generated, a business level ID should be associated with the GIID. This could be a simple log output of that business ID which, since every log output has a GIID, will associate the business ID or value with the GIID.

Similarly when the same process communicates with another system, that system’s unique identifiers and related business IDs will also be associated with the GIID. For example, a web service will take the GIID and relate it to its own ID(s). Now a support engineer can follow the complete path of the system response to a user session.

## ID creation

The easiest approach is to use the entry system’s session id created by the server. A potential problem is that this session id is not guaranteed to be unique and ends when the session expires. A UUID solves most problems.

Sample UUID generation in Groovy language:

```groovy -e "println UUID.randomUUID().toString().replace('-','')"
```

If the system ID is 3491 then the above UUID is used to create the GIID and use in logging:

### Alternative to UUID use?

A UUID is a 32 character string. Could something smaller be used? Perhaps, but eventually the complexity of threaded systems would make the uniqueness constraint of any ID approach a comparable length.

Other approaches are possible. Most of these will incorporate a timestamp in some way. Note that a UUID contains a timestamp.

An example of a ‘unique’ id is used by MongoDB’s ObjectID specification. That spec calls for a 12-byte BSON type of:
• a 4-byte value representing the seconds since the Unix epoch,
• a 3-byte machine identifier,
• a 2-byte process id, and
• a 3-byte counter, starting with a random value.
An example of an ObjectID string representation is ObjectId(“507f1f77bcf86cd799439011”)

## Log Framework support for GIID

The use of a GIID is a ‘cross-cutting’ concern. Requiring programmatic use of this ID would be burdensome and error-prone, even if stored in a thread-safe context.

Some logging frameworks support the concept of “nested diagnostic contexts”. This is a way of storing an ID so that interleaved logging is properly identified. See http://wiki.apache.org/logging-log4j/NDCvsMDC for more information.

## Example usage

In a conventional Java server application a JSP or template system would obtain a GIID and insert it into generated pages and associated client side scripts. That GIID would also be stored in the server side session. Since the GIID is stored at the session it is accessible to the various services and components on a server.

This ID is embedded in request to other distributed servers and provides event correlation. Thus the logging systems will have access to the GIID and Client or server side errors can then display or use the GIID for tracking and reporting to assist support engineers.

Since the client also has the GIID, it can display or use this ID for customer service processing.

Of course, this would make more sense if it is a part of a wider Application Service Management (ASM) system.

## Standards for IDs

Though many standards specify ID handling, modern architectures, especially web based or distributed, emphasize a stateless protocol. A GIID requirement could be one of those legerdemain stateful practices.

## Development impacts

If logging is a deployed feature of an application then it too needs testing. But, since log output is an integration point, it does not fall under the “unit” testing umbrella. There is even some doubt if this should even be tested! Here is one example: Unit Testing: Logging and Dependency Injection
If log files can contain security flaws, convey data, impact support, and impair performance, then they should be tested that they conform to standards. Project management spreadsheets needs to add rows for logging concerns.

## Technology

Log output can be developer tested using the appropriate XUnit framework, like JUnit.
Mocking frameworks provide a means of avoiding actually sending the output of a logger to an external ‘appender’. “Use JMockit to Unit test logging output”.
Issues
In development of a project, the log output changes rapidly as the code changes. Selecting where in the software development life cycle (SDLC) to test logging or even specify what logs should contain is difficult.
One approach is that the deployed system will not do any app logging that was not approved by the stake holders. These must be “unit” tested, and all development support logging is removed or disabled except for use in a development environment.

## Deployment

There is no need to change every subsystem to use this log tracking approach. If the GIID is created somewhere in the “path” of a process or service, it adds value. Other systems can gradually start to use a tracking ID. Thus, the tools and training to use this capability can also be progressively introduced.

I was going to title this article ‘Logging in Anger’, as a response to my own experiences with application logging. Alas, there are so many issues that I had time to only focus on one as a result of a recent stint supporting an application that exhibits the typical logging anti-patterns. Example: it’s bad to get a null pointer exception, but to not know which argument to a function caused this?

# Appendix

## Structured Logging

Structured Logging is a type of app log file that is data based rather than prose based. Thus, it is machine readable and amendable to high-level tools, not just a text editor.

Treating logs as data gives us greater insight into the operational activity of the systems we build. Structured logging, which is using a consistent, predetermined message format containing semantic information, builds on this technique …. We recommend adopting structured logging because the benefits outweigh the minimal effort involved and the practice is becoming the default standard. — http://www.thoughtworks.com/radar/techniques/structured-logging

An example, is a system where the log output uses a predetermined message format. An overview of such systems is found in chapter 5 of “Common Event Expression”, http://cee.mitre.org/docs/Common_Event_Expression_White_Paper_June_2008.pdf

Note this should not be confused with a similar sounding technology called “Log-structured file system”.

1. ivot Tracing: Dynamic Causal Monitoring for Distributed Systems
2. Dapper, A Large Scale Distributed Systems Tracing Infrastructure
3. Log management and intelligence, http://en.wikipedia.org/wiki/Log_management_and_intelligence
4. Logging a global ID in multiple components, http://stackoverflow.com/questions/1701493/logging-a-global-id-in-multiple-components
5. Application Service Management (APM) system
6. Application performance management, http://en.wikipedia.org/wiki/Application_performance_management
7. The art of application logging, http://www.victor-gartvich.com/2012/05/art-of-application-logging.html
8. Patterns For Logging Diagnostic Messages, http://c2.com/cgi/wiki?PatternsForLoggingDiagnosticMessages
9. UUID, UUID
10. How to test valid UUID/GUID?
11. Log Data as a Valuable Tool in the DevOps Lifecycle (and Beyond), http://devops.com/features/log-data-valuable-tool-devops-lifecycle-beyond/
12. OWASP – Logging Cheat Sheet, https://www.owasp.org/index.php/Logging_Cheat_Sheet
13. How to Do Application Logging Right, http://arctecgroup.net/pdf/howtoapplogging.pdf
14. Request for comment Structured Logging, http://www.mediawiki.org/wiki/Requests_for_comment/Structured_logging
15. 6 – Logging What You Mean: Using the Semantic Logging Application Block, http://msdn.microsoft.com/en-us/library/dn440729(v=pandp.60).aspx
16. A Review of Event Formats as Enablers of event-driven BPM, http://udoo.uni-muenster.de/downloads/publications/2526.pdf
17. Basic Android Debugging with Logs, http://www.androiddesignpatterns.com/2012/05/intro-to-android-debug-logging.html
18. Mapped diagnostic context vs Nested diagnostic context, http://wiki.apache.org/logging-log4j/NDCvsMDC
19. Building Secure Applications: Consistent Logging, http://www.symantec.com/connect/articles/building-secure-applications-consistent-logging
20. Log for machines in JSON, https://journal.paul.querna.org/articles/2011/12/26/log-for-machines-in-json/
21. Logging Discussion, http://c2.com/cgi/wiki?LoggingDiscussion
22. CEE, http://cee.mitre.org/docs/Common_Event_Expression_White_Paper_June_2008.pdf
23. CEE is a Failure., https://gist.github.com/jordansissel/1983121
24. Centralized Logging Architecture, http://jasonwilder.com/blog/2013/07/16/centralized-logging-architecture/
25. Centralized Logging, http://jasonwilder.com/blog/2012/01/03/centralized-logging/
26. Logging and the utility of message patterns, http://calliopesounds.blogspot.com/2014/07/the-power-of-javatextmessageformat.html?m=1
27. Payment Application Data Security Standard, https://www.pcisecuritystandards.org/documents/pa-dss_v2.pdf
Payment application must facilitate centralized logging.
Note: Examples of this functionality may include, but are not limited to:
• Logging via industry standard log file mechanisms such as Common Log File System (CLFS), yslog, delimited text, etc.
• Providing functionality and documentation to convert the application’s proprietary log format into industry standard log formats suitable for prompt, centralized logging.
Aligns with PCI DSS Requirement 10.5.3
28. NIST 800-92 “Guide to Computer Security Log Management”, http://csrc.nist.gov/publications/nistpubs/800-92/SP800-92.pdf
29. UniqueID Generator: A Pattern for Primary Key Generation, http://java.sys-con.com/node/36169
30. java.util.logging, http://docs.oracle.com/javase/7/docs/api/java/util/logging/package-summary.html

## Future of Google Glass: Continuum TV series

The TV series Continuum starring Rachel Nichols has some fancy high tech augmented reality from the year 2077. There are no nerd glasses, the corporate state actually does something to the eyes and other neural circuitry that is used in conjunction with a fancy super-computer skin tight (of course) costume. Pretty neat. The tech is not new, SciFi literature has been using these ideas for a very long time already.

To the naysayers of the Google Glass type of devices: If Science Fiction and other forms of imagination are any judge, we haven’t seen anything yet. The downside is that it seems privacy and security issues will not keep up with tech advances.

Alas, the TV series was cancelled (?). Yet, ‘Two and Half Creatures’ is still on. Go figure.

## List sorting using topological ordering of a digraph

By creating a pseudo weighted directed graph on a list of elements, it is possible to apply graph algorithms to the sorting problem. This offers an alternative to approaches such as QuickSort or Merge sort. As an experiment a Java program implementation is shown.

Terms: topological sorting, directed acyclic graph, Java, JUnit, algorithm

This subject is continued in Linear list sorting using weighted digraph possible?

Pseudo weighted DAG
Given a set of numbers, for example, [9, 2, 12, 40, 12, 3, 1], we can create a directed acyclic graph (DAG) where each number is a vertex and an outgoing edge points to all other elements greater then it. “9” will have an edge to 12 and 40. “12” will have an edge to 40 and also to the other “12” element. This latter edge will be a special ‘equal’ edge and indicates duplicates. If this were an ordinary edge, the graph would have cycles.

Transforming this list into a DAG yields:

Note that once this transformation is complete we essentially know the order of vertices, we just don’t have a linear arrangement yet. We don’t have to revisit any element and compare it with another as in some other sorting algorithms. Of course, this initial step is not efficient, O(n^2).

Topological sorting
A topological sorting (toposort) of a directed graph is a linear ordering of its vertices. An advantage of a toposort is that it runs in linear time. To this time must be added the graph construction phase, and in this implementation, a final step is inserting duplicate elements into the result.

A toposort of a complex graph can have many solutions. As applied here, there is only one solution. But, this remains to be proven or is already known to be true in Graph Theory. See uniqueness.

Implementation

Listing one below contains the Java implementation that sorts the list of elements given above. The program contains an embedded JUnit test. In the test the method loadDag(map, data) creates the DAG. While creating the edges, if a source and sink have values that are equal the ‘equal’ edge is created. The sink vertex is then marked as a duplicate.

The modified toposort in the program will skip vertices that are marked as duplicate. The resulting list is sorted, but is missing the duplicate vertices. In the final step each node in the result set is visited and if it has any outgoing equal edges these destination nodes are inserted into the final result list.

The resultant outputs of the program is:

```Sort with dupes: [one:1, two:2, three:3, nine:9, twelve:12, twelve2:12, forty:40]
Sort with unique: [one:1, two:2, three:3, nine:9, twelve:12, forty:40]
```

Nested JUnit test classes
Though not part of the topic, the programming example used a nested JUnit test class, an interesting approach.

Summary
Presented was a possible approach to list sorting by use of graph algorithms. A Java programming language based example implementation was given.

Listing 1, source available here.

[expand title=”GraphSort.java”]

```package com.octodecillion.sort;

import static org.hamcrest.core.Is.is;
import static org.hamcrest.core.IsEqual.equalTo;
import static org.junit.Assert.assertFalse;
import static org.junit.Assert.assertThat;
import static org.junit.Assert.assertTrue;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;

import org.junit.Test;
import org.junit.runner.RunWith;
import org.junit.runners.JUnit4;

/**
*
* Sorting using directed graph embedding based topological ordering.
*
* @author Josef Betancourt
* @since 20120101T1212
*
*
*        Example code based on code created from M. Jessup's *
*        Which itself is an implementation of one presented in Wikipedia
*
*
*/
public class GraphSort {

/**
* Modification of algorthim presented in Wikipedia article.
*
*
* <pre>
* 		L ? Empty list that will contain the sorted elements
* 		S ? Set of all nodes with no incoming edges
* 		while S is non-empty do
* 		    remove a node n from S
* 		    insert n into L
* 		    for each node m with an edge e from n to m do
* 		        remove edge e from the graph
* 		        if m has no other incoming edges then
* 		            insert m into S
* 		if graph has edges then
* 		    return error (graph has at least one cycle)
* 		else
* 		    return L (a topologically sorted order)
* </pre>
*
* Sedgewick and K. Wayne.
* *
*
* @param <T>
* @param allNodes
* @return sorted list or empty list if not possible
* @throws IllegalArgumentException
*             if graph has cycles
*/
public static <T> ArrayList<Node<T>> topoSort(final Node<T>[] allNodes) {

ArrayList<Node<T>> workList = new ArrayList<Node<T>>();

HashSet<Node<T>> sinks = new HashSet<Node<T>>();
for (Node<T> n : allNodes) {
if (n.inEdges.size() == 0) { // indegree zero? Or source?
}
}

while (!sinks.isEmpty()) {
Node<T> nextNode = sinks.iterator().next();
sinks.remove(nextNode);

if (nextNode.duplicate) {
continue;
}

for (Iterator<Edge<T>> it = nextNode.outEdges.iterator(); it
.hasNext();) {

Edge<T> outNode = it.next();
Node<T> dest = outNode.to;
it.remove();
dest.inEdges.remove(outNode);

if (dest.inEdges.isEmpty()) {
}
}
}

// All edges are removed? If not, we have a cycle.
for (Node<T> n : allNodes) {
if (!n.inEdges.isEmpty()) {
throw new IllegalArgumentException("Cycle present. Node, "
+ n.name
+ " has inEdges. Topological sort not possible");
}
}

ArrayList<Node<T>> resultList = new ArrayList<Node<T>>();
for (Node<T> node : workList) {

if (!node.equiEdges.isEmpty()) {
for (Iterator<Edge<T>> iterator = node.equiEdges.iterator(); iterator
.hasNext();) {
Edge<T> edge = iterator.next();
}
}

}

return resultList;
} // topoSort

/**
* A node.
*
* @param <T>
*/
static class Node<T> {
public final String name;
public final T value;
public final HashSet<Edge<T>> inEdges;
public final HashSet<Edge<T>> outEdges;
public final HashSet<Edge<T>> equiEdges;
public boolean duplicate;

public Node(final String name, final T value) {
this.name = name;
this.value = value;
this.duplicate = false;
inEdges = new HashSet<Edge<T>>();
outEdges = new HashSet<Edge<T>>();
equiEdges = new HashSet<Edge<T>>();
}

public Node<T> addEdge(final Node<T> node) {
Edge<T> e = new Edge<T>(this, node);
return this;
}

@SuppressWarnings("unchecked")
public Node<T> addEdges(final Node<T>... nodes) {
for (int i = 0; i < nodes.length; i++) {
Node<T> node = nodes[i];
Edge<T> e = new Edge<T>(this, node);
}

return this;
}

public Node<T> addEquiEdge(final Node<T> node) {
Edge<T> e = new Edge<T>(this, node);
node.duplicate = true;
return this;
}

@Override
public String toString() {
return String.format("%s:%s", name, value);
}
} // Node<T>

/**
* A directed edge.
*/
static class Edge<T> {
public final Node<T> from;
public final Node<T> to;

public Edge(final Node<T> from, final Node<T> to) {
this.from = from;
this.to = to;
}

@Override
public boolean equals(final Object obj) {
@SuppressWarnings("unchecked")
Edge<T> e = (Edge<T>) obj;
return (e.from == from) && (e.to == to);
}

@Override
public int hashCode() {
return super.hashCode();
}

@Override
public String toString() {
// TODO Auto-generated method stub
return String.format("%s->%s", from, to);
}
} // Edge<T>

// ====================================================================
// Nested JUnit test.
// ====================================================================

/**
*
*
* Uses nested JUnit testing approach advocated by Ben J. Christensen in <a
* href=
* "http://benjchristensen.com/2011/10/23/junit-tests-as-inner-classes/"
* >"JUnit Tests as Inner Classes"</a>
*
* @author jbetancourt
*
*/
@RunWith(JUnit4.class)
public static class GraphSortTest {
@SuppressWarnings("unchecked")
private Node<Integer>[] allNodes = new Node[0];

/**
* Sort list with no duplicate nodes.
*/
@SuppressWarnings("boxing")
@Test
public void should_sort_without_duplicates() {
uniqueList();
ArrayList<Node<Integer>> result = GraphSort.topoSort(allNodes);

assertFalse("empty result", result.isEmpty());
assertThat(Integer.valueOf(result.size()), is(equalTo(6)));
assertTrue("result not sorted", inOrder(result));

System.out.println("Sort with unique: "
+ Arrays.toString(result.toArray()));
}

/**
* Sort list with no duplicate nodes.
*/
@SuppressWarnings("boxing")
@Test
public void should_sort_with_duplicates() {
dupeList();
ArrayList<Node<Integer>> result = GraphSort.topoSort(allNodes);

assertFalse("empty result", result.isEmpty());

System.out.println("Sort with dupes: "
+ Arrays.toString(result.toArray()));

assertThat(Integer.valueOf(result.size()), is(equalTo(7)));
assertTrue("result not sorted", inOrder(result));

}

@SuppressWarnings("unchecked")
private void uniqueList() {
String nodeData = "nine,9;two,2;forty,40;twelve,12;three,3;one,1;";
String dagData = "one,three;one,twelve;one,forty;one,two;one,nine;two,twelve;two,forty;two,three;two,nine;three,nine;three,forty;three,twelve;nine,twelve;nine,forty;twelve,forty";

List<Node<Integer>> list = new ArrayList<Node<Integer>>();

for (Entry<String, Node<Integer>> entry : nodes.entrySet()) {
}

allNodes = list.toArray(new Node[list.size()]);

}

@SuppressWarnings("unchecked")
private void dupeList() {
String nodeData = "nine,9;two,2;forty,40;twelve,12;three,3;one,1;twelve2,12";
String dagData = "one,three;one,twelve;twelve,twelve2,one,forty;one,two;one,nine;two,twelve;two,forty;two,three;two,nine;three,nine;three,forty;three,twelve;nine,twelve;nine,forty;twelve,forty;twelve2,forty";

List<Node<Integer>> list = new ArrayList<Node<Integer>>();

for (Entry<String, Node<Integer>> entry : nodes.entrySet()) {
}

allNodes = list.toArray(new Node[list.size()]);

}

/**
*
* @param nodes
* @param dagData
*/
private void loadDag(final Map<?, ?> nodes, final String dagData) {
String[] dagStrings = dagData.split(";");

for (int i = 0; i < dagStrings.length; i++) {
String[] spec = dagStrings[i].split(",");

@SuppressWarnings("unchecked")
Node<Integer> source = (Node<Integer>) nodes.get(spec[0]);
@SuppressWarnings("unchecked")
Node<Integer> sink = (Node<Integer>) nodes.get(spec[1]);

if (!sink.duplicate || (source.value != sink.value)) {
} else {
}

}

}

/**
*
* @param nodeData
* @return
*/
private Map<String, Node<Integer>> loadNodes(final String nodeData) {
Map<String, Node<Integer>> nodeMap = new HashMap<String, Node<Integer>>();

String[] nodeSpec = nodeData.split(";");

for (int i = 0; i < nodeSpec.length; i++) {
String[] string = nodeSpec[i].split(",");
String name = string[0];
String value = string[1];

@SuppressWarnings("boxing")
Node<Integer> node = new Node<Integer>(name,
Integer.parseInt(value.trim()));

nodeMap.put(name, node);
}

return nodeMap;

}

/**
* Check that list is sorted.
*
* @param nodes
* @return
*/
@SuppressWarnings("boxing")
private boolean inOrder(final ArrayList<Node<Integer>> nodes) {
boolean result = true;
Iterator<Node<Integer>> iterator = nodes.iterator();
Node<Integer> prev = iterator.next();

while (iterator.hasNext()) {
Node<Integer> current = iterator.next();

// in order?
if (!(current.value >= prev.value)) {
result = false;
break;
}
}

return result;
}
}

} // End class GraphSort.java
```

[/expand]

The graph was created with GraphViz using the input:

```## File: listDAG.gv
## Command to generate: dot -Tpng -o listDAG.png listDAG.gv
digraph {

rankdir=LR

A [label="9"];
B [label="2"];
C [label="12"];
D [label="40"];
E [label="12"];
F [label="3"];
G [label="1"];
G->B
G->F
G->A
G->C
G->D
G->E
B->F
B->A
B->C
B->E
B->D
F->A
F->C
F->E
F->D
A->C
A->D
A->E
C->D
C->E [label="equi", fontcolor=darkgreen]

label="\nlist: [9, 2, 12, 40, 12, 3, 1] transformed to DAG by Josef Betancourt";
}
```

Genesis of this idea
One day I was wondering if randomness could be used to sort. The idea I had was, as it turns out, a “better” Bozosort. Instead of swapping indiscriminately, I would swap elements if they were out of order. I had other ideas. But, then I started drawing some diagrams, and it led to other ideas based on graph construction.

I gave up the use of randomization. Is the technique presented using graphs unique? Does it work? Can it scale? Or Does it add to the list of bozo sorts?

Funny xkcd comic: “Ineffective Sorts

20130104:

• Recreated the image using LR ranking in GraphViz.
• Fixed algorithm which broke when I added missing arc from second 12 to 40. TG for Unit tests.

## Parallel Threaded Interpretation of Sequential Code

Here is another web page I’m moving to this blog for storage. I was elaborating an idea I had in 1989 about hardware architecture.

Parallel Threaded Interpretation of Sequential Code (May 1989)

Abstract
Sequential code can be dramatically accelerated by the use of parallel processing where all “interruptions” of sequential execution into non-working code such as branches signal the execution of effected code in parallel on available processors.

In may of 1989 or earlier while reading some descriptions of a proposed stack processor, the Harris Semiconductor RTX 32P, I had an idea to make use of multiple processors in a system. The system I was reading about used two bits to indicate what type of branch to perform: “The RTX 32P has only one instruction format, shown in Figure 5.4. Every instruction contains a 9-bit opcode which is used as the page number for addressing microcode. It also contains a 2-bit program flow control field that invokes either an unconditional branch, a subroutine call, or a subroutine exit. In the case of either a subroutine call or unconditional branch, bits 2-22 are used to specify the high 21 bits of a 23-bit word-aligned target address. …. See Architecture of the RTX 32P

“Wherever possible, the RTX 32P’s compiler compacts an opcode followed by a subroutine call, return, or jump into a single instruction. In those cases where such compaction is not possible, a NOP opcode is compiled with a call, jump, or return, or a jump to next in-line instruction is compiled with an opcode. Tail-end recursion elimination is performed by compressing a subroutine call followed by a subroutine return into a simple jump to the beginning of the subroutine that was to be called, saving the cost of the return that would otherwise be executed in the calling routine. ” — Philip Koopman

What I noticed was that one combination of bits, 11, were not being used:

So I thought, why not use that bit combination to indicate on what processor to execute the code? This thought led to other ideas and I was off thinking of how this could be used, with very fast communication and cache, like optical interconnects, to parallelize sequential code. In a nutshell, each processor would take over execution when a processor hit a branch or other interruption of linear code. That way all processors would be used to run sequential code.

In effect, each processor would be running in their own “thread”, queing results, and eventually ask for results to ‘fire’ actual computation, resolving data dependencies. I think I got sidetracked by being limited to a load/store stack architecture, so I had to resolve the direct manipulation of stack frames and so forth. Keep in mind that I had a little bit knowledge of what computer architecture was, very naive perhaps.

I didn’t solve many of the problems and didn’t continue with it. It was fun, but I thought: if this had any relevance it would be already in use in the industry, and what do I know about this subject? Also, I was doing this in the context of a Stack Processor architecture which commercially was not part of the mainstream (the JVM is a stack machine?). Note to read on this architecture approach see Stack Computers: The New Wave, by Philip J. Koopman, Jr.

Well, years later I read about new architectures coming out, such as the Sun Microsystem’s forthcoming chip codenamed Rock, where “Simultaneous Speculative Threading (SST)”, “speculative execution” or “scout threads” are utilized for high performance. See “Rock: A SPARC CMT Processor” by S. Chaudhry. (btw, the Rock chip project was cancelled by Oracle) Further, I also read that these ideas became projects in the academic research community.

Ok, so I may have been on to something.

1. Stack machine
2. THE STANFORD HYDRA CMP“/hydra_MICRO00.pdf
3. Rock: A SPARC CMT Processor” by S. Chaudhry
4. Stack Computers: The New Wave, by Philip J. Koopman, Jr.
5. Architecture of the RTX 32P
6. GreenArrays, Inc.’s Common Architecture

## Simple Java Data File

In a prior post “A very simple data file metaformat” I presented an idea for an INI type of file and used an example implementation in the Groovy language.

I rewrote it in Java. One change was the use of section terminators. I’m still not sure about specifying of the data type in the section header.

Note the only testing of this is the embedded JUnit tests as shown in the source.
Update: A version of this is being successfully used to store test data for a suite of unit tests. Each file holds several sections with hundreds of lines each of XML and other types.

API

• T loadAs(final Class type, final String id, final String subsection): load a section as the given class type
• next(): Pull event style reading.
final Callable callable)
• void parse(final String filePath, final Runnable runnable): Push event style reading.

Update 1 20121208T1809-5
Java Jar manifest files have a very useful property. Attributes which are in the ‘main’ section are inherited by individual entries unless redefined in the entry. For the application I envisioned this metaformat, that would be ideal. One way of doing this is that any data contained in a section ‘x’ is also applicable or in scope for any section x/y. And, so forth, x/y/z would have the data in x/y available.

Update 2 20130107T2000-5
Added method Map load() throws IOException. This is more easy to use method of accessing inix file section data. This is only in the source Gist repo.

Update 3 20130113T1300-5

Update 4 20131028T0939-5
Doh, forgot about mentioning Apache Commons Configuration. It does support hierarchical INI files. However, the sections in that format only support property mappings.

The test data file is:

```# Example very simple data file
! Another comment

#[>map:/junk]
#[<]

[>list:credit/tests]
one
two
three
[<]

[>credit/report]
one,two,three
[<]

[>properties:credit/config]
one=alpha
two=beta
three=charlie
[<]
[>xml:credit/beans]
<description>
<item>one</item>
<item>two</item>
<item>three</item>
</description>
[<]

[>notype/sub]
[<]

["one","two","three"]
[<]
[>credit]
one
two
three
[<]

[>credit/coverages]
["one","two","three"]
[<]

```

[expand title=”Click to expand implementation source”]

[/expand]

## I Am, Therefore I Think

Some even claim that consciousness derives from deep quantum mechanical processes; quantum effects are relevant at room temperature and macro scales.

Consciousness is one of those nebulous concepts, at least in the ‘normal’ non-academic world. Some say there is no such thing, a delusion. It’s just how the body refers to itself and organizes its sensory processing. There are so many theories. On the other side there are the religious, spiritual views that posit that there is something more then the meat body, more then just electro-chemical discharges. In between there are some investigators finding that the traditional views are lacking. Some even claim that consciousness derives from deep quantum mechanical processes; quantum effects are relevant at room temperature and macro scales.

Heck if I have an answer. I just Am.

I’ll make one observation though. Seems that people argue about consciousness from the mental processing angle. That is, if something is conscious it exhibits certain properties. Yet, stop the flow of thoughts and what remains is awareness. How can awareness be awareness of being aware? Or is it just the body all along that encases the waking state dream?

## application security using a copy-on-write virtual machine

An architecture is possible that uses a lightweight VM for use as an application sandbox. Instead of the duplication of an OS plus a run-time environment, this virtual machine uses the host environment as a read-only resource. This allows the VM to serve as a Sandbox that allows reads and writes to the file system, but only the VM address space is modified. Since the host OS environment is supplying a-prior values, the total VM footprint is minimal. This architecture is able to serve as a base for secure application solutions.

In practice an application is installed into a host OS and via installation and use it creates a cache mirror of changed OS data and resources that it would normally have modified in the traditional installation. This application and the ‘cache’ is then versioned and mirrored. If the application is compromised it is deleted or the cache is rolled back to the period before the compromise.

There are many types of virtual machines. Two examples are the system VM types such as VMWare or Oracle VirtualBox, another is the focused process VM such as the Java Virtual Machine, Dalvik VM, or the Common Language Runtime. The former are complex and since they must “dupe” an OS are large and complex. The latter application level VMs are smaller and optimized for a single runtime environment. Each of these have corresponding security issues.

A virtual machine is usually a sandbox in implementation and provides a level of security. However, the cost is that it must duplicate OS resources. In contrast the sandboxed process VM type being discussed here depends on a real OS host and does as little duplication of the environment as possible. It is not generic, but integral to a specific application program or system.

Though this may possibly help in making an application survive destruction by protecting the storage address space, there is still the issue of active infiltration and use of system resources such as networks accessible to the application. Perhaps this type of VM will make conventional security practices and tools more useful?

Summary
Just an idea off the top of my head. Haven’t looked to see if is unique or even remotely makes sense.

June 12, 2013: “Security Anti-Pattern – Mobile Hypervisors (for user facing VM’s)
August 31, 2013: Was just reading about Docker which uses the LXC (LinuX Containers). Maybe that is what I had in mind.

## A very simple data file metaformat

What is the simplest data file metaformat you can create and yet be able to handle future complexity? I started puzzling about this yet again.

Also see follow up post: Simple Java Data File
An example application is given here: Java testing using XStream, AspectJ, Vsdf

Scenario
I had some maintenance to do. So, to reduce the big ball of mud I decided to use external data files. This is where the complexity came in. If I have d data files for each component, and c “components” then the total number of data files is d*c. Future maintenance of so many files is not optimal.

One thing about the required data files in this scenario, some would contain lists, others would be key, value pairs, and so forth. Could these be combined into one data file? I looked at JSON, YAML, XML, and even GRON. Though good they seemed excessive. What if, for example, I needed a simple list? In a simple text file this could be stored with an item per line, or using simple separators. In the aforementioned metaformats not so.

Solution
I revisited the Windows INI file format and just added metadata to a section. A section, indicated with a header “[…]”, also indicates what its data format is. Also, we allow subsections: [type:identifier/section]. This is similar to a URI. The subsection, which can be a hierarchical ‘path’, is optional. The type is optional, default being list (update: text). If the file has no sections, it is just a line oriented file of data in a list. (Update: line oriented string data).

In the original ini file format, the section data were key=value pairs. Here we follow the freedom of a HEREDOC.

The data type indication is practical when standard collections are being created such as lists, map, arrays, and so forth. We can use a generic “text’ type for a non-typed string payload. Since a host application will know what data it is extracting from a data file, the higher level types such as XML, JSon, and others are of limited value.

The use of subsections in the section name allows scoping, but this was also possible in the original INI file format, just not “formalized”. True subsections should probably be nested sections, i.e. hierarchy. But, then we are now losing the simplicity.

Subsections (though not nested) allow the use of cascading data. Data in a section is automatically reused or available in matching subsections.

Example

```# Example very simple data file
#
[>list:credit/tests]
one
two
three
[>csv:credit/report]
one,two,three
[>properties:credit/config]
one=alpha
two=beta
three=charlie
[>xml:credit/beans]
<description>
<item>one</item>
<item>two</item>
<item>three</item>
</description>
["one","two","three"]
[>credit]
one
two
three
[>gron:credit/coverages]
["one","two","three"]
```

Section Production Rule
**** Note: this is an incorrect production *****
file ::= section* ;
section ::= ‘[>’ (type:)? identifier (‘/’ subsection)? ‘]’ (data)+ sectionTerminator;
data ::= (line lineEnd)*;
identifier ::= name
subsection ::= name [/name]* ;
sectionTerminator ::= ‘[<' identifier ('/' subsection)? name ::= [a-zA-Z0-9-_]+

What we have now is a line oriented data file that can contain other data formats, and with no sections the file is just a line oriented list. Listing three is a demo in the form of a Groovy language JUnit 4 test.

[expand title=”Listing 2, Groovy language JUnit test as a demo”]

```import com.octodecillion.vsdf.*
import static com.octodecillion.vsdf.Vsdf.EventType.*
import org.junit.Test
import static org.junit.Assert.assertEquals

/** Test Vsdf */
class VsdfTest /*extends GroovyTestCase*/ {
def LINESEP =  System.properties.get("line.separator")

@Test
void testshouldGetListData(){

while(theEvent != Vsdf.EventType.END){

if(isSectionCreditWithList(event)){
def data = event.text.split(LINESEP)
assert data.size() == 3
assert ( (data[0] == 'one') &&
(data[1] == 'two') &&
(data[2] == 'three') )
}

}

}

/** */
def isSectionCreditWithList(evt){
return evt.id == 'credit' && evt.dataType == 'list'
}

}
```

[/expand]

Sample run:

```groovy -cp . VsdfTest.groovy
JUnit 4 Runner, Tests: 1, Failures: 0, Time: 281
```

Limitations
Not quite correct yet. One issue is that file encoding format. If we want to include other formats they have their own requirements, Java properties, JSON, XML, and so forth. For example, JSON is Unicode. I don’t think this is a major issue, this solution is meant for config data, so ASCII files are adequate.

Also, should the sections have terminators? Right now, the end of a section is simply the start of another. (Update: the version of this concept in actual use is terminator based, i.e., [<] or [<id/subsection...]) Implementation
Below in listing 3 is a very simple implementation in the Groovy language to show how easy this data file is to use. Note this is just a proof of concept and has not been thoroughly tested. I don’t think the use of mark and reset in the file reading is robust; how do you determine the correct read ahead buffer? To make it easier to parse I think the format will need to have section terminators as does HEREDOCS in Linux.

Source code available as a gist.

[expand title=”Listing 3, Groovy implementation”]

```// File Vsdf.groovy
// Author: Josef Betancourt
//

package com.octodecillion.vsdf
import groovy.transform.TypeChecked;
import java.text.BreakDictionary;
import java.util.regex.Matcher

/**
* @author Josef Betancourt
*
*/
class Vsdf {
String currentFolder
String iniFilePath
int lineNum
int sectionNum
VsdfEvent data
def LINESEP = System.properties.get("line.separator")

enum State {
INIT, ACCEPT, SHIFT, END
}

public enum EventType {
COMMENT, SECTION, END
}

def state = State.INIT

public Vsdf(){
currentFolder = new File(".").getAbsolutePath()
}

/**
* Value object for parsed sections.
*
*/
class VsdfEvent {
EventType event
String dataType
String namespace
String id
String text
int lineNum
int sectionNum
String sectionString
}

VsdfEvent getEvent(){
return data
}

/**
*
* @return
*/
@TypeChecked
EventType next(){
lineNum++

if(line == null){
return EventType.END
}

String type =''
String namespace = ''
String id = ''
data = new VsdfEvent()
EventType eventType

def isBlank = !line.trim()

// skip blank lines
if( isBlank){
while(true){
lineNum++
if(line  || line == null){
break;
}
}
}

def isComment = line =~ /^\s*#/
if(isComment){
data.text = line
data.event = EventType.COMMENT
data.lineNum = lineNum
eventType = EventType.COMMENT
}

if( line.trim() =~ /^\[>.*\]/){ // section?
eventType = EventType.SECTION
data.sectionString = line
sectionNum++
processSection(line, sectionNum, data)
} // end if section head

return eventType
}

/**
*
* @param line
* @return
*/
def processSection(String line, int sectionNum, VsdfEvent data){
data.event = EventType.SECTION
data.sectionNum = sectionNum

Matcher m = (line.trim() =~ /^\[>(.*)\]/)
String mString = m[0][1]
def current = mString.trim()
if(!current){
def msg = "section \$sectionNum is blank"
throw new IllegalArgumentException(msg)
}

def parts = (current =~ /^(.*):(.*)\/(.*)\$/)
if(!parts){
data.id = current
data.dataType='list'
}else{
long size = ((String[])parts[0]).length
data.dataType = size > 0? parts[0][1] : ''
data.id = size > 1  ? parts[0][2] : ''
data.namespace = size > 2 ? parts[0][3] : ''
}

}

/**
*
* @return
*/

while(true){
lineNum++

if(line == null){
break
}

if( line.trim() =~ /^\[>.*\]/){ // section?
break
}else{
buffer.append(line + LINESEP)
}
}

return buffer.toString()
}
}
```

[/expand]