As discussed in the previous post “How to Measure User Interface Efficiency“, I stated that it is easy to create a User Experience Design (UXD) or Interaction Design (IxD) interface that can minimize the cognitive and manipulative load in executing a specific task. This interface must be usable in the three most used interaction modes: graphical, voice, and text.
Let’s review the problem. A user desires some action X. To trigger X, there must be one or many sub-steps that supply the information or trigger sub processes so that X can be successful. X can be anything, an ATM transaction, insurance forms on a website, or sharing a web page. Let’s use the last example for a concrete discussion.
On my Android phone (Samsung Galaxy Note) when I am viewing a web page, I can share it by:
Click the menu button
View the resulting menu
Find “Share page”
Click “Share page”
Get a menu “Share via”
Can’t find it
Scroll menu down
Get Message app, ‘Enter recipient’
Click Contact button
Get ‘Select a contact’ app
Click ‘Favorites’ button
Search for who you want to sent to
Put check box on contact’s row
Click ‘Done’ button.
Get back to Message app
Click ‘Send’ button
And, that is just a high level view. Note that, of course, systems can use recently used lists or search to reduce the complexity. If you include the decision making going on, the list is much greater. Other phones will have similar task steps, hopefully much shorter, that is not the point. The interaction diagram is shown in figure 1. TODO: show interaction diagram.
This interaction is very quick and easy. The fact that is has so many steps is symptomatic of the user interfaces and has many drawbacks.
Cognitive load: Despite all warnings and prohibitions, mobile devices will be used in places they should not be, like cars. These task manipulations just make things much worse.
Effort: All of these tasks eventually add up to a lot of effort. Ok, if this is a social effort, but when part of a job not profitable.
Accuracy: The more choices the more possibility of error. As modern user interfaces are used in more situations this can be a problem. Does one want to launch a nuke or order lunch?
Time: These tasks add up to much time.
Performance: As we do more multitasking (the good kind), these interactions slow down our performance. Computer performance is negligible.
Interacting with computer interfaces is just too complex and manipulative. How can this be made simpler?
In the industry there has been a lot of progress in this area. However, the predominant technique used is the Most Recently Used (MRU) strategy. This is found in task bars, drop down menus, and so forth. Most recently in one Android phone the Share menu item has an image of the last application used to share a resource. The user can click the “share…” and use the subsequent cascading menu or click on the image to reuse that app to share again.
This is an improvement, however, as discussed below, there are further optimizations possible to actually invoking via the selected sharing application.
Use prior actions to determine current possible actions. What could be simpler? In the current scenario, as soon as I select the ‘Share’ option, the system will generate a proposal that is based on historical pass action. Note this is not just “Most Recently Used” strategy, but also based on context. If I am viewing a web page on cooking and click share, most likely I am targeting a subset of my contacts that I have previously shared “cooking” related pages with.
Now I can just switch to that proposal and with one click accomplish my task. If the proposal is not quite what I had in mind, I can click on the aspect or detail that is incorrect, or I can continue with my ongoing task selections, and each successive action will enhance the proposal.
The result is that in best case scenario, the task will be completed in two steps versus twenty. A 90% improvement. In worse case, the user can continue with the task as usual or modify the proposal. But, the next time the same task is begun, the generated proposal will be more accurate.
What does a proposal look like? Dependent on the interaction mode (voice, graphical, gestural, text), the proposal will be presented to the user in the appropriate manner. Each device or computer will have a different way of doing this which is dependent on the interface/OS.
Let’s look at a textual output. When I make the first selection, ‘Share’, another panel in the user interface will open, this will present the proposal based on past actions. If there was no past action with a close enough match, the proposal is presented in stages. This could be a simplest form:
Of course, it would look much better and follow the GUI L&F of the host device (Android, iOS, Windows, …). In a responsive design the proposal component would be vertical in a portrait orientation.
The fields on the Proposal will be links to the associated field’s data type: email address, URL, phone, and so forth. This gives the user a shortcut to invoke the registered application for that data type. In the above example, if I am not sending to Mary, I just click on her name and enter the contacts application and/or get a list of the most likely person(s) I am sending the web page to (based on web page content, URL, etc.). Also, if I am not sending an SMS message, when I click something else, like email, the proposal changes accordingly. When I send email, I am generally sending to a co-worker, for example.
To present an analogy of a similar approach, in Microsoft’s Outlook application one can create rules that control the handling of incoming email. A rule has many predefined actions in the rule domain specific language (VB code in this case). See figure 3. Of course, the Outlook rule interface is not proactively driven. You could select the same options a million times and the interface will never change to predict that.
A proposal is an automatically dynamically generated rule whose slots are filled in by probabilities of past action. That rule is translated into an appropriate Proposal in the current UI mode. When that rule is triggered, the user agrees with the proposal, the associated apps that perform the desired task are activated.
Predictive interfaces are not a new idea. A lot of research has gone into its various types and technologies. Amazingly in popular computing systems, these are no where to be found.
Interestingly, Games are at the forefront of this capability. To provide the best game play creators have had to use applied Artificial Intelligence techniques and actually make them work, not fodder for academic discussions.
Even Microsoft has had a predictive computing initiative, “Decision Theory & Adaptive Systems Group”, and had efforts like the Lumiere project. Has anything made it into Windows? Maybe the ordering of a menu changed based on frequency.
I came up with this idea while using my Samsung Galaxy Note smartphone or “phablet”. Using the same phone I brainstormed the idea. Here is one of the diagrams created using the stylus:
“A Comparison between Decision Trees and Markov Models to Support Proactive Interfaces“; Joan De Boeck, Kristof Verpoorten, Kris Luyten, Karin Coninx; Hasselt University, Expertise centre for Digital Media, and transnationale Universiteit Limburg, Wetenschapspark 2, B-3590 Diepenbeek, Belgium; https://lirias.kuleuven.be/bitstream/123456789/339818/1/2007+A+Comparison+between+Decision+Trees+and+
“On-line Case-Based Planning“, http://www.cc.gatech.edu/faculty/ashwin/papers/er-09-08.pdf, Santi Onta˜n´on and Kinshuk Mishra and Neha Sugandh and Ashwin Ram, CCL, Cognitive Computing Lab, Georgia Institute of Technology, Atlanta, GA 30322/0280, USA
Can “faster than realtime” computing, prediction, massive complex-event-processing and correlation also enable the powers that be to also control?
In a previous post, “Synergistic Social Agent Network Cloud“, I discussed how a web of ‘agents’ could optimize ‘apps’ to be more responsive, proactive, and multipliers of our intents. I was just reading “Should we fear mind-reading future tech?” by Andrew Keen, and was thinking of the possible negative aspects. Still reading the article, so it may cover this. (finished reading it, was not mentioned).
Privacy is the usual concern about this high-tech stuff. This is very important. But, can “faster than realtime” computing, prediction, massive complex-event-processing and correlation also enable the powers that be to also control? We already know that advertising in all it’s forms can control, else why, for example, is the American presidential election a feeding fest of political money contributions?
Could that same advertising and fake news reporting via social media and apps that employ predictive quasi-AI morph into controlling media, an Orwellian manifestation of new-speak? In a scenario that would make a great sci-fi novel, Big Interests like political parties, business groups, and political organizations use social media, not only to advertise, but to gently guide one toward having programmed epiphanies.
Can it even be more “physical” and intrusive? For example, by prediction, these groups can arrange that one will meet a certain someone at the right time. Your a bleeding heart influential liberal? No problem, the future Fox News will arrange that you meet this gorgeous strong willed conservative that will change your mind.
Silly example, but you get the point, when you know, you can make nano-adjustments: unnoticeable, personalized, massive lobbying. Ads are old school, here come the psych-bots.
This is how the internet will wake up. It will create an ontology and it will act. Hopefully, humans are part of a necessary category.
Tags are usually non-hierarchical fine grained descriptors of a resource. They are the opposite of categories which are usually part of a semantic hierarchy. Categories are really old-school, killed by the effectiveness of Search and the expanding mash-up universe.
However, while tags provide rapid access to resources and easier sharing of them, tags do not enrich knowledge. We can see this if we consider knowledge as one of the tiers toward wisdom: data, information, knowledge, wisdom. Tags are data on information, metadata. Categories are information on knowledge.
Since data is used to create information, the data on information, tags, can be harnessed to increase the information on knowledge by the automated creation of categories. We simply create the directed cyclic graphs of tags around resources and detect clusters. The naming of categories come from the resources themselves.
This is how the internet will wake up. It will create an ontology and it will act. Hopefully, humans are part of a necessary category.
2014-10-28: I see Google is doing a new “Physical Web” effort”. Intro is here.
Mobile apps have not been very gratifying. Testing an app last year gave some clarity to what I felt to be a problem with the current App ecosystem. And, this is not just a mobile issue, but also for traditional computing platforms. I have been thinking of this subject for years. This is just, finally, a very simple and pragmatic example.
Last year I downloaded an app that locates the cheapest gas based on my current location. Whether cheap gas should be used in one’s car is not the point here. The app could have been one for finding the best licensed massage therapist or bookstore. The point, is this using mobile computing to its full potential?
What if the cheap gas station is located in an area where crime is very high? Should I risk a carjacking just to save 3 cents? What if I’m about to run out of gas now, is the cheapest gas too far away? We can get even more complicated of course. What if I have to be at an appointment, shouldn’t the cheap criteria be augmented with route info; the cheapest gas is the one easiest to get to on my way to or from my appointment.
In short, the current app is one-dimensional. Real life is multidimensional and the human brain easily makes decisions within this mostly analog fuzzy chaos. If an app cannot make decisions or recommendations in that same world, it collapses the dimensions, it is a dumbing down.
How can the app be made more dimensional? AGENTS. The app should really be an Agent that cooperates with other agents to fulfill a need, in this case finding cheaper gas. Thus, it should talk to other autonomous agents, such as:
law enforcement to grade destination
vehicle network for fuel requirements
hours of operation
map routing, and so forth.
It should also be informed by human agents in a trusted relationship with the user. What we then have is An Ad Hoc Dynamic Network of Social Agent Recommenders (AhDyNoSAR).
The Mind Map Diagram shown previously gives a contextual view of this idea.
Let’s look at another example. Someone is walking in neighborhood that has a few restaurants. The embedded Agent notes that the last time the person ate was a few hours ago (based on shopping venue, Calendar, etc.). The shop’s agents are contacted and a decision processing workspace is created. Is the person currently viable, do they have cash or credit available? Each store will check inventory and accounting ratios, does it need to offer a discount or promotion to this person? More agents mobilize to assert their criteria. What are the person’s tastes, dietary restrictions and allergies, past intake (who wants pizza twice in one day?), and other multidimensional agents in a problem space hierarchy are evoked.
After all agents complete their reckonings and the spontaneous net reaches a stable resonance, the person’s intimate personal soft computing agents make a decision. It turns out that the person is currently following their spiritual observance and is fasting today. This result is sent into the local agent milieu and starts a new search for resonance, so no food, how about some clothing or reading material? Again a new recommendation graph is created, religious and political leanings are queried, clothing and accessory rules are fired, ah, that is a very old turban, here are some suggestions.
Unfortunately, the person has now walked into a new map space, a neighborhood park. Now new agents awake: social engagement, entertainment, sexual, defensive.
It would be so gross if the information that this new cloud offers is shown as ads. A better approach is that this information space is entered as a virtual world, using technologies like that of Massively multiplayer online role-playing game (MMORPG). The consumer becomes an Avatar moving through Recommendation Space, a superimposed view on current locality based environments. Instead of or in addition to other consumers, the other characters are the various agents most visible recommendation goal.
Unlike Apps an Agent should always be considered adversarial. That is, even when an agent provides a benefit, it also can allow intentionally or via weaknesses a loss of security and privacy since it must negotiate information with other agents. Thus, though current or future standards may be used, they must be in virtual application spaces that use encrypted anonymous data. This will be just as virus and other malware, an ongoing battle.
It would not be optimal to require a download of an agent to each user’s location or device. Instead, agents will exist in the cloud as a multi-agent system. A user will have a private cloud virtual machine and address space for agent storage and recommendation space. To handle disconnected use, an agent will have a mobile agent shadow. It will provide simple assistance and will punt decisions and actions it cannot handle until connection to the cloud is established.
With Apps, the app provider may require purchase or try to enforce lock-in or an advertising monopoly. This can also be accomplished by centralizing the App marketplace. This may not work directly with Agents. Agents may not even provide an obvious visible function. For example, an agent may just contribute parking meter locations and status to other agents that use a map agent.
In the real world eventually someone has to pay the piper. So too will the development and use of agents must be rewarded. Some options are:
An agent can contribute to an advertising stream that ultimately reaches the consumer facing user interface device.
Agents will negotiate among their collaborators to maintain a balance of payments, an agent of agents, and this payment is satisfied by the user or the user’s fee structure that the network provider maintains.
The consumer will purchase agents. If the fidelity and number of agents is adequate the quality of service is greater.
Of course, the internet is currently wide open and thus this opens up predation to another level if Agent “sandboxes” are porous, if personal data is not secure.
The present cavalier attitudes regarding personal privacy exhibited by the large Internet service providers is a big warning sign that giving agents access to even more information would be just another data mining delicacy ripe for exploitation.
And now for an even more far out scenario. In a classic Science Fiction novel, before a character dies, a copy of their knowledge is captured. This intelligence is then available for implantation into someone as an “Aspect”, an agent that can add its unique expertise and judgment to the human host. That is a more radical direct means for accomplishing something that the social networking may evolve into, a means to collect knowledge and translate that into a ubiquitous intelligence.
Presented was a critique of conventional app centric mobile computing and a suggestion that Agent technology can provide a more realistic computing environment. The term Agent was not defined here. Perhaps the difference with an App is just intent or where the output is ingested. The experts are still debating Agent technology and its applications.
Nov 13, 20111: Is Apple’s Siri, available on the IPhone 4S, an example of this topic?
All rights reserved. No part of this document may be reproduced or transmitted in any form by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from Josef Betancourt.