Next Meeting

The next meeting of 2018 is on Thursday 26 July 2018:

Machines and Morality: Can AI be Ethical? 

The speakers at this meeting will be Stephanie Mathisen, Policy Manager at Sense About Science and Tamara Ansons, Behavioural Science Consultant at Ipsos.

To register, follow the link above.

A pdf (Machines and Morality – 26 July 2018.) giving details of the meeting can be downloaded.

Advertisements
Posted in Uncategorized | Leave a comment

Working in Complexity – SenseMaker, Decisions and Cynefin

Account of a NetIKX meeting with Tony Quinlan of Narrate, 7 March 2018, by Conrad Taylor

At this NetIKX afternoon seminar, we got a very thorough introduction to Cynefin, ananalytical framework that helps decision-makers categorise problems surfacing in complex social, political and business environments. We also learned about SenseMaker®, an investigative method with software support, which can gather, manage and visualise patterns in large amounts of social intelligence, in the form of ‘narrative fragments’ with quantitative signifier data attached.

Tony Quinlan explaining how to interpret SenseMaker signifiers. The pink objects behind him are the micro-narratives we produced during the exercise, on ‘super-sticky’ Post-It notes. Photo Conrad Taylor.

Tony Quinlan explaining how to interpret SenseMaker signifiers. The pink objects behind him are the micro-narratives we produced during the exercise, on ‘super-sticky’ Post-It notes. Photo Conrad Taylor.

The leading architect of these analytical frameworks and methods is Dave Snowden, who in 2002 set up the IBM’s Cynefin Centre for Organisational Complexity and founded the independent consultancy Cognitive Edge in 2005.

Our meeting was addressed by Tony Quinlan, CEO and Chief Storyteller of consultancy Narrate (https://narrate.co.uk/), which has been using Cognitive Edge methodology since 2008. Tony, with his Narrate colleague Meg Odling-Smee, ran some very engaging hands-on exercises for us, which gave us better insight into what SenseMaker is about. Read on!

What follows is, as usual, my personal account of the meeting, with some added background observations of my own. (I have been lucky enough to taken part in three Cognitive Edge workshops, including one in which Tony himself taught us about SenseMaker.)

The power of narrative

Tony Quinlan also used to work for IBM, in internal communications and change management; he then left to practice as an independent consultant. Around 2000, he set up Narrate, because he recognised the valuable information that is held in narratives. Then in 2005, as Dave Snowden was setting up Cognitive Edge, Tony became aware of the Cynefin Framework – a stronger theoretical basis for understanding the significance of narrative, and how one might work effectively with it.

There several ways of working with narratives in organisations, and numerous practitioners. There’s a fruitful workshop technique called ‘Anecdote Circles’, well described in a workbook from the Anecdote consultancy. (See their ‘Ultimate Guide to Anecdote Circles’ in PDF. There is also the ‘Future Backwards’ exercise, which Ron Donaldson demonstrated to NetIKX at a March 2017 meeting.  These methods are good, but they require face-to-face engagement in a workshop environment.

A problem arises with narrative enquiry when you want to scale up – to collect and work with lots of narratives – hundreds, thousands, or more. How do you analyse so many narratives without introducing expert bias? Tony found that the SenseMaker approach offered a sound solution and, so far, he’s been involved in about 50 such projects, in 30 countries around the world.

I was reminded by Tony’s next comment of the words of Karl Marx: ‘The philosophers have only interpreted the world, in various ways. The point, however, is to change it.’

Tony remarked that there is quite a body of theory behind the Cognitive Edge worldview, combining narrative-based enquiry with complexity science and cognitive neuroscience insights. But the real reasons behind any SenseMaker enquiry are: ‘How do we make sense of where we are? What do we do next?’ So we were promised a highly practical focus.

A hands-on introduction to SenseMaker

Tony and Meg had prepared an exercise to give us direct experience of what SenseMaker is about, using an arsenal of stationery: markers, flip-chart pages, sticky notes and coloured spots!

Collecting narratives:   The first step in a SenseMaker enquiry is to pose an open-ended question, relevant to the enquiry, to which people respond with a ‘micro-narrative’. To give us an exercise example, Tony said: ‘Sit quietly, and think of an occasion which inspired/pleased you, or frustrated you, in your use of IT [support] in your organisation (or for freelances, with an external organisation you contact to get support).’

Extra-large Post-It notes had been distributed to our tables. Following instructions, we each took one, and wrote a brief narrative about the experience we’d remembered. After that, we gave our narrative a title. We were also given sheets of sticky-backed, coloured dots. We took seven each, all of the same colour, and wrote our initials on them. We each took one of our dots, and stuck it on our own narrative sticky note. Then, we all came forward and attached our notes to the wall of the room.

Adding signifiers: Tony now drew our attention to where he and Meg had stuck up four posters. On three, large triangles were drawn, each with a single question, and labels at the triangle corners. The fourth was drawn with three stripes, each forming a spectrum. (This description makes better sense if you look at our accompanying diagrams.) In SenseMaker practice these are called ‘triads’ and ‘dyads’ respectively, and they are both kinds of SenseMaker ‘signifiers’.

For example, the first triad asked us: ‘In the story you have written, indicate which needs were being addressed’. The three corners were labelled ‘Business needs’, ‘Technology needs’ and ‘People’s needs’. We were asked each to take one of our initialled, sticky dots and place it within the triangle, based on how strongly we felt each element was present within our story.

As for the dyads, we were to place our dot at a position along a spectrum between opposing extremes. For example, one prompted: ‘People in this example were…?’ with one end of the spectrum labelled ‘too bureaucratic’ and the other ‘too risky’.

In the diagrams below I have represented how our group’s total results plotted out over the triads and dyads, but I have made all the dots the same colour (for equal visual weight); and, obviously, there are no identifying initials.

Figure 1 Triad set

Figure 1 Triad set

Figure 2 Dyad compilation

Figure 2 Dyad compilation

A few observations on the exercise

  • When SenseMaker signifiers are constructed, a dyad – also referred to as a ‘polarity’ – has outer poles, which are often equally extreme opposites (‘too bureaucratic – too risky’). But in designing a triad, the corners are made equally positive, equally negative, or neutral.
  • It strikes me that deciding on effective labels to use takes some considerable thought and skills, especially for triads. Over the years, Cognitive Edge has developed sets of repurposable triads and dyads, often with assistance from anthropologists.
  • A real-world SenseMaker enquiry would typically have more signifier questions – perhaps six triads and two dyads.
  • For practical operational reasons, we all placed our dots on a common poster. This probably means that people later in the queue were influenced by where others had already placed their dots. In a real SenseMaker implementation, each person sees a blank triangle, for their input only. Then the responses are collated (in software) across the entire dataset.
  • Because the results of a SenseMaker enquiry are collated in a database, the capacity of such an enquiry is practically without limit.
  • There can be further questions, e.g. to ascertain demographics. This allows for explorations of the data, such as, how do opinions of males differ from those of females? Or young people compared to their elders?
  • SenseMaker results are anonymised, but the database structure in which responses are collected means that we can correlate a response on one signifier, with the same person’s response on another. For our paper exercise, we had to forgo that anonymity by using initials on coloured dots.

Our exercise gathered retrospective narratives, collected in one afternoon. But SenseMaker can be set up as an ongoing exercise, with each narrative fragment and its accompanying signifiers time-stamped. So, we can ask questions like ‘were customers more satisfied in May than in April or March?’

Analysing the results

Calling us to order, Tony talked through our results. At first, he didn’t even look at our narratives on the wall. It’s hard to assess lots of narratives without getting lost in the detail. It’s still more difficult if you have to wade through megabytes of digital audio recordings – another way some narratives have been collected in recent years.

But the signifiers can be thrown up en masse on a computer screen in a visual array, as they were on our posters. Then it’s easy to spot significant clusterings and outliers, and you can drill down to sample the narratives with a single click on a dot. Even with our small sample we could see patterns coming up. One dyad showed that most people thought the IT department was to blame for problems.

With SenseMaker software support, this can scale. Tony recalled a project in Delhi with 1,500 customers of mobile telecoms, about what helped and what didn’t when they needed support. A recent study in Jordan, about how Syrian refugees can be better supported, gathered 4,000 responses.

This was an enlightening exercise, giving NetIKX participants a glimpse of how SenseMaker works. But just a glimpse, cautioned Tony: the training course is typically three days.

Why do we do it like this?

Now it was time for some theory, including cognitive science, to explain the thinking behind SenseMaker.

How do humans make decisions? Not as rationally as we might like to believe, and not just because emotions get in the way. As humans we evolved to be pattern-matching intelligences. We scan information available to us, typically picking up just a tiny bit of what is available to us, and quickly match it against pre-stored response patterns. (And, as Dave Snowden has remarked, any of our hominid ancestors who spent too long pondering the characteristics of the leopard bounding towards them didn’t get to contribute to the gene pool!)

‘But there’s worse news,’ said Tony. ‘We don’t go for the best pattern match; we go for the first one. Then we are into confirmation bias, which is difficult to snap out of.’ (Ironically for knowledge management practice, maybe that means ‘lessons learned’ thinking can set us up for a fall – blocking us from seeing emerging new phenomena.)

Patterns of thinking are influenced by the cultures in which we are embedded, and the narratives we have heard all our lives. Those cultures and stories may be in the general social environment, or in our subcultures (e.g. religious, political, ethnic); they could be formed in the organisation in which we work; they could come at us from the media. All these influences shape what information we take in, and what we filter out; and how we respond and make decisions.

Examining people’s micro-narratives shows us the stories that people tell about their world, which shape opinions and decisions and behaviour. In SenseMaker, unlike in polls and questionnaires, we gather the stories that come to people’s minds when asked a much more open-ended prompting question. SenseMaker questions are deliberately designed to be oblique, without a ‘right answer’, thus hard to gift or game.

You don’t necessarily get clean data by asking straight questions, because there’s that strong human propensity to gift or to game – to give people the answer we think they want to hear, or to be awkward and say something to wind them up. In the Indian project with mobile service customers, when poll questions asked customers they would recommend the service to others, the responses were overwhelmingly positive. But in the SenseMaker part of the research, about 20% of those who claimed they would definitely recommend the company’s service, were shown by the triads to really think the diametric opposite.

Social research methods that do use straight questions are not without value, but they are reaching the limits of what they can do, and are often used in places where they no longer fit: where dynamics are complex, fluid and unpredictable. But complexity is not universal, said Tony; it is one domain amongst a number identified in the Cynefin Framework.

The Cynefin Framework

Figure 3 Diagram of the Cynefin domains, with annotations

Figure 3 Diagram of the Cynefin domains, with annotations

Cynefin, explained Tony, is a Welsh word (Dave Snowden is Welsh). It means approximately ‘The place(s) where I belong.’ Cynefin is a way of making sense of the world: of human organisation primarily. It is represented by a diagram, shown in Fig. 3, and lays out a field of five ‘domains’:

  • Simple. For problems in this domain, the relationship between cause and effect is obvious: if there is a problem, we all know how to fix it. (More recently, this domain is labelled ‘Obvious’, because Simple sounds like Easy. It may be Obvious we need to dig a tunnel through a mountain, but it’s not Easy…) Organisations define ‘best practice’ to tackle Obvious issues.
  • Complicated. In this domain, there are repeatable and predictable chains of cause and effect, but they are not so easy to discern. A jet engine is sometimes given as a metaphorical example of such complicatedness. In this domain, problem-solving often involves the knowledge of an expert, or an analytical process.
  • Chaotic. In this domain, we can’t discern a relationship between cause and effect at all, because the interacting elements are so loosely constrained. Chaos is usually a temporary condition, because a pattern will emerge, or somebody will take control and impose some sort of order. In Chaos, you don’t have time to assess carefully: decisive action is needed, in the hope that good things will emerge and can be encouraged. (And as Dave sometimes says, ‘Light candles and pray!’)
  • Complex. This is a domain in which various ‘actors and factors’ – animate and inanimate – do respond to constraints, and can be attracted to influences, but those constraints and influences are relatively elastic, and there are many interactions and feedback loops that are hard to fathom. Such a system is more like a biological or ecological system than a mechanical one. Cognitive Edge practitioners have a battery of techniques for experimentation in this space, as Tony would soon describe.
  • Disorder – that dark pit in the middle of the diagram represents problems where we cannot decide into which of the other domains this situation fits.

Finally, Tony pointed out a feature on the borderlands between Obvious and Chaotic, typically drawn like a cliff or fold. This is there to remind us that if people act with complete blind conviction that things are really simple and obvious, and Best Practice is followed without question, the organisation can be blindsided to major changes happening in their world. One day, when you pull the levers, you don’t get the response you have come to expect, and you have a crisis on your hands. And if you simply try to re-impose the rules, it can make things worse.

But with chaos may come opportunity. Once you have a measure of control back, you have a chance to be creative and try something new. And as we prepared to take a refreshment break, Tony urged us, ‘Don’t let a good crisis go to waste!’

Working in complex adaptive systems

Tony recalled that his MBA course was predicated on the idea that things are complicated, but there is a system for working things out. The corollary: if things don’t work out, either you didn’t plan well or you failed in implementation (‘are you lazy or stupid?’) Later, when he saw the Cynefin model, he was relieved to note that you can be neither lazy nor stupid and things can still go pear-shaped, in a situation of complexity.

In Cognitive Edge based practice, when you find you are operating in the domain of complexity, the recommendation is to initiate ‘safe-to-fail’ probes and experiments. Here are some working principles:

  • Obliquity — don’t go after an intractable problem directly. A misplaced focus can have massive unintended consequences. Tony has done work around problems of radicalisation and contemporary terrorism, e.g. in Pakistan and the Middle East. Western authorities and media operate as if radicalisation was a fruit of Islamic fundamentalism – but from a Jordanian perspective, a very significant factor is when young people don’t have jobs, are bored and frustrated, and don’t have a stake in society.
  • Diversity — The perspective you bring to a problem shapes how you see it. In the complex domain we can’t rely on solutions from experts alone. On the Jordanian project, to review SenseMaker inquiry results, they brought together experts from the UN, economists and government officials – but also Syrian refugees, and unemployed Jordanian youth. When the experts started to rubbish the SenseMaker data, saying it didn’t fit their experience in the field, the refugees, the youth and government officials were standing up and saying, ‘But that is our experience; we recognise it intimately.’
  • Experimentation — In a complex situation you cannot predict how experiments will work out, so you try a few things at the same time. Some won’t work. That’s why probes and experiments must be safe-to-fail: if an experiment is going to fail, you don’t want catastrophic consequences; you want to pull the plug and recover quickly. (If none of your experiments fail, it probably means you have been too timid with your experimental alternatives.)
  • Feedback — Experimenting in complex situations, we try to nudge and evolve the system. You don’t set up ‘a solution’ and come back in two years to see how it went – by then, things may have evolved to a point you can’t recover from. You need constant feedback to monitor the evolving situation for good or bad effects, and to spot when unexpected things happen.

When monitoring, it’s better to ask people what they do, rather than what they think. You’d be surprised how many respondents claim to a think a certain way, but that isn’t what they actually do or choose.

Even with the micro-narrative approach, you have to be careful in your evaluation. Meaning is not only in the words, and responses may be metaphorical, or even ironic. That can be tricky if you are working across cultures.


Safe-to-fail in Broken Hill: My personal favourite Snowden anecdote illustrating ‘safe-to-fail’ experiments comes from work Dave did with Meals on Wheels and the Aboriginal communities around Broken Hill, NSW, Australia. How could that community’s diets be improved to avoid Type II diabetes?

Projects were proposed by community members. 13 were judged ‘coherent’ enough to be given up to Aus$ 6,000 each: bussing elders to eat meals in common; sending troublesome youngsters to the bush to learn how to hunt; farming desert pears; farming wild yabbies (crayfish; see picture).

Yabby

Results? Some flopped (bussing elders); some merged (farming desert pears and yabbies); some turned out to work synergistically (hunting lessons for youth generated a meat surplus to supply a restaurant, using traditional killing and cooking practices). Nothing failed catastrophically.


The crucial role of signifiers

In a SenseMaker enquiry, only the respondents can say what their stories mean; interacting with well designed signifiers is very powerful in this regard. Tony recalled one project with young Ethiopian women; their narratives were presented to UNDP gender experts, who were asked to read them and fill out the SenseMaker signifiers as they thought the young women might. The experts’ ideas are not unimportant; but, they significantly differed from the responses ‘from the ground’, which can be important in policymaking. SenseMaker de-privileges the expert and clarifies the voice of the respondent. Dave Snowden refers to this as ‘disintermediation’.

When you design a SenseMaker framework, you do it in such a way that it doesn’t suggest a ‘right’ answer. As an example of the latter, Tony showed a linear scale asking about a conference speaker’s presentation skills, ranging from ‘poor’ to ‘excellent’ (and looking embarrassingly like the NetIKX evaluation sheets!). ‘If I put this up, you know what answer I’m looking for.’

In contrast, he showed a triad version prepared in the course of work with a high street bank. The overall prompt asked ‘This speaker’s strengths were…’ and the three corners of the triad were marked [a] relevant information; [b] clear message; [c] good presentation skills. Tony took a sheaf of about a hundred sheets evaluating speakers at an event, and collated the ‘dots’ onto master diagrams. One speaker had provoked a big cluster of dots in the ‘relevant information’ corner. Well, relevance is good – but evidently, his talk had been unclear, and his presentation skills poor.

Tony showed a triad that was used in a SenseMaker project in Egypt. The question was, ‘What type of justice is shown in your story?’ and the corners were marked [a] revenge, getting your own back; [b] restorative, reconciling justice; and [c] deterrence, to warn others from acting as the perpetrator had done.

Tony then showed a result from a similar project in Libya, which collected about 2,000 micro-narratives. The dominant form of justice? Revenge. This was cross-correlated with responses about whether the respondents felt positively or negatively about their story, and the SenseMaker software displayed that by colouring the dots on a spectrum, green to red. And what this showed was, people felt good in that culture and context about revenge being the basis of justice.

In SenseMaker evaluation software (‘Explorer’, see end), if you want to make even more sense, you click on a dot and up comes the text of the related micro-narrative. Or, you can ask to see a range of stories in which the form of justice people felt good about was of the deterrent type. In this case, those criteria pulled up a subset of 171 stories, which the project team could then page through.

From analysis to action: another exercise

SenseMaker wasn’t created for passive social research projects. It is action-oriented. An important question used in a lot of Cognitive Edge projects is, ‘What can we do to have fewer stories like that, and more stories like this?’ That question is a useful way to encourage people to think about designing interventions, without flying away into layers of abstraction. You get stakeholders together, show them the patterns, and ask, ‘What does this mean?’ Using as a guide the idea of ‘more stories like these, fewer like those,’ you then collectively design interventions to work towards that.

Tony had more practical exercises for us, to help us to understand this analytical and intervention-designing process.

Here is the background: about five years ago, a big government organisation was worried about how its staff perceived its IT department. Tony conducted a SenseMaker exercise with about 500 participants, like the one we had done earlier – the same overall question to provoke micro-narratives, and the same or similar triad and dyad signifier questions.

Now we were divided into five table groups. Each group was given sheets of paper with labelled but blank triads on them. We were each of us to think about where on the triad we would expect most answers to have come, then make a mark on the corresponding triad. Then Tony showed us where the results actually did come in.

I’m not going into detail about how this exercise went, but it was interesting to compare our expectations as outsiders, with the actual historical results. This ‘guessing game’ is also useful to do with the stakeholder community in a real SenseMaker deployment, because it raises awareness of the divergence between perceptions and reality.

Ideas can come from the narratives

In SenseMaker, micro-narratives are qualitative data; the signifier responses, which resolve into dimensional co-ordinates, are in numerical form, which can be be more easily computed, pattern-matched, compared and visualised with the aid of machines. This assists human cognition to home in on where salient issue clusters are. Even an outsider without direct experience of the language or culture can see those patterns emerging on the chart plots.

But when it comes to inventing constructive interventions, it pays to dip down into the micro-narratives themselves, where language and culture are very important.

In a project in Bangladesh, the authorities and development agency partners had spent years trying to figure out how to encourage rural families to install latrines in their homes, instead of the prevailing behaviour of ‘open defecation’ in the fields. Tony’s initial consultations were with local experts, who said they would typically focus on one of three kinds of message. First, using family latrines improves public health, avoiding water-borne diseases and parasites. Second, it reduces risk (e.g. avoiding sexual molestation of women). Third, it reduces the disgust factor. Which of those messages would be most effective in making a house latrine a desirable thing to have?

A SenseMaker enquiry was devised, and 500 responses collected. But when the signifier patterns were reviewed, no real magic lights came on. Yes, one of the triads which asked ‘in your story, a hygienic latrine was seen as [a] healthy [b] affordable [c] desirable’ returned a strong pattern answers indicating ‘healthy’. But that could be put down to years of health campaigns – which had nevertheless not persuaded people to install latrines.

Get a latrine, have a happy marriage!   But behind every dot is a story. The team in the UK asked the team in Dhaka to translate a cluster of some 19 stories from Bengali and send them over. There they found a bunch of stories which conveyed this message: if you install a latrine, you’ve got a better chance of a good marriage! One such story told of a young man, newly married, who got an ear-wigging from his mother-in-law, who told him in no uncertain terms what a low-life he was for not having a latrine in the house for her wonderful daughter…

Another story was from a village where there were many girls of marriageable age. Their families were receiving proposals from nearby villages. A young man came with his family to negotiate for a bride, and after a meal and some conversation, a guest asked to use the toilet. The girl’s father simply indicated some bushes where the family did their business. Immediately, the negotiations were broken off. The young man’s family declared that they could not establish a relationship with a family which did not have a latrine. Before long, the whole village knew why the marriage had been cancelled – and why! Shamed and chastened, the girl’s family did invest in a latrine, and the girl eventually found a husband.

As an outcome of this project, field officers have been equipped with about twenty memorable short stories, along similar lines about the positive social effect of having a latrine, and this is having an effect. If the narratives had not been mined as a resource, this would not have happened.

SenseMaker meets Cynefin

As our final exercise, Tony distributed some of the micro-narratives contributed to the project at that government organisation five years ago. We should identify issues illustrated by the narratives, and for each one we discovered, we should write a summary label on a sticky-backed note.

He placed on the wall a large poster of the Cynefin Framework diagram, and invited us to bring our notes forward, and stick them on the diagram to indicate whether we thought that problem was in the Complex domain, or Complicated, or Obvious or Chaotic, or along one of the borders… That determines whether you think there is an obvious answer, or something where experts need to consulted, or if we are in the domain of Complexity and it’s most appropriate to devise those safe-to-fail experimental interventions.

We just took five minutes over this exercise; but Tony explained, he has presided over three-hour versions of this. For the government department, he had this exercise done by groups constituted by job function: directors round one table, IT users round another, and so on. All had the same selection of micro-narratives to consider; each group interpreted them according to their shared mind-set. For the directors, just about everything was Obvious or Complicated, soluble by technical means and done by technologists. The system users considered a lot more problems to be in the Complex space, where solutions would involve improving human relations.

On that occasion, table teams were then reformulated to have a diverse mix of people, and the rearranged groups thought up direct actions that could solve the simple problems, research which could be commissioned to help solve complicated problems, and as many as forty safe-to-fail experiments to try out on complex problems. The whole exercise was complete within one day. Many of the practical suggestions which came ‘from the ground up’ were not that expensive or difficult to implement, either.

SenseMaker: some technical and commercial detail

Tony did not have time to go into the ‘nuts and bolts’ of SenseMaker, so I have done some online study to be able to tell our readers more, and give some links.

We had experienced a small exercise with the SenseMaker approach, but the real value of the methods come when deployed on a large scale, either one-off or continuously. Such SenseMaker deployments are supported by a suite of software packages and a database back end, maintained by Cognitive Edge (CE). Normally an organisation wanting to use SenseMaker would go through an accredited CE practitioner consultancy (such as Narrate), which can select the package needed, help set it up, and guide the client all the way through the process to a satisfactory outcome, including helping the client group to design appropriate interventions (which software cannot do).

SenseMaker® Collector   After initial consultations with the client and the development of a signification framework, an online data entry platform called Collector is created and assigned a specific URL. Where all contributors have Internet access, for example an in-company deployment, they can directly add their stories and signifier data into an interface at that URL. Where collection is paper-based, the results will have to be manually entered later by project administrators with Internet access.

A particularly exciting recent trend in Collector is its implementation on mobile smart devices such as Apple iPad, with its multimedia capabilities. Narrative fragment capture can now be done as an audio recording with communities who cannot read or write fluently, so long as someone runs the interview and guides the signification process.

 

My favourite case study is one that Tony was involved in, a study in Rwanda of girls’ experience commissioned by the GirlHub project of the Nike Foundation. A cadre of local female students very quickly learned how to use tablet apps to administer the surveys; the micro-narratives were captured in audio form, stored on the device, and later uploaded to the Collector site when an Internet connection was available.


Using iPads for SenseMaker collecting: The SenseMaker Collector app for iOS was first trialled in Rwanda in 2013. Read Tony’s blog post describing how well it worked. The project as a whole was written up in 2014 by the Overseas Development Institute (‘4,000 Voices: Stories of Rwandan Girls’ Adolescence’) and the 169-page publication is available as a 10.7 MB PDF.


SenseMaker® Explorer   Once all story data has been captured, SenseMaker Explorer software provides a suite of tools for data analysis. These allow for easy visual representation of data, amongst the simplest being the distribution of data points across a single triad to identify clusters and outliers (very similar to what we did with our poster exercise earlier). By drawing on multiple signifier datasets and cross-correlating them, Explorer can also produce more sophisticated data displays, for example a kind of 3D display which Dave Snowdon calls a ‘fitness landscape’ (a term probably based on a computation method used in evolutionary biology – see Wikipedia, ‘Fitness landscape’, for examples of such graphs). Explorer can also export data for analysis in other statistical packages.

A useful page to visit for an overview of the SenseMaker Suite of software is http://cognitive-edge.com/sensemaker/ — it features a short video in which Dave Snowden introduces how SenseMaker works, against a series of background images of the software screens, including on mobiles.

That page also gives links to eleven case studies, and further information about ‘SCAN’ deployments. SCANs are preconfigured, standardised SenseMaker packages around recurrent issues (example: safety), which help an organisation to implement a SenseMaker enquiry faster and more cheaply than if a custom tailored deployment is used.

Contacting Narrate

Tony and Meg have indicated that they are very happy to discuss SenseMaker deployments in more detail, and Tony has given us these contact details:

Tony Quinlan, Chief Storyteller
email: tony@narrate.co.uk
mobile: +44 (0) 7946 094 069
Website: https://narrate.co.uk/

Posted in Uncategorized | Leave a comment

First Meeting Outside London: Organising Medical and Health-related Information – Leeds – 7 June 2018

We have now planned our first meeting outside London. This will be in Leeds on Thursday 7 June and the topic will be Medical Information. The meeting will be a joint one with ISKO UK. Speakers will include Ewan Davis.

There will be no charge for attending this meeting, but you must register. For more information and to register, follow the link above.

Posted in Uncategorized | Leave a comment

Next Meeting: Trust and integrity in information – Thursday 24 May 2018

Trust and integrity in information

The speakers will be Hanna Chalmers of Ipsos MORI, Dr Brennan Jacoby of Philosophy at Work and Conrad Taylor.

For more information and to register for this meeting, follow the above link.

A pdf giving details of the meeting will be available shortly.

Posted in Uncategorized | Leave a comment

Making true connections in a complex world – Graph database technology and Linked Open Data

Conrad Taylor writes:

The first NetIKX meeting of 2018, on 25 January, looked at new technologies and approaches to managing data and information, escaping the limitations of flat-file and relational databases. Dion Lindsay introduced the concepts behind ‘graph databases’, and David Clarke illustrated the benefits of the Linked Data approach with case studies, where the power of a graph database had been enhanced by linking to publicly available resources. The two presentations were followed by a lively discussion, which I also report here.

 

The New Graph Technology of Information – Dion Lindsay

dionlindsayDion is an independent consultant well known to NetIKX members. He offered us a simple introduction to graph database technology, though he avers he is no expert in the subject. He’d been feeling unclear about the differences between managing data and information, and thought one way to explore that could be to study a ‘fashionable’ topic with a bit of depth to it. He finds graph database technology exciting, and thinks data- and information-managers should be excited about it too!

Flat-file and relational database models

In the last 40 years, the management of data with computers has been dominated by the Relational Database model devised in 1970 by Edgar F Codd, an IBM employee at their San José Research Center.

FLAT FILE DATABASES. Until then (and also for some time after), the model for storing data in a computer system was the ‘Flat File Database’ — analogous to a spreadsheet with many rows and columns. Dion presented a made-up example in which each record was a row, with the attributes or values being stored in fields, which were separated by a delimiter character (he used the | sign, which is #124 in most text encoding systems such as ASCII).

Example: Lname, Fname, Age, Salary|Smith, John, 35, £280|
Doe, Jane 28, £325|Lindsay, Dion, 58, £350…

In older flat-file systems, each individual record was typically input via a manually-prepared 80-column punched card, and the ingested data was ‘tabulated’ (made into a table); but there were no explicit relationships between the separate records. The data would then be stored on magnetic tape drives, and searching through those for a specific record was a slow process.

To search such a database with any degree of speed required loading the whole assembled table into RAM, then scanning sequentially for records that matched the terms of the query; but in those early days the limited size of RAM memory meant that doing anything clever with really large databases was not possible. They were, however, effective for sequential data processing applications, such as payroll, or issuing utility bills.

IBM-2311

The IBM 2311 (debut 1964) was
an early hard drive unit with 7.25 MB storage. (Photo from Wikimedia Commons user
‘I, Deep Silence’
[Details])

HARD DISKS and RELATIONAL DATABASES. Implementing Codd’s relational database management model (RDBM) was made possible by a fast-access technology for indexed file storage, the hard disk drive, which we might call ‘pseudo-RAM’. Hard drives had been around since the late fifties (the first was a component of the IBM RAMAC mainframe, storing 3.75 MB on nearly a ton of hardware), but it always takes time for the paradigm to shift…

By 1970, mainframe computers were routinely being equipped with hard disk packs of around 100 MB (example: IBM 3330). In 1979 Oracle beat IBM to market with the first Relational Database Management System (RDBMS). Oracle still has nearly half the global market share, with competition from IBM’s DB2, Microsoft SQL Server, and a variety of open source products such as MySQL and PostgreSQL.

As Dion pointed out, it was now possible to access, retrieve and process records from a huge enterprise-level database without having to read the whole thing into RAM or even know where it was stored on the disk; the RDBMS software and the look-up tables did the job of grabbing the relevant entities from all of the tables in the system.

TABLES, ATTRIBUTES, KEYS: In Codd’s relational model, which all these RDBMS applications follow, data is stored in multiple tables, each representing a list of instances of an ‘entity type’. For example, ‘customer’ is an entity type and ‘Jane Smith’ is an instance of that; ‘product’ is an entity type and ‘litre bottle of semi-skimmed milk’ is an instance of that. In a table of customer-entities, each row will represents a different customer, and columns may associate that customer with attributes such as her address or loyalty-card number.

One of the attribute columns is used as the Primary Key to quickly access that row of the table; in a classroom, the child’s name could be used as a ‘natural’ primary key, but most often a unique and never re-used or altered artificial numerical ID code is generated (which gets around the problem of having two Jane Smiths).

Possible/permitted relationships can then be stated between all the different entity types; a list of ‘Transactions’ brings a ‘Customer’ into relationship with a particular ‘Product’, which has an ‘EAN’ code retrieved at the point of sale by scanning the barcode, and this retrieves the ‘Price’. The RDBMS can create temporary and supplementary tables to mediate these relationships efficiently.

Limitations of RDBMs, benefits of graphs

However, there are some kinds of data which RDBMSs are not good at representing, said Dion. And many of these are the sorts of thing that currently interest those who want to make good use of the ‘big data’ in their organisations. Dion noted:

  • situations in which changes in one piece of data mean that another piece of data has changed as well;
  • representation of activities and flows.

Suppose, said Dion, we take the example of money transfers between companies. Company A transfers a sum of money to Company B on a particular date; Company B later transfers parts of that money to other companies on a variety of dates. And later, Company A may transfer monies to all these entities, and some of them may later transfer funds in the other direction… (or to somewhere in the British Virgin Islands?)

Graph databases represent these dynamics with circles for entities and lines between them, to represent connections between the entities. Sometimes the lines are drawn with arrows to indicate directionality, sometimes there is none. (This use of the word ‘graph’ is not be confused with the diagrams we drew at school with x and y axes, e.g. to represent value changes over time.)

This money-transfer example goes some way towards describing why companies have been prepared to spend money on graph data technologies since about 2006 – it’s about money laundering and compliance with (or evasion of?) regulation. And it is easier to represent and explore such transfers and flows in graph technology.

Dion had recently watched a YouTube video in which an expert on such situations said that it is technically possible to represent such relationships within an RDBMS, but it is cumbersome.


NetIKX-tablegroups

Most NetIKX meetings incorporate one or two table-group
sessions to help people make sense of what they have learned. Here, people
are drawing graph data diagrams to Dion Lindsay’s suggestions.

Exercise

To get people used to thinking along graph database lines, Dion distributed a sheet of flip chart paper to each table, and big pens were found, and he asked each table group to start by drawing one circle for each person around the table, and label them.

The next part of the exercise was to create a circle for NetIKX, to which we all have a relationship (as a paid-up member or paying visitor), and also circles representing entities to which only some have a relation (such as employers or other organisations). People should then draw lines to link their own circle-entity to these others.

Dion’s previous examples had been about money-flows, and now he was asking us to draw lines to represent money-flows (i.e. if you paid to be here yourself, draw a line from you to NetIKX; but if your organisation paid, that line should go from your organisation-entity to NetIKX). I noted that aspect of the exercise engendered some confusion about the breadth of meaning that lines can carry in such a graph diagram. In fact they can represent any kind of relationship, so long as you have defined it that way, as Dion later clarified.

Dion had further possible tasks up his sleeve for us, but as time was short he drew out some interim conclusions. In graph databases, he summarised, you have connections instead of tables. These systems can manage many more complexities of relationships that either a RDBMS could cope with, or that we could cope with cognitively (and you can keep on adding complexity!). The graph database system can then show you what comes out of those complexities of relationship, which you had not been able to intuit for yourself, and this makes it a valuable discovery tool.

HOMEWORK: Dion suggested that as ‘homework’ we should take a look at an online tool and downloadable app which BP have produced to explore statistics of world energy use. The back end of this tool, Dion said, is based on a graph database.

https://www.bp.com/en/global/corporate/energy-economics/energy-charting-tool.html


Building Rich Search and Discovery: User Experiences with Linked Open Data – David Clarke

daveclarke

DAVE CLARKE is the co-founder, with Trish Yancey, of Synaptica LLC, which since 1995 has developed
enterprise-level software for building and maintaining many different types of knowledge organisation systems. Dave announced that he would talk about Linked Data applications, with some very practical illustrations of
what can be done with this approach.

The first thing to say is that Linked Data is based on an ‘RDF Graph’ — that is, a tightly-defined data structure, following norms set out in the Resource Description Framework (RDF) standards described by the World Wide Web Consortium (W3C).

In RDF, statements are made about resources, in expressions that take the form: subject – predicate – object. For example: ‘daffodil’ – ‘has the colour’ – ‘yellow’. (Also, ‘daffodil’ – ‘is a member of’ – ‘genus Narcissus’; and ‘Narcissus pseudonarcissus’ – ‘is a type of’ – ‘daffodil’.)

Such three-part statements are called ‘RDF triples’ and so the kind of database that manages them is often called an ‘RDF triple store’. The triples can also be represented graphically, in the manner that Dion had introduced us to, and can build up into a rich mass of entities and concepts linked up to each other.

Describing Linked Data and Linked Open Data

Dion had got us to do an exercise at our tables, but each table’s graph didn’t communicate with any other’s, like separate fortresses. This is the old database model, in which systems are designed not to share data. There are exceptions of course, such as when a pathology lab sends your blood test results to your GP, but those acts of sharing follow strict protocols.

Linked Data, and the resolve to be Open, are tearing down those walls. Each entity, as represented by the circles on our graphs, now gets its own ‘HTTP URI’, that is, its own unique Universal Resource Identifier, expressed with the methods of the Web’s Hypertext Transfer Protocol — in effect, it gets a ‘Web address’ and becomes discoverable on the Internet, which in turn means that connections between entities are both possible and technically fairly easy and fast to implement.

And there are readily accessible collections of these URIs. Examples include:

We are all familiar with clickable hyperlinks on Web pages – those links are what weaves the ‘classic’ Web. However, they are simple pointers from one page to another; they are one-way, and they carry no meaning other than ‘take me there!’

In contrast, Linked Data links are semantic (expressive of meaning) and they express directionality too. As noted above, the links are known in RDF-speak as ‘predicates’, and they assert factual statements about why and how two entities are related. Furthermore, the links themselves have ‘thinginess’ – they are entities too, and those are also given their own URIs, and are thus also discoverable.

People often confuse Open Data and Linked Data, but they are not the same thing. Data can be described as being Open if it is available to everyone via the Web, and has been published under a liberal open licence that allows people to re-use it. For example, if you are trying to write an article about wind power in the UK, there is text and there are tables about that on Wikipedia, and the publishing licence allows you to re-use those facts.

Stairway through the stars

Tim Berners-Lee, who invented the Web, has more recently become an advocate of the Semantic Web, writing about the idea in detail in 2005, and has argued for how it can be implemented through Linked Data. He proposes a ‘5-star’ deployment scheme for Open Data, with Linked Open Data being the starriest and best of all. Dave in his slide-set showed a graphic shaped like a five-step staircase, often used to explain this five-star system:

starsteps

The ‘five-step staircase’ diagram often used to explain the hierarchy of Open Data types

  • One Star: this is when you publish your data to the Web under open license conditions, in whatever format (hopefully one like PDF or HTML for which there is free of charge reading software). It’s publishable with minimal effort, and the reader can look at it, print it, download and store it, and share it with others. Example: a data table that has been published as PDF.
  • Two stars: this is where the data is structured and published in a format that the reader can process with software that accesses and works with those structures. The example given was a Microsoft Excel spreadsheet. If you have Excel you can perform calculations on the data and export it to other structured formats. Other two-star examples could be distributing a presentation slide set as PowerPoint, or a document as Word (though when it comes to presentational forms, there are font and other dependencies that can trip us up).
  • Three stars: this is where the structure of a data document has been preserved, but in a non-proprietary format. The example given was of an Excel spreadsheet exported as a CSV file (comma-separated values format, a text file where certain characters are given the role of indicating field boundaries, as in Dion’s example above). [Perhaps the edges of this category have been abraded by software suites such as OpenOffice and LibreOffice, which themselves use non-proprietary formats, but can open Microsoft-format files.]
  • Four stars: this is perhaps the most difficult step to explain, and is when you put the data online in a graph database format, using open standards such as Resource Description Framework (RDF), as described above. For the publisher, this is no longer such a simple process and requires thinking about structures, and new conversion and authoring processes. The advantage to the users is that the links between the entities can now be explored as a kind of extended web of facts, with semantic relationships constructed between them.
  • Five stars: this is when Linked Data graph databases, structured to RDF standards, ‘open up’ beyond the enterprise, and establish semantic links to other such open databases, of which there are increasingly many. This is Linked Open Data! (Note that a Linked Data collection held by an enterprise could be part-open and part-closed. There are often good commercial and security reasons for not going fully open.)

This hierarchy is explained in greater detail at http://5stardata.info/en/

Dave suggested that if we want to understand how many organisations currently participate in the ‘Linked Open Data Cloud’, and how they are linked, we might visit http://lod-cloud.net, where there is an interactive and zoomable SVG graphic version showing several hundred linked databases. The circles that represent them are grouped and coloured to indicate their themes and, if you hover your cursor over one circle, you will see an information box, and be able to identify the incoming and outgoing links as they flash into view. (Try it!)

The largest and most densely interlinked ‘galaxy’ in the LOD Cloud is in the Life Sciences; other substantial ones are in publishing and librarianship, linguistics, and government. One of the most central and most widely linked is DBpedia, which extracts structured data created in the process of authoring and maintaining Wikipedia articles (e.g. the structured data in the ‘infoboxes’). DBpedia is big: it stores nine and a half billion RDF triples!

LOD-interactive

Screen shot taken while zooming into the heart of the Linked Open Data Cloud (interactive version). I have positioned the cursor over ‘datos.bne.es’ for this demonstration. This brings up an information box, and lines which show links to other LOD sites: red links are ‘incoming’ and green links are ‘outgoing’.

The first case study Dave presented was an experiment conducted by his company Synaptica to enhance discovery of people in the news, and stories about them. A ready-made LOD resource they were able to use was DBpedia’s named graph of people. (Note: the Named Graphs data model is a variant on the RDF data model,: it allows RDF triples to talk about RDF graphs. This creates a level of metadata that assists searches within a graph database using the SPARQL query language).

Many search and retrieval solutions focus on indexing a collection of data and documents within an enterprise – ‘in a box’ if you like – and providing tools to rummage through that index and deliver documents that may meet the user’s needs. But what if we could also search outside the box, connecting the information inside the enterprise with sources of external knowledge?

The second goal of this Synaptica project was about what it could deliver for the user: they wanted search to answer questions, not just return a bunch of relevant electronic documents. Now, if you are setting out to answer a question, the search system has to be able to understand the question…

For the experiment, which preceded the 2016 US presidential elections, they used a reference database of about a million news articles, a subset of a much larger database made available to researchers by Signal Media (https://signalmedia.co). Associated Press loaned Synaptica their taxonomy collection, which covers more than 200,000 concepts covering names, geospatial entities, news topics etc. – a typical and rather good taxonomy scheme.

The Linked Data part was this: Synaptica linked entities in the Associated Press taxonomy out to DBpedia. If a person is famous, DBpedia will have hundreds of data points about that person. Synaptica could then build on that connection to external data.

SHOWING HOW IT WORKS. Dave went online to show a search system built with the news article database, the AP taxonomy, and a link out to the LOD cloud, specifically DBpedia’s ‘persons’ named graph. In the search box he typed ‘Obama meets Russian President’. The results displayed noted the possibility that Barack or Michelle might match ‘Obama’, but unhesitatingly identified the Russian President as ‘Vladimir Putin’ – not from a fact in the AP resource, but by checking with DBpedia.

As a second demo, he launched a query for ‘US tennis players’, then added some selection criteria (‘born in Michigan’). That is a set which includes news stories about Serena Williams, even though the news articles about Serena don’t mention Michigan or her birth-place. Again, the link was made from the LOD external resource. And Dave then narrowed the field by adding the criterion ‘after 1980’, and Serena stood alone.

It may be, noted Dave, that a knowledgeable person searching a knowledgebase, be it on the Web or not, will bring to the task much personal knowledge that they have and that others don’t. What’s exciting here is using a machine connected to the world’s published knowledge to do the same kind of connecting and filtering as a knowledgeable person can do – and across a broad range of fields of knowledge.

NATURAL LANGUAGE UNDERSTANDING. How does this actually work behind the scenes? Dave again focused on the search expressed in text as ‘US tennis players born in Michigan after 1980’. The first stage is to use Natural Language Understanding (NLU), a relative of Natural Language Processing, and long considered as one of the harder problem areas in Artificial Intelligence.

The Synaptica project uses NLU methods to parse extended phrases like this, and break them down into parts of speech and concept clusters (‘tennis players’, ‘after 1980’). Some of the semantics are conceptually inferred: in ‘US tennis players’, ‘US’ is inferred contextually to indicate nationality.

On the basis of these machine understandings, the system can then launch specific sub-queries into the graph database, and the LOD databases out there, before combining them to derive a result. For example, the ontology of DBpedia has specific parameters for birth date, birthplace, death date, place of death… These enhanced definitions can bring back the lists of qualifying entities and, via the AP taxonomy, find them in the news content database.

Use case: understanding symbolism inside art images

Dave’s second case study concerned helping art history students make searches inside images with the aid of a Linked Open Data resource, the Getty Art and Architecture Thesaurus.

A seminal work in Art History is Erwin Panofsky’s Studies in Iconology (1939), and Dave had re-read it in preparation for building this application, which is built on Panofskyan methods. Panofsky describes three levels of analysis of iconographic art images:

  • Natural analysis gives a description of the visual evidence. It operates at the level of methods of representation, and its product is an annotation of the image (as a whole, and its parts).
  • Conventional analysis (Dave prefers the term ‘conceptual analysis’) interprets the conventional meanings of visual components: the symbolism, allusions and ideas that lie behind them. This can result in semantic indexing of the image and its parts.
  • Intrinsic analysis explores the wider cultural and historical context. This can result in the production of ‘knowledge graphs’

 

earthlydelights

Detail from the left panel of Hieronymous Bosch’s painting ‘The Garden of Earthly Delights’, which is riddled with symbolic iconography.

THE ‘LINKED CANVAS’ APPLICATION.

The educational application which Synaptica built is called Linked Canvas (see http://www.linkedcanvas.org/). Their first step was to ingest the art images at high resolution. The second step was to ingest linked data ontologies such as DBpedia, Europeana, Wikidata, Getty AAT, Library of Congress Subject Headings and so on.

The software system then allows users to delineate Points of Interest (POIs), and annotate them at the natural level; the next step is the semantic indexing, which draws on the knowledge of experts and controlled vocabularies.
Finally users get  to benefit from tools
for search and exploration of the
annotated images.

With time running tight, Dave skipped straight to some live demos of examples, starting with the fiendishly complex 15th century triptych painting The Garden of Earthly Delights. At Panofsky’s level of ‘natural analysis’, we can decompose the triptych space into the left, centre and right panels. Within each panel, we can identify ‘scenes’, and analyse further into details, in a hierarchical spatial array, almost the equivalent of a detailed table of contents for a book. For example, near the bottom of the left panel there is a scene in which God introduces Eve to Adam. And within that we can identify other spatial frames and describe what they look like (for example, God’s right-hand gesture of blessing).

To explain semantic indexing, Dave selected an image painted 40 years after the Bosch — Hans Holbein the Younger’s The Ambassadors, which is in the National Gallery in London. This too is full of symbolism, much of it carried by the various objects which litter the scene, such as a lute with a broken string, a hymnal in a translation by Martin Luther, a globe, etc. To this day, the meanings carried in the painting are hotly debated amongst scholars.

If you zoom in and browse around this image in Linked Canvas, as you traverse the various artefacts that have been identified, the word-cloud on the left of the display changes contextually, and what this reveals in how the symbolic and contextual meanings of those objects and visual details have been identified in the semantic annotations.

An odd feature of this painting is the prominent inclusion in the lower foreground of an anamorphically rendered (highly distorted) skull. (It has been suggested that the painting was designed to be hung on the wall of a staircase, so that someone climbing the stairs would see the skull first of all.) The skull is a symbolic device, a reminder of death or memento mori, a common visual trope of the time. That concept of memento mori is an element within the Getty AAT thesaurus, and the concept has its own URI, which makes it connectable to the outside world.

Dave then turned to Titian’s allegorical painting Bacchus and Ariadne, also from the same period and also from the National Gallery collection, and based on a story from Ovid’s Metamorphoses. In this story, Ariadne, who had helped Theseus find his way in and out of the labyrinth where he slew the Minotaur, and who had become his lover, has been abandoned by Theseus on the island of Naxos (and in the background if you look carefully, you can see his ship sneakily making off). And then along comes the God of Wine, Bacchus, at the head of a procession of revellers and, falling in love with Ariadne at first glance, he leaps from the chariot to rescue and defend her.

Following the semantic links (via the LOD database on Iconography) can take us to other images about the tale of Ariadne on Naxos, such as a fresco from Pompeii, which shows Theseus ascending the gang-plank of his ship while Ariadne sleeps. As Dave remarked, we generate knowledge when we connect different data sets.

Another layer built on top of the Linked Canvas application was the ability to create ‘guided tours’ that walk the viewer around an image, with audio commentary. The example Dave played for us was a commentary on the art within a classical Greek drinking-bowl, explaining the conventions of the symposium (Greek drinking party). Indeed, an image can host multiple such audio commentaries, letting a visitor experience multiple interpretations.

In building this image resource, Synaptica made use of a relatively recent standard called the International Image Interoperability Framework (IIIF). This is a set of standardised application programming interfaces (APIs) for websites that aim to do clever things with images and collections of images. For example, it can be used to load images at appropriate resolutions and croppings, which is useful if you want to start with a fast-loading overview image and then zoom in. The IIIF Search API is used for searching the annotation content of images.

Searching within Linked Canvas is what Dave described as ‘Level Three Panofsky’. You might search on an abstract concept such as ‘love’, and be presented us with a range of details within a range of images, plus links to scholarly articles linked to those.

Post-Truth Forum

As a final example, Dave showed us http://www.posttruthforum.org, which is an ontology of concepts around the ideas of ‘fake news’ and the ‘post-truth’ phenomenon, with thematically organised links out to resources on the Web, in books and in journals. Built by Dave using Synaptica Graphite software, it is Dave’s private project born out of a concern about what information professionals can do as a community to stem the appalling degradation of the quality of information in the news media and social media.

For NetIKX members (and for readers of this post), going to Dave’s Post Truth Forum site is also an opportunity to experience a public Linked Open Data application. People may also want to explore Dave’s thoughts as set out on his blog, www.davidclarke.blog.

Taxonomies vs Graphs

In closing, Dave wanted to show a few example that might feed our traditional post-refreshment round-table discussions. How can we characterise the difference between a taxonomy and a data graph (or ontology)? His first image was an organisation chart, literally a regimented and hierarchical taxonomy (the US Department of Defense and armed forces).

His second image was the ‘tree of life’ diagram, the phylogenetic tree that illustrates how life forms are related to each other, and to common ancestor species. This is also a taxonomy, but with a twist. Here, every intermediate node in the tree not only inherits characteristics from higher up, but also adds new ones. So, mammals have shared characteristics (including suckling young), placental mammals add a few more, and canids such as wolves, jackals and dogs have other extra shared characteristics. (This can get confusing if you rely too much on appearances: hyenas look dog-like, but are actually more closely related to the big cats.)

So the Tree of Life captures systematic differentiation, which a taxonomy typically cannot. However, said Dave, an ontology can. In making an ontology we specify all the classes we need, and can specify the property sets as we go. And, referring back to Dion’s presentation, Dave remarked that while ontologies do not work easily in a relational database structure, they work really well in a graph database. In a graph database you can handle processes as well as things and specify the characteristics of both processes and things.

Dave’s third and final image was of the latest version of the London Underground route diagram. This is a graph, specifically a network diagram, that is characterised not by hierarchy, but by connections. Could this be described in a taxonomy? You’d have to get rid of the Circle line, because taxonomies can’t end up where they started from. With a graph, as with the Underground, you can enter from any direction, and there are all sorts of ways to make connections.

We shouldn’t think of ditching taxonomies; they are excellent for some information management jobs. Ontologies are superior in some applications, but not all. The ideal is to get them working together. It would be a good thought-experiment for the table groups to think about what, in our lives and jobs, are better suited to taxonomic approaches and what would be better served by graphs and ontologies. And, we should think about the vast amounts of data out there in the public domain, and whether our enterprises might benefit from harnessing those resources.


Discussion

Following NetIKX tradition, after a break for refreshments, people again settled down into small table groups. We asked participants to discuss what they had heard and identify either issues they thought worth raising, or thinks that they would like to know more about.

I was chairing the session, and I pointed out that even if we didn’t have time in subsequent discussion to feed everyone’s curiosity, I would do my best to research supplementary information to add to this account which you are reading.

I ran the audio recorder during the plenary discussion, so even though I was not party to what the table groups had discussed internally, I can report with some accuracy what came out of the session. Because the contributions jumped about a bit from topic to topic, I have resequenced them to make them easier for the reader to follow.

AI vs Linked Data and ontologies?

Steve Dale wondered if these efforts to compile graph databases and ontologies was worth it, as he believed Artificial Intelligence is reaching the point where a computer can be thrown all sorts of data – structured and unstructured – and left to figure it out for itself through machine learning algorithms. Later, Stuart Ward expressed a similar opinion. Speaking as a business person, not a software wizard, he wonders if there is anything that he needs to design?

Conrad, in fielding this question, mentioned that on the table he’d been on (Dave Clarke also), they had looked some more into the use in Dave’s examples of Natural Language Understanding; that is a kind of AI component. But they had also discussed the example of the Hieronymous Bosch painting. Dave himself undertook the background research for this and had to swot up by reading a score of scholarly books. In Conrad’s opinion, we would have to wait another millennium before we’d have an AI able to trace the symbolism in Bosch’s visual world. Someone else wondered how one strikes the right balance between the contributions of AI and human effort.

Later, Dave Clarke returned to the question; in his opinion, AI is heavily hyped – though if you want investment, it’s a good buzz-word to throw about! So-called Artificial Intelligence works very well in certain domains, such as pattern recognition, and even with images (example: face recognition in many cameras). But AI is appalling at semantics. At Synaptica, they believe that if you want to create applications using machine intelligence, you must structure your data. Metadata and ontologies are the enablers for smart applications.

Dion responded to Stuart’s question by saying that it would be logical at least to define what your entities are – or at least, to define what counts as an entity, so that software can identify entities and distinguish them from relationships. Conrad said that the ‘predicates’ (relationships) also need defining, and in the Linked Data model this can be assisted if you link out to publicly-available schemas.

Dave added that, these days, in the Linked Data world, it has become pretty easy to adapt your database structures as you go along. Compared to the pain and disruption of trying to modify a relational database, it is easy to add new types of data and new types of query to a Linked Data model, making the initial design process less traumatic and protracted.

Graph databases vs Linked Open Data?

Conrad asked Dave to clarify a remark he had made at table level about the capabilities of a graph database product like Neo4j, compared with Linked Open Data implementations.

Dave explained that Neo4j is indeed a graph database system, but it is not an RDF database or a Linked Data database. When Synaptica started to move from their prior focus on relational databases towards graphical databases, Dave became excited about Neo4j (at first). They got it in, and found it was a wonderfully easy system to develop with. However, because its method of data modelling is not based on RDF, Neo4j was not going to be a solution for working with Linked Data; and so fervently did Dave believe that the future is about sharing knowledge, he pulled the plug on their Neo4j development.

He added that he has no particular axe to grind about which RDF database they should use, but it has to be RDF-conforming. There are both proprietary systems (from Oracle, IBM DB2, OntoText GraphDB, MarkLogic) and open-source systems (3store, ARC2, Apache Jena, RDFLib). He has found that the open-source systems can get you so far, but for large-scale implementations one generally has to dip into the coffers and buy a licence for something heavyweight.

Even if your organisation has no intention to publish data, designing and building as Linked Data lets you support smart data and machine reasoning, and benefit from data imported from Linked Open Data external resources.

Conrad asked Dion to say more about his experiences with graph databases. He said that he had approached Tableau, who had provided him with sample software and sample datasets. He hadn’t yet had a change to engage with them, but would be very happy to report back on what he learns.

Privacy and data protection

Clare Parry raised issues of privacy and data protection. You may have information in your own dataset that does not give much information about people, and you may be compliant with all the data protection legislation. However, if you pull in data from other datasets, and combine them, you could end up inferring quite a lot more information about an individual.

(I suppose the answer here is to do with controlling which kinds of datasets are allowed to be open. We are on all manner of databases, sometimes without suspecting it. A motor car’s registration details are held by DVLA, and Transport for London; the police and TfL use ANPR technology to tie vehicles to locations; our banks have details of our debit card transactions and, if we use those cards to pay for bus journeys, that also geolocates us. These are examples of datasets that by ‘triangulation’ could identify more about us than we would like.)

URI, URL, URN

Graham Robertson reported that on his table they discussed what the difference is between URLs and URIs…

(If I may attempt an explanation: the wider term is URI, Uniform Resource Identifier. It is ‘uniform’ because everybody is supposed to use it the same way, and it is supposed uniquely and unambiguously to identify anything which might be called a ‘resource’. The Uniform Resource Locator (URL) is the most common sub-type of URI, which says where a resource can be found on the Web.

But there can be other kinds of resource identifiers: the URN (Uniform Resource Name) identifies a resource that can be referenced within a controlled namespace. Wikipedia gives as an example ISBN 0-486-27557-4, which refers to a specific edition of Shakespeare’s Romeo and Juliet. In the MeSH schema of medical subject headings, the code D004617 refers to ‘embolism’.)

Trustworthiness

Some people had discussed the issue of the trustworthiness of external data sources to which one might link – Wikipedia (and WikiData and DBpedia) among them, and Conrad later asked Mandy  to say more about this. She wondered about the wisdom of relying on data which you can’t verify, and which may have been crowdsourced. But Dave has pointed out that you might have alternative authorities that you can point to. Conrad thought that for some serious applications one would want to consult experts, which is how the Getty AAT has been built up. Knowing provenance, added David Penfold, is very important.

The librarians ask: ontologies vs taxonomies?

Rob Rosset’s table was awash with librarians, who tend to have an understanding about what is a taxonomy and what an ontology. How did Dave Clarke see this, he asked?

Dave referred back to his closing three slides. The organisational chart he had shown is a strict hierarchy, and that is how taxonomies are structured. The diagram of the Tree of Life is an interesting hybrid, because it is both taxonomic and ontological in nature. There are things that mammals have in common, related characteristics, which are different from what other groupings such as reptiles would have.

But we shouldn’t think about abandoning taxonomy in favour of ontology. There will be times where you want to explore things top-down (taxonomically), and other cases where you might want to explore things from different directions.

What is nice about Linked Data is that it is built on standards that support these things. In the W3C world, there is the SKOS standard, Simple Knowledge Organization Systems, very light and simple, and there to help you build a taxonomy. And then there is OWL, the Web Ontology Language, which will help you ascend to another level of specificity. And in fact, SKOS itself is an ontology.

Closing thoughts and resources

This afternoon was a useful and lively introduction to the overlapping concepts of Graph Databases and Linked Data, and I hope that the above account helps refresh the memories of those who attended, and engage the minds of those who didn’t. Please note that in writing this I have ‘smuggled in’ additionally-researched explanations and examples, to help clarify matters.

Later in the year, NetIKX is planning a meeting all about Ontologies, which will be a way to look at these information and knowledge management approaches from a different direction. Readers may also like to read my illustrated account of a lecture on Ontologies and the Semantic Web, which was given by Professor Ian Horrocks to a British Computer Society audience in 2005. That is still available as a PDF from http://www.conradiator.com/resources/pdf/Horrocks_needham2005.pdf

Ontologies, taxonomies and knowledge organisation systems are meat and drink to the UK Chapter of the International Society for Knowledge Organization (ISKO UK), and in September 2010 ISKO UK held a full day conference on Linked Data: the future of knowledge organization on the Web. There were nine speakers and a closing panel session, and the audio recordings are all available on the ISKO UK Web site, at http://www.iskouk.org/content/linked-data-future-knowledge-organization-web

Recently, the Neo4j team produced a book by Ian Robinson, Jim Webber and Emil Eifrem called ‘Graph Databases’, and it is available for free (PDF, Kindle etc) from https://neo4j.com/graph-databases-book/ Or you can get it published in dead-tree form from O’Reilly Books. See https://www.amazon.co.uk/Graph-Databases-Ian-Robinson/dp/1449356265

 

Posted in Harnessing the web for information and knowledge exchange, Managing information and knowledge, Uncategorized | Tagged , | 2 Comments

2018 Programme

The remainder of the 2018 programme is as follows

  • Fake News / Post-truth (May)
  • AI / Machine Learning (Ethics) (July)
  • Ontology (September)
  • Network Science (December)

We also plan to hold our first meeting outside London. While still in the planning stage, this will probably be in Leeds in June and the topic will be Medical Information. The meeting will be a joint one with ISKO UK.

Posted in Uncategorized | Leave a comment

Next meeting: Working in Complexity – SenseMaker, Decisions and Cynefin – Wednesday 7 March 2018

The next meeting of 2018 is on Wednesday 7 March:

Working in Complexity

The speaker will be Tony Quinlan. To register, follow the link above.

A pdf giving detail of the meeting is available at Working in Complexity 7 March 2018

 

Posted in Uncategorized | Leave a comment