First Meeting Outside London: Organising Medical and Health-related Information – Leeds – 7 June 2018

We have now planned our first meeting outside London. This will be in Leeds on Thursday 7 June and the topic will be Medical Information. The meeting will be a joint one with ISKO UK. Speakers will include Ewan Davis.

There will be no charge for attending this meeting, but you must register. For more information and to register, follow the link above.

Advertisements
Posted in Uncategorized | Leave a comment

Next Meeting: Trust and integrity in information – Thursday 24 May 2018

Trust and integrity in information

The speakers will be Hanna Chalmers of Ipsos MORI, Dr Brennan Jacoby of Philosophy at Work and Conrad Taylor.

For more information and to register for this meeting, follow the above link.

A pdf giving details of the meeting will be available shortly.

Posted in Uncategorized | Leave a comment

Making true connections in a complex world – Graph database technology and Linked Open Data

Conrad Taylor writes:

The first NetIKX meeting of 2018, on 25 January, looked at new technologies and approaches to managing data and information, escaping the limitations of flat-file and relational databases. Dion Lindsay introduced the concepts behind ‘graph databases’, and David Clarke illustrated the benefits of the Linked Data approach with case studies, where the power of a graph database had been enhanced by linking to publicly available resources. The two presentations were followed by a lively discussion, which I also report here.

 

The New Graph Technology of Information – Dion Lindsay

dionlindsayDion is an independent consultant well known to NetIKX members. He offered us a simple introduction to graph database technology, though he avers he is no expert in the subject. He’d been feeling unclear about the differences between managing data and information, and thought one way to explore that could be to study a ‘fashionable’ topic with a bit of depth to it. He finds graph database technology exciting, and thinks data- and information-managers should be excited about it too!

Flat-file and relational database models

In the last 40 years, the management of data with computers has been dominated by the Relational Database model devised in 1970 by Edgar F Codd, an IBM employee at their San José Research Center.

FLAT FILE DATABASES. Until then (and also for some time after), the model for storing data in a computer system was the ‘Flat File Database’ — analogous to a spreadsheet with many rows and columns. Dion presented a made-up example in which each record was a row, with the attributes or values being stored in fields, which were separated by a delimiter character (he used the | sign, which is #124 in most text encoding systems such as ASCII).

Example: Lname, Fname, Age, Salary|Smith, John, 35, £280|
Doe, Jane 28, £325|Lindsay, Dion, 58, £350…

In older flat-file systems, each individual record was typically input via a manually-prepared 80-column punched card, and the ingested data was ‘tabulated’ (made into a table); but there were no explicit relationships between the separate records. The data would then be stored on magnetic tape drives, and searching through those for a specific record was a slow process.

To search such a database with any degree of speed required loading the whole assembled table into RAM, then scanning sequentially for records that matched the terms of the query; but in those early days the limited size of RAM memory meant that doing anything clever with really large databases was not possible. They were, however, effective for sequential data processing applications, such as payroll, or issuing utility bills.

IBM-2311

The IBM 2311 (debut 1964) was
an early hard drive unit with 7.25 MB storage. (Photo from Wikimedia Commons user
‘I, Deep Silence’
[Details])

HARD DISKS and RELATIONAL DATABASES. Implementing Codd’s relational database management model (RDBM) was made possible by a fast-access technology for indexed file storage, the hard disk drive, which we might call ‘pseudo-RAM’. Hard drives had been around since the late fifties (the first was a component of the IBM RAMAC mainframe, storing 3.75 MB on nearly a ton of hardware), but it always takes time for the paradigm to shift…

By 1970, mainframe computers were routinely being equipped with hard disk packs of around 100 MB (example: IBM 3330). In 1979 Oracle beat IBM to market with the first Relational Database Management System (RDBMS). Oracle still has nearly half the global market share, with competition from IBM’s DB2, Microsoft SQL Server, and a variety of open source products such as MySQL and PostgreSQL.

As Dion pointed out, it was now possible to access, retrieve and process records from a huge enterprise-level database without having to read the whole thing into RAM or even know where it was stored on the disk; the RDBMS software and the look-up tables did the job of grabbing the relevant entities from all of the tables in the system.

TABLES, ATTRIBUTES, KEYS: In Codd’s relational model, which all these RDBMS applications follow, data is stored in multiple tables, each representing a list of instances of an ‘entity type’. For example, ‘customer’ is an entity type and ‘Jane Smith’ is an instance of that; ‘product’ is an entity type and ‘litre bottle of semi-skimmed milk’ is an instance of that. In a table of customer-entities, each row will represents a different customer, and columns may associate that customer with attributes such as her address or loyalty-card number.

One of the attribute columns is used as the Primary Key to quickly access that row of the table; in a classroom, the child’s name could be used as a ‘natural’ primary key, but most often a unique and never re-used or altered artificial numerical ID code is generated (which gets around the problem of having two Jane Smiths).

Possible/permitted relationships can then be stated between all the different entity types; a list of ‘Transactions’ brings a ‘Customer’ into relationship with a particular ‘Product’, which has an ‘EAN’ code retrieved at the point of sale by scanning the barcode, and this retrieves the ‘Price’. The RDBMS can create temporary and supplementary tables to mediate these relationships efficiently.

Limitations of RDBMs, benefits of graphs

However, there are some kinds of data which RDBMSs are not good at representing, said Dion. And many of these are the sorts of thing that currently interest those who want to make good use of the ‘big data’ in their organisations. Dion noted:

  • situations in which changes in one piece of data mean that another piece of data has changed as well;
  • representation of activities and flows.

Suppose, said Dion, we take the example of money transfers between companies. Company A transfers a sum of money to Company B on a particular date; Company B later transfers parts of that money to other companies on a variety of dates. And later, Company A may transfer monies to all these entities, and some of them may later transfer funds in the other direction… (or to somewhere in the British Virgin Islands?)

Graph databases represent these dynamics with circles for entities and lines between them, to represent connections between the entities. Sometimes the lines are drawn with arrows to indicate directionality, sometimes there is none. (This use of the word ‘graph’ is not be confused with the diagrams we drew at school with x and y axes, e.g. to represent value changes over time.)

This money-transfer example goes some way towards describing why companies have been prepared to spend money on graph data technologies since about 2006 – it’s about money laundering and compliance with (or evasion of?) regulation. And it is easier to represent and explore such transfers and flows in graph technology.

Dion had recently watched a YouTube video in which an expert on such situations said that it is technically possible to represent such relationships within an RDBMS, but it is cumbersome.


NetIKX-tablegroups

Most NetIKX meetings incorporate one or two table-group
sessions to help people make sense of what they have learned. Here, people
are drawing graph data diagrams to Dion Lindsay’s suggestions.

Exercise

To get people used to thinking along graph database lines, Dion distributed a sheet of flip chart paper to each table, and big pens were found, and he asked each table group to start by drawing one circle for each person around the table, and label them.

The next part of the exercise was to create a circle for NetIKX, to which we all have a relationship (as a paid-up member or paying visitor), and also circles representing entities to which only some have a relation (such as employers or other organisations). People should then draw lines to link their own circle-entity to these others.

Dion’s previous examples had been about money-flows, and now he was asking us to draw lines to represent money-flows (i.e. if you paid to be here yourself, draw a line from you to NetIKX; but if your organisation paid, that line should go from your organisation-entity to NetIKX). I noted that aspect of the exercise engendered some confusion about the breadth of meaning that lines can carry in such a graph diagram. In fact they can represent any kind of relationship, so long as you have defined it that way, as Dion later clarified.

Dion had further possible tasks up his sleeve for us, but as time was short he drew out some interim conclusions. In graph databases, he summarised, you have connections instead of tables. These systems can manage many more complexities of relationships that either a RDBMS could cope with, or that we could cope with cognitively (and you can keep on adding complexity!). The graph database system can then show you what comes out of those complexities of relationship, which you had not been able to intuit for yourself, and this makes it a valuable discovery tool.

HOMEWORK: Dion suggested that as ‘homework’ we should take a look at an online tool and downloadable app which BP have produced to explore statistics of world energy use. The back end of this tool, Dion said, is based on a graph database.

https://www.bp.com/en/global/corporate/energy-economics/energy-charting-tool.html


Building Rich Search and Discovery: User Experiences with Linked Open Data – David Clarke

daveclarke

DAVE CLARKE is the co-founder, with Trish Yancey, of Synaptica LLC, which since 1995 has developed
enterprise-level software for building and maintaining many different types of knowledge organisation systems. Dave announced that he would talk about Linked Data applications, with some very practical illustrations of
what can be done with this approach.

The first thing to say is that Linked Data is based on an ‘RDF Graph’ — that is, a tightly-defined data structure, following norms set out in the Resource Description Framework (RDF) standards described by the World Wide Web Consortium (W3C).

In RDF, statements are made about resources, in expressions that take the form: subject – predicate – object. For example: ‘daffodil’ – ‘has the colour’ – ‘yellow’. (Also, ‘daffodil’ – ‘is a member of’ – ‘genus Narcissus’; and ‘Narcissus pseudonarcissus’ – ‘is a type of’ – ‘daffodil’.)

Such three-part statements are called ‘RDF triples’ and so the kind of database that manages them is often called an ‘RDF triple store’. The triples can also be represented graphically, in the manner that Dion had introduced us to, and can build up into a rich mass of entities and concepts linked up to each other.

Describing Linked Data and Linked Open Data

Dion had got us to do an exercise at our tables, but each table’s graph didn’t communicate with any other’s, like separate fortresses. This is the old database model, in which systems are designed not to share data. There are exceptions of course, such as when a pathology lab sends your blood test results to your GP, but those acts of sharing follow strict protocols.

Linked Data, and the resolve to be Open, are tearing down those walls. Each entity, as represented by the circles on our graphs, now gets its own ‘HTTP URI’, that is, its own unique Universal Resource Identifier, expressed with the methods of the Web’s Hypertext Transfer Protocol — in effect, it gets a ‘Web address’ and becomes discoverable on the Internet, which in turn means that connections between entities are both possible and technically fairly easy and fast to implement.

And there are readily accessible collections of these URIs. Examples include:

We are all familiar with clickable hyperlinks on Web pages – those links are what weaves the ‘classic’ Web. However, they are simple pointers from one page to another; they are one-way, and they carry no meaning other than ‘take me there!’

In contrast, Linked Data links are semantic (expressive of meaning) and they express directionality too. As noted above, the links are known in RDF-speak as ‘predicates’, and they assert factual statements about why and how two entities are related. Furthermore, the links themselves have ‘thinginess’ – they are entities too, and those are also given their own URIs, and are thus also discoverable.

People often confuse Open Data and Linked Data, but they are not the same thing. Data can be described as being Open if it is available to everyone via the Web, and has been published under a liberal open licence that allows people to re-use it. For example, if you are trying to write an article about wind power in the UK, there is text and there are tables about that on Wikipedia, and the publishing licence allows you to re-use those facts.

Stairway through the stars

Tim Berners-Lee, who invented the Web, has more recently become an advocate of the Semantic Web, writing about the idea in detail in 2005, and has argued for how it can be implemented through Linked Data. He proposes a ‘5-star’ deployment scheme for Open Data, with Linked Open Data being the starriest and best of all. Dave in his slide-set showed a graphic shaped like a five-step staircase, often used to explain this five-star system:

starsteps

The ‘five-step staircase’ diagram often used to explain the hierarchy of Open Data types

  • One Star: this is when you publish your data to the Web under open license conditions, in whatever format (hopefully one like PDF or HTML for which there is free of charge reading software). It’s publishable with minimal effort, and the reader can look at it, print it, download and store it, and share it with others. Example: a data table that has been published as PDF.
  • Two stars: this is where the data is structured and published in a format that the reader can process with software that accesses and works with those structures. The example given was a Microsoft Excel spreadsheet. If you have Excel you can perform calculations on the data and export it to other structured formats. Other two-star examples could be distributing a presentation slide set as PowerPoint, or a document as Word (though when it comes to presentational forms, there are font and other dependencies that can trip us up).
  • Three stars: this is where the structure of a data document has been preserved, but in a non-proprietary format. The example given was of an Excel spreadsheet exported as a CSV file (comma-separated values format, a text file where certain characters are given the role of indicating field boundaries, as in Dion’s example above). [Perhaps the edges of this category have been abraded by software suites such as OpenOffice and LibreOffice, which themselves use non-proprietary formats, but can open Microsoft-format files.]
  • Four stars: this is perhaps the most difficult step to explain, and is when you put the data online in a graph database format, using open standards such as Resource Description Framework (RDF), as described above. For the publisher, this is no longer such a simple process and requires thinking about structures, and new conversion and authoring processes. The advantage to the users is that the links between the entities can now be explored as a kind of extended web of facts, with semantic relationships constructed between them.
  • Five stars: this is when Linked Data graph databases, structured to RDF standards, ‘open up’ beyond the enterprise, and establish semantic links to other such open databases, of which there are increasingly many. This is Linked Open Data! (Note that a Linked Data collection held by an enterprise could be part-open and part-closed. There are often good commercial and security reasons for not going fully open.)

This hierarchy is explained in greater detail at http://5stardata.info/en/

Dave suggested that if we want to understand how many organisations currently participate in the ‘Linked Open Data Cloud’, and how they are linked, we might visit http://lod-cloud.net, where there is an interactive and zoomable SVG graphic version showing several hundred linked databases. The circles that represent them are grouped and coloured to indicate their themes and, if you hover your cursor over one circle, you will see an information box, and be able to identify the incoming and outgoing links as they flash into view. (Try it!)

The largest and most densely interlinked ‘galaxy’ in the LOD Cloud is in the Life Sciences; other substantial ones are in publishing and librarianship, linguistics, and government. One of the most central and most widely linked is DBpedia, which extracts structured data created in the process of authoring and maintaining Wikipedia articles (e.g. the structured data in the ‘infoboxes’). DBpedia is big: it stores nine and a half billion RDF triples!

LOD-interactive

Screen shot taken while zooming into the heart of the Linked Open Data Cloud (interactive version). I have positioned the cursor over ‘datos.bne.es’ for this demonstration. This brings up an information box, and lines which show links to other LOD sites: red links are ‘incoming’ and green links are ‘outgoing’.

The first case study Dave presented was an experiment conducted by his company Synaptica to enhance discovery of people in the news, and stories about them. A ready-made LOD resource they were able to use was DBpedia’s named graph of people. (Note: the Named Graphs data model is a variant on the RDF data model,: it allows RDF triples to talk about RDF graphs. This creates a level of metadata that assists searches within a graph database using the SPARQL query language).

Many search and retrieval solutions focus on indexing a collection of data and documents within an enterprise – ‘in a box’ if you like – and providing tools to rummage through that index and deliver documents that may meet the user’s needs. But what if we could also search outside the box, connecting the information inside the enterprise with sources of external knowledge?

The second goal of this Synaptica project was about what it could deliver for the user: they wanted search to answer questions, not just return a bunch of relevant electronic documents. Now, if you are setting out to answer a question, the search system has to be able to understand the question…

For the experiment, which preceded the 2016 US presidential elections, they used a reference database of about a million news articles, a subset of a much larger database made available to researchers by Signal Media (https://signalmedia.co). Associated Press loaned Synaptica their taxonomy collection, which covers more than 200,000 concepts covering names, geospatial entities, news topics etc. – a typical and rather good taxonomy scheme.

The Linked Data part was this: Synaptica linked entities in the Associated Press taxonomy out to DBpedia. If a person is famous, DBpedia will have hundreds of data points about that person. Synaptica could then build on that connection to external data.

SHOWING HOW IT WORKS. Dave went online to show a search system built with the news article database, the AP taxonomy, and a link out to the LOD cloud, specifically DBpedia’s ‘persons’ named graph. In the search box he typed ‘Obama meets Russian President’. The results displayed noted the possibility that Barack or Michelle might match ‘Obama’, but unhesitatingly identified the Russian President as ‘Vladimir Putin’ – not from a fact in the AP resource, but by checking with DBpedia.

As a second demo, he launched a query for ‘US tennis players’, then added some selection criteria (‘born in Michigan’). That is a set which includes news stories about Serena Williams, even though the news articles about Serena don’t mention Michigan or her birth-place. Again, the link was made from the LOD external resource. And Dave then narrowed the field by adding the criterion ‘after 1980’, and Serena stood alone.

It may be, noted Dave, that a knowledgeable person searching a knowledgebase, be it on the Web or not, will bring to the task much personal knowledge that they have and that others don’t. What’s exciting here is using a machine connected to the world’s published knowledge to do the same kind of connecting and filtering as a knowledgeable person can do – and across a broad range of fields of knowledge.

NATURAL LANGUAGE UNDERSTANDING. How does this actually work behind the scenes? Dave again focused on the search expressed in text as ‘US tennis players born in Michigan after 1980’. The first stage is to use Natural Language Understanding (NLU), a relative of Natural Language Processing, and long considered as one of the harder problem areas in Artificial Intelligence.

The Synaptica project uses NLU methods to parse extended phrases like this, and break them down into parts of speech and concept clusters (‘tennis players’, ‘after 1980’). Some of the semantics are conceptually inferred: in ‘US tennis players’, ‘US’ is inferred contextually to indicate nationality.

On the basis of these machine understandings, the system can then launch specific sub-queries into the graph database, and the LOD databases out there, before combining them to derive a result. For example, the ontology of DBpedia has specific parameters for birth date, birthplace, death date, place of death… These enhanced definitions can bring back the lists of qualifying entities and, via the AP taxonomy, find them in the news content database.

Use case: understanding symbolism inside art images

Dave’s second case study concerned helping art history students make searches inside images with the aid of a Linked Open Data resource, the Getty Art and Architecture Thesaurus.

A seminal work in Art History is Erwin Panofsky’s Studies in Iconology (1939), and Dave had re-read it in preparation for building this application, which is built on Panofskyan methods. Panofsky describes three levels of analysis of iconographic art images:

  • Natural analysis gives a description of the visual evidence. It operates at the level of methods of representation, and its product is an annotation of the image (as a whole, and its parts).
  • Conventional analysis (Dave prefers the term ‘conceptual analysis’) interprets the conventional meanings of visual components: the symbolism, allusions and ideas that lie behind them. This can result in semantic indexing of the image and its parts.
  • Intrinsic analysis explores the wider cultural and historical context. This can result in the production of ‘knowledge graphs’

 

earthlydelights

Detail from the left panel of Hieronymous Bosch’s painting ‘The Garden of Earthly Delights’, which is riddled with symbolic iconography.

THE ‘LINKED CANVAS’ APPLICATION.

The educational application which Synaptica built is called Linked Canvas (see http://www.linkedcanvas.org/). Their first step was to ingest the art images at high resolution. The second step was to ingest linked data ontologies such as DBpedia, Europeana, Wikidata, Getty AAT, Library of Congress Subject Headings and so on.

The software system then allows users to delineate Points of Interest (POIs), and annotate them at the natural level; the next step is the semantic indexing, which draws on the knowledge of experts and controlled vocabularies.
Finally users get  to benefit from tools
for search and exploration of the
annotated images.

With time running tight, Dave skipped straight to some live demos of examples, starting with the fiendishly complex 15th century triptych painting The Garden of Earthly Delights. At Panofsky’s level of ‘natural analysis’, we can decompose the triptych space into the left, centre and right panels. Within each panel, we can identify ‘scenes’, and analyse further into details, in a hierarchical spatial array, almost the equivalent of a detailed table of contents for a book. For example, near the bottom of the left panel there is a scene in which God introduces Eve to Adam. And within that we can identify other spatial frames and describe what they look like (for example, God’s right-hand gesture of blessing).

To explain semantic indexing, Dave selected an image painted 40 years after the Bosch — Hans Holbein the Younger’s The Ambassadors, which is in the National Gallery in London. This too is full of symbolism, much of it carried by the various objects which litter the scene, such as a lute with a broken string, a hymnal in a translation by Martin Luther, a globe, etc. To this day, the meanings carried in the painting are hotly debated amongst scholars.

If you zoom in and browse around this image in Linked Canvas, as you traverse the various artefacts that have been identified, the word-cloud on the left of the display changes contextually, and what this reveals in how the symbolic and contextual meanings of those objects and visual details have been identified in the semantic annotations.

An odd feature of this painting is the prominent inclusion in the lower foreground of an anamorphically rendered (highly distorted) skull. (It has been suggested that the painting was designed to be hung on the wall of a staircase, so that someone climbing the stairs would see the skull first of all.) The skull is a symbolic device, a reminder of death or memento mori, a common visual trope of the time. That concept of memento mori is an element within the Getty AAT thesaurus, and the concept has its own URI, which makes it connectable to the outside world.

Dave then turned to Titian’s allegorical painting Bacchus and Ariadne, also from the same period and also from the National Gallery collection, and based on a story from Ovid’s Metamorphoses. In this story, Ariadne, who had helped Theseus find his way in and out of the labyrinth where he slew the Minotaur, and who had become his lover, has been abandoned by Theseus on the island of Naxos (and in the background if you look carefully, you can see his ship sneakily making off). And then along comes the God of Wine, Bacchus, at the head of a procession of revellers and, falling in love with Ariadne at first glance, he leaps from the chariot to rescue and defend her.

Following the semantic links (via the LOD database on Iconography) can take us to other images about the tale of Ariadne on Naxos, such as a fresco from Pompeii, which shows Theseus ascending the gang-plank of his ship while Ariadne sleeps. As Dave remarked, we generate knowledge when we connect different data sets.

Another layer built on top of the Linked Canvas application was the ability to create ‘guided tours’ that walk the viewer around an image, with audio commentary. The example Dave played for us was a commentary on the art within a classical Greek drinking-bowl, explaining the conventions of the symposium (Greek drinking party). Indeed, an image can host multiple such audio commentaries, letting a visitor experience multiple interpretations.

In building this image resource, Synaptica made use of a relatively recent standard called the International Image Interoperability Framework (IIIF). This is a set of standardised application programming interfaces (APIs) for websites that aim to do clever things with images and collections of images. For example, it can be used to load images at appropriate resolutions and croppings, which is useful if you want to start with a fast-loading overview image and then zoom in. The IIIF Search API is used for searching the annotation content of images.

Searching within Linked Canvas is what Dave described as ‘Level Three Panofsky’. You might search on an abstract concept such as ‘love’, and be presented us with a range of details within a range of images, plus links to scholarly articles linked to those.

Post-Truth Forum

As a final example, Dave showed us http://www.posttruthforum.org, which is an ontology of concepts around the ideas of ‘fake news’ and the ‘post-truth’ phenomenon, with thematically organised links out to resources on the Web, in books and in journals. Built by Dave using Synaptica Graphite software, it is Dave’s private project born out of a concern about what information professionals can do as a community to stem the appalling degradation of the quality of information in the news media and social media.

For NetIKX members (and for readers of this post), going to Dave’s Post Truth Forum site is also an opportunity to experience a public Linked Open Data application. People may also want to explore Dave’s thoughts as set out on his blog, www.davidclarke.blog.

Taxonomies vs Graphs

In closing, Dave wanted to show a few example that might feed our traditional post-refreshment round-table discussions. How can we characterise the difference between a taxonomy and a data graph (or ontology)? His first image was an organisation chart, literally a regimented and hierarchical taxonomy (the US Department of Defense and armed forces).

His second image was the ‘tree of life’ diagram, the phylogenetic tree that illustrates how life forms are related to each other, and to common ancestor species. This is also a taxonomy, but with a twist. Here, every intermediate node in the tree not only inherits characteristics from higher up, but also adds new ones. So, mammals have shared characteristics (including suckling young), placental mammals add a few more, and canids such as wolves, jackals and dogs have other extra shared characteristics. (This can get confusing if you rely too much on appearances: hyenas look dog-like, but are actually more closely related to the big cats.)

So the Tree of Life captures systematic differentiation, which a taxonomy typically cannot. However, said Dave, an ontology can. In making an ontology we specify all the classes we need, and can specify the property sets as we go. And, referring back to Dion’s presentation, Dave remarked that while ontologies do not work easily in a relational database structure, they work really well in a graph database. In a graph database you can handle processes as well as things and specify the characteristics of both processes and things.

Dave’s third and final image was of the latest version of the London Underground route diagram. This is a graph, specifically a network diagram, that is characterised not by hierarchy, but by connections. Could this be described in a taxonomy? You’d have to get rid of the Circle line, because taxonomies can’t end up where they started from. With a graph, as with the Underground, you can enter from any direction, and there are all sorts of ways to make connections.

We shouldn’t think of ditching taxonomies; they are excellent for some information management jobs. Ontologies are superior in some applications, but not all. The ideal is to get them working together. It would be a good thought-experiment for the table groups to think about what, in our lives and jobs, are better suited to taxonomic approaches and what would be better served by graphs and ontologies. And, we should think about the vast amounts of data out there in the public domain, and whether our enterprises might benefit from harnessing those resources.


Discussion

Following NetIKX tradition, after a break for refreshments, people again settled down into small table groups. We asked participants to discuss what they had heard and identify either issues they thought worth raising, or thinks that they would like to know more about.

I was chairing the session, and I pointed out that even if we didn’t have time in subsequent discussion to feed everyone’s curiosity, I would do my best to research supplementary information to add to this account which you are reading.

I ran the audio recorder during the plenary discussion, so even though I was not party to what the table groups had discussed internally, I can report with some accuracy what came out of the session. Because the contributions jumped about a bit from topic to topic, I have resequenced them to make them easier for the reader to follow.

AI vs Linked Data and ontologies?

Steve Dale wondered if these efforts to compile graph databases and ontologies was worth it, as he believed Artificial Intelligence is reaching the point where a computer can be thrown all sorts of data – structured and unstructured – and left to figure it out for itself through machine learning algorithms. Later, Stuart Ward expressed a similar opinion. Speaking as a business person, not a software wizard, he wonders if there is anything that he needs to design?

Conrad, in fielding this question, mentioned that on the table he’d been on (Dave Clarke also), they had looked some more into the use in Dave’s examples of Natural Language Understanding; that is a kind of AI component. But they had also discussed the example of the Hieronymous Bosch painting. Dave himself undertook the background research for this and had to swot up by reading a score of scholarly books. In Conrad’s opinion, we would have to wait another millennium before we’d have an AI able to trace the symbolism in Bosch’s visual world. Someone else wondered how one strikes the right balance between the contributions of AI and human effort.

Later, Dave Clarke returned to the question; in his opinion, AI is heavily hyped – though if you want investment, it’s a good buzz-word to throw about! So-called Artificial Intelligence works very well in certain domains, such as pattern recognition, and even with images (example: face recognition in many cameras). But AI is appalling at semantics. At Synaptica, they believe that if you want to create applications using machine intelligence, you must structure your data. Metadata and ontologies are the enablers for smart applications.

Dion responded to Stuart’s question by saying that it would be logical at least to define what your entities are – or at least, to define what counts as an entity, so that software can identify entities and distinguish them from relationships. Conrad said that the ‘predicates’ (relationships) also need defining, and in the Linked Data model this can be assisted if you link out to publicly-available schemas.

Dave added that, these days, in the Linked Data world, it has become pretty easy to adapt your database structures as you go along. Compared to the pain and disruption of trying to modify a relational database, it is easy to add new types of data and new types of query to a Linked Data model, making the initial design process less traumatic and protracted.

Graph databases vs Linked Open Data?

Conrad asked Dave to clarify a remark he had made at table level about the capabilities of a graph database product like Neo4j, compared with Linked Open Data implementations.

Dave explained that Neo4j is indeed a graph database system, but it is not an RDF database or a Linked Data database. When Synaptica started to move from their prior focus on relational databases towards graphical databases, Dave became excited about Neo4j (at first). They got it in, and found it was a wonderfully easy system to develop with. However, because its method of data modelling is not based on RDF, Neo4j was not going to be a solution for working with Linked Data; and so fervently did Dave believe that the future is about sharing knowledge, he pulled the plug on their Neo4j development.

He added that he has no particular axe to grind about which RDF database they should use, but it has to be RDF-conforming. There are both proprietary systems (from Oracle, IBM DB2, OntoText GraphDB, MarkLogic) and open-source systems (3store, ARC2, Apache Jena, RDFLib). He has found that the open-source systems can get you so far, but for large-scale implementations one generally has to dip into the coffers and buy a licence for something heavyweight.

Even if your organisation has no intention to publish data, designing and building as Linked Data lets you support smart data and machine reasoning, and benefit from data imported from Linked Open Data external resources.

Conrad asked Dion to say more about his experiences with graph databases. He said that he had approached Tableau, who had provided him with sample software and sample datasets. He hadn’t yet had a change to engage with them, but would be very happy to report back on what he learns.

Privacy and data protection

Clare Parry raised issues of privacy and data protection. You may have information in your own dataset that does not give much information about people, and you may be compliant with all the data protection legislation. However, if you pull in data from other datasets, and combine them, you could end up inferring quite a lot more information about an individual.

(I suppose the answer here is to do with controlling which kinds of datasets are allowed to be open. We are on all manner of databases, sometimes without suspecting it. A motor car’s registration details are held by DVLA, and Transport for London; the police and TfL use ANPR technology to tie vehicles to locations; our banks have details of our debit card transactions and, if we use those cards to pay for bus journeys, that also geolocates us. These are examples of datasets that by ‘triangulation’ could identify more about us than we would like.)

URI, URL, URN

Graham Robertson reported that on his table they discussed what the difference is between URLs and URIs…

(If I may attempt an explanation: the wider term is URI, Uniform Resource Identifier. It is ‘uniform’ because everybody is supposed to use it the same way, and it is supposed uniquely and unambiguously to identify anything which might be called a ‘resource’. The Uniform Resource Locator (URL) is the most common sub-type of URI, which says where a resource can be found on the Web.

But there can be other kinds of resource identifiers: the URN (Uniform Resource Name) identifies a resource that can be referenced within a controlled namespace. Wikipedia gives as an example ISBN 0-486-27557-4, which refers to a specific edition of Shakespeare’s Romeo and Juliet. In the MeSH schema of medical subject headings, the code D004617 refers to ‘embolism’.)

Trustworthiness

Some people had discussed the issue of the trustworthiness of external data sources to which one might link – Wikipedia (and WikiData and DBpedia) among them, and Conrad later asked Mandy  to say more about this. She wondered about the wisdom of relying on data which you can’t verify, and which may have been crowdsourced. But Dave has pointed out that you might have alternative authorities that you can point to. Conrad thought that for some serious applications one would want to consult experts, which is how the Getty AAT has been built up. Knowing provenance, added David Penfold, is very important.

The librarians ask: ontologies vs taxonomies?

Rob Rosset’s table was awash with librarians, who tend to have an understanding about what is a taxonomy and what an ontology. How did Dave Clarke see this, he asked?

Dave referred back to his closing three slides. The organisational chart he had shown is a strict hierarchy, and that is how taxonomies are structured. The diagram of the Tree of Life is an interesting hybrid, because it is both taxonomic and ontological in nature. There are things that mammals have in common, related characteristics, which are different from what other groupings such as reptiles would have.

But we shouldn’t think about abandoning taxonomy in favour of ontology. There will be times where you want to explore things top-down (taxonomically), and other cases where you might want to explore things from different directions.

What is nice about Linked Data is that it is built on standards that support these things. In the W3C world, there is the SKOS standard, Simple Knowledge Organization Systems, very light and simple, and there to help you build a taxonomy. And then there is OWL, the Web Ontology Language, which will help you ascend to another level of specificity. And in fact, SKOS itself is an ontology.

Closing thoughts and resources

This afternoon was a useful and lively introduction to the overlapping concepts of Graph Databases and Linked Data, and I hope that the above account helps refresh the memories of those who attended, and engage the minds of those who didn’t. Please note that in writing this I have ‘smuggled in’ additionally-researched explanations and examples, to help clarify matters.

Later in the year, NetIKX is planning a meeting all about Ontologies, which will be a way to look at these information and knowledge management approaches from a different direction. Readers may also like to read my illustrated account of a lecture on Ontologies and the Semantic Web, which was given by Professor Ian Horrocks to a British Computer Society audience in 2005. That is still available as a PDF from http://www.conradiator.com/resources/pdf/Horrocks_needham2005.pdf

Ontologies, taxonomies and knowledge organisation systems are meat and drink to the UK Chapter of the International Society for Knowledge Organization (ISKO UK), and in September 2010 ISKO UK held a full day conference on Linked Data: the future of knowledge organization on the Web. There were nine speakers and a closing panel session, and the audio recordings are all available on the ISKO UK Web site, at http://www.iskouk.org/content/linked-data-future-knowledge-organization-web

Recently, the Neo4j team produced a book by Ian Robinson, Jim Webber and Emil Eifrem called ‘Graph Databases’, and it is available for free (PDF, Kindle etc) from https://neo4j.com/graph-databases-book/ Or you can get it published in dead-tree form from O’Reilly Books. See https://www.amazon.co.uk/Graph-Databases-Ian-Robinson/dp/1449356265

 

Posted in Harnessing the web for information and knowledge exchange, Managing information and knowledge, Uncategorized | Tagged , | 2 Comments

2018 Programme

The remainder of the 2018 programme is as follows

  • Fake News / Post-truth (May)
  • AI / Machine Learning (Ethics) (July)
  • Ontology (September)
  • Network Science (December)

We also plan to hold our first meeting outside London. While still in the planning stage, this will probably be in Leeds in June and the topic will be Medical Information. The meeting will be a joint one with ISKO UK.

Posted in Uncategorized | Leave a comment

Next meeting: Working in Complexity – SenseMaker, Decisions and Cynefin – Wednesday 7 March 2018

The next meeting of 2018 is on Wednesday 7 March:

Working in Complexity

The speaker will be Tony Quinlan. To register, follow the link above.

A pdf giving detail of the meeting is available at Working in Complexity 7 March 2018

 

Posted in Uncategorized | Leave a comment

The Future of Work for Information and Knowledge Professionals

To celebrate the tenth anniversary of NetIKX, the November 2017 meeting was opened free of charge to people from related knowledge and information management organisations. It featured two speakers, Peter Thomson and Stuart Ward, and an extended panel session with questions from the audience.

Peter Thomson on the changing world of work

Peter Thomson is a consultant who advises clients about how to create a corporate culture supporting new and better working practices. He is the author of ‘Future Work’, a visiting Executive Fellow at Henley Business College, and the Research Director of the Telework Association. Amongst his current engagements, he is helping the Health Foundation to launch a community of 1,000 health improvement practitioners, and working with Medécins sans Frontières on developing a knowledge-based evaluation process for their humanitarian response missions.

Thirty years ago he worked for Digital Equipment Corporation (DEC), known for its PDP and Vax lines of multi-user minicomputers. DEC experimented with new ways of networking, and with long-distance working. Surely, they thought, nobody in the 21st century would come to work getting stuck in traffic jams or packed into suburban trains – they would be sitting comfortably at home and ‘teleworking’ – it was a big buzzword at that time, but is now pretty much extinct.

With the benefit of hindsight, Peter notes that technology has changed but people haven’t. Human behaviour is full of ingrained habits, especially so amongst leaders of organisations. So we have the absurdity of people being forced to commute to sit at a desk, and send emails to the person a few metres away.

The younger generation is beginning to question why we continue with these outmoded working practices. The absurdity persists because business leaders want their team around them, under their eye, in a ‘command and control’ notion of how to run a business.

New tech, old habits

He asked: most of the audience have a smartphone, yes? How many had used it that day to actually telephone somebody – compared to sending or reading email, or other text based message? A show of hands showed that the latter was more prevalent than the former.

Although mobile devices and related technologies are now part of our everyday lives, and the world has become more complex, many of our practices in the world of work are still trying to catch up. Businesses may boast of being Agile, but many of the actual processes are Fragile, he said.

Business communication is spread across a spectrum of behaviours. People still like to get together physically in groups to discuss things. In that setting they employ not only words, but also nuances of expression, gesture, body language and so on. At the other end of the spectrum is email: asynchronous, and text-based (with quite a narrow medium of expression). Ranged in the middle we have such things are videoconferencing, audio conference calls, Skype etc.

Daily business communication is conducted mostly by typing emails – probably slowly – then checking for typing mistakes. Wouldn’t it be quicker to use new technology to send a quick video message? It’s technically possible these days. Look at how people have adopted WhatsApp in their personal lives. But the corporate default is face-to-face physical meetings, email at the other end, and nothing in between. Indeed, the social media tools by which people communicate daily in ordinary life are banned from many workplaces. And then people complain of having too many emails and too many meetings.

Tyranny of 24/7 and the overwork culture

Many people today are the victims of ‘presenteeism’. If you are not already at your desk and working when your managers show up, or if you leave your desk before they do, they won’t be impressed. They can’t sack you for sticking to the hours you are contracted for, but you’ll probably be passed over for promotion. Even if you’re the one who comes up with the best creative ideas, that’s regarded as incidental, secondary to the quantitative measure of how many hours you work.

This has now extended into, or been replaced by, a form of virtual presenteeism. Knowledge and administration work can now be done anywhere. So now we have digital presenteeism, 24/7. ‘Please take your laptop on holiday with you, in case there’s an emergency – and check in every day.’ Or, ‘I tend to send emails over the weekend, and while I don’t insist that you reply to them immediately, those who do are the ones I’ll favour.’ These leadership behaviours force people to be in work mode round the clock. It all builds stress, which the World Health Organization says is a 21st century epidemic.

But many people now won’t work under these conditions. They’d rather quit and set up their own business, or join the ‘gig economy’. They want to own their own time. If you got used to budgeting your time how you see fit, for example at university, you don’t want to be treated like a child by an employer, and be told when to do what.

The typical contract of employment is not about what you achieve – it’s about your hours of work, and you are paid according to the time you spend at work. For example, consider a mother who returns from maternity leave and agrees to work a four-day week rather than five. She benefits from having the extra time; the employer may also benefit, because in trying to get the same work done in four rather than five days a week, she probably skips unproductive meetings and spends less time chatting. But after a while, she finds that however productive she is, she’s being paid four-fifths of what her colleagues are.

At a national level, Peter commented, Britain is quite low on the productivity scale, and yet our working hours are so long.

Challenges to permanent employment

There has been a trend towards outsourcing and subcontracting: consider call centres in India or companies having products made in China. Will there now be a second wave of this, at the management and administration level, in which inefficient layers are taken out of corporate organisations and the organisation gets the professional inputs it needs from subcontractors?

We’re seeing the collapse of the pension paradigm. The conventional model is predicated on the idea of ‘superannuation’, and a few years of retirement before you die. But with today’s longer lifespans, thinking of seniors as being too old to contribute knowledge and skills is increasingly untenable — and anyway, it’s proving impossible to fund a long retirement from the proceeds of 40 years of employment. Nor can the State pension scheme fund extended pensions from the taxes paid by (a declining proportion of) younger people in work. Is retirement then an antiquated idea?

Peter closed by wondering what people’s ideal workplace might be — where they are at their most creative. Within the audience, people mentioned a variety of domestic settings, plus walking and driving. Peter imagines the organisation of the future as a network of people, working in such conducive conditions, and connected by all the liberating means that technology can bring us. Are we ready for this new world?

Stuart Ward on KIM and adding value to organisations

Stuart was the first chair of NetIKX and has been involved with our community throughout. The first meeting of NetIKX was addressed by David Skyrme, who spoke about the value of knowledge and information management (KIM for short, hereafter); that would also be the main focus of his presentation. He believes that it can be challenging for KIM professionals (KIPs) to prove their value to the organisations in which they work.

Knowledge and information are the life-blood of organisations; those who use them well will prosper, those who don’t will wither. From that, one might expect the KIPs to be highly valued, but often it is not the case.

Stuart identifies four things he thinks are important if KIPs are to survive and flourish in an organisation. They are: to focus on creating value; to link KIM activities to the organisation’s goals and objectives; to be clear about everyone’s responsibilities in relation to KIM (and there are various such roles, whether in creating and disseminating information products, or managing data and information resources). Finally, the organisation must have the right structures, skills and culture to make best use of what KIM can provide.

‘Value’ means different things in different enterprises. Commerce will focus on value for shareholders and other stakeholders, and customer service. In the public sector, and for non-profits, value could mean packaging information so that citizens and other service users can make best use of it.

A six-part model

Stuart has long promoted a model that is structured around six mechanisms through which information and knowledge can be used to deliver value to an organisation. They are:

  • developing information and knowledge products which can be marketed;
  • helping to drive forward the business strategy;
  • enabling organisational flexibility and change;
  • improving corporate decision making;
  • enabling and improving all business processes, especially the key ones; and
  • enhancing the organisation’s ‘intellectual capital’.

Looking at these in turn…

Information and knowledge products: Some businesses (publishers, media companies, law firms etc) create products for sale to the public or to other businesses. Others, such as local government or non-profits, produce reports and studies: though not for sale, these are crucial in their work of informing the public, influencing government or what have you.

Driving business strategy and renewal: Organisations often must change to survive, and here KIM can deliver value by enabling innovation. Apple Computer almost hit the rocks a couple of times, but through KIM and innovative product and service design became highly profitable. It’s important to sense the direction the market is headed: Blackberry and Nokia are examples of companies which failed to do that.

Enabling organisational change and flexibility: Good KIM helps an organisation to be sensitive to changing business opportunities and risks, to improve efficiency and cut costs. Here, the last thing one needs is to have knowledge trapped within silos. Efficient sharing of knowledge and information across the organisation is key.

Improving decision making: The mantra is, ‘Get the right information at the right time, to the right people.’ Good decision making requires an understanding of options available, and the consequences of making those choices – including the risks. Bad decisions are often made because of the prejudices of the decision-makers, who have the power to prevail in the face of evidence, so it’s important that the organisation has the right cultural attitudes towards decision-making, knowledge and information.

Continuous improvement: Almost always, business processes could be done better, and proper curation of information and knowledge is the key to this. Good ideas need not be limited to a discrete business process, and can inspire changes in other activities.

Enhancing intellectual capital: One of the most important realisations in KIM is that, just as money and equipment and premises and people are assets to a business, so are information and knowledge, and they should be managed properly. Yet many organisations don’t have an overview of what those intellectual assets are. As engineer Lewis Platt, 1990s CEO of Hewlett-Packard once said, ‘If we only knew what we know, we’d be three times more profitable.’ (Platt was also famous for his practice of ‘management by walking around’, immersing himself in the operational side of the business rather than staying aloof in his office.)

Linking KIM to key goals: the benefits matrix

Stuart then proposed a methodology for fitting IM and KM to an organisation’s key goals and objectives. As a tool, he recommends a ‘benefits matrix’ diagram, somewhat like a spreadsheet, in which the row headings define the organisation’s goals, its aims and objectives, while column headings define existing or possible future KIM services and capabilities. This creates an array of cells, each one of which represents how one particular KIM service or capability maps to one of the organisation’s goals. In these cells, you can then list the benefits, which may be quantifiable (e.g. increased income, reduced cost) or unquantifiable.

Stuart gave the example of an organisation having a Document Management System (represented as a column on the matrix). How might that map across to the company’s goal of reducing overhead costs? Well, a quantifiable result might be the saving of £160K a year, while unquantifiable benefits could include faster access to information, and a reduced risk of losing critical information.

Stuart likes this kind of exercise, because it stimulates thinking about the way in which information and knowledge management initiatives can generate benefits for the organisation’s self-identified objectives. It isn’t narrow-minded with a focus only on quantifiable benefits, though it does stimulate thinking about how one might define metrics, valuable for monitoring results. Finally, it is strongly visual and easy to assimilate, and as such is good for engaging with senior management.

Responsibilities, capabilities

It would be a mistake to assume that KIM responsibilities attach only to people explicitly employed for a KIM role. Business leaders also have strategic responsibilities for KIM, and every ‘ordinary’ worker has KIM responsibilities too. There are special defined responsibilities for those with ‘steward’ roles, for those who create and manage information and knowledge resources as their main job, and also those who manage IT and other kinds of infrastructure and services which help to manage and deliver the resources to where they are needed.

Stuart’s slide set included several detailed bullet-point slides for the KIM responsibilities that might attach to these various roles, but we skipped over a detailed examination these due to pressure of time. [The slide set is available at www.netikx.org to NetIKX members and to those who attended the meeting.]

A cyclical process

Stuart’s final diagram suggested that there is a cyclical process between two spheres: the first sphere represents the organisation’s data and information, and what it knows. Through good management and use of these resources, the organisation hopefully performs well and succeeds in the second sphere, that of action. By monitoring performance, the organisation can learn from experience, and that feeds back to the first sphere.

Learning from the Hawley Committee

In 1995 the Hawley Committee, chaired by Sir Robert Hawley, under the auspices of the KPMG IMPACT Programme, published the results of an investigation into information management, and the value assigned to information, in several large UK businesses. Entitled ‘Information as an Asset – The Board Agenda’, the report set out ten agenda points, of which three have to do with responsibilities of the Board of Management. CILIP have recently shown a renewed interest in the Hawley Report, and may soon republish it, with updates to take account of changes between then and now.

Panel Q&A session

The panel discussion session had something of a BBC ‘Any Questions’ flavour. Before the session, people sat in table groups and came up with written questions, which Panel chair David Penfold then collected and collated during our tea break. David then called for questions which were similar to be put to the panel, which consisted of Noeleen Schenk of Metataxis, David Gurteen, David Smith (Government KIM Head of Profession), Karen McFarlane (Chair of CILIP Board), David Haynes (chair of ISKO UK) and Steve Dale.

Will KIM professionals become redundant?

Stuart Ward asked for the panel’s opinion about whether knowledge and information management professionals might soon be redundant, as their skills are diffused to a wider group of people through their exposure to technology in school and university. Joanna asked how we can create knowledge from the exponentially growing amount of information; and Alison wondered if the information available to all on Wikipedia is good enough; or are we looking for a perfect solution which doesn’t exist?

Steve Dale responded mostly to Stuart’s question. He has observed how in most of the organisations with which he works, KIM functions are being spread around the organisation. Organisations like Cadbury’s, the British Council and PWC no longer have a KIM department per se. Knowledge has become embedded in the organisation. But Steve still sees a future role for KIM professionals (or their equivalent – they may be called something else) as organisations turn to machine-augmented decision-making, ‘big data’, and machine learning.

Consider machine learning, in which computer systems are fed with truckloads of data, and process that to discover patterns and connection. If there is bias in the data, there will be bias in the outcomes – who is checking for that? This is where wise and knowledgeable humans can and should intervene to manage the quality of the data, and to ensure that any ‘training set’ is truly representative.

Karen McFarlane also responded to Stuart’s question. With her GCHQ and National Cybersecurity Centre background, she sees a continued and deepening need for skills in data and information governance, information assurance and cyber-security; also in information risk management, and data quality. KIM professionals have those kinds of skills. As for Stuart’s assertion that exposure to technology at university is enough to impart those skills – she thinks that is definitely not the case. Such people often don’t know how to manage the information on their desktops [let alone at an enterprise level].

Noeleen Schenk in contrast replied that she didn’t think it should be necessary to teach people how to manage the information they work with, so long as there were rule-based technical systems to do information governance automatically (for example, though auto-categorisation). But who will write the rules? That’s where the future of KIM work may lie.

David Haynes offered the perspective of someone teaching the next generation of KIM professionals (at Dundee and at City University of London). He is impressed by the diversity of backgrounds among people drawn to take courses in Library and Information Studies, and Archives and Records Management: it shows how relevant these skills are to many fields of human activity. He would like to see in future a larger emphasis on information governance skills, because many LIS-trained people go on to take up roles as data protection officers, or working on compliance matters.

David Smith thinks KIM professionals do risk extinction if they don’t change. He remembers that when he joined the Civil Service, doing online search was an arcane craft for specialists – that’s gone! He agrees that information governance is a key skill. Could anyone do that? Yes… Would they make mistakes? Certainly! This is where KIM professionals should be asserting their skills and gaining a reputation for being the go-to person for help in such matters.

David Gurteen doesn’t think the need to manage information and knowledge will go away – quite the reverse. One recently-arising topic has been that of ‘fake news’ and bias, which for him highlights the need for information and media literacy skills and judgement to be taught and learnt.

Training the robots?

Claire Parry referred to the greater availability today of information which has been structured to be ‘understandable’ by machines as well as by humans. What did the panel think might be KIM professionals’ roles in training the machines, and dealing with ‘bots’.

Steve Dale said that artificial intelligence has been a big interest of his in recent years. A lot of young people are out there coding algorithms, and some machines are even crafting their own through machine learning. That’s fine for game-playing, but in matters of more importance, affecting the lives of citizens, we must be concerned when machines evolve their own algorithms that we don’t understand. The House of Commons Science and Technology Committee is requesting information from organisations creating these kinds of tools, so they can consider the implications. Steve said that, when some algorithm is being used to augment decision-making in a way which affects him, he wants to know about it, and about what data is being used to inform it.

Rob Rosset wondered whether it is possible to create an algorithm that does not have some form of bias within it. David Gurteen thought ‘bias’ was inevitable, given that programming always proceeds from assumptions.  Noeleen Schenk thought that good data governance could at least reveal to us the provenance and quality of the data being used to inform decisions. David Haynes agreed, and referred to the ‘ethics of information’, noting that CILIP’s model of the Wheel of Knowledge places ethics at the very centre of that diagram.

Steve Dale mentioned he had just been at an event about whether AI will lead to job losses, and people there discussed the algorithms that Facebook uses. Facebook now realises that they can’t detect online abuse algorithmically, so are in the process of recruiting 10,000 humans to moderate the algorithms! So the adoption of AI maybe bringing up job opportunities?

The gig economy; face-to-face vs virtual working

David Penfold brought forward four questions which he thought might be related to each other.

Kathy Jacob asked, what impact will ‘gig economy’ workforce arrangements have on knowledge and information work in the future? Particularly in the aspects of knowledge creation and use. Valerie Petruk asked, is the gig economy a necessary evil? Sarah Culpin wondered how to get the right balance between face-to-face interactions and virtual working spaces; and Jordana Moser similarly wondered how we organise to meet human needs as well as the demands of efficiency and productivity. For example, a face-to-face meeting may not be the most efficient way of getting work done, but has value on other levels.

David Smith thought that the ‘gig economy’ probably is a necessary evil. Records management for government has become increasing commoditised: when a task emerges, you buy in people to do it, then they go. It’s a balancing act, because some work is more appropriately done by people on the payroll, and some doesn’t have to be. Procurement skills have therefore become more important – deciding what work you keep in-house, and what you farm out, or get people in for on a temporary basis.

David Haynes noted the loss of rights that comes along with the ‘gig economy’ – being employed has benefits for people. He himself has been both employed and self-employed – it’s worked out just fine for him, but people engaged for more routine tasks can be easily exploited; when they are ill, they aren’t paid; they don’t get holiday pay, etc. Peter Thompson in his talk had proposed being ‘paid by the piece’ rather than for time on the job, but David thinks that going down this path not only imposes on individuals, but brings a cost to the whole of society too.

Noeleen Schenk finds that a ‘gig economy’ approach suits her, because she likes a portfolio lifestyle. If you combine it with the Internet’s opportunities for long-distance working, it’s brilliant that an enterprise can find someone with just the skills they want, who can provide that service from the other side of the world.

Moving to address Kathy Jacobs’s question directly, Noeleen thinks that knowledge capture will move from writing things down towards voice capture, plus voice-to-text conversion, such that there will be fewer low-grade tasks to be assigned to temporary workers. However, what gig work methods do risk losing is the organisational knowledge that comes with continuity of shared experience in the enterprise.

Karen McFarlane said that we need both face-to-face and distant working. We are humans; we work best in a human way. We can blend in virtual meetings, virtual communities; but these virtualised relationships always work best if you have met the other person(s) in real life.

David Gurteen is definitely in favour of face to face conversation. He has been experimenting holding his Knowledge Café meetings using Zoom technology, but he thinks, if you can meet face to face, it’s better. Doing it remotely is something you do if you have to. Nancy Dixon talks about the ‘oscillation principle’ – if you have a geographically dispersed team, every so often you have to bring them together (see her blog post at https://www.druckerforum.org/blog/p=881 – she talks about ‘blend[ing] sophisticated virtual tools with periodic, in-depth, face-to-face collective sensemaking.’)

Recruitment criteria, and the robots (again)

Judith Stuart, who lectures at the University of the West of England, asked what skills and knowledge the panel look for in new recruits and appointments to knowledge management roles in organisations.

David Haynes replied in one word: ‘flexibility’, and other panelists agreed, David Gurteen would add ‘curiosity’. Noeleen’s answer was similar – adaptability, and the ability to cope with uncertainty.

Karen McFarlane said that when she used to recruit people to roles in records, information or knowledge management, she looked out for people who had a love of information. Yes, flexibility was also amongst her criteria, but also the inter-personal skills to be able to work in a team.

David Penfold thought it was interesting that no-one had mentioned professional skills! Karen replied that of course those were required, but her response to the question was about what would make a candidate ‘stand out’ in her eyes. Noeleen added that professional skills can be learned, but the softer skills can’t so easily.

Steve Dale referred to a company he would shortly be meeting, called HeadStart, which is using artificial intelligence and machine learning working on data (such as exam results, social media interventions) to identify candidates for organisations. They claim to shorten the time and lower the cost of getting the right people into the right jobs. He has been wondering how they would know what ‘a good result’ or ‘a bad result’ looks like…

David Haynes noted that the new data protection regulation will give people the right to challenge how decisions are made by automated systems, and to insist on human intervention if they don’t like the basis on which decisions are made.

Is it good to be averse to risk?

Anna Stothard asked for top tips or recommendations for changing a risk-averse culture, and getting more buy-in to new ideas from senior management.

David Smith remarked that government is keen on risk-aversion! Indeed the best way to get civil service management attention is to say, ‘If you want to avoid risk, do this.’ If he tells them about various benefits that a new approach could bring, he’ll be politely ignored. If he describes all sorts of bad things that could be avoided – then they are all ears (though one shouldn’t overdo it).

It all depends on your organisational culture; you need to assess management’s appetite for risk, and to make sure people understand the nature of the risks. He gave the example of a local government organisation that had turned down a Freedom of Information request on the grounds that it was ‘impertinent’, when what was underlying the response was a risk-averse culture.

Steve Dale said that in his consulting role he has had to go try to convince senior management that a change would be beneficial. His rule of thumb is to pay attention to Return on Investment (ROI); if the investment can be kept modest, the proposal is more likely to find favour.

Noeleen Schenk generally prefers to argue for change because of the benefits it will bring, but she had recently worked with a client where the concern was mostly about risk. So the project on which she was working was converted from a ‘value adding’ one to a ‘risk reduction’ one instead.

The role of knowledge organisation?

Keri Harrowven asked what role knowledge organisation plays in promoting knowledge and information management in the workplace.

Noeleen Schenk replied that, for Metataxis, knowledge organisation has a central role. But many people regard KO as an overhead, and an unnecessary expense. It takes time and effort to get KO right, but Noeleen will ask – ‘If you can’t find it, why have you got it?’ She recalled a client with about 110,000 boxes of documents in offsite storage, with next to no indexing, but they insisted they wanted to keep it all – at huge cost. She asked them, could they find anything? (No.)

Just because it is hard to do knowledge organisation, doesn’t mean that you shouldn’t. She’d say – start with some architecture, then build out. In a recent project, she has started by putting in place a set of requirements about how newly generated information is handled, first.

David Haynes noted that the UK Chapter of the International Society for Knowledge Organization often visits these topics. Like Noeleen, he thinks that there is no point in hoarding information if you can’t retrieve it. That leads to such KO questions as how you categorise information, how it is described, what metadata can be captured and attached, and what search and discovery tools you can put in place. It also goes into what the organisation’s needs are, what is the nature of the information you are faced with, and how you make that connection.

Also of increasing importance is how we can exploit information. Linked Data is an approach showing incredible potential, and new applications, such as combining map data with live sensor and update feeds – for example, the data revolution which helps Transport for London passengers know when their next bus is coming and where it is now. But none of these novel forms of exploitation would be possible without robust schemes for classifying and managing the information sources.

Finally, knowledge organisation is key to information governance.

Silos or outstations?

Someone asked: ‘Does having KM roles in an organisation create silos? How can move towards a more embedded approach?’

Karen MacFarlane described a hybrid approach in which her organisation had a central KIM team, which might be considered a silo; but she funded the placing of KIM professionals into teams of analysts for a year, helping them to develop their own information management skills. In every case, the teams that had had the benefit of working with the KIM professional wanted to find the funds to continue that work from within their own budgets.

Information governance directions?

Martin Newman wanted to know where panellists thought information governance was going, as the two initial speakers seemed to be predict new ways of working in which information roles would be decentralised.

David Haynes replied that KIM professionals are increasingly being tasked with data protection and information governance framework development. But he doesn’t think that they can work on their own. They have to work with the people on the legal side, and the people delivering the technology. It doesn’t really matter who is ‘in charge’, so long as there is that sort of coalition, and that it is embedded within the organisation.

Noeleen Schenk recounted noticing enormous variability in where information governance tasks are run from – sometimes from legal, sometimes from the executive, sometimes IT. Arguably, all is well if how people collaborate matters more to them than where people are sitting. She has been noticing a trend of information governance roles moving from the centre, along ‘spokes’, towards decentralised clusters of people; but it is even better if the way people work at every level supports good governance rather than it having to be done for them.

David Smith says that the culture of the civil service is already imbued with instinct to take good care of information. Yes, silos are there – he gave us a picture of ‘fifteen hundred silos in a single department, flying in close formation’. Teams have got smaller – to do with cuts, as much as anything else. Not every information asset is equally treated, depending upon risks attached. It’s a question of expedience, and balancing risk against cost.

His own department manages information about the European Regional Development Fund. If the European Court of Auditors asks to see information about any ERDF projects in the UK, his department has 72 hours to find it; or else there is a fine up to 10% of the value of the ERDF loan that financed that project. Imagine the prospect of a fine of £100,000 if you can’t find a file in 72 hours! You can bet the department has got that information rigorously indexed; whereas other areas are managed with a lighter touch, as they don’t carry the same risks.

There is also variability across government as to whether the work is done at the ‘hub’ or along the ‘spokes’.

Steve Dale pointed out that silos can exist for a reason – an example would be to maintain security in investment banking.

Globalism and process alignment

Emma Bahar had a question: ‘How can processes be managed in global organisations in which alignment is likely impossible?’

Steve Dale used to work for Reuters, with offices in every country. They managed very well in aligning their processes. Indeed, their whole business model relied on good interchange of quality information. He thought most global organisations would wither and die if there wasn’t good interchange and standardisation of processes. Yes, there will be cultural differences, and in Reuters they encountered these and learned to work with them.

Wikipedia again

David Penfold suggested returning to the question about the quality of information available on Wikipedia; are we asking too much looking for a perfect solution which doesn’t exist? Universities typically talk down Wikipedia, and students are not allowed to quote it as a reference. Is that realistic?

David Haynes pointed out that Wikipedia editing is moderated. A study some years ago compared the accuracy of Wikipedia articles against Encyclopaedia Brittanica online, and Wikipedia was found to be superior. He advises students that Wikipedia is a fantastic resource and they should use it – but not quote it! If Wikipedia gives a reference to a source document [according the Wikipedia ‘no original research’ rules, every assertion should be backed up by a citation], then go to that source and quote that. Wikipedia should be regarded as a secondary source, a good entry point into many subjects. Indeed, David uses it that way in his own research.

Noeleen Schenck hinted at possible double standards. In the analogue world we never relied on Encyclopaedia Brittanica for everything. She thought that some of the discomfiture was about how Wikipedia is authored by tens of thousands of volunteers. We should remember that enthusiastic amateurs helped to expand the boundaries of science; they are not necessarily ignorant or inept.

The panel agreed that Wikipedia should be regarded as one source amongst many. Noeleen compared this to reading several newspapers to get an angle on something in politics. How you assess sources brings us back to the topic of Information Literacy – not, perhaps, the best term for it, but as David Haynes confirmed, critical assessment of information sources is actually being taught (to KIM students, anyway).

Generational attitudes

Graham Robertson noted that Peter Thomson had talked about ‘millennials’ and their attitudes, and Graham wondered what the panel thought about the role the younger generations would play in changing attitudes, cultures and practices around KIM in organisations. Do younger people use and process information in a different way?

David Smith said he has been doing a review, where it was interesting to compare how older and younger people were sharing information within their cohorts. In the case of older members of staff, one could track discussions via email. But the younger staff members appeared to be absent. Why? It turned that they communicated with work colleagues using WhatsApp. Because it was a medium with which they were familiar, it was a quick way for them to ‘spin up’ a conversation. Of course, this poses new challenges for organisations. Discussions and information sharing are absent from the network (and apparently WhatsApp security isn’t up to much).

Noeleen Schenk thought it was a fool’s errand to try to force people to work in a way which they don’t find natural. She doesn’t know what the solution is, but we need to think afresh at important information and knowledge is kept track of – the current crop of tools seem inadequate.

Facing down ‘alternative facts’

Conrad Taylor asked: What is, or could be, the roles and responsibilities of all who work with knowledge and information – including teachers and journalists – in helping people to learn how to weigh evidence, distinguish fact from falsehood and propaganda, both in ‘big media’ and in social media?

David Haynes noted that this was increasingly a focus in meetings of KIM professionals [it was the subject of a panel session at the 2017 ISKO UK conference]. How can people be sure they are receiving unbiased information? Or if, like Steve Dale, we think that there cannot be unbiased information, we will have to be open to a range of information sources, as Noeleen had suggested.

David Penfold noted that in recent partisan political debate on social media, bots had been unleashed as disseminators of propaganda. Conrad noted that Dave Clarke of Synaptica has proposed a taxonomy of different sources of misleading information (see resources at https://davidclarke.blog/?page_id=16). The panel noted that the role being played by paid-for posts on, for example. Facebook and the way Facebook’s personalisation algorithms work are coming under closer strategy.

Conrad regretted that the term ‘Information and Knowledge Professional’ is often used to mean only people who curate information, excluding people whose job it is to create information – as writers, and designers and illustrators too. It is all too common to see data graphics that have been created in support of an editorial line, and which are misleading. (Indeed Steve Dale addressed this at a recent NetIKX meeting.)

Steve Dale remarked that we now have a new weapon to counteract ‘phishing’ attacks where fake online approaches are made in an attempt to defraud us of money, steal our identity, etc. It’s called Re:scam (https://www.rescam.org) and if you forward scammy emails to it, its artificial personalities will engage the scammers in an exchange of emails that will waste their time!

At this point, we ran out of time, but continued discussions over drinks. Face-to-face, naturally!

Posted in Uncategorized | 1 Comment

NetIKX Programme for 2018

The first meeting of 2018 is on Thursday 25 January:

Making true connections in a complex world: new technologies to link facts, concepts and data – Thursday 25 January 2018

A pdf giving detail of the meeting will be available shortly.

The full 2018 programme will be announced shortly.

Posted in Uncategorized | Leave a comment