Tag Archives: KM

Securing knowledge content

In looking to control the inadvertent or fraudulent export of knowledge content there are three elements that are part of the transaction and that can be targeted:

  • people – the individual conducting the activity
  • system – the devices, applications, etc being used to generate or transfer the copy.
  • product – the content being copied


Fundamental in any approach to secure knowledge content within your organisation is user behaviour. No initiative should be considered, or even attempted, without an attempt to encourage users to do the right thing regardless.

Some actions to consider to encourage the correct behaviour include:

  • adoption of a formal content policy
  • inclusion of the policy in induction training/education
  • inclusion of adherence to the policy in employment contracts or agreements

Another set of possible actions may be considered depending on culture. They target the less altruistic aspects of this issue, encouraging “compliance through
fear of getting caught”:

  • make publically aware all steps taken to secure content
  • make publically aware instances of users being caught
  • aggressively penalise any user caught doing the wrong thing (ie folllow through on your policy)

Note that promoting an internal policy to protect firm content is meaningless if the firm is seen to be actively avoiding providing the same respect to external parties, for example by:

  • using content taken from other firms
  • using copyright content taken from the internet, etc.

To provide a culture of respect for content, that respect must be seen to be given to all content irrespective of source.


On the face of it, securing content by managing the system would be the obvious choice.  You just determine the possible ways content could be lost and develop technical solutions to block all paths.

For example, to prevent theft by attaching to e-mail:

  • filter all outgoing e-mail
  • ban non-company e-mail (Hotmail, etc)

To prevent theft by copying to a portable media device:

  • ban USB devices
  • disable USB ports
  • disable CD burners

Or, to prevent theft by photocopying:

  • disable printing of knowledge content
  • watermark content so it cannot be copied
  • identify each copy (so it can be tracked to an owner)
  • track all photocopying use

(note these last two are as much “people” as “system” approaches since they don’t directly stop the unwanted copy - just discourage it)

However the issues with a system-based approach become apparent once you start trying to detail the various methods available to the unscrupulous. For example, consider simply trying to prevent theft via e-mail. Even the briefest consideration of the options would include:

  • use a web-based e-mail system (and how many of them are there? Google, Yahoo, Hotmail,.. – would we need to identify them all?)
  • use some form of encryption on the file (can be as simple as putting a password on a ZIP file)
  • hide the file (eg changing the file extension from “doc” to “jpg” or something meaningless – “rgk” - so that it’s not obvious it’s a document and is intercepted as such)
  • upload the file to an online sharing service and simply e-mail a link to the file

What makes it doubly difficult is that for each method needing to be locked down, you may have to consider instances where the method is legitimately used. For example, encrypting a file before mailing it may be a business requirement and cannot automatically be assumed to be fraudulent.


The third arm of any content security approach is the most promising: put the onus of restricting inadvertent use on the content itself.  In essence, this approach modifies the content so that it checks whether its use is valid each and every time it is accessed. For example:

  • checks it’s being opened on a computer within a specific network (ie on a computer owned by your firm)
  • checks its being opened by a valid user, or
  • when opened “phones home” to confirm its acceptable for the user to access at that point.

If the self-aware content idenfies an issue, how it responds could vary from simply “saying no” through to logging the inadvertent use and allowing the breach to be investigated.

Since the content is secured at the point of access, this approach does not place any onus on managing the technology of duplication and thus upsetting any legitimate use. It also has advantages for legitimate users too. For example,
if the “phone home” approach was adopted, a user accessing a superseded knowledge document could be informed and/or directed to its replacement.

For details on one implementation of such a content-based approach, look at features available with Adobe Acrobat, in particular their Lifecycle ES product.


Any comprehensive approach to securing your precious content needs to address people, system and product aspects.  The mix appropriate for your firm will depend on the sensitivity of the content, the skills of your staff and the implications of getting it wrong.

Integrated vs. federated search

One of the key decisions to be made when implementing an Enterprise Search solution, including Recommind, is whether each information source is federated or integrated (or ‘indexed’ as Recommind sometimes describe it).

A recent buckeye post does a good job of explaining the difference.  In essence:

  • federated = multiple searches simultaneously; send the search criteria to multiple sources and get their own, independent search engines to return the results they think are the best matches
  • integrated = single search; index all the sources in the one place and get our one, chosen search engine to check them all the and pick the best matches

Note that the choice of federation vs. integration is more at the back-end (how we find matches) than the front-end (how we display results).  Normally federated results are returned separately:  eg a separate tab or listing for each results set.  However there is no technical reason why federated results are not returned in one set of matches.   Or why an integrated search does not return different sources in different results tabs.

You’d imagine that an integrated search should return a better set of matches since it will consistently apply the same criteria and  search logic to all the sources and therefore more consistently pick the winners.

However it is not necessarily the case that you would always choose to integrate rather than federate:

  • not allowed – some sources will not ‘open the bonnet’ enough for you to directly index their content.  This is particularly the case with those that provide (= in competition) search solution themselves
  • not beneficial – for some sources the provided search tool may be preferred to the single Enterprise approach as it will obviously allow both search criteria and results to better target the content.  In such a case federation may be preferable to simply provide pointers for users to then conduct more detailed searches within the source directly.
  • not appropriate – related to the question of benefit, there may be some content sources that users would not expect to be searched in an integrated manner.  And would fully expect/prefer to search independently and separately.  For example, nomatter how well your chosen search tool works, it is unlikely that anyone completing a knowledge search in your firm would expect the entire internet to be included in the scope.
  • not technically possible – there are some sources (the “internet” for example) where it would be impossible to index all the content.

So in putting together an Enterprise Search plan, you need to use the above definitions and criteria to help determine which sources we federate, which we integrate, and which we avoid completely.