Improving Data Quality

In a previous post we touched on the subject of improving data quality via data cleaning and the processes involved. If you are dealing with sensor derived data then the quality will more or less depend on the sensors you are using and there isn’t much you can really do – so here we will deal with manually entered data particularly. One point mentioned was how you can encourage colleagues to not load poor quality data into systems. Human error is a significant factor for errors in data. Data entered manually can be inconsistent, formatted incorrectly, or just plain missing. The most common reasons for poor quality human supplied data include:

Poor knowledge

‘I have no idea what you want or what I need to do’

This is a training and education issue. It’s obvious that a member of your team will provide poor or no data if they have no guidance. By providing the necessary training it’s the most straightforward problem you can solve.

Ignorance

‘I didn’t know you wanted this info’.

Contrary to popular opinion, this is not solely because your colleagues haven’t had any training. There are actually two factors to understand; the training provided was itself not to a high standard, and/or your co-workers do not understand the importance in providing accurate data.

Bad advice

‘I was told to do it this way’

This is another fallout from poor training. Your colleagues think they are doing the task correctly because they were told, either formally or informally. Finding the root cause of this will take some time and potentially some review of process or documentation.

There is a tangent to this reason where data formats are concerned. Take, as two very simple examples, the address ‘County’ field and telephone numbers. You can use full or abbreviated county names but they should always use the same format. It doesn’t matter if your co-workers want to state Shrops or Shropshire, they must all use the same format. Telephone numbers are different in they have a defined format. This has grown over the years to cover different dialing codes for cities. However, there is a format laid down which is as follows:

  • 5 & 6 number format; 01234 567890
  • 4 & 7 number format: 0123 567 8901
  • 3 & 8 number format: 012 3456 7890

Not following these formats will mean data quality is eroded, impacting your data analytics work.

Application Issues

It won’t let me put the information in how you want’

A less common reason (although the point about telephone numbers above can be used here) is where a system seems to not allow your colleague to provide information in the format required.

Time / Workload

‘I have to focus on more important stuff’

This is of course very common. For anyone in a role that is fast-paced role or demands high concentration, paperwork gets pushed back to the absolute minimal effort required to continue.

You can see a common underlying issue here. Uploading data is seen as a minor detail, too often an inconvenience, which distracts staff from their work. What gets lost in the day-to-day activities is the importance of good data; it helps staff through better decision making, is evidence of their work which they can use if queried, reduces subsequent errors by them or colleagues, lowers costs and provides better customer service. And if your organisation wants to become more data centric, this becomes central to your efforts.

So how do you resolve the problems mentioned above?   Let’s look at some practical steps which will help.

Spot Bad Data

Identify the nature of any poor data. What is it, what is the source and what appears to be the issue – and how important it is to sort out the problem. This approach may mean you have a list of issues and sources which you can begin to deal with in turn.

Find Answers First

Now you have spotted the possible reasons behind data being below par, think about how to resolve the issues. Anyone can spot a problem, finding the answer may not be so easy. An ideal route here is for you to spot check some issues directly with colleagues. Ask them about the issues you have found and get their view on why the errors occur. This will help you come up with suggestions around how to solve individual problems in a way they would support. Taking this time to provide a plan to reduce or remove errors can be invaluable guidance to your colleagues and makes the task easier for them.

Show What Going On

Third, make this visible to your senior staff. These are the people who most rely on the data you provide to be the best quality possible. They will see the benefit to themselves and will naturally be most inclined to support your efforts and having their backing will improve adoption through their teams. So you need to demonstrate what barriers are in place to ensure they get exactly that.

Keep Up The Momentum

Once you have senior staff support you can involve colleagues to implement simple steps to improving data quality. The visibility you have gained will boost efforts to resolve the problems. Where any stiction occurs, reporting back to senior staff on progress can help remove barriers to progress. On that point, it’s important to report any positive results to everyone involved. When your colleagues see they are helping the organisation function better they are much more inclined to take an interest and support your aims.

Where issues are related to a process or application, then working with those involved can bring you some excellent results. Discussing with a developer practical issues with their software and looking for answers helps them to evolve and improve their product – delivering better solutions for you. In reality this may mean any updates having to be within existing development roadmaps but never underestimate the goodwill you can get from your active involvement to help them.

Leave a Reply

Your email address will not be published. Required fields are marked *