3.5 Data Analysis Tools
The fifth stage of data analysis is to select the most appropriate tools to analyse the collected data. The method(s) selected will depend on the type of project and the established objectives.
Databases are often split into tables to be easier to update, view and manipulate. For example, a supermarket database may include a table of product information, another table of suppliers and another for actual stock levels. Separating the data into tables allows for more straightforward editing and also provides for the display of basic patterns. For example, looking at a table of stock levels in a supermarket can quickly show which products need to be ordered in as they are close to selling out.
Data tables allow for the most simple form of pattern discovery and are an excellent method of speedy, short-term data analysis. However they present data in its current format and cannot show change or trends over time - a product may have a high stock level because it is popular and has just been ordered in, rather than because no-one is buying it.
A simplified data table for a supermarket.
Visualisation of Data
Visualising data (by producing a chart or graph of collected data, for example) makes it easier for an audience to see trends and patterns.
Visualising data, like the bar chart to the right of the supermarket table from the tool above, makes it easier to understand and quicker to interpret.
In this example, It is easier to see using the chart that steak pies are low in stock and should be re-ordered soon.
A bar chart of the supermarket data table.
Trend & Pattern Identification
This tool links heavily to the visualisation of data in allowing trends and patterns to be viewed as a visual format - such as producing a line graph of last year’s stock sales.
Statistical analysis enables data analysts to examine numerical data and, if done correctly, can highlight relationships between different data elements - such as the price of a product and how many have been sold. Discovering links between variables is known as regression analysis.
Data cleaning ensures that any stored data is up-to-date and accurate per the Data Protection Act (1998). Forms of data cleaning include removing customers who have not made a purchase in a certain amount of time (e.g. two years) and periodically checking that user addresses are up to date.
Data cleaning would reduce the size of any data table by removing redundant, incorrect or unnecessary data. This would make it easier to work with the data table and would improve the data quality by deleting erroneous and irrelevant data.
GIS / Location Mapping
Geographic Information Systems (GIS) can be used to add geographic data to any analysis. For example, an organisation can track the geographical location of items or staff, e.g. tracking the movement of shipping containers around the world to see production flow. This also works for courier services to see delays and delivery times in real-time.
Data Analysis Tools:
The supermarket from section 3.4 have reached stage 5 of data analysis and now need to select an appropriate data analysis tool for investigating pumpkin sales. Briefly describe each tool above and explain how the supermarket could use it.