How to Ensure Proper Data Cleaning in Excel?

May 11, 2020

In Research, the emphasis is on the report writing because a good report comprehensively explains all the stages with the relevant outcome and valuable way forwards.

Reports are the product of data gathered either from secondary or primary sources and it is, therefore, very important for the data to be authentic, reliable, and up to date. In order to ensure the reliability of data, it has to be processed for omitting any error or mistake. And before we can work with our data, we need to make sure it’s valid, accurate, and reliable.

In the age of Big Data, companies may spend just as much or more on maintaining the health and cleaning their data as they spend on collecting it in the first place. Consider the issues that can stem from missing or wrong values, duplicates, and typos. The validity, accuracy, and reliability of your calculations depend on your ability to keep your data up-to-date, this is also evident from Ace Research’s projects.

To prepare data for later analysis, it is important to have a clean data table.  Depending on the origin of the data, you may need to do some of the following steps to ensure that the data are as complete and consistent as possible.

  1. Assign unique code to your fields

Unique codes are very useful while sorting and cleaning data because at any stage if trouble arises you can sort out the data from your database with the help of unique codes already assigned to the data set.

  1. Maintain separate sheets if you are working on a huge data set

Often the data is very large and you cannot work on the whole data set at the same time, so it is preferred to maintain separate files for each change you make. This helps when you refer back in case you missed anything at any step.

  1. Get rid of extra spaces

Extra spaces are painfully difficult to spot. While you may somehow spot the extra spaces between words or numbers, trailing spaces are not even visible. Here is a neat way to get rid of these extra spaces

Excel TRIM function takes the cell reference (or text) as the input. It removes leading and trailing spaces as well as the additional spaces between words (except single spaces).

  1. Select and treat all blank cells

Blank cells can create havoc if not treated beforehand. We often face issues with blank cells in a data set that is used to create reports.

You may want to fill all blank cells with ‘0’ or ‘Not Available’, or may simply want to highlight it. If there is a huge data set, doing this manually could take hours. Thankfully, there is a way you can select all the blank cells at once.

  1. Select the entire data set
  2. Press F5 (this opens the Go to dialogue box)
  3. Click on the Special button (at the bottom left).
  4. This opens the Go To Special dialogue box
  5. Select Blank and Click OK

This selects all the blank cells in your data set. If you want to enter 0 or Not Available in all these cells, just type it and press Control + Enter (remember if you press only enter, the value is inserted only in the active cell).

  1. Remove duplicates

There can be 2 things you can do with duplicate data – Highlight It or Delete It.

Highlight Duplicate Data:

Select the data and Go to Home – Conditional Formatting – Highlight Cells Rules – Duplicate Values.

Specify the formatting and all the duplicate values get highlighted.

Delete Duplicates in Data: 

  • Select the data and Go to Data – Remove Duplicates.
  • If your data has headers, ensure that the checkbox at the top right is checked.
  • Select the Column(s) from which you want to remove duplicates and click OK.

This removes duplicate values from the list.

If you want the original list intact, copy-paste the data at some other location and then do this.

  1. Highlight errors

There are 2 ways you can highlight Errors in Data in Excel:

Using Conditional Formatting

  • Select the entire data set
  • Go to Home –Conditional Formatting – New Rule
  • In New Formatting Rule Dialogue Box select ‘Format Only Cells that Contain’
  • In the Rule Description, select Errors from the drop-down
  • Set the format and click OK. This highlights any error value in the selected dataset

Using Go To Special

  • Select the entire data set
  • Press F5 (this opens the Go To Dialogue box)
  • Click on Special Button at the bottom left
  • Select Formulas and uncheck all options except Errors

This selects all the cells that have an error in it. Now you can manually highlight these, delete it, or type anything into it.

  1. Change text to lower/upper/proper case

When you import data from text files, often the names or titles are not consistent. Sometimes all the text could be in lower/upper case or it could be a mix of both. You can easily make it all consistent by using these three functions:

  • LOWER () – Converts all text into Lower Case.
  • UPPER () – Converts all text into Upper Case.
  • PROPER () – Converts all Text into Proper Case.
  1. Parse data using text to column

When you get data from a database or import it from a text file, it may happen that all the text is cramped in one cell. You can parse this text into multiple cells by using Text to Column functionality in Excel.

  • Select the data/text you want to parse
  • Go To Data –Text to Column (This opens the Text to Columns Wizard)

Step 1: Select the data type (select Delimited if your data is not equally spaced, and is separated by characters such as comma, a hyphen, dot.). Click Next

Step 2: Select Delimiter (the character that separates your data). You can select pre-defined delimiter or anything else using the other option

Step 3: Select the data format. Also, select the destination cell. If the destination cell is not selected, the current cell is overwritten.

  1. Spell check

Nothing lowers the credibility of your work than a spelling mistake.

Use the keyboard shortcut F7 to run a spell check for your data set in Excel.

  1. Delete all formatting

In my job, I used multiple databases to get the data in excel. Every database had its own data formatting. When you have all the data in place, here is how you can delete all the formatting at one go:

  • Select the data set
  • Go to Home – Clear –Clear Formats

Similarly, you can also clear only the comments, hyperlinks, or content.

  1. Use find and replace to clean data in excel

Find and replace is indispensable when it comes to data cleansing. For example, you can select and remove all zeros, change references in formulas, find and change formatting, and so on.

Steps In Report Writing – Market Research

April 30, 2020

Report Writing involves many steps but in this blog areas which are very important in every report will be explained. Report Writing normally contains headings such as Introduction, details about secondary Research and/or Primary Research, Analysis based on any of the two research types i.e. qualitative and quantitative followed by recommendation and Conclusion.

The introduction of the report provides an overview of the purpose of conducting any research and also its objectives.

This is followed by Methodology i.e. secondary research, primary research, or a mix of both. Simultaneously, research can be quantitative or qualitative. Secondary research can be quantified using simple statistical operations. Whereas Primary research is based upon qualitative attributes seeking opinions, views, and ideas from respondents. Each has its own way of interpretation.

In this blog, you can find details on how to prepare a very comprehensive report with a step by step guide on Quantitative and Qualitative Research Report Writing as followed by Ace Research.

Quantitative Research

Secondary Research      

  • Make Headings/Sections

If the data has to come from secondary sources make sure you make proper heading and sections for each category of secondary data. Secondary research contains data that is obtained from online available information such as, previous reports, scientific journals, online articles or organizational databases.

  • Search Relevant and Up to date Data

Secondary data should not be more than two years old, except in those cases where data is not available older data not more than 5 years can be cited/ used for reference. Too old data may net help in generalizability and cannot be used to infer outcomes.

  • Proper Referencing is important

Always maintain the record of sources from where you acquired the secondary information. Use authentic sources and double check the validity of data across different data sources if available. It is recommended to use APA format for referencing.

Primary Data

  • Compile Data in Excel

In quantitative research you compile all data in excel, either it is online or paper based. Remember when designing a questionnaire you will have to maintain logic against each question and answer. The proper scale should be adopted for quantification of responses, otherwise if systematically not designed, after gathering all the data you might struggle as to how to arrange the data for extracting meaningful information or cross analysis.

Data Cleaning

Data cleaning comes after when you have exported and compiled data in Excel. When you get the data it might have human errors although in the training phase emphasis is placed to ensure minimum errors in data.

To clear the errors you must have to check all the responses against any spelling/grammatical mistakes, numerical errors, wrong responses, blank responses etc.

  • Adjust Blank Responses

Blank’s data/incomplete data have to be assessed critically. If you have lots of blanks in any question you can either discard them if the information provided is not very meaningful. Similarly, in case of a few blanks questions you can depict the trend of responses by responding and it can be filled accordingly. Remember, this holds true only if trends can help you in filling those blanks otherwise state N/A.

  • Make Individual Excel sheets

For each question in the questionnaire you can create separate sheets as it helps to make analysis easier. This will be done after you have completely cleaned the data.

  • Maintain Separate Files for Complex Questions

In case some questions are complex or have multiple layers of response you may analyze them separately. This is helpful because if you made too many sheets in a single file, analysis can become very difficult.

  • Use a pivot table in Excel

A pivot table is best for quick analysis. If you have questions in which you have to analyze the trend between two variables uses a simple pivot table. You can use a variety of charts to analyze your question. For this you must have Excel 2013 or latest.

  • Make Graphs

Graphs are really helpful in analyzing the situation and if you have made graphs it is suggested to use 3d bars with percentages so you can reflect the assessment in a precise and clear manner.

  • Highlight Information in Graphs

For clarity, you can highlight important information in the Graphs. For instance, the highest bar in the histogram can be highlighted.

  • Copy Graphs in Ms. Word

Once you have run all the operations in excel, now it’s time to embed the workings in the report. For this, you will copy all the graphs into word files with proper labels and headings.

  • Comments on Each Graph

For each graph, you can put comments to explain your analysis.

Qualitative Research

Secondary Research      

Follow the same steps as described in Quantitative analysis.

Primary Research

  • Transcripts (IDIs / FGDs)

Make headings and sections for transcript, this will serve as an outline for transcribing recorded interviews.

Once the interview is conducted you must draft a document in which sections should be maintained for each of the discussion areas and after this, each response from respondents i.e. either a single respondent in case of IDI or group of respondents in FGD should be carefully written adhering to the sequence of discussion.

  • Highlighting Main points

Make a table that consists of very brief and relevant response against each question by respondent. Once you have transcribed all the recordings now you can assess them for the individual’s response. For IDI’s it is easier as there is only one respondent but for FGD as there are many respondents you can make a grid and record each respondents response against questions asked. Remember in this stage you are highlighting important points only.

  • Extract findings and recommendations

While extracting important points from a transcripts highlight relevant findings and recommendations, maintain heading for both at the end of report

As you have prepared a transcript for the whole session and also highlighted each individual’s response in case of FGD now you can extract all relevant findings and recommendations from all discussions for Report Writing.

Final Report Glimpse

  1. Make the content outline
  2. Enter data (Follow the Aforementioned sequence)
  3. After Analysis is performed and Final Draft Prepared, review thoroughly for spelling mistakes, grammatical mistakes or/and any contradictory statements.
  4. Revisit Deliverables of the report and match with the content of the report before submission to the client.