How to Ensure Proper Data Cleaning in Excel?

May 11, 2020

In Research, the emphasis is on the report writing because a good report comprehensively explains all the stages with the relevant outcome and valuable way forwards.

Reports are the product of data gathered either from secondary or primary sources and it is, therefore, very important for the data to be authentic, reliable, and up to date. In order to ensure the reliability of data, it has to be processed for omitting any error or mistake. And before we can work with our data, we need to make sure it’s valid, accurate, and reliable.

In the age of Big Data, companies may spend just as much or more on maintaining the health and cleaning their data as they spend on collecting it in the first place. Consider the issues that can stem from missing or wrong values, duplicates, and typos. The validity, accuracy, and reliability of your calculations depend on your ability to keep your data up-to-date, this is also evident from Ace Research’s projects.

To prepare data for later analysis, it is important to have a clean data table.  Depending on the origin of the data, you may need to do some of the following steps to ensure that the data are as complete and consistent as possible.

  1. Assign unique code to your fields

Unique codes are very useful while sorting and cleaning data because at any stage if trouble arises you can sort out the data from your database with the help of unique codes already assigned to the data set.

  1. Maintain separate sheets if you are working on a huge data set

Often the data is very large and you cannot work on the whole data set at the same time, so it is preferred to maintain separate files for each change you make. This helps when you refer back in case you missed anything at any step.

  1. Get rid of extra spaces

Extra spaces are painfully difficult to spot. While you may somehow spot the extra spaces between words or numbers, trailing spaces are not even visible. Here is a neat way to get rid of these extra spaces

Excel TRIM function takes the cell reference (or text) as the input. It removes leading and trailing spaces as well as the additional spaces between words (except single spaces).

  1. Select and treat all blank cells

Blank cells can create havoc if not treated beforehand. We often face issues with blank cells in a data set that is used to create reports.

You may want to fill all blank cells with ‘0’ or ‘Not Available’, or may simply want to highlight it. If there is a huge data set, doing this manually could take hours. Thankfully, there is a way you can select all the blank cells at once.

  1. Select the entire data set
  2. Press F5 (this opens the Go to dialogue box)
  3. Click on the Special button (at the bottom left).
  4. This opens the Go To Special dialogue box
  5. Select Blank and Click OK

This selects all the blank cells in your data set. If you want to enter 0 or Not Available in all these cells, just type it and press Control + Enter (remember if you press only enter, the value is inserted only in the active cell).

  1. Remove duplicates

There can be 2 things you can do with duplicate data – Highlight It or Delete It.

Highlight Duplicate Data:

Select the data and Go to Home – Conditional Formatting – Highlight Cells Rules – Duplicate Values.

Specify the formatting and all the duplicate values get highlighted.

Delete Duplicates in Data: 

  • Select the data and Go to Data – Remove Duplicates.
  • If your data has headers, ensure that the checkbox at the top right is checked.
  • Select the Column(s) from which you want to remove duplicates and click OK.

This removes duplicate values from the list.

If you want the original list intact, copy-paste the data at some other location and then do this.

  1. Highlight errors

There are 2 ways you can highlight Errors in Data in Excel:

Using Conditional Formatting

  • Select the entire data set
  • Go to Home –Conditional Formatting – New Rule
  • In New Formatting Rule Dialogue Box select ‘Format Only Cells that Contain’
  • In the Rule Description, select Errors from the drop-down
  • Set the format and click OK. This highlights any error value in the selected dataset

Using Go To Special

  • Select the entire data set
  • Press F5 (this opens the Go To Dialogue box)
  • Click on Special Button at the bottom left
  • Select Formulas and uncheck all options except Errors

This selects all the cells that have an error in it. Now you can manually highlight these, delete it, or type anything into it.

  1. Change text to lower/upper/proper case

When you import data from text files, often the names or titles are not consistent. Sometimes all the text could be in lower/upper case or it could be a mix of both. You can easily make it all consistent by using these three functions:

  • LOWER () – Converts all text into Lower Case.
  • UPPER () – Converts all text into Upper Case.
  • PROPER () – Converts all Text into Proper Case.
  1. Parse data using text to column

When you get data from a database or import it from a text file, it may happen that all the text is cramped in one cell. You can parse this text into multiple cells by using Text to Column functionality in Excel.

  • Select the data/text you want to parse
  • Go To Data –Text to Column (This opens the Text to Columns Wizard)

Step 1: Select the data type (select Delimited if your data is not equally spaced, and is separated by characters such as comma, a hyphen, dot.). Click Next

Step 2: Select Delimiter (the character that separates your data). You can select pre-defined delimiter or anything else using the other option

Step 3: Select the data format. Also, select the destination cell. If the destination cell is not selected, the current cell is overwritten.

  1. Spell check

Nothing lowers the credibility of your work than a spelling mistake.

Use the keyboard shortcut F7 to run a spell check for your data set in Excel.

  1. Delete all formatting

In my job, I used multiple databases to get the data in excel. Every database had its own data formatting. When you have all the data in place, here is how you can delete all the formatting at one go:

  • Select the data set
  • Go to Home – Clear –Clear Formats

Similarly, you can also clear only the comments, hyperlinks, or content.

  1. Use find and replace to clean data in excel

Find and replace is indispensable when it comes to data cleansing. For example, you can select and remove all zeros, change references in formulas, find and change formatting, and so on.

Digital Research ‘Lifeline for The Survival of Industries”

April 22, 2020

Effect of coronavirus on Businesses

The novel coronavirus has disrupted nearly all operations across the globe. Industries in general and business, in particular, are the victim of this virus. This virus which affects the respiratory system of the victim has to date killed many innocents globally, with highest deaths recorded in Italy i.e. around 10k.

Traveling has been put to halt and as a matter of fact all human to human interaction has been minimized as this virus is contagious. Social distancing is the only approach effective so far to mitigate the far-reaching effects of coronavirus.

Impact on businesses

As per Moody’s assessment industries that will bear the biggest brunt of this pandemic are; Textile, automotive, consumer durables, gaming, lodging/leisure and tourism, airlines, retail (non-food) and shipping. Those getting moderately affected include, beverages, chemicals, manufacturing, media, metal and mining, oil and gas, property developers, agriculture, services, steel and technology hardware. Industries that will bear the lowest impact are construction, defenses, equipment and transportation, rental, packaging, pharmaceutical, food, telecom and waste management.

Potential positive impact will be upon, internet services companies, retail online and gold mining.

Why conventional research is not an option now?

Conventional research involves field surveys and human interaction is involved but given the circumstance this cannot take place now but instead online means can be adopted effectively to conduct research. In recent stats it is being revealed that internet usage has increase significantly and people are spending more time online.

Digital research has seen a shift from conventional paper and pencil research in recent times and with ever increasing use of online platforms companies are using digital means to assess needs and preferences of their customers. It has been observed that FMCGs and Telecom are affected with less by coronavirus and this is good time or such industries to analyze the shift in demand of their customers and be prepared to meet the with maximum productivity.

How to conduct Digital Research

Online Market Research is a research method in which the data collection process is carried out over the Internet.

Online Market Research can be either Qualitative or Quantitative.  Qualitative Online Tools include Video Ethnography and Market Research Online Communities (MROCs).  Quantitative Online Methods include mobile and app surveys.

This research can evaluate the performance of a product or service and may allow companies to glean insight into consumer purchasing behavior. With the rising use of the Internet, digital research has become a popular tool among market research firms.

Digital research can provide additional information about a buyer, such as her prior purchasing history. Digital research projects can be carried out by a company itself or by a hired research firm. There are several ways that may be effective for carrying out digital research. Quantitative research can be carried out via online questionnaires and web-based experiments.  Qualitative research can be carried out via online in-depth interviews, online focus groups and participant observation, in which a researcher acts as a part of a community to observe behaviors.

Online questionnaires and online polls are some of the most popular digital research tools. Online questionnaires may need to be carefully designed in terms of format and length. Some of the key digital research models Ace Research offers include;

1. Brand Heath tracking

Brand Analysis: Brand research has similar profiling features (“Who uses this brand?”) and also aims at identifying the reasons for brand loyalty or fickleness.

Scanner Research: Scanner research uses checkout counter scans of transactions to develop patterns for all manner of end uses, including stocking, of course. From a marketing point of view, scans can also help users track the success of coupons and to establish linkages between products.

2. Consumer research

Audience Research: Audience research is aimed at discovering who is listening, watching, or reading radio, TV, and print media respectively. Such studies in part profile the audience and in part determine the popularity of the medium or portions of it.

Product Research: Product tests, of course, directly relate to use of the product. Good examples taste tests used to pick the most popular flavors-; and consumer tests of vehicle or device prototypes to uncover problematical features or designs.

Psychological Profiling: Psychological profiling aims at construction profiles of customers by temperament, lifestyle, income, and other factors and tying such types to consumption patterns and media patronage.

Database Research: Also known as database “mining,” this form of research attempts to exploit all kinds of data on hand on customers-; which frequently have other revealing aspects. Purchase records, for example, can reveal the buying habits of different income groups-;the income classification of accounts taking place by census tract matching. Data on average income by census tract can be obtained from the Bureau of the Census.

3. Pre/post campaign evaluation

Post-sale or Consumer Satisfaction Research: Post-consumer surveys are familiar to many consumers from telephone calls that follow having a car serviced or calling help-lines for computer- or Internet-related problems. In part such surveys are intended to determine if the customer was satisfied. In part this additional attention is intended also to build good will and word-of-mouth advertising for the service provider.

4. Monitoring and evaluation

M&E can be conducted using a wide array of tools, methods and approaches. These include, for example: performance monitoring indicators; the logical framework; theory-based evaluation; formal surveys such as service delivery surveys, citizen report cards, living standards measurement surveys (LSMS) and core welfare indicators questionnaires (CWIQ); rapid appraisal methods such as key informant interviews, focus group discussions and facilitated brainstorming by staff and officials; participatory methods such as participatory M&E; public expenditure tracking surveys; rigorous impact evaluation; and cost-benefit and cost-effectiveness analysis. With the aid of digital research all these facets of M&E can be conducted online.

5. Online Discussions (Focus Group and In-depth)

Another common practice for online surveys is the use of online panels. An online panel is a group of selected individuals that have agreed to participate in digital research projects for a particular company at specific intervals over a period of time. These participants are selected through a screening process according to their demographics, lifestyles and habits, and are usually rewarded for their efforts by the research company regularly. Online panels may allow companies to glean insight into creating long-term relationships with their customers.  These panels may also allow customers to give direct feedback about products and services without the potential reluctance that may occur in face-to-face interactions.  Online panels may also mitigate bias caused by peer pressure to agree on a certain viewpoint, a phenomena that may occur in face-to-face panels.

Benefits of Digital Research

Amid the outbreak of this virus, businesses are widely affected and we all hope for better results with least damage in coming days to people and businesses alike. Nonetheless, once things get settle down the businesses have already borne the wrath of this virus and revenues are dwindling. So in these circumstances it is crucial for companies to closely monitor their clients and consumers so after this virus settles down, a strategy based on factual research can be devised immediately to recover the losses effectively.

To sum up, we have following advantages of conducting digital research.

  • Cost advantages
  • Speed advantages
  • Data collection in real-time
  • Advanced analytics
  • Efficient global and multi-country survey management

Conducting digital research can be a complex procedure and may require considerable expertise on the part of researchers in obtaining accurate data.  It may be challenging to recruit participants in digital research for several reasons.  Recipients may be reluctant to participate in digital research because they may be afraid that the privacy and confidentiality of their personal information may be violated.  Since the identity of the researcher cannot be verified completely, people may find it difficult to trust such research methods.  Researchers often present participants with some monetary or non-monetary rewards for their participation