Advanced Data Cleaning: Power Query vs. Google Sheets’ REGEX

Let us show you how to use Power Query and Google Sheets’ REGEX functions for advanced data cleaning while comparing them.

Advanced Data Cleaning: Power Query vs. Google Sheets' REGEX

Data cleaning is crucial for performing data analysis and calculations and it is the first step in data preparation and formatting. Data cleaning means removing inconsistencies, errors, and unwanted formatting. Both Microsoft Excel’s Power Query and Google Sheets’ REGEX functions are powerful for advanced cleaning. Let us show you how to use Power Query and Google Sheets’ REGEX functions for advanced data cleaning while comparing them.

Excel’s Power Query

Power Query is a built-in Excel feature that provides a user-friendly interface for data transformation and data cleaning. It cleans and formats data without any complex formulas. Power Query imports data from various sources and offers robust data connectivity. It has flexible features to perform complex data-cleaning tasks.

Google Sheets’ REGEX functions

Google Sheets’ REGEX functions use regular expressions to find patterns within text data, making it ideal for parsing, formatting, or validating text entries.

  1. REGEXREPLACE: Replaces text matching a regular expression with specified text.
    • =REGEXREPLACE(text, regular_expression, replacement)
  2. REGEXMATCH: Checks if a text string matches a specified regular expression.
    • =REGEXMATCH(text, regular_expression)
  3. REGEXEXTRACT: Extracts matching substrings from text based on a regular expression.
    • =REGEXEXTRACT(text, regular_expression)

These functions are useful for data cleaning, pattern recognition, and dynamic text manipulation.

Let’s consider a raw dataset containing wrong formatting, unwanted characters, errors, blank spaces, and so on. By using Excel’s Power Query and Google Sheets’ REGEX functions we will show you the advanced data cleaning while comparing each other.

Removing Unwanted Characters

Clean a dataset of phone numbers containing unwanted characters like parentheses, dashes, or spaces.

Power Query

  • Select the data range.
  • Go to the Data tab >> select From Table/Range.
  • Power Query editor will pop up. Select the column. We selected the Phone Number column.
  • Go to the Transform tab >> select Replace Values.
  • Replace Values dialog box will pop up;
    • In the Value to Find box: Enter unwanted characters ((, ), -, ) individually.
    • In the Replace with box: Keep it empty or if you have a value to replace with you can insert that value.
    • Click OK.

Advanced Data Cleaning: Power Query vs. Google Sheets' REGEX

  • Select Close & Load to bring the cleaned data back to Excel.

Advanced Data Cleaning: Power Query vs. Google Sheets' REGEX

Google Sheets’ REGEXREPLACE Function

Let’s clean data along with formatting in Google Sheets by using the REGREPLACE function. Insert the following formula in cell G2.

=ARRAYFORMULA(IF(LEN(REGEXREPLACE(D2:D6, "[^0-9]", ""))=10,
                 "(" & MID(REGEXREPLACE(D2:D6, "[^0-9]", ""), 1, 3) & ") " &
                 MID(REGEXREPLACE(D2:D6, "[^0-9]", ""), 4, 3) & "-" &
                 MID(REGEXREPLACE(D2:D6, "[^0-9]", ""), 7, 4),
                 "Invalid"))

This formula will remove all non-numeric characters. Then, it will check if the cleaned number has exactly 10 digits. If it’s valid, format it as (XXX) XXX-XXXX; otherwise, it returns “Invalid”.

Advanced Data Cleaning: Power Query vs. Google Sheets' REGEX

Google Sheets’ REGREPLACE Function

You can use the following formula to clean unnecessary special characters.

Formula:

=REGEXREPLACE(D2, "[^a-zA-Z0-9]", "")

This [^a-zA-Z0-9] pattern removes any character that isn’t a letter or digit.

If you are familiar with regular expressions, you can use the REGREPLACE function, it removes any characters except letters or digits within a single formula.

Advanced Data Cleaning: Power Query vs. Google Sheets' REGEX

Standardizing Text Case

The dataset may contain mixed text; you can convert all text entries to lowercase or title case.

Power Query

Open Power Query by selecting the data range and navigating to the Data tab >> select From Table/Range.

  • Select the column with text to transform.
  • Go to the Transform tab >> select Capitalize Each Word.

Advanced Data Cleaning: Power Query vs. Google Sheets' REGEX

Google Sheets PROPER Function

In Google Sheets’ there isn’t a way to capitalize each word purely with regex, as Google Sheets’ REGEXREPLACE doesn’t support UPPER and LOWER functions directly within regex patterns or replacements. You can use external functions like PROPER, UPPER, or LOWER.

=PROPER(B2)

This formula ensures each word is capitalized, and any non-first letters are converted to lowercase.

Advanced Data Cleaning: Power Query vs. Google Sheets' REGEX

Splitting Data by Delimiter

You can split any merged texts or names into separate columns by delimiter. Let’s clean the data by using Power Query and Google Sheets’ REGEX functions.

Power Query

Open Power Query by selecting the data range and navigating to the Data tab >> select From Table/Range.

  • Choose the column containing names.
  • Go to Home tab >> from Split Column > select By Delimiter.

Advanced Data Cleaning: Power Query vs. Google Sheets' REGEX

  • In Split Column by Delimiter dialog box;
    • In Select and enter delimiter: choose a Space( ).
    • Click OK.

Advanced Data Cleaning: Power Query vs. Google Sheets' REGEX

  • Rename the resulting columns to “First Name” and “Last Name”.

Advanced Data Cleaning: Power Query vs. Google Sheets' REGEX

Google Sheets REGEXEXTRACT Function

You can use the REGEXEXTRACT function to split data by delimiter. Insert the following formula in your selected cell.

=REGEXEXTRACT(E9, "^([^ ]+) (.+)$")

This formula separates the last name and first name into two cells. It uses additional columns as needed.

Advanced Data Cleaning: Power Query vs. Google Sheets' REGEX

When to Use Power Query vs. Google Sheets REGEX

Power Query is the most useful for basic to advanced data cleaning. It is ideal for more extensive, structured data transformation with a user-friendly interface. It’s efficient when working with complex datasets, structured joins, and merging data.

Google Sheets REGEX functions are perfect for quick text-based operations and specific text extraction within individual cells. REGEX functions are especially helpful if you’re comfortable with regular expression syntax.

Conclusion

Power Query and Google Sheets’ REGEX functions provide complementary tools for data cleaning. Power Query is ideal for Excel users needing batch transformations on large datasets, while Google Sheets REGEX functions offer flexibility for text-based operations in smaller datasets. You can use both effectively for powerful, streamlined data cleaning! By understanding both tools, you can handle most cleaning tasks whether you’re in Excel or Google Sheets!

Get FREE Advanced Excel Exercises with Solutions!

Shamima Sultana
Shamima Sultana

Shamima Sultana, BSc, Computer Science and Engineering, East West University, Bangladesh, has been working with the ExcelDemy project for 2 years. She has written reviewed 1000+ articles for ExcelDemy. She has also led several teams with Excel VBA and Content Development works. Currently, she is working as the Project Manager and oversees the day-to-day work, leads the services team, allocates resources to the right area, etc. Her work and learning interests vary from Microsoft Office Suites, and Excel... Read Full Bio

We will be happy to hear your thoughts

Leave a reply

Advanced Excel Exercises with Solutions PDF

 

 

ExcelDemy
Logo