You are learning Power Query in MS Excel
How to perform data cleansing and normalization techniques in Power Query?
Performing data cleansing and normalization techniques in Power Query allows you to prepare data for analysis by correcting errors, removing inconsistencies, and standardizing formats. Here’s how you can achieve this using Power Query:
1. Removing Duplicates
1. Remove Duplicate Rows:
- Select the column(s) containing potential duplicates.
- Go to `Home` > `Remove Rows` > `Remove Duplicates`.
- Power Query removes rows where all selected columns have identical values.
2. Handling Errors and Missing Values
1. Replace Errors:
- Use `Replace Errors` feature in Power Query to replace error values with a default value or formula.
- Select the column > `Transform` > `Replace Errors`.
2. Fill Down or Up:
- Fill down or up to propagate non-empty values in a column.
- Select the column > `Transform` > `Fill` > `Down` or `Up`.
3. Replace Values:
- Replace specific values or nulls with other values or formulas.
- Select the column > `Transform` > `Replace Values`.
3. Text and Date Manipulation
1. Text Transformations:
- Convert text to uppercase, lowercase, proper case, or extract substrings.
- Use functions like `Text.Upper`, `Text.Lower`, `Text.Middle`, etc.
2. Date Transformations:
- Convert text to date format, extract parts of a date (year, month, day).
- Use functions like `Date.FromText`, `Date.Year`, `Date.Month`, `Date.Day`, etc.
4. Standardizing Formats
1. Change Data Types:
- Convert data types to ensure consistency (e.g., text to number, text to date).
- Select the column > `Transform` > `Data Type`.
2. Trimming Whitespace:
- Remove leading or trailing spaces from text values.
- Select the column > `Transform` > `Trim`.
5. Handling Text and Numeric Conversions
1. Text to Number:
- Convert text values representing numbers to numeric data type.
- Use `Number.FromText` function.
2. Number Formatting:
- Format numbers with specific decimal places or thousand separators.
- Use `Number.Round`, `Number.ToText`, etc.
6. Data Normalization
1. Splitting Columns:
- Split columns based on delimiters (comma, space, custom delimiter).
- Select the column > `Transform` > `Split Column`.
2. Pivoting and Unpivoting Data:
- Rotate data from rows to columns (pivoting) or from columns to rows (unpivoting) to normalize data structures.
- Use `Transform` > `Pivot` or `Transform` > `Unpivot`.
7. Custom Column and Conditional Logic
1. Adding Custom Columns:
- Create new columns with calculated values or transformations based on existing data.
- Use `Add Column` > `Custom Column` and define a formula using M language.
2. Conditional Logic:
- Apply conditional logic (IF-THEN-ELSE) to derive values or flags based on specific criteria.
- Use `Add Column` > `Conditional Column`.
Example: Removing Duplicates and Normalizing Data
Suppose you have a dataset `SalesData` with columns `ProductID`, `ProductName`, and `SalesAmount`. Here’s how you can perform basic data cleansing and normalization:
- Remove Duplicates:
- Select `ProductID` and `ProductName`.
- Go to `Home` > `Remove Rows` > `Remove Duplicates`.
- Normalize Sales Amount:
- Convert `SalesAmount` to numeric data type (if needed).
- Ensure consistent formatting (e.g., round to two decimal places).
- Standardize Product Names:
- Trim whitespace, convert to proper case if needed (`Text.Trim`, `Text.Proper`).
- Date Normalization (if applicable):
- Convert date columns to proper date format using `Date.FromText` or `Date.ToText`.
By applying these techniques in Power Query, you can clean, transform, and normalize your data effectively to prepare it for further analysis and visualization in Power BI or Excel.