Portfolio - Data Cleaning with SQL

Project Details

This project focuses on cleaning and organizing data from the "NashvilleHousing" table in our database. Here's a summary of the key steps:

Populating Property Address Data:
- Identifying records with missing property addresses.
- Matching records with the same ParcelID but different UniqueID to update missing property addresses.
Breaking out Address into Individual Columns:
- Splitting the Property Address into separate columns for Address and City.
- Creating new columns (PropertySplitAddress and PropertySplitCity) to store the split data.
Breaking out Owner Address into Individual Columns:
- Replacing commas with periods and using PARSENAME to split Owner Address into Address, City, and State.
- Creating new columns (OwnerSplitAddress, OwnerSplitCity, and OwnerSplitState) to store the split data.
Changing 'Y' to 'Yes' and 'N' to 'No' in "Sold as Vacant" Field:
- Updating the "SoldAsVacant" field to standardize values as 'Yes' or 'No'.
Removing Duplicates:
- Using a Common Table Expression (CTE) to identify and delete duplicate records based on specific columns.
Deleting Unused Columns:
- Removing unnecessary columns (OwnerAddress, TaxDistrict, PropertyAddress, SaleDate).

The project aims to enhance data quality and structure, making it more suitable for analysis and reporting. These data cleaning steps ensure consistency, standardization, and elimination of redundant information in the Nashville housing dataset.

Use case:

In a real-world scenario, this cleaned and organized dataset could be used by real estate professionals, analysts, or researchers for various purposes such as market analysis, trend prediction, and decision-making. It could also serve as a foundation for building machine learning models or integrating with other datasets for comprehensive analysis. The project essentially contributes to data quality, reliability, and usability, which are crucial aspects in any data-driven application or business context.

Data Cleaning with SQL

Project Details

Use case:

Email

Github

Location