In this project, we took raw housing data from the Seattle area and transformed it to be more usable for analysis.
Project Details
This project focuses on cleaning and organizing data from the "NashvilleHousing" table in our database. Here's a summary of the key steps:
-
Populating Property Address Data:
- Identifying records with missing property addresses.
- Matching records with the same ParcelID but different UniqueID to update missing property addresses.
-
Breaking out Address into Individual Columns:
- Splitting the Property Address into separate columns for Address and City.
- Creating new columns (PropertySplitAddress and PropertySplitCity) to store the split data.
-
Breaking out Owner Address into Individual Columns:
- Replacing commas with periods and using PARSENAME to split Owner Address into Address, City, and State.
- Creating new columns (OwnerSplitAddress, OwnerSplitCity, and OwnerSplitState) to store the split data.
-
Changing 'Y' to 'Yes' and 'N' to 'No' in "Sold as Vacant" Field:
- Updating the "SoldAsVacant" field to standardize values as 'Yes' or 'No'.
-
Removing Duplicates:
- Using a Common Table Expression (CTE) to identify and delete duplicate records based on specific columns.
-
Deleting Unused Columns:
- Removing unnecessary columns (OwnerAddress, TaxDistrict, PropertyAddress, SaleDate).
The project aims to enhance data quality and structure, making it more suitable for analysis and reporting. These data cleaning steps ensure consistency, standardization, and elimination of redundant information in the Nashville housing dataset.