A Simple Data Cleaning Workflow
🟡 Demo:
Name | Age | Signup Date | |
---|---|---|---|
John Doe | 29 | john.doe[at]example.com | 2023-04-15 |
Jane Smith | janesmithexample.com | 2023/05/02 | |
Alice Johnson | 35 | alice.j@example.com | 15-06-2022 |
Bob Lee | twenty-eight | bob.lee@example | 2021-11-30 |
In this case, We have problems:
- Missing values (e.g., Jane's age is missing)
- Inconsistent date formats (e.g., 2023/05/02, 15-06-2022)
- Type errors (e.g., age is given as "twenty-eight")
1. Create a block to input your data

2. Chunk your input

3. Using LLMs to normalize and unify the data formats.

After data cleaning, we obtained the following table:
Name | Age | Signup Date | |
---|---|---|---|
John Doe | 29 | john.doe@example.com | 2023/04/15 |
Jane Smith | janesmith@example.com | 2023/05/02 | |
Alice Johnson | 35 | alicej@example.com | 2022/06/15 |
Bob Lee | 28 | boblee@example.com | 2021/11/30 |