Workshop
Case 5: A Simple Data Cleaning Workflow

A Simple Data Cleaning Workflow


🟡 Demo:

NameAgeEmailSignup Date
John Doe29john.doe[at]example.com2023-04-15
Jane Smithjanesmithexample.com2023/05/02
Alice Johnson35alice.j@example.com15-06-2022
Bob Leetwenty-eightbob.lee@example2021-11-30

In this case, We have problems:

  • Missing values (e.g., Jane's age is missing)
  • Inconsistent date formats (e.g., 2023/05/02, 15-06-2022)
  • Type errors (e.g., age is given as "twenty-eight")

1. Create a block to input your data

data1

2. Chunk your input

data2

3. Using LLMs to normalize and unify the data formats.

data3

After data cleaning, we obtained the following table:

NameAgeEmailSignup Date
John Doe29john.doe@example.com2023/04/15
Jane Smithjanesmith@example.com2023/05/02
Alice Johnson35alicej@example.com2022/06/15
Bob Lee28boblee@example.com2021/11/30