To design and implement a serverless data pipeline that ingests CSV files, transforms them into a clean format, and makes them available for visualization in Amazon QuickSight (Quick Suite)—all with minimal manual intervention.CSV files uploaded to Amazon S3 csv-input-bucket5 trigger a Lambda function, which launches a Glue job to clean and transform the data before saving it back to Amazon S3 csv-output9-bucket5. A manifest file then connects the processed data to Amazon QuickSight, where it can be visualized in a piechart. The result is a fully automated, scalable workflow that demonstrates cloud engineering, automation, and data visualization skills.
- Stores raw CSV files uploaded by the user
- Acts as the entry point for the pipeline
- Amazon S3 is a reginal serverless storage
Lambda Role:
- This role allows my lambda function to call Start Job Run on AWS Glue
- Without this role, Lambda can't trigger the ETL process when a new CSV is uploaded
- Lastly IAM roles are Global
Glue Role:
- Allows Glue to raed input data from Amazon S3 input bucket
- Grants permission to write output data to Aamazon S3 output bucket
-Without it Glue can't read ,transform or write data
- Triggered automatically when a new CSV file is uploade insied the Amazon S3 input bucket
- Starts the Glue job Without a manual execation
- Reads the raw CSV from S3.
- Cleans and transforms the data
- Writes the processed data back into an output S3 buckt in a CSV Format
- Stores the transformed/cleaned data
- Organized into folders for an easy access and scalability
- A json file that tells QuickSight where to find the processed data in Amazon S3
- Connects to the output Amazon S3 bucket via the manifest file
- loads the cleaned dataset
- Provides interactive dashboards,charts and reports
- Automation: Every new CSV upload triggers the pipeline automatically.
- Scalability: Can handle multiple files and scale with AWS services.
- Flexibility: Output can be JSON, CSV, or Parquet depending on needs.
- Visualization: QuickSight dashboards make insights accessible and shareable.
- Security: IAM roles and S3 access points ensure least-privilege access.