Perform data exploration, data cleaning, data imputation, and feature engineering on unstructured and structured data.
Build the infrastructure for optimal extraction, transformation, and loading (ETL) of data from a wide variety of data sources.
Develop and maintain optimal data pipeline architecture for training statistical and machine learning models such as regression and classification.
Develop and maintain evaluations to measure the effectiveness of training data. This includes measuring the capabilities of models on a variety of tasks and domains.
Collaborate with data scientists and machine learning engineers to develop a comprehensive data science/machine learning solution pipeline.
Requirements
Bachelor's degree in computer science or related fields, or equivalent software engineering experience.
Proficiency in Python programming language
Experience in dataset processing and feature engineering using tools such as Numpy, Pandas, and Scikit-Learn
Visualization skills using tools such as Matplotlib, Seaborn, and Bokeh
Understanding of deep learning frameworks such as PyTorch and TensorFlow