Login | Register

Intermediate Data Scientist to identify data quality issues with Machine Learning, Python and Big Data Experience- 26075

Job Type: Contract
Positions to fill: 1
Start Date: May 24, 2022
Job End Date: Oct 31, 2022
Pay Rate: Hourly: Negotiable
Job ID: 119084
Location: Toronto
Apply
Intermediate Data Scientist to identify data quality issues with Machine Learning, Python and Big Data Experience- 26075

Contract Duration: Until October 31st (High Chance of Extension)
Location: Downtown Toronto- Hybrid (Candidate must be willing to be on site)
Hours: 37.5 Hours 

Project:
The Data Management and Quality team is responsible for Enterprise Data Quality Governance and Issue Management activities throughout the Enterprise. Our mandate is to enable and establish a culture of Data Governance by continued measurement, monitoring and controls applied throughout the data lifecycle. The Data Quality Innovation role is responsible for developing, executing and enhancing the existing data quality framework through the use of advanced analytics in line with the business priorities. The incumbent will work closely with lines of business, data stewards and internal stakeholders to drive business value through the identification of data quality issues using AI/ ML approaches.

Responsibilities:
  • Work with large data sets, experience working with distributed computing (MapReduce, Hadoop, Hive, Apache Spark, etc.)
  • Profile and analyze source data to identify opportunities for data quality interventions
  • Work with business stakeholders to understand problem statement and develop machine learning algorithms and prototype them for execution in Hadoop and cloud environment
  • Work with data engineers to brainstorm solutions to problems and support others in their goals
  • Collaborate with data engineers to deploy production scale solutions
  • Use sound agile development practices (code reviews, testing, etc.) to develop and deliver data products
Must Have Skills:
  • 5+ years of experience experience as a Data Scientist with knowledge of general Machine Learning concepts 
  • Proficient in a variety of languages: Python, R, Scala, Java. Preferred Python
  • 3+ years of experience with big data technology (Hadoop, Hive, Spark, Kafka, etc.) Spark & Hive are preferred
  • Strong proficiency in SQL
  • 2+ years of experience and understanding of Industry Data Quality process and practices
Nice to have:
  • Familiarity with container environments: Docker, Openshift
  • Familiarity with DevOps processes, pipelines, and tooling
Education:
  • Degree in Engineering, Computer Science or Mathematics/Statistics