Login | Register

Int. Spark Developer to clean transform and analyze raw data – 2577-1

Job Type: Contract
Positions to fill: 1
Start Date: Aug 01, 2022
Job End Date: Nov 01, 2022
Pay Rate: Hourly: Negotiable
Job ID: 122052
Location: Toronto
Int. Spark Developer to clean transform and analyze raw data – 2577-1

Location: Toronto (REMOTE)
Duration: 3 months (possible extension)

The ideal candidate is a problem-solver who can thrive in an environment with a complex, diverse and rapidly expanding data infrastructure. You’ll know how to fully exploit the potential of Spark while possessing a broad familiarity of the available APIs. You will clean, transform, and analyze vast amounts of raw data from various systems using Spark to provide ready-to-use data to our Data Scientists. You’ll need to be comfortable managing multiple deliverables and communicating results to management as your efforts will directly impact the data science decision making process and the planning of strategic initiatives. The role will focus on extending the existing framework to leverage novel large-scale datasets and various distributed technologies to support statistical model development that will aid in optimization of business operations.

Must Haves:

· 3-5 years of technical experience working with large data sets.
· 3-5 years of experience working in distributed compute/storage environments including the development/maintenance of streaming applications (Apache Kafka preferred).
· 3-5 years of experience working with Python and PySpark with a focus on large scale data warehousing, data integration and/or development experience.
· 3-5 years of experience with Apache Spark with an emphasis on Spark query tuning and performance optimization.
· Demonstrated use of the Spark APIs (Spark 2.x/3.x) including RDDs, SQL DataFrames, MLib, GraphX and Streaming.
· Experience working with distributed file systems (HDFS, S3, etc.).
· AWS cloud services experience is an asset (EMR, EC2, etc.).

Nice to Have:

· Experience in relational database logical modeling (Oracle, Postgres, Snowflake, Teradata) and integration into a distributed workflows based on Spark.
· Background in software engineering with strong skills in parallel data processing, data flows, REST APIs, etc.
· Strong knowledge and hands-on experience authoring and auditing advanced SQL and shell scripts.