Senior Software Engineer - Scraping/Web Crawling Platform
project44
This job is no longer accepting applications
See open jobs at project44.See open jobs similar to "Senior Software Engineer - Scraping/Web Crawling Platform" Emergence.Why project44?
At project44 we’re on a mission - to make supply chains work.
project44 optimizes the movement of products globally, delivering better resiliency, sustainability, and value for our customers. As the supply chain connective tissue, we operate the most trusted end-to-end visibility platform that tracks more than 1 billion shipments annually for the world’s leading brands. The undisputed leader in the market, project44 was named the Leader in the Gartner Magic Quadrant, #1 in FreightWaves’ FreightTech 25, and the Customer’s Choice in Gartner Peer Insights’ Voice of the Customer report. project44 is headquartered in Chicago with a diverse and fast-growing, global workforce.
If you’re eager to be part of a winning team that works together to solve some of the most challenging supply chain challenges every day, let’s talk.
We're looking for a Staff Engineer who has experience as being a technical lead and deliver projects with a team of 5-7 engineers. You will work directly with engineering leadership to shape the future of our technology stack and products, collaborating with product managers, product designers, data scientists and other engineering teams. You will be a thought leader designing, building, and implementing our best-in-class integrations platform with a strong focus on accelerating how project44 connects to the world’s logistics networks.
Responsibilities:
- Capture of massive data on the web and mobile terminals, and the design of architectures such as extraction, deduplication, classification, clustering, and filtering;
- Design and development of distributed web crawlers, able to independently solve various problems encountered in the actual development process;
- Research and development of web page information extraction technology algorithms to improve the efficiency and quality of data capture;
- Analysis and warehousing of crawled data, monitoring of the crawler system and abnormal alarms;
- Designing and developing data collection strategies and anti-shielding rules to improve the efficiency and quality of data collection;
- Design and development of core algorithms according to the system data processing flow and business function requirements;
- Own the development of these tools, services, and workflows to enhance data management, crawl/scrape analysis, reports, and workflows
- Control the testing of the data and the scraping to guarantee compliance, quality, and accuracy
- Monitor the procedure to detect and address any problems with breaks and scale scrapes as necessary
- Create systems for handling large and unstructured data while developing a regulatory update tool for legal clients
- Create a tool that gathers information on regulatory updates for legal clients by using scraping bots on websites, especially regulatory websites
Qualifications
- Proficient in Python language, familiar with one or more of the commonly used crawler frameworks, such as Scrapy framework or other Web scraping frameworks, with independent development experience
- Have 10+ years of experience and 7+ years of experience working with WebScrape, Crawlers, and Data Extraction.
- Familiar with vertical search crawlers and distributed web crawlers, deeply understand the principles of web crawlers, have rich experience in data crawling, parsing, cleaning, and storage related projects, and master anti-crawler technology and breakthrough solutions.
- Familiar with common data storage and various data processing technologies are preferred
- A solid foundation in data structure and algorithms is preferred
- Experience in distributed crawler architecture design, IP farms and proxy is preferred
- Familiar with commonly used frameworks such as ssh, multi-threading, network communication programming related knowledge.
- Mentoring experience of 2-3 engineers is preferred.
Diversity & Inclusion
At project44, we're designing the future of how the world moves and is connected through trade and global supply chains. As we work to deliver a truly world-class product and experience, we are also intentionally building teams that reflect the unique communities we serve. We’re focused on creating a company where all team members can bring their authentic selves to work every day.
We’re building a company that every one of us at project44 is proud to work for, and our journey of becoming a more diverse, equitable and inclusive organization, where all have a sense of belonging, is shaped through the actions of our leadership, global teams and individual team members. We are resolute in our belief that each team member has an equal responsibility to mold and uphold our culture.
project44 is an equal opportunity employer seeking to enrich our work environment by creating opportunities for individuals of all backgrounds and experiences to thrive. If you share our values and our passion for helping the way the world moves, we’d love to review your application!
This job is no longer accepting applications
See open jobs at project44.See open jobs similar to "Senior Software Engineer - Scraping/Web Crawling Platform" Emergence.