[python]Multithreading to scan for port status version 2

Background
this is a follow up of this post, on that previous post, threading printed the results however i had problems with getting the returned value from the function which the thread was running, lama4ok suggested in his comment to use concurrent.futures, well his suggestion does work, so now I can get the actual returned result from the function which the thread was executing.

Code modification
This section presents the modification of the new codes. Although the return values were obtained successfully, speed is still a problem, the scripts collect a set of subnets from multiple interfaces of a laptop, then iterate the subnet which in turn iterate the host address from the subnet, and one by one the range of ports are “scanned”.

network_discovery.py
On previous post the network_discovery.py only worked if all interfaces have ip addresses, this code i put checks to ensure valid ipv4 addresses are assigned.

import netifaces
from ipaddress import IPv4Interface
import re

pattern = r'127\.\d+\.\d+\.\d+/\d+'
loopback_regex = re.compile(pattern)

def collect_interface_ip_addresses(interface):
    if netifaces.ifaddresses(interface).get(2):
        return {'intf_id': interface,
                'ip_address': netifaces.ifaddresses(interface).get(2)[0]['addr'],
                'netmask': netifaces.ifaddresses(interface).get(2)[0]['netmask']}


def netmask_to_cidr(netmask):
    return (sum(bin(int(octet)).count('1') for octet in netmask.split('.')))


def get_host_network():
    host_collection = []
    host_network_collection = []
    for interface in netifaces.interfaces():
        host_collection.append(collect_interface_ip_addresses(interface))
    for host in host_collection:
        if host:
            host['netmask'] = str(netmask_to_cidr(host['netmask']))
            if not loopback_regex.match(host['ip_address'] + "/" + host['netmask']):
                host_network_collection.append(str(IPv4Interface(host['ip_address'] + "/" + host['netmask']).network))
    return host_network_collection

scanner.py
I ditched the threading.Thread class, and use the concurrent.futures
There is a good reference on how to use the concurrent.futures with as_completed and submit within list comprehension here.

from socket import *
from concurrent.futures import as_completed, ThreadPoolExecutor
from network_discovery import get_host_network
from ipaddress import IPv4Network


hosts = []
setdefaulttimeout(2)
def connect_tcp_host(host, port):
    with socket(AF_INET, SOCK_STREAM) as sock:
        if not sock.connect_ex((host, port)):
            #print(f"{host} {port}/tcp open")
            return port


def start_scan(minport, maxport):
    with ThreadPoolExecutor(max_workers=10) as ex:
        for subnet in get_host_network():
            for addr in IPv4Network(subnet).hosts():
                port_list = []
                futures = [ex.submit(connect_tcp_host, str(addr), port) for port in range(minport, maxport)]
                for future in as_completed(futures):
                    print(f"{addr}: {future.result()}")
                    if future.result():
                        port_list.append(future.result())
                hosts.append(
                    {
                    'host': str(addr),
                            'ports': [port for port in port_list]
                              }
                             )

    return [sorted_host for sorted_host in hosts if sorted_host.get('ports')]

start_now.py
This start_now.py is to test the functions written above.

from scanner import  start_scan
from time import time
from pprint import pprint


start_time = time()
if __name__ == '__main__':
    pprint(start_scan(21, 25))

5 thoughts on “[python]Multithreading to scan for port status version 2

  1. I’ve used ThreadpPoolExecutor to grab data from multiple devices via NETCONF and after the number of devices exceeded 90 I got into speed troubles. My app just could not grab data as fast as devices send it to me.
    I’ve found out that this is because of GIL. Python interpreter just could not switch between threads as fast as I need to do it.
    The solution is to use Processes via ProcessPoolExecutor from concurrent.futures. It has absolutely the same API as ThreadPoolExecutor. I guess this may be a solution for you too.

    1. You are right I will have to use ProcessPoolExecutor, the GIL is a good thing to have else we will have unsynchronized data, this GIL is something like spanning tree protocol with it it gives limitation without it it gives disaster.. lol.
      I see you are doing network automation, do you use ansible?

      1. Yes. I use Ansible for deploying KVM VMs (Juniper vRR) on Ubuntu hosts (somewhat ZTP) and for configuring them via NETCONF (netconf_config Ansible module).
        I like Ansible for doing parallel tasks for me and for its pretty flexible inventory and tags system but in general, I prefer doing pure Python because sometimes I found myself trying to solve Ansible issues instead of my production issues =)

      2. Actually this is the reason why i want to learn writing threading, if i do not know how this is scripted i will need to rely on very high level modules like ansible. I am going to use ansible, just that i am thinking if it is worth writing logics for using ansible modules or i just need to write templates and use jinja2 to fill up the blanks and use ansible-playbook to run the automation through command line… do you have samples for doing logics with ansible modules?

Leave a reply to cyruslab Cancel reply