Converting a list of phone numbers from range format to prefix format.

In the fall of 2014 I got a task for complete redesign of telephony accounting system at my former work.
One of the stages was automatically parse price-lists of uplink operators. Those prices usually were Excel spreadsheets that consisted of rows where each row represents a bunch of call directions that have the same price.
But price-lists might be of different types.
First type – where the group of direction is indicated by it’s prefix:

DEST COUNTRY CODE ROLLUP RATE COMMENTS EFFECTIVED
Great Britain (premium services) 44 9 0.4734 No Change 12.07.2014

Second type – where a group of directions is indicated by a range, as the beginning and the ending phone numbers of those group:

DEST COUNTRY CODE ROLLUP RATE COMMENTS EFFECTIVED
Great Britain (mob) 44 71-75, 77-79 12.7068 Decrease 12.07.2014

Furthermore, there are (they really are!) price-lists where both mentioned types are combined:

DEST COUNTRY CODE ROLLUP RATE COMMENTS EFFECTIVED
Great Britain (mob) – Hutchison 3G 44 7400-7403, 7411-7414, 74170, 7426-7429, 7915-7916, 7988 0.9637 Increase 12.07.2014

We had the restriction that our accounting system should use prefixes so I faced the problem of converting range form directions to prefix form.
I am sorry to mention this, but despite the fact that I studied physics and mathematics at the University – now I’m very bad in mathematics. I don’t know why. It may be the peculiar trait of my brain, or it maybe some other reason. So I couldn’t create algorithm for such conversion by myself in a reasonable time.
Anyway, I was lucky and found this Blog of Ahmad Rifky, the guy from the telecommunication business, who already solved this problem.
In his blog he clearly explained the problem and the way for its solution.
I took the Perl script, he provided, and made some modifications.
I also left reply on his blog with modified version of the script, but there were some errors and lack of indentation.
So I am posting the modified script here.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
#! /usr/bin/perl -CS
 
use strict;
use utf8;
use Getopt::Long;
use Data::Dumper;
 
my $usage = "
This script converts the range of numbers into the list of prefixes.
The range can be described by it's beginning and ending numbers,
or by it's beginning number and capacity (length).
Usage: rangetopref.pl [options]
Options:
--beg   <ther first number of the range>  required
--end   <the last number of the range>    optional
--cap   <the capacity of the range>       optional
 
The --beg option is reqired, but the end of the range
can be sat either by it's last number or it's capacity.
";
 
 
my ($range_beg, $range_end, $range_capacity) = ();
 
GetOptions ("beg=i" => \$range_beg,
            "end=i" => \$range_end,
            "cap=i" => \$range_capacity) or die("$usage\n");
 
 
  my @prefix_list   = ();
  my @tmp_list      = ();
 
 
  s/\D//g foreach( $range_beg, $range_end, $range_capacity );
 
  die "ERROR. You must provide the initial number of the range\n$usage"
    unless $range_beg;
  die "ERROR. You must provide the last number or capacity of the range\n$usage"
    if !$range_end and !$range_capacity;
 
 
  unless( $range_capacity ){
    $range_capacity = $range_end - $range_beg + 1;
  }
 
  foreach (0..$range_capacity-1){
    push @prefix_list, $range_beg + $_;
  }
 
  (my $exponent =  () = ($range_capacity =~ /\d/g))--;
 
  foreach my $exp (reverse ( 1..$exponent )){
 
      my $comp_length = 10 ** $exp;
 
      while( @prefix_list >= $comp_length ){
 
        my $subrange_bad    =
            ( $prefix_list[$comp_length-1] - $prefix_list[0] + 1 ) % $comp_length;
        my $first_elem_bad  =
            $prefix_list[0] % $comp_length;
 
        if ( !$subrange_bad and !$first_elem_bad ) {
          $prefix_list[0] = $prefix_list[0] / $comp_length ;
          push @tmp_list, $prefix_list[0];
          splice (@prefix_list, 0, $comp_length);
        }
        else{
          push @tmp_list, $prefix_list[0];
          shift @prefix_list;
        }
      }
      push @tmp_list, @prefix_list;
      $#prefix_list     = -1;
      @prefix_list      = @tmp_list;
      $#tmp_list        = -1;
 
  }
 
  $_ + 0 foreach @prefix_list;
 
  print Dumper(\@prefix_list);

Now, I’ll try to explain my version of what happens here line by line 🙂

The algorithm:
– we take our range and convert it to the (maybe huge!) list of the sequential numbers.
– by the count of digits in the number, which represents range (or list) size (range_capacity) we assume the greatest round number that can be used as rollup.
– we move from the first to the last number of the list and trying to compactify it’s to rollup-sized sublists.

str. 1-41
Common script beginning. Reading commandline parameters, checking values.
str. 42-44
Calculating range capacity, because internally we use the capacity of the range instead of the last number.
str. 46-48
Unwrapping our range to a list of individual numbers (prefixes).
str. 50
calculating the number of digits in the range capacity to find the biggest round number (power of 10) for compactification or the biggest subarray’s length for searching.
For example, if we have range length of 1000 numbers, the exponent would be 3 so the first length to search would be 10 ** 3 = 1000. That means we would try to roll up the whole range in our first attempt. If we not succeed – we would try a smaller piece.
str. 52-54
cycle from the largest possible round number to 10 for the search of subarrays that can be rolled up.
str. 56
during the process the size of the analyzed array is decreased so we have to check it’s length.
str. 58-61
Here we are checking from the first number of unwrapped array – can it be rolled up from this point by the current compactification length or not. I’m not really sure that this criteria (checking the difference and the first number) is necessary and sufficient, but as far as I know – it works.
str. 63-67
Here we decided that first $comp_length numbers of unwrapped range can be compactified to their prefix.
We calculate this prefix from the first element of array, saving prefix to temporary array and remove compactified range from the original array.
str. 69-71
Here we decided that first $comp_length numbers do not fit our compactification rules, so we move first number from the original array to temporary array, and so the next check will be performed from the 2nd number of the original array.
str. 73-76
All attempts to find rollups for the certain compactification length are ended here. Prefixes that had been found and mismatched number sequences are moved to the temporary array.
So we move the rest of the numbers from the main array to the temporary array and then swap main and temporary arrays.
After that, we begin to search rollups for a smaller and smaller sequences until we reach the sequence length of 10 numbers.

It passed tests from the original blog:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
user@host:~/perltests$ ./rtp_test.pl --beg 628161400000 --end 628161999999
$VAR1 = [
          6281614,
          6281615,
          6281616,
          6281617,
          6281618,
          6281619
        ];
user@host:~/perltests$ ./rtp_test.pl --beg  628166490000 --cap 110000
$VAR1 = [
          62816649,
          6281665
        ];
user@host:~/perltests$ ./rtp_test.pl --beg  628127000000 --cap 760000
$VAR1 = [
          6281270,
          6281271,
          6281272,
          6281273,
          6281274,
          6281275,
          6281276,
          62812770,
          62812771,
          62812772,
          62812773,
          62812774,
          62812775
        ];
user@host:~/perltests$ ./rtp_test.pl --beg 2353596 --end 2353999
$VAR1 = [
          2353596,
          2353597,
          2353598,
          2353599,
          23536,
          23537,
          23538,
          23539
        ];