0

I need to extend a range from its given start number to end number, for example if I have [1,4] I need output as [1,2,3,4]. I have been trying to use this code block, as a logic, however, I am unable to make it dynamic. When I pass many lists in it I get an error.

    # Create an empty list
My_list = []


# Value to begin and end with
start = 10
print(start)
end = 20
print(end)

# Check if start value is smaller than end value
if start < end:
    # unpack the result
    My_list.extend(range(start, end))
    # Append the last value
    # My_list.append(end)

# Print the list
print(My_list) 

Output: 10 20 [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

This is what I need! But...

I am trying to do this:

import pandas as pd
My_list = []
isarray = []
pd_df = draft_report.toPandas()
for index, row in pd_df.iterrows():
   My_list = row[14] #14 is the place of docPage in the df
   start = My_list[1] #reads the 1st element eg: 1 in [1,16]
   print(start)
   end = My_list[3] #reads the last element eg: 16 in [1,16]
   print(end)
   if start < end:
       isarray.extend(range(int(start, end)))
       isarray.append(int(end))
   print(isarray)

Output:

An error was encountered:
'str' object cannot be interpreted as an integer
Traceback (most recent call last):
TypeError: 'str' object cannot be interpreted as an integer

The data looks like this:

docPages
[1,16]
[17,22]
[23,24]
[25,27]

1 Answers1

0

since, the source column is of StringType(), you will first need to convert the string to array - this can be done using from_json function. then use the resulting array elements within the sequence function.

data_sdf. \
    withColumn('arr', 
               func.sort_array(func.from_json('arr_as_str', 'array<integer>'))
               ). \
    withColumn('arr_range', func.expr('sequence(arr[0], arr[1], 1)')). \
    show(truncate=False)

# +----------+--------+-------------------------------------------------------+
# |arr_as_str|arr     |arr_range                                              |
# +----------+--------+-------------------------------------------------------+
# |[1,16]    |[1, 16] |[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]|
# |[17,22]   |[17, 22]|[17, 18, 19, 20, 21, 22]                               |
# |[23,24]   |[23, 24]|[23, 24]                                               |
# |[25,27]   |[25, 27]|[25, 26, 27]                                           |
# +----------+--------+-------------------------------------------------------+

if the source column is an ArrayType() field, you can directly use the sequence function to create a range.

see example below.

data_sdf. \
    withColumn('doc_range', func.expr('sequence(doc_pages[0], doc_pages[1], 1)')). \
    show(truncate=False)

# +---------+-------------------------------------------------------+
# |doc_pages|doc_range                                              |
# +---------+-------------------------------------------------------+
# |[1, 16]  |[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]|
# |[17, 22] |[17, 18, 19, 20, 21, 22]                               |
# |[23, 24] |[23, 24]                                               |
# |[25, 27] |[25, 26, 27]                                           |
# +---------+-------------------------------------------------------+
samkart
  • 6,007
  • 2
  • 14
  • 29