4

Suppose I have a lattice

a = np.array([[1, 1, 1, 1],
              [2, 2, 2, 2],
              [3, 3, 3, 3],
              [4, 4, 4, 4]])

I'd like to make a function func(lattice, start, end) that takes in 3 inputs, where start and end are the index of rows for which the function would sum the elements. For example, for func(a,1,3) it'll sum all the elements of those rows such that func(a,1,3) = 2+2+2+2+3+3+3+3+4+4+4+4 = 36.

Now I know this can be done easily with slicing and np.sum() whatever. But crucially what I want func to do is to also have the ability to wrap around. Namely func(a,2,4) should return 3+3+3+3+4+4+4+4+1+1+1+1. Couple more examples would be

func(a,3,4) = 4+4+4+4+1+1+1+1
func(a,3,5) = 4+4+4+4+1+1+1+1+2+2+2+2
func(a,0,1) = 1+1+1+1+2+2+2+2

In my situation I'm never gonna get to a point where it'll sum the whole thing again i.e.

func(a,3,6) = sum of all elements

Update: For my algorithm

for i in range(MC_STEPS_NODE):
    sweep(lattice, prob, start_index_master, end_index_master,
                      rows_for_master) 
    # calculate the energy
    Ene = subhamiltonian(lattice, start_index_master, end_index_master)  
    # calculate the magnetisation
    Mag = mag(lattice, start_index_master, end_index_master)
    E1 += Ene
    M1 += Mag
    E2 += Ene*Ene
    M2 += Mag*Mag

        if i % sites_for_master == 0:
            comm.Send([lattice[start_index_master:start_index_master+1], L, MPI.INT],
                              dest=(rank-1)%size, tag=4)
            comm.Recv([lattice[end_index_master:end_index_master+1], L, MPI.INT],
                              source = (rank+1)%size, tag=4)
            start_index_master = (start_index_master + 1)
            end_index_master = (end_index_master + 1)

            if start_index_master > 100:
                start_index_master = start_index_master % L

            if end_index_master > 100:
                end_index_master = end_index_master % L

The function I want is the mag() function which calculates the magnetisation of a sublattice which are just sum of all its elements. Imagine a LxL lattice split up into two sublattices, one belongs to the master and the other to the worker. Each sweep sweeps the corresponding sublattice of lattice with start_index_master and end_index_master determining the start and end row of the sublattice. For every i%sites_for_master = 0, the indices move down by adding 1, eventually mod by 100 to prevent memory overflow in mpi4py. So you can imagine if the sublattice is at the centre of the main lattice then start_index_master < end_index_master. Eventually the sublattice will keep moving down to the point where start_index_master < end_index_master where end_index_master > L, so in this case if start_index_master = 10 for a lattice L=10, the most bottom row of the sublattice is the first row ([0]) of the main lattice.

Energy function:

def subhamiltonian(lattice: np.ndarray, col_len_start: int,
                   col_len_end: int) -> float:

    energy = 0
    for i in range(col_len_start, col_len_end+1):
        for j in range(len(lattice)):
            spin = lattice[i%L, j]
            nb_sum = lattice[(i%L+1) % L, j] + lattice[i%L, (j+1) % L] + \
                     lattice[(i%L-1) % L, j] + lattice[i%L, (j-1) % L]
            energy += -nb_sum*spin

    return energy/4.

This is my function for computing the energy of the sublattice.

user3613025
  • 373
  • 3
  • 10
  • Not very efficient, but you could `a.vstack(a)` before doing the slicing and summing. If it can wrap around multiple times you can `vstack` multiple times. – Boris Verkhovskiy Feb 09 '20 at 20:45
  • Unfortunately I'd be calling this function potentially over 10^6 times so I'm afraid `np.vstack()` is not ideal – user3613025 Feb 09 '20 at 20:58
  • _Unfortunately I'd be calling this function potentially over 10^6 times_ What for? – AMC Feb 09 '20 at 21:05
  • 2
    Do you need it to wrap around just once or an arbitrary number of times? Like will the second value never exceed twice the length of the array? – Boris Verkhovskiy Feb 09 '20 at 21:06
  • Like @Boris If it's an arbitrary number of times then using *recursive* function is helpful. – Ch3steR Feb 09 '20 at 21:09
  • @AMC I'm using mpi4py on Ising model. I use domain decomposition to split my main lattice to sublattices for different cores and so they each sweep their on sublattices. I've left a row of values that don't get updated between cores. In order to fulfill ergodicity I've to move the sub lattices and thus the non-update row down too. Problem comess when one of the sublattice eventually have the top half of its sublattice row at the bottom of the main lattice, and bottom half of its sublattice row at the top of the main lattice. Need to find magnetisation which sums up all elements of sublattice – user3613025 Feb 09 '20 at 21:15
  • @Boris The second value sometimes will exceed twice the length of the array unfortunately – user3613025 Feb 09 '20 at 21:17
  • @user3613025 what should happen when `end` exceeds twice the length of the array. For example `func(a,3,8)=np.sum([4 4 4 4],[1 1 1 1],[2 2 2 2],[3 3 3 3],[4 4 4 4],[1 1 1 1]])`? – Ch3steR Feb 09 '20 at 21:28
  • @Ch3steR That'll never happen, atleast that's what I thought with the way my algorithm is set up. `start` and `end` are the start and end index of each sublattice, so at any point `(end-start)` will never be bigger than the dimension of the full lattice. In fact `(end-start) <= L/size` where `L` is the dimension of the full lattice and size is the number of cores that I'm splitting my main lattice into. – user3613025 Feb 09 '20 at 22:35

2 Answers2

4

You could use np.arange to create the indexes to be summed.

>>> def func(lattice, start, end):
...     rows = lattice.shape[0]
...     return lattice[np.arange(start, end+1) % rows].sum()
... 
>>> func(a,3,4) 
20
>>> func(a,3,5)
28
>>> func(a,0,1)
12
abc
  • 11,579
  • 2
  • 26
  • 51
  • It works for now, but not sure if completely if I scale up my simulation. I f*cking love you man. Keep ballin'. – user3613025 Feb 09 '20 at 21:28
  • 1
    It's elegant notation, but creating and using a separate index array noticeably hurts performance for larger input arrays compared to slice notation. A simple `if` branch gets you the same result for cases that do not wrap more than one time without the overhead of an index array. – a_guest Feb 09 '20 at 21:47
2

You can check if the stop index wraps-around and if it does add the sum from the beginning of the array to the result. This is efficient because it relies on slice indexing and only does extra work if necessary.

def func(a, start, stop):
    stop += 1
    result = np.sum(a[start:stop])
    if stop > len(a):
        result += np.sum(a[:stop % len(a)])
    return result

The above version works for stop - start < len(a), i.e. no more than one full wrap-around. For an arbitrary number of wrap-around (i.e. arbitrary values for start and stop) the following version can be used:

def multi_wraps(a, start, stop):
    result = 0
    # Adjust both indices in case the start index wrapped around.
    stop -= (start // len(a)) * len(a)
    start %= len(a)
    stop += 1  # Include the element pointed to by the stop index.
    n_wraps = (stop - start) // len(a)
    if n_wraps > 0:
        result += n_wraps * a.sum()
    stop = start + (stop - start) % len(a)
    result += np.sum(a[start:stop])
    if stop > len(a):
        result += np.sum(a[:stop % len(a)])
    return result

In case n_wraps > 0 some parts of the array will be summed twice which is unnecessarily inefficient, so we can compute the sum of the various array parts as necessary. The following version sums every array element at most once:

def multi_wraps_efficient(a, start, stop):
    # Adjust both indices in case the start index wrapped around.
    stop -= (start // len(a)) * len(a)
    start %= len(a)
    stop += 1  # Include the element pointed to by the stop index.
    n_wraps = (stop - start) // len(a)
    stop = start + (stop - start) % len(a)  # Eliminate the wraps since they will be accounted for separately.
    tail_sum = a[start:stop].sum()
    if stop > len(a):
        head_sum = a[:stop % len(a)].sum()
        if n_wraps > 0:
            remaining_sum = a[stop % len(a):start].sum()
    elif n_wraps > 0:
        head_sum = a[:start].sum()
        remaining_sum = a[stop:].sum()
    result = tail_sum
    if stop > len(a):
        result += head_sum
    if n_wraps > 0:
        result += n_wraps * (head_sum + tail_sum + remaining_sum)
    return result

The following plot shows a performance comparison between using index arrays and the two multi-wrap methods presented above. The tests are run on a (1_000, 1_000) lattice. One can observe that for the multi_wraps method there is an increase in runtime when going from 1 to 2 wrap-around since it unnecessarily sums the array twice. The multi_wraps_efficient method has the same performance irregardless of the number of wrap-around since it sums every array element no more than once.

Performance plot

The performance plot was generated using the perfplot package:

perfplot.show(
    setup=lambda n: (np.ones(shape=(1_000, 1_000), dtype=int), 400, n*1_000 + 200),
    kernels=[
        lambda x: index_arrays(*x),
        lambda x: multi_wraps(*x),
        lambda x: multi_wraps_efficient(*x),
    ],
    labels=['index_arrays', 'multi_wraps', 'multi_wraps_efficient'],
    n_range=range(1, 11),
    xlabel="Number of wrap-around",
    equality_check=lambda x, y: x == y,
)
a_guest
  • 34,165
  • 12
  • 64
  • 118
  • You seemed to have put a lot effort into your answer so I'll have a look tomorrow since it's very late but if it's quicker for larger dimensions then I'll reaccept yours as the answer. – user3613025 Feb 09 '20 at 22:37
  • Using the `%timeit` command on all 3 `func`, `multi_wraps` and `multi_wraps_efficient` on lattice sizes of `10x10` and `100x100` seems to show that they all ran for about the same time apart from `multi_wraps_efficient` being slightly quicker when `stop - start` is not too big. Upon closer look both `multi_wraps` and `multi_wraps_efficient` seem to be giving the wrong values of 4 for `(a,6,8)` when it should be giving 32. Thanks for the effort tho! – user3613025 Feb 10 '20 at 16:40
  • @user3613025 Oh well you didn't mention that the start index can wrap as well. But that's easily solved by adjusting the stop index via `stop -= (start // len(a)) * len(a)` and the start index via `start %= len(a)`, please see my updated answer. The performance difference will be more noticeable for larger lattices and especially growing with the number of wraps. – a_guest Feb 10 '20 at 18:48
  • beautiful. thank you. I'm getting twice the speed up on certain cases. – user3613025 Feb 10 '20 at 21:23