Javascript – loop efficiency

I want to run through a scenario that highlights how loop performance can be very poor…when poorly considered, and you might not realise it if the code “works”. Consider this scenario – the database has a table of 30,000 bookings for a collection of tennis courts, for each booking there is a startDate and an endDate. For each booking there is also a courtId. There was a problem with the application, it was discovered that for 2 months the system has allowed users to create duplicate bookings. We have been tasked to write a utility that can identify any duplicates using momentjs.

Let’s have a go:

type Booking = { startDate: string; endDate: string; courtId: number }

const findDuplicateBookings = (bookings: Booking[]) => {
  let foundDuplicateIds: number[] = []

  bookings.forEach((booking) => {
    const currentBookingStart = moment(booking.startDate)
    const currentBookingEnd = moment(booking.endDate)
    const currentBookingRange = moment.range(currentBookingStart, currentBookingEnd)

    const newDuplicates = bookings.filter((otherBooking) => {
      const otherBookingStart = moment(otherBooking.startDate)
      const otherBookingEnd = moment(otherBooking.endDate)
      const otherBookingRange = moment.range(otherBookingStart, otherBookingEnd)
      return currentBookingRange.overlaps(otherBookingRange) && 
        booking.courtId === otherBooking.courtId 
    })
    foundDuplicateIds = [...foundDuplicateIds.map((id) => id), ...newDuplicates.map((d) => d.id)]
  })
}

type Booking = { startDate: string; endDate: string; courtId: number }

const findDuplicateBookings = (bookings: Booking[]) => {

let foundDuplicateIds: number[] = []

bookings.forEach((booking) => {

const currentBookingStart = moment(booking.startDate)

const currentBookingEnd = moment(booking.endDate)

const currentBookingRange = moment.range(currentBookingStart, currentBookingEnd)

const newDuplicates = bookings.filter((otherBooking) => {

const otherBookingStart = moment(otherBooking.startDate)

const otherBookingEnd = moment(otherBooking.endDate)

const otherBookingRange = moment.range(otherBookingStart, otherBookingEnd)

return currentBookingRange.overlaps(otherBookingRange) &&

booking.courtId === otherBooking.courtId

})

foundDuplicateIds = [...foundDuplicateIds.map((id) => id), ...newDuplicates.map((d) => d.id)]

})

}

This code is horribly inefficient for a number of reasons, for a start we probably shouldn’t be using momentjs (date-fns is nicer), but for the purpose of this exercise I will use it, and I want to focus solely on one specific issue. That issue is how many times we are running heavy parsing operations on the dates. We know there are 30,000 bookings, and within our loop we are looping again…yikes, based on the code we have, think how many times we are parsing these start and end times, way too many. Let’s try to improve things a bit:

type Booking = { startDate: string; endDate: string; courtId: number; id: number }
type ParsedBooking = Pick<Booking, 'courtId' | 'id'> & {
  startDate: Moment
  endDate: Moment
}

const findDuplicateBookings = (bookings: Booking[]) => {
  const bookingsWithMoments = bookings.map((booking) => ({
    ...booking,
    startDate: moment(booking.startDate),
    endDate: moment(booking.endDate),
  }))

  let foundDuplicateIds: number[] = []

  bookingsWithMoments.forEach((booking) => {
    const { startDate, endDate } = booking
    const currentBookingRange = moment.range(startDate, endDate)

    const newDuplicates = bookingsWithMoments.filter((otherBooking) => {
      const otherBookingRange = moment.range(otherBooking.startDate, otherBooking.endDate)
      return currentBookingRange.overlaps(otherBookingRange) &&
          booking.courtId === otherBooking.courtId 
    })

    foundDuplicateIds = [...foundDuplicateIds.map((id) => id), ...newDuplicates.map((d) => d.id)]
  })
}

type Booking = { startDate: string; endDate: string; courtId: number; id: number }

type ParsedBooking = Pick<Booking, 'courtId' | 'id'> & {

startDate: Moment

endDate: Moment

}

const findDuplicateBookings = (bookings: Booking[]) => {

const bookingsWithMoments = bookings.map((booking) => ({

...booking,

startDate: moment(booking.startDate),

endDate: moment(booking.endDate),

}))

let foundDuplicateIds: number[] = []

bookingsWithMoments.forEach((booking) => {

const { startDate, endDate } = booking

const currentBookingRange = moment.range(startDate, endDate)

const newDuplicates = bookingsWithMoments.filter((otherBooking) => {

const otherBookingRange = moment.range(otherBooking.startDate, otherBooking.endDate)

return currentBookingRange.overlaps(otherBookingRange) &&

booking.courtId === otherBooking.courtId

})

foundDuplicateIds = [...foundDuplicateIds.map((id) => id), ...newDuplicates.map((d) => d.id)]

})

}

Ok that’s at least a bit better, you can see that we now proactively create a new array of bookings before the loop which already contains parsed versions of the start and end dates. This means that we now run these parsing operations MUCH LESS, resulting in less duplicated effort and effort in general. Maybe there is more to be done, let’s see if we can improve a bit more:

type Booking = { startDate: string; endDate: string; courtId: number; id: number }
type ParsedBooking = Pick<Booking, 'courtId' | 'id'> & {
  range: DateRange
  startDate: Moment
  endDate: Moment
}

const findDuplicateBookings = (bookings: Booking[]) => {
  const bookingsWithMoments: ParsedBooking[] = bookings.map((booking) => ({
    ...booking,
    startDate: moment(booking.startDate),
    endDate: moment(booking.endDate),
    range: moment.range(booking.startDate, booking.startDate),
  }))

  let foundDuplicateIds: number[] = []

  bookingsWithMoments.forEach((booking) => {
    const newDuplicates = bookingsWithMoments.filter((otherBooking) => {
      return booking.range.overlaps(otherBooking.range) 
         && booking.courtId === otherBooking.courtId
    })
    foundDuplicateIds = [...foundDuplicateIds.map((id) => id), ...newDuplicates.map((d) => d.id)]
  })
}

type Booking = { startDate: string; endDate: string; courtId: number; id: number }

type ParsedBooking = Pick<Booking, 'courtId' | 'id'> & {

range: DateRange

startDate: Moment

endDate: Moment

}

const findDuplicateBookings = (bookings: Booking[]) => {

const bookingsWithMoments: ParsedBooking[] = bookings.map((booking) => ({

...booking,

startDate: moment(booking.startDate),

endDate: moment(booking.endDate),

range: moment.range(booking.startDate, booking.startDate),

}))

let foundDuplicateIds: number[] = []

bookingsWithMoments.forEach((booking) => {

const newDuplicates = bookingsWithMoments.filter((otherBooking) => {

return booking.range.overlaps(otherBooking.range)

&& booking.courtId === otherBooking.courtId

})

foundDuplicateIds = [...foundDuplicateIds.map((id) => id), ...newDuplicates.map((d) => d.id)]

})

}

A little bit better again, we are now also proactively creating the range so we don’t have to do it inside the nested loops, again this results in much less duplicated effort and effort in general. There is one more things I would like to do, let’s take a look at one line in particular:

return booking.range.overlaps(otherBooking.range) && 
  booking.courtId === otherBooking.courtId

1 2	return booking.range.overlaps(otherBooking.range) && booking.courtId === otherBooking.courtId

This line could be a bit more efficient, the problem with the current version is, it will often do a lot of extra work that is redundant, we only care about bookings that share the same courtId, but right now we are evaluating the overlap before we check for this match, let’s switch things around:

return booking.courtId === otherBooking.courtId &&
  booking.range.overlaps(otherBooking.range)

1 2	return booking.courtId === otherBooking.courtId && booking.range.overlaps(otherBooking.range)

This is much more efficient because now the first check is the one that performs a very lightweight operation in the form of comparing primitive values – two strings, and when they are not the same, the evaluation will stop.

Can you think of any more ways to improve this code? I am sure there are many ways, for example, when iterating maybe we could skip bookings that have already been detected as duplicates, there is no value in adding them twice. We might also consider creating a smaller subset of bookings that will only contain bookings in the future, or before running the second loop we could filter out bookings that don’t have the same courtId, and overall these changes in approach could result in a massively reduced number of iterations.