Skip to content Skip to sidebar Skip to footer

Creating A Timeseriesgenerator With Multiple Inputs

I'm trying to train an LSTM model on daily fundamental and price data from ~4000 stocks, due to memory limits I cannot hold everything in memory after converting to sequences for t

Solution 1:

So what I've ended up doing is to do all the preprocessing manually and save an .npy file for each stock containing the preprocessed sequences, then using a manually created generator I make batches like this:

class seq_generator():

  def __init__(self, list_of_filepaths):
    self.usedDict = dict()
    forpathin list_of_filepaths:
      self.usedDict[path] = []

  def generate(self):
    while True: 
      path = np.random.choice(list(self.usedDict.keys()))
      stock_array = np.load(path) 
      random_sequence = np.random.randint(stock_array.shape[0])
      if random_sequence notinself.usedDict[path]:
        self.usedDict[path].append(random_sequence)
        yield stock_array[random_sequence, :, :]

train_generator = seq_generator(list_of_filepaths)

train_dataset = tf.data.Dataset.from_generator(seq_generator.generate(),
                                               output_types=(tf.float32, tf.float32), 
                                               output_shapes=(n_timesteps, n_features)) 

train_dataset = train_dataset.batch(batch_size)

Where list_of_filepaths is simply a list of paths to preprocessed .npy data.


This will:

  • Load a random stock's preprocessed .npy data
  • Pick a sequence at random
  • Check if the index of the sequence has already been used in usedDict
  • If not:
    • Append the index of that sequence to usedDict to keep track as to not feed the same data twice to the model
    • Yield the sequence

This means that the generator will feed a single unique sequence from a random stock at each "call", enabling me to use the .from_generator() and .batch() methods from Tensorflows Dataset type.

Post a Comment for "Creating A Timeseriesgenerator With Multiple Inputs"