• 0 Posts
  • 2 Comments
Joined 1 year ago
cake
Cake day: October 28th, 2023

help-circle
  • EveryThingPlay@alien.topBtoStartupsServer Set up
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Agreed with you, profiling is really needed to see where the bottleneck is - my suggestion is in LLM itself (because some of them are really heavy stuff), but it actually maybe just because of the wrong server settings. And yeah running the whole neural network inside of the flask app is a bad idea not only for performance, but for stability (cuz if something gets crashed in model, then it would be high risk for something to get crashed in Flask app, and whole application get stuck)


  • EveryThingPlay@alien.topBtoStartupsServer Set up
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    As I know so far, LLMs are pretty GPU-hungry stuff. So I’d go with a dedicated GPU server for the best results, but they cost kinda too much. Anyway I think you should check them out, maybe there would be some affordable options for your team.