Hey all! Firstly a huge thanks in advance to anyone who spends time responding to this.

So I’m working on my MVP which I’m about to launch (in its simplest form this is an AI based news aggregator)

To date my server set up has been:

  1. data storage, scraping, and app API calls to my digital ocean server. This is a 2GB memory, 1 AMD vCPU 50 GB disk server running LAMP on Ubuntu 20.04

  2. All my AI LLM work where I preprocess and clean text, locally run LLMs from hugging face is done through a Scaleway PLAY2-PICO instance.

A few issues I’m facing:

  1. The api calls to the digital ocean server are incredibly slow. Takes 5 seconds to load posts and I’m the only one using the app.

  2. The scaleway server processes for LLMs just get killed I assume due to memory issues or whatever it is.

So now to the question. What is the server architecture / providers you guys use? It needs to be able to deal with large data tables in MYSQL quickly as well as run large LLM models as well (the two don’t need to be the same set up)

Much appreciated!

  • HandrMo@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    You could use a tool like newrelic to get an idea of where the bottleneck is. But there are so many things that can slow down an app, you’ll probably need some technical help to diagnose and resolve the issue.

  • DarkInTwisted@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    go AWS

    trust me bro… why trust me?

    microsoft = feasible TECHNICALLY, but the worst customer support in the world.

    aws is just as scalable and has amazing customer support. it’s easier to understand

    digital ocean… this is good for what? they’re a little cheaper, but that’s because they’re… cheaper. you’re obviously not satisfied with him

    you could also check out google cloud, they’re probably really good but i don’t know about them much

  • maxip89@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    5 seconds?

    For me it looks like that you query to your database each item (Ping pong between application server and database server, the latency between them sums up by the number of items in the database). Some junior developers use ORM layers still they find out they have such a problem. Just search n+1 problem, maybe you can fix it.

    this is not infrastructure related (or you use that much less memory and force your system to swap).

    hope this helps.

  • tony-berg@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    Speaking as a techie having dealt with clients with that problem: You need a techie to help you.

    Your whole concept could be flawed to the point where you will find it prohibitively expensive to create the user experience that you’re going for, or there might be a switch to flip.

    And the reason that I’m saying that you need help from someone to figure this out is that if you had the skills to find the underlying cause you wouldn’t ask what you just asked; your question would be a lot more specific, and it would be in a more relevant tech subreddit, or you’d right now be reading relevant posts on stackoverflow.

    You really need to dig deeper into what’s causing that delay, and you need to try things like emulating 100+ concurrent users to see what happens with that delay. And you need that not only to figure out how to minimize the delay, but also to calculate what resources (CPU, GPU, RAM, network, and so on) will costs you how much on different platforms; so that you can get your numbers right to figure out if your business plan is realistic or not. Can you afford to operate your business? Is your target market willing to pay what you must charge? How much capital do you need to invest in your own hardware (or perhaps operate at a loss on a cloud provider to get early scaleability)? Do you need to rework your software to take advantage of doing some calculations on off-peak hours?

  • EveryThingPlay@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    As I know so far, LLMs are pretty GPU-hungry stuff. So I’d go with a dedicated GPU server for the best results, but they cost kinda too much. Anyway I think you should check them out, maybe there would be some affordable options for your team.

  • PrepxI@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    As u/victrolla said you need to do some profiling, but I suspect it’s lack of indexes in your SQL database tables.

    • thewanitz@alien.topOPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      11 months ago

      Thanks! Sorry for the noob question but what would you need to know as part of the profiling?

      • xasdfxx@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        11 months ago

        what would you need to know as part of the profiling

        what’s taking the time

        Obvious things

        1 - your server is out of ram (free -g; ps aux --sort -rss)

        2 - your server is serving a ton of requests (what tho)

        3 - database slow

        4 - database full

        5 - db out of connections (are you using connection pooling?)

        6 - poor choices inside whatever php you’ve written

        7 - internet between scaleway / DO

        That said… all this is optimized to be as cheap as possible. Value your time, use bigger servers, and go figure out if you’ve created any value. If you’ve created value, hosting costs can be optimized.

  • victrolla@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    Needs more profiling. I doubt the API call is the bill of processing time. I’m completely blind about your app architecture, but I’m wondering if you’ve looked into something like google vertex. You’d benefit from hardware acceleration and horizontal scaling of your model. The API component is best to run in something like cloudrun.

    Usually what I see for folks like you is they’ve got a python webapp like flask that’s running the model directly. This is a really bad plan. You need to decompose. Push the model into vertex and make an api call to it on the backend.

    There’s a lot of possible optimizations but there’s not enough detail to know specifics.

    • EveryThingPlay@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      11 months ago

      Agreed with you, profiling is really needed to see where the bottleneck is - my suggestion is in LLM itself (because some of them are really heavy stuff), but it actually maybe just because of the wrong server settings. And yeah running the whole neural network inside of the flask app is a bad idea not only for performance, but for stability (cuz if something gets crashed in model, then it would be high risk for something to get crashed in Flask app, and whole application get stuck)

  • SkullTech101@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    Hi, this is very interesting. I’d like to talk to you more about this since I’m building something in this space. Would love to help you as much as I can in exchange for more details about your usecase and issues. Can we talk in DMs?

  • soulsurvivor97@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    I’m in a similar position you are in now. Developing a mobile app that utilizes a couple of computer vision models, using the aws ecosystem for prototyping and finding that server-side development has a pretty big learning curve.