I have a platform with a cool free feature that helps companies (or people) get the dataset they need, we’re talking about commercial/valuable datasets here - so you don’t have to spend weeks or months searching. I know there are tons of startups out there that need datasets (i.e for AI/ML model etc…), but there isn’t really a place where you can find them since they don’t really post their data request online. Here are a few things I’ve done:
- Reach out to data related communities on Reddit (and LinkedIn w/ less success)

- Cold emailing startups with our value proposition

- Go to networking events to explain our value proposition

These options do work sometimes but are hard to scale. I’m iterating on how I deliver our message to increase response rate but I’m wondering if there are any other channels or ideas out there that I haven’t explored.

  • ivanmartinvalle@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    Did you try running ads? This would be interesting to me, but I’d have concerns about if the data is valid and acquired legally.

  • TheWorldIsOnlyYours@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    I am a potential customer. But reading your post, my only thought was: that sounds sketchy. I would recommend that you include the “how” these data sets were collected directly in your value proposition.

  • boxxa@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    I did send you a DM about this. I do adhoc requests for data sets and scraping so the idea of being able to list data somewhere but curious how you differ from Kaggle for example.

  • boxxa@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    I’ve sent examples of sets I thought would be useful to companies by cold email. Some success but not much. More amazed in hindsight how many people opened the attachments from an external sender.

    • nobilis_rex_@alien.topOPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      Hmmm might try that, include some images or something dynamic to increase open-rate - that’s a good idea

  • bpliv@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    Why would a company risk training their models on data that they have not gathered and validated themselves? Garbage in, garbage out.

    • boxxa@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      There is a lot of steps before the “training of the models” when it comes to AI.

      Data processing is a major part and dealing with data sets is a pain. In a similar business, we scrape, capture and sanitize a lot of data at requests of companies to help give them a consistent format they don’t need to continually manage.

  • mr_house7@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    Why shouldn’t I just stick with open source dataset, like hugging face and other platforms? What do you offer that is unique? Basically my question is what is your unique value prop.