So I have a question to anyone familiar with the General Transit Feed Specification...

Why is the data provided in text files? Is there not a way to format the data to allow for random access to it?

Like I'm currently writing a transit app for a school project, and as far as I can tell, the only way to get all specific stops for a route, is to first look up all trips in a route, then look up all the stopids that are associated with a trip in stoptimes.txt (while also filtering out duplicates since the goal is to get stop ids, not specifically stop times) and then look up those stop ids in the stops.txt file.

The stoptimes file alone is over 500000 lines long, unless there is a way better way to be parsing the data that I'm not aware of? Currently I'm just loading the entire stoptimes file into a data structure in memory because the extra bit of ram used seems negligible compared to the load times I'm saving...

Would it be faster if I just parsed all the data once and threw it into a database? (And then updated the database once a month when the new data comes in?)

  • 1
    I'd normalise it in a db and then work with the data from there.

    Maintain or automate a regular update (weekly/monthly) and then sell it back to them in the form of an api 😏

    Now hurry up before someone else does it.
  • 0
    @C0D4 well if I decide to continue maintaining/improving this app after I graduate then I'll probably throw it into a database (also not sure I could legally sell it back to them XD)
  • 0

    Reason is portability...

    Depending on the export (don't know the transit thing you mentioned) the file format will never change after being released once.

    Many many many companies still use ASCII instead of UTF, which is even worse than the file format.

    Always. Really always. Abstract the import of your data from the data layer that you want to use. And Abstract this layer from the Services that consume the data.

    It might increase the work load at the beginning, but it solves many many many problems afterwards.

    When you design the data layer, especially when utilizing a database, ignore the format of the file.

    Let the importer service prepare the content, transform it to your needs and then let the data layer handle validation, consistency checks and persistence.

    Always expect surprises :)
Add Comment