Plus the old catalog is very old/stale and I have never seen 'proof' that they actually did rip the data and build a catalog, not the full set anyhow. Still not sure how to go about doing this.supposedly others have built a catalog like this for the same client by ripping the Icecat repository, but never make/provide tools for future manipulation etc.which is where I am headed. Then I will need to build a model for updating prices and shipping weights from a different feed.Synnex in this case, but seems much easier because shipping and price should be real-time.so different story and much less data at once, only whats in the cart I'm thinking. from other feed, manipulate descriptions, rename images if need (SEO) or whatever. At this point in the workflow I have all info and can use SKU (or whats needed) to grab price, etc. ID and build products by grabbing SKU, Desc., Image filename, etc. Then produce a job 'use category ID and get all XML feeds'.then 'use cat.ID (probably same again) and then fetch images'.then, take same Cat. I have already populated tables based on desired categories.now I am kind of headed towards this: Building an app (tool) that will better refine, such as a single category, what I am working with. I have the meta-data based on the main catalog (over 500K items). Here is where I am sort of headed already. Always grabbing whats needed in real time based on the meta-data that is in the database that was (is) already populated from the 'main' catalog file.that describes the entire catalog available from Icecat, but this don't plug into many solutions and will take a performance hit, plus some hosts wont let us GET anyhow.so many limitations here, but sounds like an awsome solution to be sure you always have super current info (which is not needed here though) Maybe everything is in real time: The products are fetched and displayed in real time, when the admin views products for manipulation it is presented in real-time, etc. But, expanding on the 'real-time' e real time GET of XML data, and then store the data as it comes in with some logic like 'if it's not present locally.go get it and then store it if it is present locally then check if it is up-to-date info.if not update it'.of course if I'm gonna check if it's up-to-date then there really is no point in storing the data because I'm doing a request every time no matter what.may as well just fetch it and throw it away, which seems inefficient. The client wants the catalog, understandable.and I notice that real time adds a performance hit and does not plug in to (easily) many solutions. One idea is to get the data in real time as needed, but this is not desired by the client or myself. as needed (prices, related product, etc.) Seems easy enough, but after around 3-4k GET requests or so I get booted because I'm ripping to much data, once I have the entire catalog (my catalog of wanted categories) then it will be easy enough to grab the update files (XML.and small in comparison) and make changes accordingly.this would be a great point to be at, but first need to get all the data and build the product table(s) first. to a table so I can build the catalog, like for use with Magento, later adding/changing etc. So, then I could parse the individual XML to extract the description, SKU, etc. I feel this way because it's an obvious solution and thats what the client keeps asking for.doesn't mean that it's correct though. What I feel I need to do is GET the XML for a product and grab the image and store them. My categories contain thousands (over 40k for instance) of products. I find it easy to adhoc a script to use a GET request for each XML file and image.then could dump them into directories, but Icecat does NOT want you to rip very large amounts. ![]() Here is where I run into question/concern. ![]() At this point, I inserted all the data into a table, so now I have a table for like 'Printer cats', this contains the URL to the images and the id for the XML for each product in the category.Easy enough.I extract the data about the products that I need based on the Category ID.I GET the XML feed of the entire catalog.My question is, what is a good model for populating a product catalog's database? I am working on building a catalog of products based on OpenIcecat's (Icecat Open Catalog) and I am looking for advice from someone who may have experience with this, or possibly experience with another similar service (like C-Net maybe).
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |