Sebastian Pieczynski's website and blog
Published on: yyyy.mm.dd
Published on: yyyy.mm.dd
Created with ❤️ by Sebastian Pieczyński © 2023-2024.
Published on: 11/30/2023
This is the continuation of the part 1 of our project to create a web scraper with local database and typescript.
Arrrr... you ready for boarding?
We have started our journey following the trail left by bright sun and dark moon. Then we have sent messages to our crew and found a map and a key to the treasures hidden on this island. Finally we have unearthed all the gold and precious gems and today we're going to preset them to everyone on board.
In the first part of the project we created a simple but reliable web scraper. In this part we'll build the backend and the frontend to show our users data we have unearthed from the depths of the web.
We will be using the same repo (it's updated with new features) as in part 1: https://github.com/ethernal/web-scraper-stepped-solution
Every step has a separate branch and main
is a finished version.
Keen eyed among you might have noticed that in the model name in schema.prisma
file there is a typo. Instead of ScrapedData
the model name is ScrappedData
. We'll address that issue in the bonus section at the end of the article.
Since Prisma cannot run on frontend we need some way to get the data. We can build an API server to expose a route that will then return the data that we are interested in. Additional benefit is that we can then have some more control over what and how the data is returned.
To make our lives a bit easier while developing a live running process (that requires restarting every time a change is introduced) we can install a 'nodemon' package. Nodemon will automatically restart the server when a change is introduced. It will be helpful if you want to tinker with the implementation later.
Since we want to use tsx
to run our .ts
files we need to tell nodemon
to use tsx
when executing .ts
files.
To do that create a nodemon.json
in the root of the project with the map of the executables for specific file extensions.:
then in the package.json
:
We'll create the server.ts
in just a second.
If you are using ts-node
in your project it is not required to create a configuration. nodemon
will use it automatically for .ts
files.
To be able to receive requests and respond to them we need to create a server that will listen on a specific port and respond to our queries.
One of the most popular (read standard) ways to do this is by using express
package. We'll also use cors
package to allow communication between frontend and backend.
Since we are using Typescript we'll also add the types for both packages.
and for develpment:
Now in the root of the project create a server.ts
file and paste the code below. We'll go through it line by line in the comments:
Our API layer is very simple, it waits for a client to visit the route /api/products
and returns all data from the DB based on the parameters passed. The req
and res
parameters are automatically passed to the function by the Express server.
req
stands for request sent by the client browser and res
is the response that we will send back to the client according to the paramters passed in req
. The req
holds all the information about the request:
The server waits for a client (user/browser) to open the browser with URL 'http://localhost:3213/api/products' and when that happens it invokes code in the async (req, res)
function.
If the URL we invoked was formed as: http://localhost:5173/?price_lte=80
this is what the request would look like (shortened contents for readability):
As you can see the req
object has a lot of potentially useful information about both the request and client sending it. For us the one of most importance is the query
object.
In our case that means parsing the query parameters that you have seen in the request body:
Let's break down the maxPrice parsing as it may be a bit complex to look at.
First we check if we can parse the price_lte
as an integer. If it's undefined we use empty string ('') and that in turn will produce a NaN
(not a number) result. If it is a NaN
the we fallback to DEFAULT_MAX_PRICE
if not we parse it. The exclamation mark at the end of price_lte
is there to tell Typescript that we know what we are doing and price_lte will not be undefined
- we already took care of that.
Then we use the rest of the parameters to build a query for the database:
If the sortBy
param is not supplied we'll return undefined
as orderByQuery
that way it will not be included as part of the prisma query, using null here will lead to errors.
Finally the function is invoked and data returned in as the successful (code 200) response.
As you can clearly see this is enough to create a robust functionality and it is quite simple. Of course proper implementation would need to take care of many other things like:
Our goal here is to just get data so our frontend can render it. But there is one thing we must take care of: CORS.
CORS stands for Cross-Origin Resource Sharing. MDN docs explain it in details on the CORS documentation page:
Cross-Origin Resource Sharing (CORS) is an HTTP-header based mechanism that allows a server to indicate any origins (domain, scheme, or port) other than its own from which a browser should permit loading resources.
For the demo purposes we allow all requests to pass with:
You can try to comment out this line and see what happens when we finish the frontend.
This concludes creating a backend service for our use case. Now let's start using this data and show it to the world.
Code of the finished backend.
I'll be using TailwindCSS for styling. You can read more about building a robust CSS system in my previous article here.
So we'll install that first and then use it in our frontend.
And finally change the tailwind.config.js
from:
to:
Now we need to import Tailwind base classes into our frontend.
At the top of the index.css
add:
We need to add 100% width to the #root
element so that we can arrange our products in a grid.
Now delete App.css
from the src
folder and remove it from imports.
Let's start simple and display only the names of the products returned from the backend.
Find the src/App.tsx
and replace it's contents with the code below:
Make sure that the API server is running:
And run the Vite app with:
Frontend application should be available at: http://localhost:5173
When it loads names of the pokemon products should be displayed:
At this point there are few issues with current implementation:
Version displaying product names on GitHub.
So let's add these features now.
Now users of our application have much more flexibility:
And as we add more features (more power) we need to take on more responsibilities with it.
Right now when users change the parameters of the search it is not reflected by the link they see. It is important because:
Adding such feature is not hard:
Code withURL parameters, images and sorting.
That's a lot of code but we actually only needed to add one line inside fetch function to make it work:
Since fetch happens on every change to the parameters we can set history there and be certain that it's updated whenever new request is made. Grrrreat!
It's not all rum and roses though.
Open the Developer Tools (press F12 inside most browsers to show it) and navigate to the Network tab and reload the page if needed. Now try changing the price..
Requests stream like water into the cannon ball treated hull and whenever we start a new request the old one is still in progress. We need a way to correct that:
AbortController
.Debouncing means that we want to wait a certain amount of time before making a call to a function. It uses a timer that will only fire after a certain amount of time has passed and if a call has been made again it resets the timer to start again. That way it does not call the function with a timeout but runs it once after a certain time of inactivity.
We'll use the useDebounce
hook for the functionality.
To debounce the fetch function when price changes by using the useDebounce
we'll debounce the maxPrice
variable and use the debounced value inside the effect instead.
While we are inside the useEffect
we'll also implement the AbortController
to cancel the last request whenever we want to start a new one.
Install the usehooks-ts
package:
Then in the App.tsx
add right after the maxPrice variable:
We have set the debounce timeout to 50ms, feel free to experiment with the value and see when it feels to slow down the UI and when it is not really noticeable.
Finally change the useEffect
to use debounced value instead of maxPrice
:
You can also see that we have created an AbortController
to cancel the last request whenever we fire new one.
When we return a function from useEffect
it will be called when the component is unmounted (destroyed) by React. So whenever the price changes we cancel the last request (before rendering new page) and create a new one.
We are also intercepting axios response to check if the request was aborted and if so we'll return resolved promise instead of an error. You will see this in development as React fires the effects twice to verify if they run correctly. It is better solution than disabling such behavior as it's meant to make us aware of any issues in our synchronization logic.
With that we have come to an end of our journey. We dug up all the treasures and collected them in our coffers.
In the process we learned how to:
[x] create a simple API server,
[x] work with TailwindCSS and build a frontend,
[x] use the REST API from the frontend,
[x] use useEffect
to fetch data,
[x] debounce a state variable to limit network requests,
[x] abort in-flight requests with AbortController
.
See you soon!
Code with debounce and abort controller.
Complete implementation with useDebounce
and AbortController
:
We have now a fully working web application that can display data we have scraped from the website and it uses our infrastructure. That way we have more control over the data and how it is presented and used ex. it would be fairly easy to implement infinite scroll for the page or we could display similar products when selecting one from the list.
I hope you had fun and learned something new today.
Please feel free to contact me if you have questions or create an issue or PR if you want to add something fun to the project. I'll gladly cover the changes in another article.
Stay safe and know that you are doing great! Even when no one noticed yet.
In schema.prisma
change model name from ScrappedData
to ScrapedData
Then run npx prisma migrate dev
, you will see a notification to confirm that db will be reset:
After confirming you will see that migration has been generated and applied:
This will require you to scrape the data again from the website as the old table has been deleted.
Before we can do this though we need to fix the scraper and backed.
In scraper.ts
replace call to:
await prisma.scrappedData.upsert
with await prisma.scrapedData.upsert
and in server.ts
change const data = await prisma.scrappedData.findMany
to const data = await prisma.scrapedData.findMany
.
Now run npm run scrap
(you can change the script name as well if you want 😉).
It will re-populate database with data and download the files again.
And now we are really done.
In this bonus section you have experienced how to migrate the database and what issues arise when models change. This was a destructive change but not all migrations will look like this one. It's still worth keeping in mind that depending on your decisions and what changes you need to make you may be in similar situation.
Code with database migration using prisma.
Back to Articles