Yesterday I opened my phone gallery and found a photo that was taken in February 2020. This is the moment when I visited one of the largest libraries in Seoul, South Korea. This library located in COEX Mall, Gangnam District.
This place made me fall in love at first sight. Until I keep thinking do people who visit this place feel the same way? My curiosity is high enough and I wanna try to do web scrapping visitors review on Tripadvisor using R Studio.
So, let’s start. the first thing is we have to open Starfield Coex Mall in Tripadvisor (https://www.tripadvisor.com/Attraction_Review-g294197-d554303-Reviews-Starfield_COEX_Mall-Seoul.html)
From the picture above we can see that there are 810 Reviews from visitor. Then, let’s prepare the R first then install the package below.
Next, to take a review from COEX Mall by copying the following url link (https://www.tripadvisor.com/Attraction_Review-g294197-d554303-Reviews-Starfield_COEX_Mall-Seoul.html).
Enter in the R link url, then we give the name “Seoul” to display the data.
seoul <- read_html("https://www.tripadvisor.com/Attraction_Review-g294197-d554303-Reviews-Starfield_COEX_Mall-Seoul.html")
Then, we use the SelectorGadget to see the position of the visitor’s review.
After that open R Studio and input the syntax which will be used for scraping review as follows
Next we do some data cleaning, for example if there are letters “\ n” and other letters that are not needed, then we save them in the folder we want with the csv format. Here i save the file name with “reviewtextbaru”
> write.csv(reviewtextbaru,"E:/KULIAH/semester 5/BIML/reviewtextbaru.csv")
Then install some additional packages needed for the next analysis “install.packages (“ nama_package “)” then call them with library syntax like the following
After the installation is complete, the next step is to check the documents in the folder
Next we create a Corpus (a collection of texts that captures the use of language in written or spoken form) for the document above. and proceed with creating a matrix of words in the document with the commands in the image below.
dokumen <- VCorpus(VectorSource(dokumen))
The output shows that from 6 documents there are 91different words. After that we use the following command “inspect”
List of words is sorted alphabetically which is considered as a column. The inspect command displays only the first 6 words. For example, in the 6th document, the word “librari” appears twice, the word “mall, provid,shop and wisit” appears once, and so on. If you want to know more details, you can use the following command.
Of all the words above, we try to display words that have 2 frequency or more times or we can also try to display words that have occurrences of 3 times or more with a command like this.
Then, create wordcloud using the command:
> wordcloud(words = de$word, freq = de$freq, min.freq = 1,
max.words=50, random.order=FALSE, rot.per=0.35,
Based on the image above, we can see that the words that appear on visitor reviews of Starfield COEX Mall will be displayed. The larger the font size, the more reviews mention the word. From the words that seem to want to know the word associations of words that often appear, for example the words “The”, “Book” “And” “Best” and so on.
That’s enough for an overview of visitors’ comments in providing a review of Starfield COEX Mall. Hopefully, with the steps to do text mining above.
Hope we can visit Starfield library soon, because seoul is so beautiful
Thank you so much everyone, have a nice day!
- Hakim, RB. F. (2019, September 25). Web Scraping dengan R. https://medium.com/@986110101/text-mining-using-r-28ada2abb883