Your comment seems to assume that 1) employment levels for software engineers is the same in India 2) that the OP has overstated the social stigma of layoffs in India, and finally 3) the current employment landscape that you find yourself in will last indefinitely.
Based on this, I suspect it's highly likely the hubris in this comment will not age well.
1) India is the chief exporter of software engineering for the world. There are a billion people in India (or more). So naturally on a per-population basis the number will be lower. Given that there are nearly entire towns dedicated to tech services I doubt finding a new job is difficult for those that want one. Especially given how huge labor arbitrage is in the west right now.
2) No idea. My Indian contractor coworkers never seemed to care. It's the H1Bs that panic during a layoff (understandably).
3) I don't think it will, that's why I get mine while the gettin' is good and hopefully have enough to not worry about the future whenever the FAANGs finally succeed at driving wages into the ground. However, given there are nearly 4 jobs for every engineer currently in the field according to the BLS I'm not worried at all about the next decade. I'll just continue to stack cash and re-evaluate every once in a while.
Most startups doing these layoffs are not really solving any real problems. Many are wrapping up existing solutions in a better UI/UX. Which might be nice to have in an upcycle, but in a downtrend, will be the first on the chopping block.
My guess is because it was self-submitted, it was held to a higher standard. Fair enough.
And in some universes the title could be considered click bait, although it is accurate in this case.
Moderation is a tough job. You never win.
That said, the revised HN title seems like it was written by bad AI. The point seemed to be to drive it off the homepage. In that, the HN title succeeded.
Regardless, I'm happy the article generated a lot of interesting discussion before manually being deemed unfit.
It is curious how at the same time the title changed, all the top comments (which were generally supportive) got pushed to the bottom. And now, including yours. Assuming this article touched a nerve at the same time someone was having a bad day.
Author here. Frustrating situation. As the title is long at 84 characters, we know that Google is definitely going to rewrite it. The simplest way is to break it into parts and get rid of the shortest part that still makes sense.
So maybe take
'Towards Platform Democracy: Policymaking Beyond Corporate CEOs and Partisan Pressure'
And 1) condense it and 2) lose the colon
'Platform Democracy is Policymaking Beyond CEOs & Partisanship' (60 characters)
If that is too condensed, you could try a a short title in the <title> and a longer title in the copy.
So an interesting distinction here is required! When Google says they use the title 80% of the time, they mean they use the title 80% of the time to create their search result title, which they may or may not modify. The other 20% of the time they use an H1 or other elements on the page.
And I certainly changed my diet over the past 10 years, and my gut bacteria has become more diverse and presumably more healthy as a result. I've verified this through various stool tests and 3rd party microbiome trackers over the years (uBiome, Thryve)
I don't want to get into pseudoscience or offer anyone advice, but I can tell you that my experience with increasing specific bacteria that increase butyric acid within the gut has seemed to work for me. In theory, one could do this by eating butyrate promoting foods such as almonds, apples, barley, kiwifruit, and more, or take a bifidobacteria-enhancing fiber such as GOS.
I think what folks are missing is that a lot of these "zero-click" searches happen as a result of Google scraping your website, and displaying the results as a "featured snippet."
Yes, they link to you below the featured snippet.
No, more people don't click, because they've taken the answer from your website and displayed it right in their search results.
For example: If I'm searching for "best nail for cedar wood" Google gives me the answer: STAINLESS STEEL - and I never had to click through to the website that gave the answer: https://bit.ly/2MdovdP
• Yes, this is good for users (it would also be good for users if Netflix gave away movies free)
• Overall, the publishers who "rank" for this query receive fewer clicks
• Google earns more ad revenue as users stick around on Google longer
Ironically, Google has a policy against scraping their results, but their whole business model is predicated off scraping other sites and making money off the content - in many cases never sending traffic (or significantly reduced traffic) to the publisher of the content.
No, more people don't click, because they've taken the answer from your website and displayed it right in their search results.
It's for this reason that's I've stopped embedding micro data in the HTML I write.
Micro data only serves Google. Not my clients. Not my sites. Just Google.
Every month or so I get an e-mail from a Google bot warning me that my site's micro data is incomplete. Tough. If Google wants to use my content, then Google can pay me.
If Google wants to go back to being a search engine instead of a content thief and aggregator, then I'm on board.
I just got one of those emails for the first time about my personal site that's basically my resume. Apparently my text is small on mobile (it's not...) and some other crap
I don't get why google thinks it's acceptable to critique my site without prompting. It honestly just feels rude. They want me to do a whole bunch of micro-optimizations on a site that already works fine because it doesn't fit their standard of "high quality". I think I've gotten exactly 0 clicks from Google search results ever and I don't really ever want any.
If it were possible to get a human's attention at Google I'd start sending my own criticism their way but of course it doesn't work like that...
I was curious what it was complaining about, since https://henryfjordan.com looks great to me. I tried to run it through Google's "Mobile Friendly Test" but fetching failed [1] because your robots.txt has:
User-agent: *
Disallow: /
This would explain why you've gotten zero clicks from Google (or I would guess anyone else's) search results!
On the other hand, it's surprising that you would get a notification if you had crawling disabled. Did you set this robots.txt up recently?
Google seems to see robots.txt as "more what you call guidelines, than actual rules". Sites that block googlebot or all bots with robots.txt still turn up in google searches, just without a description, and are obviously still indexed.
robots.txt is a tool to control crawling, not to specify how you would like your site to be displayed (or not) in search results. If you don't want search engines to include your site, set:
The blog post you link has a bunch of alternatives, but I agree they're not great. If there are a lot of webmasters who want to be able to noindex through robots.txt then making the case for adding noindex to the standard would be a good next step.
I sent you an email, and I'm posting it here but without identifying info:
---
Hi Jeff,
Thank you for your comment, I'm replying via email to send some info I'd rather not share on HN, but will post the same redacted in HN. I used to (back when starting my web-dev career) run a one man show development team of a web agency and all our development/pre-prod sites (that had to be unauthed) had robots.txt to disallow all bots, but they still popped up in Google. Searching some of the old domains in google I found an example here: http://***.***/***, and attached is an example of it showing up in a SERP and a what the robots.txt looks like (and I'm pretty sure that the robots.txt has looked like that since that page was created).
In this case it is just one page that nobody will care about, and since I'm not working on projects that are open but "robots.txt hidden" anymore I don't know if it is as bad as it used to be, but I regularly see pages with the "No information is available for this page" whose domains have robots.txt's that disallow all bots but still show up in Google.
Thanks for sending the screenshot! That site shows up with "no information is available for this page", which means that while robots.txt has disallowed bots from crawling it the page is still linked from other pages that do allow crawling.
The robots.txt protocol gives instructions to crawlers about how they should interact with the site. If you instead want to give instructions to indexes, use the noindex meta tag.
You're right, I was wrong about how to expect a "Disallow: /" to work. But isn't it sorta odd to have a protocol to control crawling (which is usually done to index) but (almost) require a compliant indexer to crawl all pages to comply with the indexing rules?
In this example the robots.txt has clearly told all bots to not crawl this site, but the only way to read the meta tag (or equivalent header) is to crawl the site. So I assume that in this case google either assumes that it is fine to crawl URL's that it has found elsewhere while ignoring the robots.txt or it assumes that pages disallowed by robots.txt are "open for indexing/linking", which would mean that any page both disallowed by robots.txt and which has a noindex meta tag would still show up, right?
What is the intended behavior if a page is disallowed by robots.txt and still linked by another indexed page? Will it get crawled or just assumed to be okay for indexing/linking? Is there any way to tell Google not to index/link and not to crawl?
If you have a calendar where every month links to the previous and next months, a crawler can get stuck and hammer the server. That's the kind of thing robots.txt is for.
>"more what you call guidelines, than actual rules"
they can index without scraping. It is enough that other websites have links to you site. So the google bot follows the rules in robots.txt to the letter. "no-index" is the way to stay away from google.
They can't read my no-index if they obey my robots.txt. Do they break the robots.txt to be able to read my no-index or do they assume my "Disallow: /" means I'm fine with them indexing/linking?
Without the noindex part of robots.txt (which google decided to ignore not so long ago) this is not solvable.
Oh, I just added that yesterday as a response to the email. Before that I was actually running Google Analytics but since I get basically 0 clicks it wasn't really useful.
I have a feeling the PDF viewer triggered it, cause on Mobile it defaults to showing the whole page which results in tiny text but that's easily fixed by the user so I prefer to leave it like that.
Yeah it's amazing how rapidly and rabidly they show up when the complaint is on one of their paid features like a Google cloud (GCE) post for them or a competitor, but nada on the other products. Well no it's actually not surprising.
> If Google wants to go back to being a search engine
While I understand the problems with Google scraping content, as a user these snippets help me find what I'm searching for faster. If that's all you're optimizing for, Google is fantastic. There are certainly good arguments to be made for other models, but for search, stealing content helps. I'm not advocating stealing content, I'm just saying that it produces more useful results.
How do you know that the content Google features is the best there is? If we stop clicking on sites and just rely on Google to provide us the content we'll go down a very slippery slope.
I don't really see how this problem is any different to 'how do we know the #1 search result is the best content there is?', if it provides you the information you want, then great, otherwise you load #2.
Google lends the weight of its authority to the answers it presents. It's one thing if Infowars says that Obama is planning a coup against Donald Trump, it's another if Google says so.
The first three result lead you to fake android blog telling you how you can easily root every chinese android device and specifically the M89 tablet...
The real authoritative result (xda-developers) only appears in the fourth position, under sight. It will tell you if you follow the instruction given in the fake blog post from the 2 or 3 first results, you will brick your tablet.
In a similar way the word "cbd" (for cannabidiol) has been hijacked by dubious commercial compagnies through fake blog posts filling pages after pages of google results telling you how great cbd is for the treatment of every disease on earth... But there is no trace of an actual study in these results. You will have to go with the less popular word "cannabidiol" to start to see some serious articles about it.
Google results can be hijacked and Google do little about it. May be because the ads shown in these fake blog posts are from google ads network ? I don't know...
But google result have clearly deteriorated these last years and the authoritative figure of the companie is not anymore what it was in the past.
I know that sort of thing happens sometimes (Google presenting a spurious statement as a categorical answer) but those are bugs. As long as they are very rare, and fixed quickly when they occur, I don’t see them causing much harm.
OK, some people believe anything they read (especially if it confirms their existing biases), but that problem has always existed. I think Google’s occasional snippet fuck-ups are a drop in the ocean compared to the spread of false information through social networks.
There's the modern news-cycle axis, where Google can and should devote full-time engineers.
But the long tail is important too. It's fixed now (yay) but for years you could search for "calories in corn" and Google would confidently present an answer 5x the true value, scraped from a site with profoundly wrong information. As Google moves to present more direct answers and fewer links, this risk increases.
It looks like they have backed off on the direct answers somewhat which is good news.
Very few new blogs and content websites are being set up.
All content is moving into apps and walled gardens. Part of the reason for that is that running a well researched blog will never pay for your time, so becomes a hobby thing, and most people are fine to use Facebook for that.
> Micro data only serves Google. Not my clients. Not my sites. Just Google.
Well it also serves Google's users, to be clear. Though I should also be clear that I don't think that justifies it, since I think it's bad for the ecosystem in more subtle ways than are expressed in immediate user satisfaction.
That depends on how you define "users". If you define a website creator also as a Google user (by virtue of wanting to be found through Google), then Google is serving part of its users to the detriment of their other users.
And if you view Google instead as a connection broker, e.g. a middle-man between publisher and consumer, then Google is destroying their own business by snubbing publishers. Assuming that Google is still making rational, intelligent decisions, it follows that Google no longer sees itself like that.
Did Google ever see itself as prioritizing publishers and consumers equally? I think that’s a false premise and the parent is right; Google’s priority has always been consumer first.
Google (and virtually every other search engine) has always included content with links, what's different now (but not unique to Google, though they are perhaps the most advanced at it) is that now it algorithmically synthesizes content instead of merely aggregating it.
On top of all that, Google's snippets aren't curated and therefore, aren't always correct. They can be (and almost certainly are) gamed. Users that don't click through open themselves up to carrying on being misinformed.
I've found them to be incorrect so often on things when I would click through to the actual page or find a better link. I don't trust just the blurb for any answers any more.
I don't trust just the blurb for any answers any more.
I don't, either.
A site I used to own had a discussion forum on it. It contained a message along the lines of "Real Estate Agent X is a great guy. Real Estate Agent Y is a complete sleazebag."
The blurb that Google displayed for it was "Real Estate Agent X... is a sleazebag." And that was the first result for anyone who searched for that agent's name.
As you can imagine, I received many angry e-mails, phone calls, and legal threats. No, you can't explain to angry people that it's "just" an algorithm that told the world that they're a sleazebag.
I ended up editing the post so that Google would display a different version after its next scrape.
I think there's more to this... Google use lots of fancy Natural Language Processing stuff to extract that data, and unless the wording was very tortuous, I doubt it could make such a big mistake by chance.
They can get it painfully wrong last time. I came down with something like optic neuritis a few years ago. It's often one of the first signs of MS in many folk. When I googled something like "MS life expectancy", the blurb said something like "3-7 years" -- with subtext indicating it's 3-7 years LESS than average rather than "you're kicking it in 3 years".
I think they’re believable because google started by providing things that weren’t wrong. If you search for a time zone google shows it in your local time, if you search for currency conversion google does that. All those things that it’s done for ages, which were things that were also typically correct.
Then the snippets show up, and they are presented in a similarly trust worthy fashion. But the snippets are really just the really just the result of which ever site has the best SEO, and that’s often a really worthless metric these days. The time zone and currency stuff is easy, because it’s math, but opinions aren’t. The thing is though that even if google didn’t have the snippets, those sites that gets snippets would still be the top results that we clicked, and we’d still get the wrong information. That would probably be better, because it might be easier to spot obvious bad sources, but I still think there is just a fundamental flaw in how SEO professionals have learned to game the google bot to bring the world useless information.
I mean, part of it is certainly on google. No one in their right mind wants to comply with Google’s ranking terms, unless you make money from google searches. Which means a lot of useful personal blogs have dropped off the face of the internet, unless you’re really lucky to see them linked on a place like HN.
I wish libraries would band together and make a privacy focused and curated search engine, because librarians are actually kind of good at finding you the correct information.
It sucks. Sometimes the bold text is the exact opposite of the answer to the query I search for. It’s very misleading unless you click through and read the full context.
This is especially true where the answer is time-bound, which happens a lot in technical topics. Many times the snippet is for an earlier version of the language (but still with a high PageRank), or the Operating System (especially Android settings), and the most annoying at all: an ancient answer in an undated blog post.
Google is good at dating undated content. They keep track of the first time they've ever seen a bit of text, and assume it was composed then, even if it later gets copied to other sites.
The point is that Google frequently adds another level of incorrectness, that may not be identifiable without checking the source. This is pretty common on Wikipedia, and when people link to things in discussion forums, as well.
And anything Google does, is done at vast scale, which makes me, at least, think it might be substantially affecting society.
But that's the responsibility of that website. Of course it's bad if Google lists a site with wrong information as the first hit, but I think it's worse when Google blindly copies that false info and lists it as their own zero-click result. By doing that, Google itself takes responsibility for the information.
Although sometimes the site is actually correct and Google still gets it wrong by copying the info incorrectly or losing some context or qualifiers.
I loved zero-click results back when DucfDuckGo first introduced them, but I'm less enthusiastic about Google's implementation of them.
It's important to note that this is strategically incredibly important for Google because this forms the backbone of their voice AI. The better at answering questions directly, the better their voice AI becomes and that leads to a lot of future products.
AdWords is and always has been the goose that lays the golden eggs, none of Google's other initiatives have ever rivaled that revenue. That's why they put so much effort into bolstering and optimizing their search results pages.
Another reason is the use of add-ons such as: "Google search link fix - Prevents Google and Yandex search pages from modifying search result links when you click them."
I have stopped using Google a few years ago, but just in case I keep this (or similar) add-ons of my Firefox.
I have no idea of the popularity of such addons, but they would also impact the tracking that Google does.
It's been this way for ages, although for chrome (iirc) this is managed via hyperlink auditing [1] which allows google to track what you're clicking even though the link appears 'clean'.
The click through google redirect also allows them to track things like relevancy of the content and time on site (if you return to google SERP by clicking the back button), in-case the target site isn't using google analytics (unfortunately most sites do).
Any search engine is going to want to know what people click on so they can make their product better. For example, I just searched for [test] on DuckDuckGo and when clicking on the first result I see DDG sending a ping back:
https://improving.duckduckgo.com/t/lc?...
which contains which URL I clicked.
(Disclosure: I work for Google, speaking only for myself)
Startpage is an anonymizing proxy for Google Search, not a full search engine. Crucially, it doesn't determine how to rank results. If they decided to try to compete with Google, Bing, Yandex, DDG etc directly by bringing ranking in-house they would have a very hard time serving good results without being able to track which of their links were popular among users.
How safe are all these plugins we install to escape tracking? Are we trying to escape big tech tracking only to hand our information over to extension developers? Looking at network traffic often shows a ton of extensions sending data to some aws server almost perpetually.
Asking because I'm not sure of the answer to this question and lately I've become even warier so I decided to uninstall everything except things I absolutely must have like colorzilla, grammarly and full-page screen capture. For adblocking I use brave and never ever touch firefox, opera or chrome.
There's an extension that appends a share=1 parameter to all quora links to prevent them from forcing you to sign in in order to view a post. I like it but I'm trying to minimize my extensions footprint and I'd rather write my own script to perform the same script.
The question is, how do you get to be sure that an extension is safe?
> Google has a policy against scraping their results, but their whole business model is predicated off scraping other sites and making money off the content
Yea a couple days ago I was checking the Places API, which they’ve built off user-generated content and scraping Yelp and others. They charge $17 / 1000 calls for certain items and don’t you dare cache anything for too long.
Great way to build a business: get data for free, wall it off and put a hefty price tag on it, then put your best lawyers around the moat for good measure!
I downloaded all the places data for the world while it was still free. In my jurisdiction, the data is considered owned by the place owners rather than Google, so I doubt they'll come after me.
I disagree. There is an implicit contract between website publishers and search engines that it’s ok to do this. The website can set nosnippet in robots if they want to not have the snippet in search results.
You put a resource on an open network and don't use any of the standard, recognized methods to indicate don't index, don't share, (nor lock it away with auth).
It's like if you put a sculpture in front yard and get upset when someone points it out in their neighborhood tour company, even worse cause yard ornaments don't have standard accepted methods of saying "don't use".
You put a resource on an open network and don't use any of the standard, recognized methods to indicate don't index, don't share, (nor lock it away with auth).
This is the kind of argument people used to use as they flagrantly violated your copyright by cloning your article on their own site. "You put it on the Internet, so it's free for everyone to copy."
The law says no such thing, at least not in any jurisdiction that I'm familiar with. Contrary to popular belief in some quarters, normal laws do still apply on the Internet.
If you infringe copyright, it's still infringement even if what you copied was freely available on someone else's site.
And if you state something that is misleading and harmful, it might still be defamation, even if what you stated was just an automatically generated snippet that takes a small part of someone else's site and shows it out of context.
Nah. Take it easy here, there is a long way between indexing and showing the most relevant hit and outright lifting big parts out of the site and use them on their own property:
It is more like if the guide that used to send visitors to your property has set up their own boot on the best spot on the sidewalk next to you and are raking in money because of the useless (often, in the last few years) ads they have plastered all over it.
Even if it is an educational non-profit resource you don't want that as some of the details get lost when visitors only reads the guides summary instead of taking a closer look for themselves.
And according to people on this thread they will also complain and/or come with suggestions about how you can make it even more useful to them.
I think of it more as if you put a banner with content somewhere in the public, and I take a photo of it, what can I later do with that photo?
And for that, it's a question of copyright. It turns out, in the US, if something is publicly available it does not make the copyright a part of the public domain. Thus the original author still retains copyright unless explicitly stated otherwise.
There is an exception to this though, which is called fair use. And for that, I'd recommend reading this: https://amp.theatlantic.com/amp/article/411058/ Book snippets by Google searched were deemed fair use.
So the question remains, would website snippet similarly count as fair use? What will the federal courts rule be? And when it comes to fair use, that's the only way to know if it is or not.
It's worth pointing out in this context that the US legal concept of fair use is not universal. In fact, unusually for US IP laws, it's actually much more permissive than most other places. The more usual practice is to enumerate specific situations where copying without the copyright holder's consent is still allowed, instead of defining general tests, which is how fair use works. This has been a controversial point, because it's not clear that the US scheme is sufficient to meet its obligations under international treaties.
In answer to your final question, I'm not sure whether this use of snippets in search engine results has been tested in any US courts yet, but the issue of search engines showing enough content from the sites they link to that users never actually go through to the original site is sufficiently controversial that the EU's recently passed copyright directive includes specific provisions aimed at exactly that sort of situation.
> It is more like if the guide that used to send visitors to your property
Here is where your argument falls apart. The web is a public space - it's not your property or your front yard. It's more akin to going to the town square wearing a fancy hat and getting upset if people look at you and your weirdly shaped headwear.
The web is a public space - it's not your property or your front yard.
You're wrong here. Just because it's a public space does not mean nobody owns the property. As a simple example, a shopping street is usually a public place. That does not mean that all window displays, doorways and adjacent buildings are automatically a free-for-all.
In fact, only "the tubes" of the web are a public space. The rest is owned property, even if there are no visible fences.
Laws everywhere are pretty much saying your take is wrong. There is no such thing as an implicit contract, and your take on it is plain victim blaming.
It is very surprising to read this on a board where many people write code: if a dev found unlicensed code, they would certainly not think it is public domain.
It's a devil's bargain. If you opt-out of snippets, it simply means somebody else claims the top spot, and you are left with even less traffic (by a significant amount)
> I disagree. There is an implicit contract between website publishers and search engines that it’s ok to do this. The website can set nosnippet in robots if they want to not have the snippet in search results.
Who made this contract? I never signed one. If I came to your place of business and copied your content and provided it somewhere else, I would be infringing your copyright. Do I have to put up signs specifying that at my place of business? Why is this any different? My web content is not the property of someone else and by publishing my information that is in no way an implicit grant of the right to reproduce it.
I wonder how true that assumption really is any more. The quality of traffic Google drives to sites I operate is very low compared to all other major sources, with much less engagement by any metric you like, notably including conversions. The only reliable exception is when we're running marketing campaigns in other places, which often result in spikes in both direct visitors landing on our homepage and search engine visitors arriving at our general landing pages.
There is this conventional wisdom that SEO, and in particular playing by Google's rules to rank highly in its results pages, is the only way you can run a viable commercial site these days. Our experience has been exactly the opposite: our SEO is actually quite effective, in that we do rank very highly for many relevant search terms, but it makes a relatively small contribution to anything that matters. And really, when I write "SEO" here, I'm only talking about general good practices like being fast, having a good information architecture and working well on different devices. We don't change the structure of our pages just because Google's latest blog post says X or Y is now considered a "best practice" or anything like that.
Of course I have no way to know how representative our experience is. YMMV.
Yes you can. There are other ways to market yourself and your website. For instance, the author of “Fearless Negotiation” has appeared in four or five podcasts I follow. The well known pundits in the Apple ecosystem grew an audience organically through word of mouth.
Hoping to stand out on Google results as a business plan is recipe for failure. You are one algorithm change from going out of business.
Glad the first ranked response was this. It's what I came here to say. These days you simply don't need to click as often to get what you need out of a search, and Google's business model doesn't rely on click through to web sites, but for display & click through of ads.
Searching for "best car engine oil" has certain brands displayed straight on the featured snippet. Who cares about the click if Google found your customer for you and got your message through for free?
In the end, Google should care. If a search for "best car engine oil" got your product featured, that means you won a sale. But assuming the sale happens completely offline, Google lost its opportunity to inform you of the search, and of the succesful search->sale conversion.
That means your marketing department can no longer justify investing money in Google SEO, which means less optimization towards Google's crawler, which means less reliable search results, which means less Google searches in the long run.
Increased profits from unknown sources VS decreased profits from known sources. The gained marketing intelligence may come at the cost of the bottom line.
I 0 click search more than I click because but not limited to:
1 ) to get the correct spelling of a word that spell check cant find a suggestion
2 ) avoid going to a site that I might potentially get malware (example searching music lyrics)
3 ) avoid having to deal with slow loading and bloated pages
"• Google earns more ad revenue as users stick around on Google longer"
This one is actually reverse. Google search doesn't net google any money if people don't actually click the link, since ad revenue for google search is Per Click, not per view (per mille).
The incentives for them are actually reversed - increasing the amount of clicks into external websites, specifically advertised links, increases their revenue. (which is why there are so many advertised links on a search page)
I do a fair amount of grammar and spelling searches. Google often displays tips and examples. And typing "sp500" displays a stock chart right in Google itself. Google has a lot of "instant snippets" like that. Quite convenient. However, near-monopolies do make me nervous about supporting them.
Great point, this was my first thought also. Google has been doing a slow creep of this type of content for the past through years through the featured snippet you mentioned, and other knowledge panel material. They now serve sports, weather, math, translation, flights, etc.
Speaking of scraping, does anyone know where one can get a hold of full text news articles/press releases for nlp research? Most APIs that I have found only offer partial texts.
I know that Aylien has an API for this but it's out of my price range.
If I recall correctly content(news especially) publishers and some Europeans were very angry about that. I think the consensus was that these businesses don't understand the internet.
I don’t see any way to actually achieve this at scale let alone any reason to add an opening for more pointless lawsuits. Let’s say they’re liable and you choose to act on incorrect information recieved for free. Do you really try to take them to court and on what grounds?
Yes, Google and similar companies should be 100% responsible for anything published on their platforms. No more “safe harbour”. They have chosen to take positions in many issues, that makes them more like newspapers than phone companies.
Positions like what? And no, banning radicals from their platform for violating their terms of service is not a position.
Even if they were responsible, it's still legal to lie. You don't see pseudoscience websites being taken down because they are objectively false either.
It’s OK for the NYT to attempt to “prevent another Trump situation”. They have an editor and that person is legally responsible for what they publish. They don’t even pretend to be non-partisan. But Google takes a position then hides behind “common carrier” status. It’s not reasonable that they can pick and choose. Either they’re the phone company or they’re a publisher. It’s their right to be either of course, but they must choose.
> It’s not reasonable that they can pick and choose. Either they’re the phone company or they’re a publisher. It’s their right to be either of course, but they must choose.
This is 100% wrong, the opposite is true. The law explicitly protects website operators from being liable for content posted by 3rd parties while simultaneously granting them the explicit freedom to curate content that they deem objectionable.
No content on Google is posted there by 3rd parties. Google does select what is displayed and, in the case of snippets, they go out of their traditional way to promote that content.
The use of the word "post" is my own colloquially imprecise language, the law actually states
> No provider or user of an interactive computer service shall be treated as the publisher or speaker of any information provided by another information content provider.
So content indexed by google absolutely falls under the definition of "provided by another information content provider"
Providing links is indeed within this definition. However, cards go beyond that: by selecting one out of the many results promoting it and possibly alterating its meaning by selecting which parts, and how it is displayed goes far beyond merely displaying content provided by others.
Of course, there is plenty of room for google attorneys to wiggle, but in the end the objective for them is to 1) give credibility to a source and 2) to get the benefits of being the providers of information.
Common carrier and safe harbor are 100% separate and distinct concepts. The same way that a forum could have a theme ("political party X posts only") and still be allowed to remove illegal content is Safe Harbor (both curation at their discretion and no responsibility for illegal posts) - and I don't see how one could be against that - and Google is nowhere near that, whatever "positions" you envision them to have taken.
Google and other tech never claimed to be common carriers, and even internet service providers have been cleared of that status - barely anyone is legally required to transmit without discretion (it's pretty much just phone companies). So why make it about Google and Twitter, and start with ISPs?
I don’t mean literally alphabetically but according to some objective measures. In the old days it was by incoming links (PageRank). But now it is opaque and many people are finding that it orders by whatever is best for Google, not for the user.
There is not nearly enough room on the front page for everyone that wants to be there, google has to make subjective decisions about what shows up there, it's impossible to do it any other way.
I don't think that's a bad idea, but the vast majority of google users would not desire this behavior, especially the way google is used today where users try specific terms to relocate content they have looked up before.
Yet there is no spam in the Yellow Pages. It’s very unlikely that if you call Aaron he’ll clone your credit card or install hidden cameras in your house. Also, it’s very likely that he actually is a locksmith, has the accreditation he claims to, is a legitimate business registered at Companies House, fully insured, all the things you expect of a normal business.
Hold on. Google doesn't earn even a penny when you visit their site, find your answer on the search results page, and then leave. That user behavior COSTS Google money, it doesn't earn anything.
If they were trying to monetize you they'd show you an ad that links to your answer and take a profit on the click. Directly giving the user the answer they want is great for the user, but guarantees that Google won't earn any revenue.
So why does Google do it? Simple: because their competitors do. That's the free market for you. Google didn't start that feature, another competitor did; Microsoft made it their primary differentiating feature in fact (remember the "bing and decide" ads?). Google had to adopt the same behavior or lose their customers.
So no, don't blame Google, blame capitalism. This is precisely the kind of feature that you wouldn't get if Google was able to behave as a monopoly.
that is correct and should be considered copyright infringement... I am so tired of the double standard in the US of people VS corporations... Corporations are considered better people then real people.
But as Google sucks up the consumer surplus, it's going to be harder and harder to make money from internet businesses, and the final result a few years down the road will be toxic.
The internet isn't going to work too well if its solely reliant on hobbyists.
The funny thing is this used to happen. In the early days, you ask a simple question, you would get the answer in the search results, before they introduced feature snippets. The problem was, because no one was clicking on these useful sites, they were downgraded in the listings to sites that hid the useful info so you had to click on it.
Perhaps you don't know who Examine.com is, but they are cited by the New York Times, Washington Post, The Guardian, CBC, multiple Wikipedia pages all over the world, and over 12,000 other websites. I assume they did more homework than you.
Some, like Mercola, peddle highly-controversial, near anti-vax content.
But on the other end of the spectrum, Examine.com should be the gold standard. Quality Raters should use it as an example of a site to emmulate. Much higher quality content and informative content than WebMD, IMO.
When I google for astaxanthin, the suggestion in the original post, Mercola ranks higher than Examine. Which is bloody tragic, as examine gives legit information.
IMHO the mere fact a page links to relevant (to its subject) papers in reputable scientific journals should be considered a positive factor in a page rank computation.
Based on this link I wouldn't trust the critical thinking of the content of this site.
> We also know about Google firing James Damore and their ideological echo chamber.
They are already displaying their own bias based regardless of what they think Goolge's bias is.
Damore was fired because he violated California labor law. But the post continues with lines like:
> Now, I’m not a conspiracy theorist
...
> But it seems like they decided that there’s no way to algorithmically penalize certain sites — so instead they do it manually, behind the scenes, without telling anyone.
Without any evidence of this what-so-ever.
It's one thing to say "Google are not doing a good job of filtering out mis-information/commercialization without penalizing high quality information from smaller sites/institutions" it's another to say that this is a conspiracy from the "echo chamber bias" of their employees to suppress the speech of people who don't agree with them.
Google health related results are straight garbage these days. I noticed the change that dropped examine in the rankings as I frequently search for supplements. At the same time l, Wikipedia also tanked in my queries which is one of the sites that I’m almost always going to have a look at if they have a page on the topic. This seems tied into google’s expansion of their built-in snippets. That would be ok if they linked to the source but it feels like that’s true for only 1 in 20 of these boxes.
I’m actually really frustrated with these changes and would like to start using an alternative. I like startpage but they use google search results so that’s not a viable alternative. Guess I’ll have to check out duck duck go’s Bing powered performance.
So... Google is openly editorialising their results?
They were already doing it with the carousel (google "american inventors") but if they are doing it with what seemingly is the list of organic results this is very, very troubling.
I mean, Google always was, even in the PageRank days. Even if you can perfectly recreate the numbers as to why so and so site is ranked higher in your system, its still your system choosing to rank so and so site higher.
I see your point, but in this case they are blacklisting domains by hand because of their content, which they don't agree with. And that is very bad. Maybe it was my mistake, thinking that their organic search was holy, which no longer is the case it seems.
I dont understand this point of view. Googles literal mission since inception was to rank results based on how good google thought they were. Their purpose is to editorialize results through the order they appear. Quality is defined buy googles subjectivity.
Where did the idea of google neutrality come from? Google would be useless if they didnt blacklist what they perceive to be spam.
> Where did the idea of google neutrality come from?
From Google. They've stated time and time again that it's a magic algorithm and they don't hand-pick winners and losers. And it's a good thing, too, otherwise you're just inviting corruption. Top spots are literally worth millions, and if there's an small army of people that decide who ranks where, they are an obvious target for bribes.
This doesn't look that hand picked, though, more like somebody didn't check what would happen if they rolled out some algo change and targeted way too broad.
they pick losers by identifying losers, or who SHOULD be a loser, and then modifying the algorithm to derank them and their tactics. believing they can target spam, without first identifying spam, doesnt make any sense.
in this case, some better sites resemble spam enough that they were also hit. a basic false positive, collateral damage.
That's different though. The algorithm applies to all sites, and if apple.com does the same thing a spam-site does, they will be punished by the algorithm as well. Hand picking is very different, in that similar things aren't treated similarly.
> in this case, some better sites resemble spam enough that they were also hit. a basic false positive, collateral damage.
I believe that as well, though not necessarily because of "spam", but because of the topic. I was just trying to explain how people might think that Google doesn't manually curate their results.
This is an extremely circular conversation. Google writes the algorithm that ranks pages.
They absolutely know, that if you search Disney, and Disney isnt the first result, they wrote it incorrectly. They also know their product has less value if it returns spam, which is why they fight SEO artists.
They do try to distance themselves from "choosing" the top result for "best construction store" or "best news site" by shouting the world algorithm, to distract the conversation. That doesnt mean they dont carefully craft the algorithm to return a relevant top result.
> That doesnt mean they dont carefully craft the algorithm to return a relevant top result.
I found https://medium.com/@mikewacker/googles-manual-interventions-... an interesting read on that topic. It's not just a crafted algorithm, but there are different algorithms and employees choose different algorithms for some queries if they/journalists dislike what the original algorithm considered most relevant.
I mean, that's how you'd train the main algorithm, right?
I'd fully expect to see these interventions fed back into the algorithm so Google can better predict "this search term is likely to be targeted by partisan or otherwise suspiciously motivated actors".
See, I don't think there's much difference between writing a deterministic mathematical algorithm to have X site on top, hand-curating a list to have X site on top or writing a magic spell that consults 4 neural nets, a space dragon from Jupiter and the Canadian Prime Minister for weightings that results in X site on top.
That's all implementation details, at the end of the day site X is on top and site Y is not, and Google decided that.
And as mentioned in the sibling thread, that's the value of Google Search. If you disagree that X should be on top, then find an alternative search engine that has some different ranking algorithm, but there's no such thing as an objective search engine.
My way of looking at it has always been there is no such thing as organic search results (other than that the organisation in question did not have a hand in choosing to put it there). The aim is, presumably, to return quality sites, which always involves a subjective judgement - there is no intrinsic or natural ordering of a set of sites (other people could choose to rank them differently). Whether it's writing an algorithm that results in those sites being at the top (which _must_ be rewritten if returns certain sites otherwise it will be gamed), a more direct choice, or a combination of the two, there doesn't seem to be much difference. The algorithm serves to scale Google's subjective opinion, and they will always boot out sites they don't think are of sufficient quality - it enables them to return sites that they may not have hand-judged, but I've always assumed it is trying to approximate what would be returned if they _did_ check every site.
> They were already doing it with the carousel (google "american inventors")
Maybe.
There's a chance you're seeing an unintentional side-effect of inclusion-oriented school projects getting hordes of people to Google stuff like "African American inventors".
That potentially shows the same phenomenon - kids given a Black History Month assignment to write a paper on a black inventor - and it's fairly clear there's some sort of automated threshold at work.
As evidence, I really doubt this difference is editorial-based:
Many similar websites have suffered from a so-called "Medic" update. Some of them recovered (through fixing their website), some did not. The SEO community is full of such stories, for more than a year I think.
FWIW, Dan (the author) has an outstanding reputation for professionalism and integrity in the marketing world. If he says he did something for ethical reasons, to those who know him, he's earned the benefit to be believed. (If you don't know him, you'd be forgiven for being suspicious)
And credit should be given to him for educating everyone on this exploit.
It'd be very easy to make a proof of concept of this exploit which didn't breach copyright or record peoples personal information and then to publicise the problem immediately, instead he chose to operate on real sites, collect real personal data and then forget about it for 5 years. It's this general lax attitude that gives everyone working in the SEO sector -- and by extensions the tech sector as a whole a bad reputation. The whole experiment doesn't feel like it was conducted in good faith or with any consideration for the ethics beyond 'hey this is cool'. Grow up!
While it is clear that you did not have any bad intentions, you should never have published it on the web. Based on your earlier comment "It worked a little too well" it becomes clear that multiple users were tricked by your site and that you possibly even intercepted submitted forms ("I gasped when I realised I can actually capture all form submissions and send them to my own email.").
You misled people and breached their privacy. This is as simple as it gets, even if it was for an experiment (though leaving the site online in some other form still raises a lot of question marks..).
My advice for you is to perform future experiments locally, not on the web and make sure people participating in your experiment are aware.
The point of the experiment was the social engineering aspect. The fact that it would work technologically was obvious. The fact that it would work practically was what he set out to prove.